Recent leading zero-shot video object segmentation (ZVOS) works devote to integrating appearance and motion information by elaborately designing feature fusion modules and identically applying them in multiple feature stages.
Spatial-information Guided Adaptive Context-aware Network for Efficient RGB-D Semantic Segmentation
Efficient RGB-D semantic segmentation has received considerable attention in mobile robots, which plays a vital role in analyzing and recognizing environmental information.
GitHub Link
The GitHub link is https://github.com/mvme-hbut/sgacnetIntroduce
Title GitHub Repository Update for SGACNet Spatial-information Guided Adaptive Context-aware Network. Summary This GitHub repository branch is current and synced with the CyunXiong/SGACNetmain repository, which pertains to the development of SGACNet—a network designed for efficient RGB-D semantic segmentation, incorporating spatial information guidance and adaptive context awareness. Efficient RGB-D semantic segmentation has received considerable attention in mobile robots, which plays a vital role in analyzing and recognizing environmental information.Content
We provide the weights for our selected ESANet-R34-NBt1D (with ResNet34 NBt1D backbones) on NYUv2, SunRGBD, and Cityscapes. Download and extract the models to ./trained_models. Please navigate to the cloned directory. Note we are using Python 3.7+. Torch 1.3.1 and torchvision 0.4.2 ImageNet can be downloaded for our selected backbones on the above datasets. Stored in <dir>/trained_models/imagenet. Note that some parameters are different in Cityscapes. Evaluation on SUN RGB-D is similar to NYUv2. Yang Zhang, Chenyun Xiong, Junjie Liu, Xuhui Ye, and Guodong Sun. Spatial-information Guided Adaptive Context-aware Network for Efficient RGBD Semantic Segmentation[J]. IEEE Sensors Journal, 2023.Alternatives & Similar Tools
In this work, we propose a novel training mechanism termed SegPrompt that uses category information to improve the model's class-agnostic segmentation ability for both known and unknown categories.
Google Gemini, a multimodal AI by DeepMind, processes text, audio, images, and more. Gemini outperforms in AI benchmarks, is optimized for varied devices, and has been tested for safety and bias, adhering to responsible AI practices.
Video ReTalking, advanced real-world talking head video according to input audio, producing a high-quality
Then transplant it to the real world to solve complex problems
LongLLaMA is a large language model designed to handle very long text contexts, up to 256,000 tokens. It's based on OpenLLaMA and uses a technique called Focused Transformer (FoT) for training. The repository provides a smaller 3B version of LongLLaMA for free use. It can also be used as a replacement for LLaMA models with shorter contexts.