Isomer: Isomerous Transformer for Zero-shot Video Object Segmentation logo
AI Tool Profile

Isomer: Isomerous Transformer for Zero-shot Video Object Segmentation

Recent leading zero-shot video object segmentation (ZVOS) works devote to integrating appearance and motion information by elaborately designing feature fusion modules and identically applying them in multiple feature stages.

Website
github.com
Pricing model
Free
Price start
Free

GitHub Link

The GitHub link is https://github.com/dlut-yyc/isomer

Introduce

The project "Isomer" introduces an innovative approach for zero-shot video object segmentation (ZVOS) using transformers. By leveraging the capabilities of transformers, the method combines appearance and motion information for feature fusion in ZVOS tasks. The proposed approach includes two transformer variants Context-Sharing Transformer (CST) for low-level feature fusion and Semantic Gathering-Scattering Transformer (SGST) for high-level feature fusion. This results in improved ZVOS performance with real-time inference. The code and model are available on GitHub under the Apache 2.0 license, along with installation and usage instructions. Recent leading zero-shot video object segmentation (ZVOS) works devote to integrating appearance and motion information by elaborately designing feature fusion modules and identically applying them in multiple feature stages.

Content

[ICCV2023] Isomer: Isomerous Transformer for Zero-Shot Video Object Segmentation The code requires python>=3.7, as well as pytorch>=1.7 and torchvision>=0.8. Download pretrained models, datasets, final checkpoints and results from here (passwd: iiau). Please organize the files as follows: The model is licensed under the Apache 2.0 license.

Alternatives & Similar Tools

LongLLaMA-handle very long text contexts, up to 256,000 tokens logo

LongLLaMA is a large language model designed to handle very long text contexts, up to 256,000 tokens. It's based on OpenLLaMA and uses a technique called Focused Transformer (FoT) for training. The repository provides a smaller 3B version of LongLLaMA for free use. It can also be used as a replacement for LLaMA models with shorter contexts.

LAMA: Human motion data to realistic complex 3D model actions logo

LAMA utilizes a reinforcement learning framework combined with a motion matching algorithm. Reinforcement learning helps the model make appropriate decisions in various scenarios, while motion matching algorithms ensure that synthesized actions match real human actions. In addition, LAMA also utilizes the motion editing framework of manifold learning to cover various possible changes in interactions and operations.

Compare Isomer: Isomerous Transformer for Zero-shot Video Object Segmentation

Quick compare routes for nearby alternatives.

All compare routes →