AI Tool Profile

3DMOTFormer: Graph Transformer for Online 3D Multi-Object Tracking

Tracking 3D objects accurately and consistently is crucial for autonomous vehicles, enabling more reliable downstream tasks such as trajectory prediction and motion planning.

Paper and LLMs Autonomous Vehicles

Website

github.com

Pricing model

Free

Price start

Free

GitHub Link

The GitHub link is https://github.com/dsx0511/3dmotformer

Introduce

The repository "3DMOTFormer" is the official implementation of the ICCV2023 paper titled "3DMOTFormer Graph Transformer for Online 3D Multi-Object Tracking." The paper addresses the challenge of accurate 3D object tracking for autonomous vehicles. It introduces a learned framework called 3DMOTFormer that leverages the transformer architecture. The framework uses an Edge-Augmented Graph Transformer to handle frame-by-frame reasoning on track-detection graphs and performs data association through edge classification. To mitigate the gap between training and inference, an innovative online training strategy is proposed. The approach achieves state-of-the-art results on nuScenes validation and test data using CenterPoint detections. The repository provides installation instructions and data preparation steps for replication. Tracking 3D objects accurately and consistently is crucial for autonomous vehicles, enabling more reliable downstream tasks such as trajectory prediction and motion planning.

Content

Tracking 3D objects accurately and consistently is crucial for autonomous vehicles, enabling more reliable downstream tasks such as trajectory prediction and motion planning. Based on the substantial progress in object detection in recent years, the tracking-by-detection paradigm has become a popular choice due to its simplicity and efficiency. State-of-the-art 3D multi-object tracking (MOT) works typically rely on non-learned model-based algorithms such as Kalman Filter but require many manually tuned parameters. On the other hand, learning-based approaches face the problem of adapting the training to the online setting, leading to inevitable distribution mismatch between training and inference as well as suboptimal performance. In this work, we propose 3DMOTFormer, a learned geometry-based 3D MOT framework building upon the transformer architecture. We use an Edge-Augmented Graph Transformer to reason on the track-detection bipartite graph frame-by-frame and conduct data association via edge classification. To reduce the distribution mismatch between training and inference, we propose a novel online training strategy with autoregressive and recurrent forward pass as well as sequential batch optimization. Using CenterPoint detections, our approach achieves state-of-the-art 71.2% and 68.2% AMOTA on nuScenes validation and test split. In addition, a trained 3DMOTFormer model generalizes well across different object detectors.

Alternatives & Similar Tools

Replicate-AI model GFPGAN can help restore old photos Paid

Replicate – Run open-source machine learning models with a cloud API

Visit →

Free Google Gemini: the best largest and most capable AI model Free

Google Gemini, a multimodal AI by DeepMind, processes text, audio, images, and more. Gemini outperforms in AI benchmarks, is optimized for varied devices, and has been tested for safety and bias, adhering to responsible AI practices.

Visit →

LongLLaMA-handle very long text contexts, up to 256,000 tokens Open Source

LongLLaMA is a large language model designed to handle very long text contexts, up to 256,000 tokens. It's based on OpenLLaMA and uses a technique called Focused Transformer (FoT) for training. The repository provides a smaller 3B version of LongLLaMA for free use. It can also be used as a replacement for LLaMA models with shorter contexts.

Visit →

LAMA: Human motion data to realistic complex 3D model actions Open Source

LAMA utilizes a reinforcement learning framework combined with a motion matching algorithm. Reinforcement learning helps the model make appropriate decisions in various scenarios, while motion matching algorithms ensure that synthesized actions match real human actions. In addition, LAMA also utilizes the motion editing framework of manifold learning to cover various possible changes in interactions and operations.

Visit →

Video ReTalking-focuses on audio-based lip synchronization for talking head video editing Open Source

Video ReTalking, advanced real-world talking head video according to input audio, producing a high-quality

Visit →

UniSim-Chat Control Video and Virtual simulation Open Source

Then transplant it to the real world to solve complex problems

Visit →

Compare 3DMOTFormer: Graph Transformer for Online 3D Multi-Object Tracking

Quick compare routes for nearby alternatives.

All compare routes →

3DMOTFormer: Graph Transformer for Online 3D Multi-Object Tracking vs Replicate-AI model GFPGAN can help restore old photos

Compare 3DMOTFormer: Graph Transformer for Online 3D Multi-Object Tracking with Replicate-AI model GFPGAN can help restore old photos and jump into the preserved compare route.

Open compare route →

3DMOTFormer: Graph Transformer for Online 3D Multi-Object Tracking vs Free Google Gemini: the best largest and most capable AI model

Compare 3DMOTFormer: Graph Transformer for Online 3D Multi-Object Tracking with Free Google Gemini: the best largest and most capable AI model and jump into the preserved compare route.

Open compare route →

3DMOTFormer: Graph Transformer for Online 3D Multi-Object Tracking vs LongLLaMA-handle very long text contexts, up to 256,000 tokens

Compare 3DMOTFormer: Graph Transformer for Online 3D Multi-Object Tracking with LongLLaMA-handle very long text contexts, up to 256,000 tokens and jump into the preserved compare route.

Open compare route →