AI Tool Profile

DFM-X: Augmentation by Leveraging Prior Knowledge of Shortcut Learning

We propose a data augmentation strategy, named DFM-X, that leverages knowledge about frequency shortcuts, encoded in Dominant Frequencies Maps computed for image classification models.

Paper and LLMs Data Augmentation Image Classification

Website

github.com

Pricing model

Free

Price start

Free

GitHub Link

The GitHub link is https://github.com/nis-research/dfmx-augmentation

Introduce

The 'DFM-X Augmentation by Leveraging Prior Knowledge of Shortcut Learning' is a data augmentation strategy designed to improve the generalization and robustness of neural network models. It introduces DFM-X, a technique that uses Dominant Frequencies Maps (DFMs) to select training images for augmentation. By including frequencies from other classes' DFMs, the models are encouraged to learn deeper and task-related semantics, reducing shortcut learning. This method enhances model robustness against corruptions and adversarial attacks, and can be combined with other augmentation techniques. The code repository provides installation instructions and scripts for computing DFMs, evaluating model performance, and training with DFM-X. The strategy is presented in the context of the International Conference on Computer Vision Workshops (ICCVW) by Shunxin Wang, Christoph Brune, Raymond Veldhuis, and Nicola Strisciuglio. We propose a data augmentation strategy, named DFM-X, that leverages knowledge about frequency shortcuts, encoded in Dominant Frequencies Maps computed for image classification models.

Content

Neural networks are prone to learn easy solutions from superficial statistics in the data, namely shortcut learning, which impairs generalization and robustness of models. We propose a data augmentation strategy, named DFM-X, that leverages knowledge about frequency shortcuts, encoded in Dominant Frequencies Maps computed for image classification models. We randomly select X% training images of certain classes for augmentation, and process them by retaining the frequencies included in the DFMs of other classes. This strategy compels the models to leverage a broader range of frequencies for classification, rather than relying on specific frequency sets. Thus, the models learn more deep and task-related semantics compared to their counterpart trained with standard setups. Unlike other commonly used augmentation techniques which focus on increasing the visual variations of training data, our method targets exploiting the original data efficiently, by distilling prior knowledge about destructive learning behavior of models from data. Our experimental results demonstrate that DFM-X improves robustness against common corruptions and adversarial attacks. It can be seamlessly integrated with other augmentation techniques to further enhance the robustness of models.

Alternatives & Similar Tools

Replicate-AI model GFPGAN can help restore old photos Paid

Replicate – Run open-source machine learning models with a cloud API

Visit →

Free Google Gemini: the best largest and most capable AI model Free

Google Gemini, a multimodal AI by DeepMind, processes text, audio, images, and more. Gemini outperforms in AI benchmarks, is optimized for varied devices, and has been tested for safety and bias, adhering to responsible AI practices.

Visit →

LongLLaMA-handle very long text contexts, up to 256,000 tokens Open Source

LongLLaMA is a large language model designed to handle very long text contexts, up to 256,000 tokens. It's based on OpenLLaMA and uses a technique called Focused Transformer (FoT) for training. The repository provides a smaller 3B version of LongLLaMA for free use. It can also be used as a replacement for LLaMA models with shorter contexts.

Visit →

LAMA: Human motion data to realistic complex 3D model actions Open Source

LAMA utilizes a reinforcement learning framework combined with a motion matching algorithm. Reinforcement learning helps the model make appropriate decisions in various scenarios, while motion matching algorithms ensure that synthesized actions match real human actions. In addition, LAMA also utilizes the motion editing framework of manifold learning to cover various possible changes in interactions and operations.

Visit →

Video ReTalking-focuses on audio-based lip synchronization for talking head video editing Open Source

Video ReTalking, advanced real-world talking head video according to input audio, producing a high-quality

Visit →

UniSim-Chat Control Video and Virtual simulation Open Source

Then transplant it to the real world to solve complex problems

Visit →

Compare DFM-X: Augmentation by Leveraging Prior Knowledge of Shortcut Learning

Quick compare routes for nearby alternatives.

All compare routes →

DFM-X: Augmentation by Leveraging Prior Knowledge of Shortcut Learning vs Replicate-AI model GFPGAN can help restore old photos

Compare DFM-X: Augmentation by Leveraging Prior Knowledge of Shortcut Learning with Replicate-AI model GFPGAN can help restore old photos and jump into the preserved compare route.

Open compare route →

DFM-X: Augmentation by Leveraging Prior Knowledge of Shortcut Learning vs Free Google Gemini: the best largest and most capable AI model

Compare DFM-X: Augmentation by Leveraging Prior Knowledge of Shortcut Learning with Free Google Gemini: the best largest and most capable AI model and jump into the preserved compare route.

Open compare route →

DFM-X: Augmentation by Leveraging Prior Knowledge of Shortcut Learning vs LongLLaMA-handle very long text contexts, up to 256,000 tokens

Compare DFM-X: Augmentation by Leveraging Prior Knowledge of Shortcut Learning with LongLLaMA-handle very long text contexts, up to 256,000 tokens and jump into the preserved compare route.

Open compare route →