Text-to-Video: a Two-stage Framework for Zero-shot Identity-agnostic Talking-head Generation logo

Text-to-Video: a Two-stage Framework for Zero-shot Identity-agnostic Talking-head Generation

In the second stage, an audio-driven talking head generation method is employed to produce compelling videos privided the audio generated in the first stage.

GitHub Link

The GitHub link is https://github.com/zhichaowang970201/text-to-video

Introduce

This GitHub repository, titled "Text-to-Video," presents a two-stage framework for generating talking-head videos without requiring the person's identity. It includes various components like Text-to-Speech models (Tacotron, VITS, YourTTS, Tortoise), Audio-driven Talking Head Generation methods (Audio2Head, StyleHEAT, SadTalker), and VideoRetalking. The repository provides links to the code and assets for these models, facilitating research and development in zero-shot identity-agnostic talking-head generation. In the second stage, an audio-driven talking head generation method is employed to produce compelling videos privided the audio generated in the first stage.

Content

KDD workshop: Text-to-Video: a Two-stage Framework for Zero-shot Identity-agnostic Talking-head Generation

Alternatives & Similar Tools

LongLLaMA-handle very long text contexts, up to 256,000 tokens logo

LongLLaMA is a large language model designed to handle very long text contexts, up to 256,000 tokens. It's based on OpenLLaMA and uses a technique called Focused Transformer (FoT) for training. The repository provides a smaller 3B version of LongLLaMA for free use. It can also be used as a replacement for LLaMA models with shorter contexts.