Skip to content

hao-ai-lab/FastVideo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

832 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

| Documentation | Quick Start | Weekly Dev Meeting | 🟣💬 Slack | 🟣💬 WeChat |

FastVideo is a unified post-training and real-time inference framework for accelerated video generation.

NEWS

More News

Key Features

FastVideo has the following features:

  • End-to-end post-training support for bidirectional and autoregressive models:
    • Support full finetuning and LoRA finetuning for state-of-the-art open video DiTs
    • Data preprocessing pipeline for video, image, and text data
    • Distribution Matching Distillation (DMD2) stepwise distillation.
    • Sparse attention with Video Sparse Attention
    • Sparse distillation to achieve >50x denoising speedup
    • Scalable training with FSDP2, sequence parallelism, and selective activation checkpointing.
    • Causal distillation through Self-Forcing
    • See this page for full list of supported models and recipes.
  • State-of-the-art performance optimizations for inference
    • Sequence Parallelism for distributed inference
    • Multiple state-of-the-art attention backends
    • User-friendly CLI and Python API
    • See this page for full list of supported optimizations.
  • Diverse hardware and OS support
    • Support H100, A100, 4090
    • Support Linux, Windows, MacOS
    • See this page for full list of supported models, hardware assumptions, and optimization compatibility.
  • Realtime video generation & editing
    • Dreamverse: stream and "vibe direct" video in realtime (live demo), deployable on local GPU, a self-hosted B200 server, Docker, or serverless Modal

Getting Started

We recommend using uv to create a clean environment. If you previously used Conda, switching to uv generally gives faster and more stable installs.

# Create and activate a new uv environment
uv venv --python 3.12 --seed
source .venv/bin/activate

# Install FastVideo on NVIDIA CUDA 12
UV_TORCH_BACKEND=cu126 uv pip install fastvideo

Use UV_TORCH_BACKEND=cu130 on CUDA 13. Apple silicon users should follow the MPS installation guide.

Please see our docs for more detailed installation instructions.

On an NVIDIA DGX Spark (GB10 / ARM64 + CUDA 13)? There's no prebuilt ARM wheel for the FastVideo CUDA kernel, so it's an editable from-source install (UV_TORCH_BACKEND=cu130 uv pip install -e ., which compiles that kernel for you) rather than UV_TORCH_BACKEND=cu130 uv pip install fastvideo. A compatible prebuilt ARM64 FlashAttention wheel is available separately. Follow the DGX Spark install guide.

Install with an AI coding agent

FastVideo is a monorepo with rich agent guidance (see AGENTS.md). If you use Claude Code, Cursor, or another coding agent, paste the prompt below — it detects your platform and follows the matching guide:

Install FastVideo (https://github.com/hao-ai-lab/FastVideo) into a fresh uv virtual environment.

1. Detect the platform: run `uname -m`, `nvidia-smi`, and `nvcc --version`.
2. Read and follow the matching install guide exactly (in this repo, or at
   https://hao-ai-lab.github.io/FastVideo/getting_started/installation/):
     - NVIDIA GPU, x86_64                         -> docs/getting_started/installation/gpu.md
     - NVIDIA DGX Spark / GB10, aarch64, CUDA 13  -> docs/getting_started/installation/spark.md
     - Apple Silicon, macOS                       -> docs/getting_started/installation/mps.md
3. Use uv for every step. If a command fails, debug it and tell me what you changed.
4. Verify the result:
     python -c "import fastvideo, torch; print('cuda', torch.cuda.is_available())"
     fastvideo --help
5. Report which platform you detected and any deviations you had to make.

Sparse Distillation

For our sparse distillation techniques, please see our distillation docs and check out our blog.

See below for recipes and datasets:

Model Sparse Distillation Dataset
FastWan2.1-T2V-1.3B Recipe FastVideo Synthetic Wan2.1 480P
FastWan2.2-TI2V-5B Recipe FastVideo Synthetic Wan2.2 720P

Dreamverse — Realtime Video Generation & Editing

Dreamverse is FastVideo's realtime video generation and editing platform — "vibe directing" a video as it streams. It lives in the monorepo under apps/dreamverse/ and ships its own backend (dreamverse-server) plus a web UI.

Try the live demo, read the blog, or run it yourself. Dreamverse deploys on a local GPU, a self-hosted B200 server over SSH, Docker, or serverless Modal — see the Dreamverse README.

Inference

Generating Your First Video

Here's a minimal example to generate a video using the default settings. Make sure VSA kernels are installed. Create a file called example.py with the following code:

import os
from fastvideo import VideoGenerator

def main():
    os.environ["FASTVIDEO_ATTENTION_BACKEND"] = "VIDEO_SPARSE_ATTN"

    # Create a video generator with a pre-trained model
    generator = VideoGenerator.from_pretrained(
        "FastVideo/FastWan2.1-T2V-1.3B-Diffusers",
        num_gpus=1,  # Adjust based on your hardware
    )

    # Define a prompt for your video
    prompt = "A curious raccoon peers through a vibrant field of yellow sunflowers, its eyes wide with interest."

    # Generate the video
    video = generator.generate_video(
        prompt,
        output_path="my_videos/",  # Controls where videos are saved
        save_video=True
    )

if __name__ == '__main__':
    main()

Run the script with:

python example.py

For a more detailed guide, please see our inference quick start.

More Guides

Awesome work using FastVideo or our research projects

  • SGLang: SGLang's diffusion inference functionality is based on a fork of FastVideo on Sept. 24, 2025.
  • DanceGRPO: A unified framework to adapt Group Relative Policy Optimization (GRPO) to visual generation paradigms. Code based on FastVideo.
  • SRPO: A method to directly align the full diffusion trajectory with fine-grained human preference. Code based on FastVideo.
  • DCM: Dual-expert consistency model for efficient and high-quality video generation. Code based on FastVideo.
  • HY-WorldPlay: An action-conditioned world model model trained using FastVideo framework.
  • Hunyuan Video 1.5: A leading lightweight video generation model, where they proposed SSTA based on Sliding Tile Attention.
  • Kandinsky-5.0: A family of diffusion models for video & image generation, where their NABLA attention includes a Sliding Tile Attention branch.
  • LongCat Video: A foundational video generation model with 13.6B parameters with block-sparse attention similar to Video Sparse Attention.

🤝 Contributing

We welcome all contributions. Please check out our guide here. See details in development roadmap.

Acknowledgement

We learned the design and reused code from the following projects: Wan-Video, ThunderKittens, DMD2, diffusers, xDiT, vLLM, SGLang. We thank MBZUAI, Anyscale, and GMI Cloud for their support throughout this project.

Citation

If you find FastVideo useful, please consider citing our research work:

@article{zhang2025vsa,
  title={Vsa: Faster video diffusion with trainable sparse attention},
  author={Zhang, Peiyuan and Chen, Yongqi and Huang, Haofeng and Lin, Will and Liu, Zhengzhong and Stoica, Ion and Xing, Eric and Zhang, Hao},
  journal={arXiv preprint arXiv:2505.13389},
  year={2025}
}

@article{zhang2025fast,
  title={Fast video generation with sliding tile attention},
  author={Zhang, Peiyuan and Chen, Yongqi and Su, Runlong and Ding, Hangliang and Stoica, Ion and Liu, Zhengzhong and Zhang, Hao},
  journal={arXiv preprint arXiv:2502.04507},
  year={2025}
}