| Documentation | Quick Start | Weekly Dev Meeting | 🟣💬 Slack | 🟣💬 WeChat |
FastVideo is a unified post-training and real-time inference framework for accelerated video generation.
2026/06/23: Release FastWan-QAD: 5s of Video generated in 1.8s E2E. FastWan-QAD models, check out the Blog.2026/03/17: Release demo: Into the Dreamverse: Vibe Directing in FastVideo, check out the Blog.2026/03/13: Release demo: Create a 5s 1080p Video in 4.5s with FastVideo on a Single GPU, check out the Blog.2025/11/19: Release CausalWan2.2 I2V A14B Preview models, Blog and Inference Code!.2025/08/04: Release FastWan models and Sparse-Distillation.
2025/06/14: Release finetuning and inference code for VSA.2025/04/24: FastVideo V1 is released!2025/02/18: Release the inference code for Sliding Tile Attention.
FastVideo has the following features:
- End-to-end post-training support for bidirectional and autoregressive models:
- Support full finetuning and LoRA finetuning for state-of-the-art open video DiTs
- Data preprocessing pipeline for video, image, and text data
- Distribution Matching Distillation (DMD2) stepwise distillation.
- Sparse attention with Video Sparse Attention
- Sparse distillation to achieve >50x denoising speedup
- Scalable training with FSDP2, sequence parallelism, and selective activation checkpointing.
- Causal distillation through Self-Forcing
- See this page for full list of supported models and recipes.
- State-of-the-art performance optimizations for inference
- Sequence Parallelism for distributed inference
- Multiple state-of-the-art attention backends
- User-friendly CLI and Python API
- See this page for full list of supported optimizations.
- Diverse hardware and OS support
- Support H100, A100, 4090
- Support Linux, Windows, MacOS
- See this page for full list of supported models, hardware assumptions, and optimization compatibility.
- Realtime video generation & editing
- Dreamverse: stream and "vibe direct" video in realtime (live demo), deployable on local GPU, a self-hosted B200 server, Docker, or serverless Modal
We recommend using uv to create a clean environment. If you previously used Conda, switching to uv generally gives faster and more stable installs.
# Create and activate a new uv environment
uv venv --python 3.12 --seed
source .venv/bin/activate
# Install FastVideo on NVIDIA CUDA 12
UV_TORCH_BACKEND=cu126 uv pip install fastvideoUse UV_TORCH_BACKEND=cu130 on CUDA 13. Apple silicon users should follow the
MPS installation guide.
Please see our docs for more detailed installation instructions.
On an NVIDIA DGX Spark (GB10 / ARM64 + CUDA 13)? There's no prebuilt ARM wheel for the FastVideo CUDA kernel, so it's an editable from-source install (
UV_TORCH_BACKEND=cu130 uv pip install -e ., which compiles that kernel for you) rather thanUV_TORCH_BACKEND=cu130 uv pip install fastvideo. A compatible prebuilt ARM64 FlashAttention wheel is available separately. Follow the DGX Spark install guide.
FastVideo is a monorepo with rich agent guidance (see AGENTS.md). If you use Claude Code, Cursor, or another coding agent, paste the prompt below — it detects your platform and follows the matching guide:
Install FastVideo (https://github.com/hao-ai-lab/FastVideo) into a fresh uv virtual environment.
1. Detect the platform: run `uname -m`, `nvidia-smi`, and `nvcc --version`.
2. Read and follow the matching install guide exactly (in this repo, or at
https://hao-ai-lab.github.io/FastVideo/getting_started/installation/):
- NVIDIA GPU, x86_64 -> docs/getting_started/installation/gpu.md
- NVIDIA DGX Spark / GB10, aarch64, CUDA 13 -> docs/getting_started/installation/spark.md
- Apple Silicon, macOS -> docs/getting_started/installation/mps.md
3. Use uv for every step. If a command fails, debug it and tell me what you changed.
4. Verify the result:
python -c "import fastvideo, torch; print('cuda', torch.cuda.is_available())"
fastvideo --help
5. Report which platform you detected and any deviations you had to make.
For our sparse distillation techniques, please see our distillation docs and check out our blog.
See below for recipes and datasets:
| Model | Sparse Distillation | Dataset |
|---|---|---|
| FastWan2.1-T2V-1.3B | Recipe | FastVideo Synthetic Wan2.1 480P |
| FastWan2.2-TI2V-5B | Recipe | FastVideo Synthetic Wan2.2 720P |
Dreamverse is FastVideo's realtime video generation
and editing platform — "vibe directing" a video as it streams. It lives in the
monorepo under apps/dreamverse/ and ships its own backend
(dreamverse-server) plus a web UI.
Try the live demo, read the blog, or run it yourself. Dreamverse deploys on a local GPU, a self-hosted B200 server over SSH, Docker, or serverless Modal — see the Dreamverse README.
Here's a minimal example to generate a video using the default settings. Make sure VSA kernels are installed. Create a file called example.py with the following code:
import os
from fastvideo import VideoGenerator
def main():
os.environ["FASTVIDEO_ATTENTION_BACKEND"] = "VIDEO_SPARSE_ATTN"
# Create a video generator with a pre-trained model
generator = VideoGenerator.from_pretrained(
"FastVideo/FastWan2.1-T2V-1.3B-Diffusers",
num_gpus=1, # Adjust based on your hardware
)
# Define a prompt for your video
prompt = "A curious raccoon peers through a vibrant field of yellow sunflowers, its eyes wide with interest."
# Generate the video
video = generator.generate_video(
prompt,
output_path="my_videos/", # Controls where videos are saved
save_video=True
)
if __name__ == '__main__':
main()Run the script with:
python example.pyFor a more detailed guide, please see our inference quick start.
- SGLang: SGLang's diffusion inference functionality is based on a fork of FastVideo on Sept. 24, 2025.
- DanceGRPO: A unified framework to adapt Group Relative Policy Optimization (GRPO) to visual generation paradigms. Code based on FastVideo.
- SRPO: A method to directly align the full diffusion trajectory with fine-grained human preference. Code based on FastVideo.
- DCM: Dual-expert consistency model for efficient and high-quality video generation. Code based on FastVideo.
- HY-WorldPlay: An action-conditioned world model model trained using FastVideo framework.
- Hunyuan Video 1.5: A leading lightweight video generation model, where they proposed SSTA based on Sliding Tile Attention.
- Kandinsky-5.0: A family of diffusion models for video & image generation, where their NABLA attention includes a Sliding Tile Attention branch.
- LongCat Video: A foundational video generation model with 13.6B parameters with block-sparse attention similar to Video Sparse Attention.
We welcome all contributions. Please check out our guide here. See details in development roadmap.
We learned the design and reused code from the following projects: Wan-Video, ThunderKittens, DMD2, diffusers, xDiT, vLLM, SGLang. We thank MBZUAI, Anyscale, and GMI Cloud for their support throughout this project.
If you find FastVideo useful, please consider citing our research work:
@article{zhang2025vsa,
title={Vsa: Faster video diffusion with trainable sparse attention},
author={Zhang, Peiyuan and Chen, Yongqi and Huang, Haofeng and Lin, Will and Liu, Zhengzhong and Stoica, Ion and Xing, Eric and Zhang, Hao},
journal={arXiv preprint arXiv:2505.13389},
year={2025}
}
@article{zhang2025fast,
title={Fast video generation with sliding tile attention},
author={Zhang, Peiyuan and Chen, Yongqi and Su, Runlong and Ding, Hangliang and Stoica, Ion and Liu, Zhengzhong and Zhang, Hao},
journal={arXiv preprint arXiv:2502.04507},
year={2025}
}