Skill v1.0.1
currentAutomated scan100/1003 files
version: "1.0.1" name: vllm-omni-video-gen description: Generate videos with vLLM-Omni using Wan2.2 and other video generation models. Use when generating videos from text, creating videos from images, configuring video generation parameters, or working with text-to-video or image-to-video models.
vLLM-Omni Video Generation
Overview
vLLM-Omni supports video generation through diffusion transformer models, primarily the Wan2.2 family. Three modes are supported: text-to-video (T2V), image-to-video (I2V), and text+image-to-video (TI2V).
Supported Video Models
| Model | HF ID | Mode | Min VRAM | |
|---|---|---|---|---|
| Wan2.2-T2V-A14B | Wan-AI/Wan2.2-T2V-A14B-Diffusers | Text-to-video | 48 GB | |
| Wan2.2-TI2V-5B | Wan-AI/Wan2.2-TI2V-5B-Diffusers | Text+Image-to-video | 24 GB | |
| Wan2.2-I2V-A14B | Wan-AI/Wan2.2-I2V-A14B-Diffusers | Image-to-video | 48 GB | |
| NextStep-1.1 | stepfun-ai/NextStep-1.1 | Text-to-video | 24 GB | |
| Helios-Distilled | naver-ai/Helios-Distilled | Text-to-video | 24 GB | |
| daVinci-MagiHuman | SII-GAIR/daVinci-MagiHuman-Base-1080p | Image-to-video + audio | 24 GB |
daVinci-MagiHuman is an image-to-video model that also generates audio (44100 Hz, 25 fps). Use --enable-diffusion-pipeline-profiler to get per-stage timing (stage_durations) and peak memory (peak_memory_mb) in video responses (async poll JSON or sync HTTP headers).
Quick Start: Text-to-Video
Offline
from vllm_omni.entrypoints.omni import Omniomni = Omni(model="Wan-AI/Wan2.2-T2V-A14B-Diffusers")outputs = omni.generate("A dog running on a beach at sunset")video = outputs[0].request_output[0].videovideo.save("dog_beach.mp4")
Online API
vllm serve Wan-AI/Wan2.2-T2V-A14B-Diffusers --omni --port 8091curl -s http://localhost:8091/v1/chat/completions \-H "Content-Type: application/json" \-d '{"messages": [{"role": "user", "content": "A dog running on a beach at sunset"}],"extra_body": {"num_inference_steps": 50,"guidance_scale": 5.0,"seed": 42}}'
Image-to-Video
Animate a static image into a video:
from vllm_omni.entrypoints.omni import Omniomni = Omni(model="Wan-AI/Wan2.2-I2V-A14B-Diffusers")outputs = omni.generate(prompt="The person starts walking forward",images=["portrait.jpg"],)outputs[0].request_output[0].video.save("animated.mp4")
Text+Image-to-Video (TI2V)
Combine a text description and reference image:
omni = Omni(model="Wan-AI/Wan2.2-TI2V-5B-Diffusers")outputs = omni.generate(prompt="The city lights up at night with moving traffic",images=["cityscape.jpg"],)outputs[0].request_output[0].video.save("city_night.mp4")
Video Generation Parameters
| Parameter | Description | Typical Range | |
|---|---|---|---|
num_inference_steps | Denoising steps | 30-100 | |
guidance_scale | CFG scale | 3.0-7.0 | |
seed | Random seed | Any integer | |
num_frames | Number of output frames | Model-dependent | |
fps | Frames per second | 8-24 |
Performance Considerations
Video generation is significantly more compute-intensive than image generation:
- A single video may take 2-10 minutes on a single GPU
- Multi-GPU tensor parallelism strongly recommended for 14B models
- Multi-thread weight loading (enabled by default) significantly reduces cold-start time for Wan2.2 models
- Enable TeaCache for diffusion acceleration (see vllm-omni-perf skill)
- CPU offloading can help fit larger models:
``bash vllm serve <model> --omni --cpu-offload-gb 20 ``
- For multi-transformer pipelines (e.g., Wan2.2-T2V has
transformer+transformer-2), the sequential offloader now offloads all other DiTs to CPU when any one is running. This allows Wan2.2-T2V to fit on 64GB GPUs with--enable-cpu-offload --tensor-parallel-size 2.
Troubleshooting
Generation too slow: Use tensor parallelism or enable TeaCache/Cache-DiT acceleration. Helios supports cache-dit (--enable-cache-dit) for ~20% speedup.
LTX-2 error with diffusers>=0.38.0: Fixed in #3661. Text encoder normalization moved into the diffusers connector. Update vllm-omni to the latest version when upgrading diffusers to 0.38.0+.
Out of memory: Reduce resolution/frame count or use CPU offloading.
Choppy output: Increase num_inference_steps and num_frames.
References
- For Wan2.2 model details and advanced config, see references/wan-models.md