AI Video Generation with Wan 2.2 in ComfyUI

Wan 2.2 is the latest generation of video foundation models from Wan-AI, offering consumer-friendly video generation right in ComfyUI. Unlike LTX-2’s massive 19B parameter model requiring enterprise GPUs, Wan 2.2’s 5B variant runs on just 12GB VRAM—making it accessible to anyone with a mid-range graphics card.

Why Wan 2.2?

Wan 2.2 represents a significant leap forward in accessible AI video generation:

Consumer-friendly: The 5B model runs on 12GB VRAM (RTX 3060/4070 Ti)
Unified model: One 5B model handles both text-to-video AND image-to-video
Quality options: 14B models available for higher quality output
Native ComfyUI integration: Direct support without custom nodes
Visual text: Generate Chinese and English text in videos

Model Variants

Model	Parameters	Task	VRAM	Best For
ti2v_5B	5B	T2V + I2V	12GB	Consumers, fast iteration
t2v_14B	14B	Text-to-Video	24GB+	High-quality output
i2v_14B	14B	Image-to-Video	24GB+	Animation from images

Setup Guide

Prerequisites

Before installing Wan 2.2, ensure your system meets these requirements:

Minimum (5B model):

GPU: NVIDIA RTX 3060 12GB or equivalent
VRAM: 12GB
RAM: 16GB
Storage: 30GB for models
Python: 3.10+

Recommended (14B models):

GPU: RTX 4090 / RTX 5090 / A100
VRAM: 24GB+
RAM: 32GB+
Storage: 50GB+

Step 1: Install ComfyUI

If you haven’t installed ComfyUI yet:

git clone https://github.com/comfyanonymous/ComfyUI
cd ComfyUI
pip install -r requirements.txt

Step 2: Download Model Files

Wan 2.2 requires several components. Here’s how to download them:

Text Encoder (Required)

# Create text_encoders directory
mkdir -p models/text_encoders

# Download UMT5 text encoder
huggingface-cli download Comfy-Org/Wan_2.2_ComfyUI_repackaged \
  split_files/text_encoders/umt5_xxl_fp8_e4m3fn_scaled.safetensors \
  --local-dir .

The text encoder goes in: ComfyUI/models/text_encoders/

For 5B Model (Recommended)

# Download 5B diffusion model
huggingface-cli download Comfy-Org/Wan_2.2_ComfyUI_repackaged \
  split_files/diffusion_models/wan2.2_ti2v_5B_fp16.safetensors \
  --local-dir .

# Download 5B VAE
huggingface-cli download Comfy-Org/Wan_2.2_ComfyUI_repackaged \
  split_files/vae/wan2.2_vae.safetensors \
  --local-dir .

Files go in:

Diffusion model: ComfyUI/models/diffusion_models/
VAE: ComfyUI/models/vae/

For 14B Models (High Quality)

The 14B models use a two-stage approach with high and low noise schedulers:

# Download T2V 14B models
huggingface-cli download Comfy-Org/Wan_2.2_ComfyUI_repackaged \
  split_files/diffusion_models/wan2.2_t2v_high_noise_14B_fp8_scaled.safetensors \
  --local-dir .

huggingface-cli download Comfy-Org/Wan_2.2_ComfyUI_repackaged \
  split_files/diffusion_models/wan2.2_t2v_low_noise_14B_fp8_scaled.safetensors \
  --local-dir .

Note: 14B models use wan_2.1_vae.safetensors, not the 2.2 VAE.

Step 3: Download Workflows

Grab the official ComfyUI workflow JSON files:

Load these in ComfyUI via Load Default → Choose File.

Generating Your First Video

Text-to-Video Workflow

Load the T2V workflow JSON in ComfyUI
Find the WanTextToVideoCheckpointLoader node
Select your model: wan2.2_ti2v_5B_fp16

Set your prompt in the Text Encode node:

A cat playing piano in a sunlit room,
soft afternoon light, cinematic, 4K

Adjust settings:
- Width: 832 (480P) or 1280 (720P)
- Height: 480 or 720
- Frames: 33 (about 1.3 seconds at 24fps)
- CFG: 6.0
Click Queue Prompt

Image-to-Video Workflow

Load the I2V workflow JSON
Connect your input image to the Load Image node

Write a motion prompt:

The character walks forward,
turning to look at the camera,
natural movement

Adjust frames and CFG as needed
Generate

Hardware Optimization

Memory-Saving Techniques

If you’re running into OOM (Out of Memory) errors:

Use FP8 Variants

FP8 models use ~40% less VRAM with minimal quality loss:

# Instead of FP16
wan2.2_ti2v_5B_fp16.safetensors

# Use FP8
wan2.2_ti2v_5B_fp8_scaled.safetensors

CPU Offloading

For systems with limited VRAM, offload the text encoder:

# In ComfyUI settings or launch args
--lowvram --cpu

Reduce Frame Count

Lower the frame count for faster generation:

Resolution	Frames	Duration	VRAM
480P	33	1.3s	Lower
480P	81	3.4s	Medium
720P	33	1.3s	Medium
720P	81	3.4s	Higher

Quality vs Speed

Format	Quality	Speed	File Size
FP16	Best	Slowest	Largest
BF16	High	Medium	Large
FP8 Scaled	Good	Fast	Medium
FP8 E4M3FN	Acceptable	Fastest	Smallest

Prompt Engineering

Structure Your Prompts

Use this formula for best results:

[Subject] + [Action/Motion] + [Environment] + [Lighting] + [Quality/Style]

Example Prompts

Cinematic:

A vintage car driving through a neon-lit Tokyo street at night,
rain reflections on wet pavement, Blade Runner aesthetic,
cinematic lighting, 4K, photorealistic

Animation:

An animated character waving at the camera,
bright studio lighting, Pixar style,
vibrant colors, smooth motion

Nature:

Ocean waves crashing on rocky cliffs during golden hour,
spray catching the light, slow motion,
cinematic, 4K, nature documentary

Negative Prompts

Wan doesn’t use traditional negative prompts, but you can improve results by being specific about what you DO want, avoiding ambiguous descriptions.

Comparing Wan 2.2 to Other Models

Wan 2.2 vs LTX-2

Feature	Wan 2.2 5B	Wan 2.2 14B	LTX-2 19B
Parameters	5B	14B	19B
VRAM (min)	12GB	24GB	32GB+
Resolution	480-720P	480-720P	Native 4K
Audio Sync	❌ No	❌ No	✅ Yes
Consumer GPU	✅ Yes	⚠️ High-end	❌ No
Unified Model	✅ Yes	❌ Separate	❌ Separate

Winner for consumers: Wan 2.2 5B runs on a 12GB card, while LTX-2 requires enterprise hardware.

Winner for quality: LTX-2 at 4K with synchronized audio, if you have the hardware.

Wan 2.2 vs Wan 2.1

Wan 2.2 introduces:

Unified 5B model (both T2V and I2V)
Improved quality over 2.1’s 1.3B
Better motion consistency
New VAE architecture for 5B variant

Troubleshooting

Out of Memory Errors

Problem: CUDA out of memory during generation

Solutions:

Switch to FP8 model variants
Reduce frame count
Enable CPU offload
Close other GPU applications

Slow Generation

Problem: Video takes 10+ minutes to generate

Solutions:

Use 5B model instead of 14B
Reduce resolution to 480P
Reduce frame count
Check GPU utilization—is it actually being used?

Poor Quality Output

Problem: Generated videos look blurry or have artifacts

Solutions:

Use FP16 instead of FP8
Increase CFG value (try 6-8)
Improve prompt clarity
Use 14B model for better quality

Resources

Official GitHub: Wan-Video/Wan2.2
ComfyUI Examples: Wan 2.2 Workflows
Model Downloads: Hugging Face
Discord: Wan-AI Community

Wan 2.2 democratizes AI video generation by making it accessible to consumer hardware. Start with the 5B unified model for quick iterations, then scale up to the 14B variants when you need higher quality output. The native ComfyUI integration makes it easy to experiment with different prompts and settings without leaving your workflow.

AI Video Generation with Wan 2.2 in ComfyUI

AI Video Generation with Wan 2.2 in ComfyUI

Why Wan 2.2?

Model Variants

Setup Guide

Prerequisites

Step 1: Install ComfyUI

Step 2: Download Model Files

Text Encoder (Required)

For 5B Model (Recommended)

For 14B Models (High Quality)

Step 3: Download Workflows

Generating Your First Video

Text-to-Video Workflow

Image-to-Video Workflow

Hardware Optimization

Memory-Saving Techniques

Use FP8 Variants

CPU Offloading

Reduce Frame Count

Quality vs Speed

Prompt Engineering

Structure Your Prompts

Example Prompts

Negative Prompts

Comparing Wan 2.2 to Other Models

Wan 2.2 vs LTX-2

Wan 2.2 vs Wan 2.1

Troubleshooting

Out of Memory Errors

Slow Generation

Poor Quality Output

Resources

Anthony Lattanzio

Comments

AI Video Generation with Wan 2.2 in ComfyUI

Why Wan 2.2?

Model Variants

Setup Guide

Prerequisites

Step 1: Install ComfyUI

Step 2: Download Model Files

Text Encoder (Required)

For 5B Model (Recommended)

For 14B Models (High Quality)

Step 3: Download Workflows

Generating Your First Video

Text-to-Video Workflow

Image-to-Video Workflow

Hardware Optimization

Memory-Saving Techniques

Use FP8 Variants

CPU Offloading

Reduce Frame Count

Quality vs Speed

Prompt Engineering

Structure Your Prompts

Example Prompts

Negative Prompts

Comparing Wan 2.2 to Other Models

Wan 2.2 vs LTX-2

Wan 2.2 vs Wan 2.1

Troubleshooting

Out of Memory Errors

Slow Generation

Poor Quality Output

Resources

Get Early Access

Anthony Lattanzio

Comments