Fixing OOM Errors in ComfyUI with LTX-2.3 and NVIDIA GPUs
Troubleshoot and resolve out-of-memory errors when running LTX-2.3 video generation in ComfyUI on NVIDIA GPUs. Learn memory optimization techniques, precision settings, and workflow configurations.
Table of Contents
- Understanding the OOM Error
- Types of Memory Errors
- Why LTX-2.3 Is Particularly Demanding
- Quick Fixes (Start Here)
- 1. Update ComfyUI and PyTorch
- 2. Add the —lowvram Flag
- 3. Reserve VRAM for Your OS
- 4. Use FP8 Model + Distilled LoRA
- Precision Settings Explained
- FP8 vs FP16 vs BF16
- Complete Precision Flags Reference
- ComfyUI Memory Management Flags
- VRAM Mode Flags
- Advanced Memory Flags
- Recommended Configurations by GPU
- Workflow Optimizations
- Resolution Guidelines
- Frame Count Limits
- Kijai Optimized Workflow
- GGUF Quantized Workflows
- Memory Cleanup Between Generations
- Why torch.cuda.empty_cache() Isn’t Enough
- ComfyUI Memory Cleanup Workflow
- Custom Nodes for Memory Management
- Common OOM Scenarios and Solutions
- Scenario 1: First Generation Works, Second Fails
- Scenario 2: OOM During VAE Decode
- Scenario 3: System Freezes Completely
- Scenario 4: Black Images After Generation
- Troubleshooting Checklist
- Monitoring Your Setup
- Check VRAM Usage
- Find Model Sizes
- ComfyUI Memory Stats
- What’s Coming: NVFP4 Support
- Summary
Fixing OOM Errors in ComfyUI with LTX-2.3 and NVIDIA GPUs
You’ve just installed LTX-2.3, loaded up the workflow, hit Queue Prompt, and then—disaster. The dreaded RuntimeError: CUDA out of memory error crashes your generation. Sound familiar?
LTX-2.3 is a massive 22-billion parameter video model that can consume up to 46GB of VRAM at full precision. Even the optimized FP8 version requires at least 23-30GB. But here’s the good news: with the right configuration, you can run LTX-2.3 on as little as 6GB of VRAM.
This guide walks through every OOM troubleshooting step, from quick fixes to advanced optimizations, specifically for NVIDIA GPUs running LTX-2.3 in ComfyUI.
Understanding the OOM Error
Types of Memory Errors
CUDA Out of Memory is the most common error you’ll encounter:
RuntimeError: CUDA out of memory. Tried to allocate 12.50 GiB
This means your GPU’s VRAM is exhausted. But there are actually three distinct memory issues:
- VRAM Exhaustion — GPU memory is completely filled with model weights, intermediate tensors, and cached data
- System RAM Issues — ComfyUI offloads to system RAM when VRAM runs out, which can crash your entire system
- Memory Leaks — Memory isn’t properly released between generations, accumulating until failure
Why LTX-2.3 Is Particularly Demanding
LTX-2.3 introduces several memory-intensive features:
- 22B parameters — One of the largest open video models
- Native 9:16 portrait support — Requires processing different aspect ratios
- 4x larger text connector — Better prompt adherence, but more memory for text encoding
- Improved VAE — Higher quality output but larger intermediate tensors
The baseline FP8 model weighs in at ~30GB. Without optimization, you’d need a 4090 or better just to load it.
Quick Fixes (Start Here)
1. Update ComfyUI and PyTorch
ComfyUI’s recent versions include significant memory optimizations:
cd ComfyUI
git pull
pip install --upgrade torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu128
As of March 2026, ComfyUI has Dynamic VRAM enabled by default, which massively reduces RAM usage and prevents VRAM OOMs.
2. Add the —lowvram Flag
The single most effective quick fix:
python main.py --lowvram
This instructs ComfyUI to:
- Split the UNET model across CPU and GPU
- Offload inactive model weights to system RAM
- Use minimal VRAM for text encoders
For extremely limited VRAM, use --novram (runs entirely on CPU).
3. Reserve VRAM for Your OS
Prevent system instability by reserving VRAM:
python main.py --lowvram --reserve-vram 2
The 2 reserves 2GB for your operating system and other applications.
4. Use FP8 Model + Distilled LoRA
The official LTX-2.3 workflow includes an 8-step distilled LoRA that dramatically reduces generation steps:
- Reduces VRAM from ~46GB to ~23-30GB
- Faster generation (8 steps vs 30+)
- Minimal quality loss
Download the FP8 model and LoRA from the official LTX-Video repository.
Precision Settings Explained
FP8 vs FP16 vs BF16
| Precision | VRAM Usage | Quality | Speed | Supported GPUs |
|---|---|---|---|---|
| FP32 | 100% | Best | Slowest | All |
| FP16 | 50% | Good | Fast | GTX 10-series+ |
| BF16 | 50% | Better | Fast | RTX 30-series+ |
| FP8 | 25% | Good | Fastest | RTX 40-series+ |
FP8 (Recommended for LTX-2.3)
NVIDIA’s FP8 format uses 8 bits instead of 16 or 32, reducing memory by 75% compared to FP32:
python main.py --lowvram --fp8_e4m3fn-unet --fp8_e4m3fn-text-enc
The e4m3fn variant is optimized for inference (4 exponent bits, 3 mantissa bits).
BF16 (Alternative)
If you have an RTX 30-series or newer, BF16 offers FP32-like dynamic range with FP16 memory usage:
python main.py --lowvram --bf16-unet --bf16-vae
Complete Precision Flags Reference
# UNET (diffusion model) precision
--fp8_e4m3fn-unet # Recommended for LTX-2.3
--fp8_e5m2-unet # Alternative FP8 format
--fp16-unet # Standard half precision
--bf16-unet # Best for newer GPUs
# VAE precision
--fp16-vae # Saves VRAM, potential quality loss
--bf16-vae # Better quality than fp16
--fp32-vae # Best quality, highest VRAM
# Text encoder precision
--fp8_e4m3fn-text-enc # Minimal VRAM for text
--fp16-text-enc # Standard precision
ComfyUI Memory Management Flags
VRAM Mode Flags
# Limited VRAM (8-12GB) — Most common for consumer GPUs
python main.py --lowvram
# Very limited VRAM (<8GB) — Uses system RAM heavily
python main.py --novram
# Abundant VRAM (24GB+) — Keeps models in GPU
python main.py --highvram
# Force normal mode (if lowvram auto-enabled incorrectly)
python main.py --normalvram
Advanced Memory Flags
# Disable smart memory (may help with specific OOM issues)
python main.py --disable-smart-memory
# Cache control
--cache-none # No caching, lowest memory usage
--cache-lru 5 # LRU cache, limited items
--cache-classic # Traditional aggressive caching
# Cross-attention methods
--use-split-cross-attention # Lower VRAM, slower
--use-pytorch-cross-attention # Faster, more VRAM
# VAE on CPU (significant VRAM savings, slower)
python main.py --lowvram --cpu-vae
Recommended Configurations by GPU
RTX 3060/3070 (8-12GB VRAM)
python main.py --lowvram --fp8_e4m3fn-unet --fp8_e4m3fn-text-enc --reserve-vram 1 --cpu-vae
RTX 3080/4070 (12-16GB VRAM)
python main.py --lowvram --fp8_e4m3fn-unet --bf16-vae --reserve-vram 2
RTX 4080/4090 (16-24GB VRAM)
python main.py --lowvram --fp8_e4m3fn-unet --bf16-vae
RTX 5090 (32GB+ VRAM)
python main.py --highvram --fp8_e4m3fn-unet --bf16-vae
Workflow Optimizations
Resolution Guidelines
| Resolution | Minimum VRAM | Recommended Config |
|---|---|---|
| 512x512 | 6GB | GGUF Q4 + CPU VAE |
| 768x512 | 12GB | FP8 + lowvram |
| 720x480 | 12GB | FP8 + lowvram |
| 1024x576 | 16GB | FP8 + lowvram |
| 1280x720 | 24GB | FP8 + standard |
Frame Count Limits
LTX-2.3 generates 257 frames by default (8.5 seconds at 30fps). Each frame increases VRAM linearly during VAE decode:
- 257 frames: Full model memory
- 121 frames: ~50% memory reduction
- 50 frames: Suitable for 6GB GPUs
For long videos, process in segments or use a batch manager node.
Kijai Optimized Workflow
The Kijai workflow separates the VAE from the model, reducing VRAM:
- Load LTX-2.3 FP8 model
- VAE runs separately (can offload to CPU)
- Distilled LoRA for 8-step generation
- VRAM reduction: 29GB → 23GB
GGUF Quantized Workflows
QuantStack’s GGUF quantized models shrink memory further:
| Quantization | VRAM Usage | Quality Loss |
|---|---|---|
| Q4 K-means | ~18GB | Moderate |
| Q5 | ~20GB | Minor |
| Q6 | ~22GB | Minimal |
| Q8 | ~28GB | Negligible |
Use with tiled VAE decode for additional memory savings.
Memory Cleanup Between Generations
Why torch.cuda.empty_cache() Isn’t Enough
Many users discover that torch.cuda.empty_cache() doesn’t actually free memory. Here’s why:
# This alone does NOTHING
torch.cuda.empty_cache()
# Proper cleanup requires all three steps
del model # Delete references
del latent
gc.collect() # Python garbage collection
torch.cuda.empty_cache() # Now it works
PyTorch caches allocated memory for reuse. Objects with active references remain in memory even after empty_cache().
ComfyUI Memory Cleanup Workflow
Insert cleanup nodes between stages:
- Free Memory (Model) — After model unload
- Free Memory (Latent) — After latent operations
- Clean VRAM Used — Between major workflow sections
Custom Nodes for Memory Management
Install via ComfyUI Manager:
- ComfyUI-FreeMemory — Free CUDA/system RAM at workflow points
- ComfyUI-MemoryManagement — Smart manager with leak detection
- ComfyUI-MemoryCleaner — Comprehensive cleanup + RAM overflow prevention
Common OOM Scenarios and Solutions
Scenario 1: First Generation Works, Second Fails
Cause: Memory isn’t released between generations
Solution:
# Add to startup
python main.py --lowvram --disable-smart-memory
# In workflow, add Free Memory nodes after VAE decode
Scenario 2: OOM During VAE Decode
Cause: VAE requires significant VRAM for video frames
Solutions:
- Run VAE on CPU:
--cpu-vae - Use tiled VAE decode (process frames in batches)
- Lower resolution before VAE, upscale after
Scenario 3: System Freezes Completely
Cause: System RAM exhaustion from VRAM offloading
Solutions:
- Increase swap file (1.5x your RAM on NVMe SSD)
- Close other applications
- Use
--reserve-vramto prevent complete RAM usage
Scenario 4: Black Images After Generation
Cause: VAE precision issues (FP16 can cause black outputs)
Solution:
python main.py --lowvram --fp8_e4m3fn-unet --fp32-vae
Keep VAE at FP32 even with FP8 model.
Troubleshooting Checklist
Run through this list before posting for help:
- ComfyUI updated to latest git version
- PyTorch updated to CUDA 12.8+
- Using FP8 model + distilled LoRA
-
--lowvramflag added - VRAM reserved for OS (
--reserve-vram 2) - Memory cleared between generations
- VRAM monitored with
nvidia-smi - Swap file increased (1.5x RAM on SSD)
- Other GPU applications closed
- ComfyUI restarted between long sessions
Monitoring Your Setup
Check VRAM Usage
# Real-time monitoring
watch -n 1 nvidia-smi
# Or use Python
python -c "import torch; print(f'VRAM: {torch.cuda.memory_allocated()/1e9:.2f}GB / {torch.cuda.get_device_properties(0).total_memory/1e9:.2f}GB')"
Find Model Sizes
# Check model files
ls -lh models/checkpoints/
ls -lh models/vae/
ls -lh models/loras/
ComfyUI Memory Stats
The ComfyUI interface shows memory usage in the footer. For detailed stats:
# Add to a custom node or script
import torch
import gc
def print_memory():
print(f"CUDA Allocated: {torch.cuda.memory_allocated()/1e9:.2f}GB")
print(f"CUDA Reserved: {torch.cuda.memory_reserved()/1e9:.2f}GB")
print(f"CUDA Max Allocated: {torch.cuda.max_memory_allocated()/1e9:.2f}GB")
torch.cuda.reset_peak_memory_stats()
What’s Coming: NVFP4 Support
NVIDIA has announced NVFP4 support for LTX-2.3, expected in 2026:
- 60% memory reduction compared to FP8
- 2.5x faster generation with optimizations
- Requires Blackwell architecture (RTX 50-series)
If you’re running a 5090, you’ll soon be able to run LTX-2.3 at full resolution without the memory tricks in this guide.
Summary
Running LTX-2.3 on consumer GPUs comes down to three principles:
- Precision reduction — Use FP8 models and text encoders
- Aggressive offloading —
--lowvramand--cpu-vaewhen needed - Memory discipline — Clean up between generations with
gc.collect()+empty_cache()
With these optimizations, even a 12GB RTX 3060 can generate LTX-2.3 videos—albeit at lower resolutions. As NVIDIA continues improving FP8 and NVFP4 support, the memory requirements will only decrease.
For the latest workflows and optimizations, check the LTX-Video GitHub discussions and the ComfyUI community.
Comments
Powered by GitHub Discussions