Fixing OOM Errors in ComfyUI with LTX-2.3 and NVIDIA GPUs

You’ve just installed LTX-2.3, loaded up the workflow, hit Queue Prompt, and then—disaster. The dreaded RuntimeError: CUDA out of memory error crashes your generation. Sound familiar?

LTX-2.3 is a massive 22-billion parameter video model that can consume up to 46GB of VRAM at full precision. Even the optimized FP8 version requires at least 23-30GB. But here’s the good news: with the right configuration, you can run LTX-2.3 on as little as 6GB of VRAM.

This guide walks through every OOM troubleshooting step, from quick fixes to advanced optimizations, specifically for NVIDIA GPUs running LTX-2.3 in ComfyUI.

Understanding the OOM Error

Types of Memory Errors

CUDA Out of Memory is the most common error you’ll encounter:

RuntimeError: CUDA out of memory. Tried to allocate 12.50 GiB

This means your GPU’s VRAM is exhausted. But there are actually three distinct memory issues:

VRAM Exhaustion — GPU memory is completely filled with model weights, intermediate tensors, and cached data
System RAM Issues — ComfyUI offloads to system RAM when VRAM runs out, which can crash your entire system
Memory Leaks — Memory isn’t properly released between generations, accumulating until failure

Why LTX-2.3 Is Particularly Demanding

LTX-2.3 introduces several memory-intensive features:

22B parameters — One of the largest open video models
Native 9:16 portrait support — Requires processing different aspect ratios
4x larger text connector — Better prompt adherence, but more memory for text encoding
Improved VAE — Higher quality output but larger intermediate tensors

The baseline FP8 model weighs in at ~30GB. Without optimization, you’d need a 4090 or better just to load it.

Quick Fixes (Start Here)

1. Update ComfyUI and PyTorch

ComfyUI’s recent versions include significant memory optimizations:

cd ComfyUI
git pull
pip install --upgrade torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu128

As of March 2026, ComfyUI has Dynamic VRAM enabled by default, which massively reduces RAM usage and prevents VRAM OOMs.

2. Add the —lowvram Flag

The single most effective quick fix:

python main.py --lowvram

This instructs ComfyUI to:

Split the UNET model across CPU and GPU
Offload inactive model weights to system RAM
Use minimal VRAM for text encoders

For extremely limited VRAM, use --novram (runs entirely on CPU).

3. Reserve VRAM for Your OS

Prevent system instability by reserving VRAM:

python main.py --lowvram --reserve-vram 2

The 2 reserves 2GB for your operating system and other applications.

4. Use FP8 Model + Distilled LoRA

The official LTX-2.3 workflow includes an 8-step distilled LoRA that dramatically reduces generation steps:

Reduces VRAM from ~46GB to ~23-30GB
Faster generation (8 steps vs 30+)
Minimal quality loss

Download the FP8 model and LoRA from the official LTX-Video repository.

Precision Settings Explained

FP8 vs FP16 vs BF16

Precision	VRAM Usage	Quality	Speed	Supported GPUs
FP32	100%	Best	Slowest	All
FP16	50%	Good	Fast	GTX 10-series+
BF16	50%	Better	Fast	RTX 30-series+
FP8	25%	Good	Fastest	RTX 40-series+

FP8 (Recommended for LTX-2.3)

NVIDIA’s FP8 format uses 8 bits instead of 16 or 32, reducing memory by 75% compared to FP32:

python main.py --lowvram --fp8_e4m3fn-unet --fp8_e4m3fn-text-enc

The e4m3fn variant is optimized for inference (4 exponent bits, 3 mantissa bits).

BF16 (Alternative)

If you have an RTX 30-series or newer, BF16 offers FP32-like dynamic range with FP16 memory usage:

python main.py --lowvram --bf16-unet --bf16-vae

Complete Precision Flags Reference

# UNET (diffusion model) precision
--fp8_e4m3fn-unet    # Recommended for LTX-2.3
--fp8_e5m2-unet      # Alternative FP8 format
--fp16-unet          # Standard half precision
--bf16-unet          # Best for newer GPUs

# VAE precision
--fp16-vae           # Saves VRAM, potential quality loss
--bf16-vae           # Better quality than fp16
--fp32-vae           # Best quality, highest VRAM

# Text encoder precision
--fp8_e4m3fn-text-enc  # Minimal VRAM for text
--fp16-text-enc        # Standard precision

ComfyUI Memory Management Flags

VRAM Mode Flags

# Limited VRAM (8-12GB) — Most common for consumer GPUs
python main.py --lowvram

# Very limited VRAM (<8GB) — Uses system RAM heavily
python main.py --novram

# Abundant VRAM (24GB+) — Keeps models in GPU
python main.py --highvram

# Force normal mode (if lowvram auto-enabled incorrectly)
python main.py --normalvram

Advanced Memory Flags

# Disable smart memory (may help with specific OOM issues)
python main.py --disable-smart-memory

# Cache control
--cache-none          # No caching, lowest memory usage
--cache-lru 5         # LRU cache, limited items
--cache-classic       # Traditional aggressive caching

# Cross-attention methods
--use-split-cross-attention      # Lower VRAM, slower
--use-pytorch-cross-attention    # Faster, more VRAM

# VAE on CPU (significant VRAM savings, slower)
python main.py --lowvram --cpu-vae

Recommended Configurations by GPU

RTX 3060/3070 (8-12GB VRAM)

python main.py --lowvram --fp8_e4m3fn-unet --fp8_e4m3fn-text-enc --reserve-vram 1 --cpu-vae

RTX 3080/4070 (12-16GB VRAM)

python main.py --lowvram --fp8_e4m3fn-unet --bf16-vae --reserve-vram 2

RTX 4080/4090 (16-24GB VRAM)

python main.py --lowvram --fp8_e4m3fn-unet --bf16-vae

RTX 5090 (32GB+ VRAM)

python main.py --highvram --fp8_e4m3fn-unet --bf16-vae

Workflow Optimizations

Resolution Guidelines

Resolution	Minimum VRAM	Recommended Config
512x512	6GB	GGUF Q4 + CPU VAE
768x512	12GB	FP8 + lowvram
720x480	12GB	FP8 + lowvram
1024x576	16GB	FP8 + lowvram
1280x720	24GB	FP8 + standard

Frame Count Limits

LTX-2.3 generates 257 frames by default (8.5 seconds at 30fps). Each frame increases VRAM linearly during VAE decode:

257 frames: Full model memory
121 frames: ~50% memory reduction
50 frames: Suitable for 6GB GPUs

For long videos, process in segments or use a batch manager node.

Kijai Optimized Workflow

The Kijai workflow separates the VAE from the model, reducing VRAM:

Load LTX-2.3 FP8 model
VAE runs separately (can offload to CPU)
Distilled LoRA for 8-step generation
VRAM reduction: 29GB → 23GB

GGUF Quantized Workflows

QuantStack’s GGUF quantized models shrink memory further:

Quantization	VRAM Usage	Quality Loss
Q4 K-means	~18GB	Moderate
Q5	~20GB	Minor
Q6	~22GB	Minimal
Q8	~28GB	Negligible

Use with tiled VAE decode for additional memory savings.

Memory Cleanup Between Generations

Why torch.cuda.empty_cache() Isn’t Enough

Many users discover that torch.cuda.empty_cache() doesn’t actually free memory. Here’s why:

# This alone does NOTHING
torch.cuda.empty_cache()

# Proper cleanup requires all three steps
del model          # Delete references
del latent
gc.collect()       # Python garbage collection
torch.cuda.empty_cache()  # Now it works

PyTorch caches allocated memory for reuse. Objects with active references remain in memory even after empty_cache().

ComfyUI Memory Cleanup Workflow

Insert cleanup nodes between stages:

Free Memory (Model) — After model unload
Free Memory (Latent) — After latent operations
Clean VRAM Used — Between major workflow sections

Custom Nodes for Memory Management

Install via ComfyUI Manager:

ComfyUI-FreeMemory — Free CUDA/system RAM at workflow points
ComfyUI-MemoryManagement — Smart manager with leak detection
ComfyUI-MemoryCleaner — Comprehensive cleanup + RAM overflow prevention

Common OOM Scenarios and Solutions

Scenario 1: First Generation Works, Second Fails

Cause: Memory isn’t released between generations

Solution:

# Add to startup
python main.py --lowvram --disable-smart-memory

# In workflow, add Free Memory nodes after VAE decode

Scenario 2: OOM During VAE Decode

Cause: VAE requires significant VRAM for video frames

Solutions:

Run VAE on CPU: --cpu-vae
Use tiled VAE decode (process frames in batches)
Lower resolution before VAE, upscale after

Scenario 3: System Freezes Completely

Cause: System RAM exhaustion from VRAM offloading

Solutions:

Increase swap file (1.5x your RAM on NVMe SSD)
Close other applications
Use --reserve-vram to prevent complete RAM usage

Scenario 4: Black Images After Generation

Cause: VAE precision issues (FP16 can cause black outputs)

Solution:

python main.py --lowvram --fp8_e4m3fn-unet --fp32-vae

Keep VAE at FP32 even with FP8 model.

Troubleshooting Checklist

Run through this list before posting for help:

Monitoring Your Setup

Check VRAM Usage

# Real-time monitoring
watch -n 1 nvidia-smi

# Or use Python
python -c "import torch; print(f'VRAM: {torch.cuda.memory_allocated()/1e9:.2f}GB / {torch.cuda.get_device_properties(0).total_memory/1e9:.2f}GB')"

Find Model Sizes

# Check model files
ls -lh models/checkpoints/
ls -lh models/vae/
ls -lh models/loras/

ComfyUI Memory Stats

The ComfyUI interface shows memory usage in the footer. For detailed stats:

# Add to a custom node or script
import torch
import gc

def print_memory():
    print(f"CUDA Allocated: {torch.cuda.memory_allocated()/1e9:.2f}GB")
    print(f"CUDA Reserved: {torch.cuda.memory_reserved()/1e9:.2f}GB")
    print(f"CUDA Max Allocated: {torch.cuda.max_memory_allocated()/1e9:.2f}GB")
    torch.cuda.reset_peak_memory_stats()

What’s Coming: NVFP4 Support

NVIDIA has announced NVFP4 support for LTX-2.3, expected in 2026:

60% memory reduction compared to FP8
2.5x faster generation with optimizations
Requires Blackwell architecture (RTX 50-series)

If you’re running a 5090, you’ll soon be able to run LTX-2.3 at full resolution without the memory tricks in this guide.

Summary

Running LTX-2.3 on consumer GPUs comes down to three principles:

Precision reduction — Use FP8 models and text encoders
Aggressive offloading — --lowvram and --cpu-vae when needed
Memory discipline — Clean up between generations with gc.collect() + empty_cache()

With these optimizations, even a 12GB RTX 3060 can generate LTX-2.3 videos—albeit at lower resolutions. As NVIDIA continues improving FP8 and NVFP4 support, the memory requirements will only decrease.

For the latest workflows and optimizations, check the LTX-Video GitHub discussions and the ComfyUI community.

Fixing OOM Errors in ComfyUI with LTX-2.3 and NVIDIA GPUs

Fixing OOM Errors in ComfyUI with LTX-2.3 and NVIDIA GPUs

Understanding the OOM Error

Types of Memory Errors

Why LTX-2.3 Is Particularly Demanding

Quick Fixes (Start Here)

1. Update ComfyUI and PyTorch

2. Add the —lowvram Flag

3. Reserve VRAM for Your OS

4. Use FP8 Model + Distilled LoRA

Precision Settings Explained

FP8 vs FP16 vs BF16

Complete Precision Flags Reference

ComfyUI Memory Management Flags

VRAM Mode Flags

Advanced Memory Flags

Recommended Configurations by GPU

Workflow Optimizations

Resolution Guidelines

Frame Count Limits

Kijai Optimized Workflow

GGUF Quantized Workflows

Memory Cleanup Between Generations

Why torch.cuda.empty_cache() Isn’t Enough

ComfyUI Memory Cleanup Workflow

Custom Nodes for Memory Management

Common OOM Scenarios and Solutions

Scenario 1: First Generation Works, Second Fails

Scenario 2: OOM During VAE Decode

Scenario 3: System Freezes Completely

Scenario 4: Black Images After Generation

Troubleshooting Checklist

Monitoring Your Setup

Check VRAM Usage

Find Model Sizes

ComfyUI Memory Stats

What’s Coming: NVFP4 Support

Summary

Anthony Lattanzio

Comments

Fixing OOM Errors in ComfyUI with LTX-2.3 and NVIDIA GPUs

Understanding the OOM Error

Types of Memory Errors

Why LTX-2.3 Is Particularly Demanding

Quick Fixes (Start Here)

1. Update ComfyUI and PyTorch

2. Add the —lowvram Flag

3. Reserve VRAM for Your OS

4. Use FP8 Model + Distilled LoRA

Precision Settings Explained

FP8 vs FP16 vs BF16

Complete Precision Flags Reference

ComfyUI Memory Management Flags

VRAM Mode Flags

Advanced Memory Flags

Recommended Configurations by GPU

Workflow Optimizations

Resolution Guidelines

Frame Count Limits

Kijai Optimized Workflow

GGUF Quantized Workflows

Memory Cleanup Between Generations

Why torch.cuda.empty_cache() Isn’t Enough

ComfyUI Memory Cleanup Workflow

Custom Nodes for Memory Management

Common OOM Scenarios and Solutions

Scenario 1: First Generation Works, Second Fails

Scenario 2: OOM During VAE Decode

Scenario 3: System Freezes Completely

Scenario 4: Black Images After Generation

Troubleshooting Checklist

Monitoring Your Setup

Check VRAM Usage

Find Model Sizes

ComfyUI Memory Stats

What’s Coming: NVFP4 Support

Summary

Get Early Access

Anthony Lattanzio

Comments