Fixing OOM Errors in ComfyUI with LTX-2.3 and NVIDIA GPUs

Troubleshoot and resolve out-of-memory errors when running LTX-2.3 video generation in ComfyUI on NVIDIA GPUs. Learn memory optimization techniques, precision settings, and workflow configurations.

• 7 min read
comfyuioomnvidialtx-2.3gpumemoryvideo-generation
Fixing OOM Errors in ComfyUI with LTX-2.3 and NVIDIA GPUs

Fixing OOM Errors in ComfyUI with LTX-2.3 and NVIDIA GPUs

You’ve just installed LTX-2.3, loaded up the workflow, hit Queue Prompt, and then—disaster. The dreaded RuntimeError: CUDA out of memory error crashes your generation. Sound familiar?

LTX-2.3 is a massive 22-billion parameter video model that can consume up to 46GB of VRAM at full precision. Even the optimized FP8 version requires at least 23-30GB. But here’s the good news: with the right configuration, you can run LTX-2.3 on as little as 6GB of VRAM.

This guide walks through every OOM troubleshooting step, from quick fixes to advanced optimizations, specifically for NVIDIA GPUs running LTX-2.3 in ComfyUI.

Understanding the OOM Error

Types of Memory Errors

CUDA Out of Memory is the most common error you’ll encounter:

RuntimeError: CUDA out of memory. Tried to allocate 12.50 GiB

This means your GPU’s VRAM is exhausted. But there are actually three distinct memory issues:

  1. VRAM Exhaustion — GPU memory is completely filled with model weights, intermediate tensors, and cached data
  2. System RAM Issues — ComfyUI offloads to system RAM when VRAM runs out, which can crash your entire system
  3. Memory Leaks — Memory isn’t properly released between generations, accumulating until failure

Why LTX-2.3 Is Particularly Demanding

LTX-2.3 introduces several memory-intensive features:

  • 22B parameters — One of the largest open video models
  • Native 9:16 portrait support — Requires processing different aspect ratios
  • 4x larger text connector — Better prompt adherence, but more memory for text encoding
  • Improved VAE — Higher quality output but larger intermediate tensors

The baseline FP8 model weighs in at ~30GB. Without optimization, you’d need a 4090 or better just to load it.

Quick Fixes (Start Here)

1. Update ComfyUI and PyTorch

ComfyUI’s recent versions include significant memory optimizations:

cd ComfyUI
git pull
pip install --upgrade torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu128

As of March 2026, ComfyUI has Dynamic VRAM enabled by default, which massively reduces RAM usage and prevents VRAM OOMs.

2. Add the —lowvram Flag

The single most effective quick fix:

python main.py --lowvram

This instructs ComfyUI to:

  • Split the UNET model across CPU and GPU
  • Offload inactive model weights to system RAM
  • Use minimal VRAM for text encoders

For extremely limited VRAM, use --novram (runs entirely on CPU).

3. Reserve VRAM for Your OS

Prevent system instability by reserving VRAM:

python main.py --lowvram --reserve-vram 2

The 2 reserves 2GB for your operating system and other applications.

4. Use FP8 Model + Distilled LoRA

The official LTX-2.3 workflow includes an 8-step distilled LoRA that dramatically reduces generation steps:

  • Reduces VRAM from ~46GB to ~23-30GB
  • Faster generation (8 steps vs 30+)
  • Minimal quality loss

Download the FP8 model and LoRA from the official LTX-Video repository.

Precision Settings Explained

FP8 vs FP16 vs BF16

PrecisionVRAM UsageQualitySpeedSupported GPUs
FP32100%BestSlowestAll
FP1650%GoodFastGTX 10-series+
BF1650%BetterFastRTX 30-series+
FP825%GoodFastestRTX 40-series+

FP8 (Recommended for LTX-2.3)

NVIDIA’s FP8 format uses 8 bits instead of 16 or 32, reducing memory by 75% compared to FP32:

python main.py --lowvram --fp8_e4m3fn-unet --fp8_e4m3fn-text-enc

The e4m3fn variant is optimized for inference (4 exponent bits, 3 mantissa bits).

BF16 (Alternative)

If you have an RTX 30-series or newer, BF16 offers FP32-like dynamic range with FP16 memory usage:

python main.py --lowvram --bf16-unet --bf16-vae

Complete Precision Flags Reference

# UNET (diffusion model) precision
--fp8_e4m3fn-unet    # Recommended for LTX-2.3
--fp8_e5m2-unet      # Alternative FP8 format
--fp16-unet          # Standard half precision
--bf16-unet          # Best for newer GPUs

# VAE precision
--fp16-vae           # Saves VRAM, potential quality loss
--bf16-vae           # Better quality than fp16
--fp32-vae           # Best quality, highest VRAM

# Text encoder precision
--fp8_e4m3fn-text-enc  # Minimal VRAM for text
--fp16-text-enc        # Standard precision

ComfyUI Memory Management Flags

VRAM Mode Flags

# Limited VRAM (8-12GB) — Most common for consumer GPUs
python main.py --lowvram

# Very limited VRAM (<8GB) — Uses system RAM heavily
python main.py --novram

# Abundant VRAM (24GB+) — Keeps models in GPU
python main.py --highvram

# Force normal mode (if lowvram auto-enabled incorrectly)
python main.py --normalvram

Advanced Memory Flags

# Disable smart memory (may help with specific OOM issues)
python main.py --disable-smart-memory

# Cache control
--cache-none          # No caching, lowest memory usage
--cache-lru 5         # LRU cache, limited items
--cache-classic       # Traditional aggressive caching

# Cross-attention methods
--use-split-cross-attention      # Lower VRAM, slower
--use-pytorch-cross-attention    # Faster, more VRAM

# VAE on CPU (significant VRAM savings, slower)
python main.py --lowvram --cpu-vae

RTX 3060/3070 (8-12GB VRAM)

python main.py --lowvram --fp8_e4m3fn-unet --fp8_e4m3fn-text-enc --reserve-vram 1 --cpu-vae

RTX 3080/4070 (12-16GB VRAM)

python main.py --lowvram --fp8_e4m3fn-unet --bf16-vae --reserve-vram 2

RTX 4080/4090 (16-24GB VRAM)

python main.py --lowvram --fp8_e4m3fn-unet --bf16-vae

RTX 5090 (32GB+ VRAM)

python main.py --highvram --fp8_e4m3fn-unet --bf16-vae

Workflow Optimizations

Resolution Guidelines

ResolutionMinimum VRAMRecommended Config
512x5126GBGGUF Q4 + CPU VAE
768x51212GBFP8 + lowvram
720x48012GBFP8 + lowvram
1024x57616GBFP8 + lowvram
1280x72024GBFP8 + standard

Frame Count Limits

LTX-2.3 generates 257 frames by default (8.5 seconds at 30fps). Each frame increases VRAM linearly during VAE decode:

  • 257 frames: Full model memory
  • 121 frames: ~50% memory reduction
  • 50 frames: Suitable for 6GB GPUs

For long videos, process in segments or use a batch manager node.

Kijai Optimized Workflow

The Kijai workflow separates the VAE from the model, reducing VRAM:

  1. Load LTX-2.3 FP8 model
  2. VAE runs separately (can offload to CPU)
  3. Distilled LoRA for 8-step generation
  4. VRAM reduction: 29GB → 23GB

GGUF Quantized Workflows

QuantStack’s GGUF quantized models shrink memory further:

QuantizationVRAM UsageQuality Loss
Q4 K-means~18GBModerate
Q5~20GBMinor
Q6~22GBMinimal
Q8~28GBNegligible

Use with tiled VAE decode for additional memory savings.

Memory Cleanup Between Generations

Why torch.cuda.empty_cache() Isn’t Enough

Many users discover that torch.cuda.empty_cache() doesn’t actually free memory. Here’s why:

# This alone does NOTHING
torch.cuda.empty_cache()

# Proper cleanup requires all three steps
del model          # Delete references
del latent
gc.collect()       # Python garbage collection
torch.cuda.empty_cache()  # Now it works

PyTorch caches allocated memory for reuse. Objects with active references remain in memory even after empty_cache().

ComfyUI Memory Cleanup Workflow

Insert cleanup nodes between stages:

  1. Free Memory (Model) — After model unload
  2. Free Memory (Latent) — After latent operations
  3. Clean VRAM Used — Between major workflow sections

Custom Nodes for Memory Management

Install via ComfyUI Manager:

  • ComfyUI-FreeMemory — Free CUDA/system RAM at workflow points
  • ComfyUI-MemoryManagement — Smart manager with leak detection
  • ComfyUI-MemoryCleaner — Comprehensive cleanup + RAM overflow prevention

Common OOM Scenarios and Solutions

Scenario 1: First Generation Works, Second Fails

Cause: Memory isn’t released between generations

Solution:

# Add to startup
python main.py --lowvram --disable-smart-memory

# In workflow, add Free Memory nodes after VAE decode

Scenario 2: OOM During VAE Decode

Cause: VAE requires significant VRAM for video frames

Solutions:

  1. Run VAE on CPU: --cpu-vae
  2. Use tiled VAE decode (process frames in batches)
  3. Lower resolution before VAE, upscale after

Scenario 3: System Freezes Completely

Cause: System RAM exhaustion from VRAM offloading

Solutions:

  1. Increase swap file (1.5x your RAM on NVMe SSD)
  2. Close other applications
  3. Use --reserve-vram to prevent complete RAM usage

Scenario 4: Black Images After Generation

Cause: VAE precision issues (FP16 can cause black outputs)

Solution:

python main.py --lowvram --fp8_e4m3fn-unet --fp32-vae

Keep VAE at FP32 even with FP8 model.

Troubleshooting Checklist

Run through this list before posting for help:

  • ComfyUI updated to latest git version
  • PyTorch updated to CUDA 12.8+
  • Using FP8 model + distilled LoRA
  • --lowvram flag added
  • VRAM reserved for OS (--reserve-vram 2)
  • Memory cleared between generations
  • VRAM monitored with nvidia-smi
  • Swap file increased (1.5x RAM on SSD)
  • Other GPU applications closed
  • ComfyUI restarted between long sessions

Monitoring Your Setup

Check VRAM Usage

# Real-time monitoring
watch -n 1 nvidia-smi

# Or use Python
python -c "import torch; print(f'VRAM: {torch.cuda.memory_allocated()/1e9:.2f}GB / {torch.cuda.get_device_properties(0).total_memory/1e9:.2f}GB')"

Find Model Sizes

# Check model files
ls -lh models/checkpoints/
ls -lh models/vae/
ls -lh models/loras/

ComfyUI Memory Stats

The ComfyUI interface shows memory usage in the footer. For detailed stats:

# Add to a custom node or script
import torch
import gc

def print_memory():
    print(f"CUDA Allocated: {torch.cuda.memory_allocated()/1e9:.2f}GB")
    print(f"CUDA Reserved: {torch.cuda.memory_reserved()/1e9:.2f}GB")
    print(f"CUDA Max Allocated: {torch.cuda.max_memory_allocated()/1e9:.2f}GB")
    torch.cuda.reset_peak_memory_stats()

What’s Coming: NVFP4 Support

NVIDIA has announced NVFP4 support for LTX-2.3, expected in 2026:

  • 60% memory reduction compared to FP8
  • 2.5x faster generation with optimizations
  • Requires Blackwell architecture (RTX 50-series)

If you’re running a 5090, you’ll soon be able to run LTX-2.3 at full resolution without the memory tricks in this guide.

Summary

Running LTX-2.3 on consumer GPUs comes down to three principles:

  1. Precision reduction — Use FP8 models and text encoders
  2. Aggressive offloading--lowvram and --cpu-vae when needed
  3. Memory discipline — Clean up between generations with gc.collect() + empty_cache()

With these optimizations, even a 12GB RTX 3060 can generate LTX-2.3 videos—albeit at lower resolutions. As NVIDIA continues improving FP8 and NVFP4 support, the memory requirements will only decrease.

For the latest workflows and optimizations, check the LTX-Video GitHub discussions and the ComfyUI community.

Anthony Lattanzio

Anthony Lattanzio

Tech Enthusiast & Builder

I'm a tech enthusiast who loves building things with hardware and software. By night, I run a homelab that's grown way beyond what any reasonable person needs. Check out about me for more.

Comments

Powered by GitHub Discussions