LTX 2 Cinematic Video Generation with ComfyUI

The landscape of AI video generation has evolved dramatically. With LTX-2’s release as open source, creators now have access to a professional-grade video generation model that delivers synchronized audio and video in a single pass. Let’s explore how to harness its cinematic capabilities through ComfyUI.

What Makes LTX-2 Different

LTX-2 isn’t just another video model—it’s the first DiT-based foundation model to generate audio and video simultaneously. This matters because motion, dialogue, ambience, and music flow together naturally, eliminating the tedious process of manually synchronizing separate audio tracks.

Key capabilities:

Native 4K at 50 FPS — Sharp textures and clean motion for production-ready output
Synchronized audio — Audio generated with video, not layered on after
Multiple duration options — 6, 8, or 10 second clips (15 seconds coming soon)
Three performance tiers — Fast for concepting, Pro for reviews, Ultra for delivery
ComfyUI native integration — Built directly into ComfyUI for seamless workflows

Setting Up LTX-2 in ComfyUI

Installation

LTX-2 is integrated into ComfyUI through the LTXVideo custom nodes. To get started:

# Clone the custom nodes repository
cd ComfyUI/custom_nodes
git clone https://github.com/Lightricks/ComfyUI-LTXVideo.git

The model weights are available on HuggingFace:

ltxv-13b-0.9.8-dev — Highest quality, more VRAM
ltxv-13b-0.9.8-distilled — Balanced quality and speed
ltxv-2b-0.9.8-distilled — Fastest, lowest VRAM

Choosing the Right Model

Model	Quality	Speed	VRAM Required	Best For
13B-dev	Highest	Slower	High	Final delivery
13B-distilled	Good	Fast	Medium	Iteration
2B-distilled	Good	Fastest	Low	Rapid prototyping

For cinematic work, I recommend the 13B-dev model when quality is paramount, and the 13B-distilled model for faster iteration during concept development.

LTX-2 Workflow Pipeline

The Cinematic Prompting Framework

LTX-2 responds exceptionally well to structured prompts. The key is thinking like a cinematographer rather than a casual user.

The Anatomy of a Cinematic Prompt

A well-crafted cinematic prompt follows this structure:

[Scene Description] + [Lighting & Atmosphere] + [Camera Movement] + [Character Action] + [Audio Elements] + [Technical Specs]

Let’s break this down with a real example:

A woman stands alone on a balcony late at night as warm yellow city glow and 
scattered neon reflections fall across her shoulders and the metal railing. 
The camera begins with a wide shot from a distance, slowly pushing forward 
through the cool night air. A gentle breeze moves strands of her hair while 
distant city lights blur softly. As the camera approaches, the framing 
transitions into a medium close-up, revealing her three-quarter profile. 
Color grading is slightly desaturated with teal shadows and warm highlights, 
inspired by Kodak 2383 print film emulation. Shot with a 50mm anamorphic 
equivalent lens at f2.0, natural film grain, 180 degree shutter.

Why this works:

Atmospheric foundation — Sets mood before action
Explicit camera path — Describes movement progression
Color science reference — Guides aesthetic direction
Technical specs — Locks in professional look

Essential Camera Movement Keywords

Keyword	Effect	Use Case
`stable dolly movement`	Smooth tracking	Product reveals
`tripod locked stability`	Static control	Interviews
`smooth gimbal tracking`	Fluid following	Action sequences
`constant speed pan`	Even horizontal	Landscape reveals
`natural motion blur`	Realistic motion	Cinematic look
`180 degree shutter`	Film-style blur	Narrative content
`controlled slow dolly`	Dramatic tension	Emotional scenes

Things to Avoid at High Frame Rates

When generating at 50 FPS, avoid these terms:

chaotic handheld motion — introduces distortion
shaky camera movement — amplifies jitter
irregular speed changes — breaks temporal consistency

Cinematic Scene Types

Product Showcase

For e-commerce and brand content:

An ultra-thin aluminum mechanical keyboard rests on a minimalist white marble 
surface. Soft morning light enters from a window on the left, creating subtle 
shadows across the brushed metal frame. The camera begins with an extreme macro 
shot of the keycaps, revealing their matte texture. As the backlight illuminates 
beneath the keys, the camera pulls back into a medium shot. A hand enters from 
the right, fingers hovering before touching the keys. Ambient audio includes 
soft tactile keyboard clicks and quiet room atmosphere. Shot on a 50mm lens, 
f/2.8 aperture, shallow depth of field, smooth gimbal stabilization.

Pro tip: Lock the seed across multiple shots for consistent lighting throughout a campaign.

Tutorial and Educational Content

For instructional videos with clarity:

A history lecturer stands in a bright modern classroom in front of a 
high-resolution interactive digital whiteboard. The camera frames him in a 
stable medium shot at chest height as he gestures toward ancient map images 
displayed on the screen. As he speaks, his right hand moves deliberately toward 
the screen and pauses mid-air to emphasize a key point. The camera slowly pushes 
in, keeping both his face and visual content in frame. Soft overhead lighting 
blends with the cool white glow of the display. Ambient audio includes quiet 
classroom atmosphere, faint page turning sounds, and clear speech with natural 
room echo. Tripod locked, 35mm equivalent lens, paced for educational clarity.

Dramatic Narrative

For film-quality storytelling:

A figure emerges from dense forest fog at dawn. Golden-hour light filters 
through ancient trees, creating volumetric rays that catch mist particles. 
The camera tracks laterally, matching their steady pace as they navigate 
moss-covered roots. Their silhouette gradually sharpens against the warming 
sky. No dialogue—only wind through branches, distantbird calls, and 
footsteps on wet leaves. Color grading emphasizes warm highlights cutting 
through cool morning haze. Anamorphic lens flare catches the sun, shot at 
24fps with authentic film grain, gradual atmospheric shift.

Resolution and Frame Rate Considerations

Configuration Matrix

Configuration	Best For	Considerations
4K @ 50 FPS	Final delivery, VFX	Highest quality, longer render times
4K @ 25 FPS	Cinematic narrative	Natural film motion blur, faster than 50fps
1080p @ 50 FPS	Social media	Smooth motion, rapid iteration
1080p @ 25 FPS	Concept testing	Fastest rendering, draft quality

Why 25 FPS Often Looks More Cinematic

Film traditionally uses 24fps, which creates natural motion blur. At 50fps, everything is sharper—but that can actually reduce cinematic quality. For narrative content, 4K @ 25 FPS often yields better results than 4K @ 50 FPS because the motion blur mimics traditional film.

Advanced Techniques

Multiscale Rendering

LTX-2 supports multiscale pipelines that can render faster by:

Generating a low-quality draft
Iteratively adding detail layers
Upscaling to final resolution

This approach can be 30x faster than single-pass 4K generation.

Control Models (IC-LoRAs)

LTX-2 supports several control models for precise generation:

Depth Control — Use depth maps to guide spatial structure
Pose Control — Transfer poses from reference images
Canny Control — Control generation with edge detection
Detailer — Enhance fine details in upscaled outputs

Multi-Keyframe Conditioning

For precise scene control, provide multiple keyframes:

Start frame: close-up on subject's eyes
Mid frame: pull back to reveal environment  
End frame: wide establishing shot of location

The model interpolates smoothly between these visual anchors.

Post-Processing Workflow

After generation, consider these enhancements:

Temporal Upscaling — Use LTX temporal upscaler for smoother motion
Spatial Upscaling — Enhance resolution with spatial upscaler models
Seed Consistency — Lock seeds across shots for unified color grading
Audio Separation — Toggle audio generation per project needs

Practical Workflow Example

Here’s a complete Cinematic Product Reveal workflow:

Step 1: Concept Development

Define the mood, lighting, and camera path
Write a structured prompt using the framework above

Step 2: Model Selection

Start with 13B-distilled for iteration
Switch to 13B-dev for final output

Step 3: Generation

Generate at 1080p @ 25 FPS for quick review
Refine prompt based on output
Lock seed when satisfied with lighting

Step 4: Upscale

Apply spatial upscaler for 4K output
Run detailer model for fine textures

Step 5: Audio (if needed)

Use LTX-2’s native audio generation
Or generate silent and add audio separately

Tips for Consistent Results

Lock Your Seeds When you find lighting you like, lock the seed across multiple shots to maintain visual consistency throughout a sequence.

Reference Film Stocks Mentioning Kodak 2383, ARRI Alexa, or Fuji Pro 400H guides the model toward specific color science looks.

Specify Technical Camera Settings Including depth of field (f/2.0), shutter angle (180 degree), and lens type (50mm anamorphic) helps lock in professional aesthetics.

Use Negative Prompts Sparingly LTX-2’s prompt understanding is strong—focus on what you want rather than what you don’t.

Conclusion

LTX-2 represents a significant leap forward for AI video generation. With synchronized audio, native 4K output, and deep ComfyUI integration, it enables creators to produce genuinely professional content without the traditional barriers of video production.

The key to cinematic results lies in structured prompting that thinks like a cinematographer. By combining atmospheric description, explicit camera movement, and technical specifications, you can guide the model toward outputs that rival traditional production.

Ready to start generating? The model weights and workflows are available on HuggingFace and GitHub—open source and ready for your next project.

LTX 2 Cinematic Video Generation with ComfyUI

LTX 2 Cinematic Video Generation with ComfyUI

What Makes LTX-2 Different

Setting Up LTX-2 in ComfyUI

Installation

Choosing the Right Model

The Cinematic Prompting Framework

The Anatomy of a Cinematic Prompt

Essential Camera Movement Keywords

Things to Avoid at High Frame Rates

Cinematic Scene Types

Product Showcase

Tutorial and Educational Content

Dramatic Narrative

Resolution and Frame Rate Considerations

Configuration Matrix

Why 25 FPS Often Looks More Cinematic

Advanced Techniques

Multiscale Rendering

Control Models (IC-LoRAs)

Multi-Keyframe Conditioning

Post-Processing Workflow

Practical Workflow Example

Tips for Consistent Results

Conclusion

Anthony Lattanzio

Comments

LTX 2 Cinematic Video Generation with ComfyUI

What Makes LTX-2 Different

Setting Up LTX-2 in ComfyUI

Installation

Choosing the Right Model

The Cinematic Prompting Framework

The Anatomy of a Cinematic Prompt

Essential Camera Movement Keywords

Things to Avoid at High Frame Rates

Cinematic Scene Types

Product Showcase

Tutorial and Educational Content

Dramatic Narrative

Resolution and Frame Rate Considerations

Configuration Matrix

Why 25 FPS Often Looks More Cinematic

Advanced Techniques

Multiscale Rendering

Control Models (IC-LoRAs)

Multi-Keyframe Conditioning

Post-Processing Workflow

Practical Workflow Example

Tips for Consistent Results

Conclusion

Get Early Access

Anthony Lattanzio

Comments