🌍 SANA-WM — Camera-Controlled World Model

Image-to-video generation with 6-DoF camera control using Efficient-Large-Model/SANA-WM_bidirectional and the NVlabs/Sana Stage-1 pipeline.

First frame

Prompt

Resulting DSL

Output (704×1280)

The output is the Sana VAE decode of Stage-1 latents (no refiner). For peak quality use the full pipeline with --no_refiner disabled offline.

Example image + prompt + camera action queue (lazy-cached)