๐ŸŒ SANA-WM โ€” Camera-Controlled World Model

Image-to-video generation with 6-DoF camera control using Efficient-Large-Model/SANA-WM_bidirectional and the NVlabs/Sana Stage-1 pipeline.

๐ŸŽฎ Camera action queue

The output is the Sana VAE decode of Stage-1 latents (no refiner). For peak quality use the full pipeline with --no_refiner disabled offline.

Example image + prompt + camera action queue (lazy-cached)