Part 10/14:
Craya has introduced Craya Realtime, a 14-billion-parameter autoregressive video model derived from a diffusion backbone through a method called self-forcing. It delivers impressive speed—11 frames per second on an Nvidia B200 with minimal inference steps—making it suitable for studio environments rather than consumer devices.
This model enables interactive video creation: changing prompts mid-generation, quickly previewing the first frame, and streaming video primitives for editing or composition. Its architecture includes memory-efficient techniques such as KV cache recomputation and attention bias, ensuring stability during long-generation sequences.