MLLM Planner · DiT Renderer · Unified Video Generation and Editing

Bernini Latent Semantic Planning for Video Diffusion

Bernini Team, ByteDance

Bernini is a unified framework for video generation and editing with self-supervised vision-text reasoning. It combines an MLLM-based semantic planner with a DiT-based renderer.

Coming Soon

We are preparing the demo videos and more visual results.