Pyramidal Flow Matching for Efficient Video Generative Modeling

Anonymous Institute

Qualitative Results

Text-to-Video Generation (1280x768, 10s, 24fps)

Text-to-Video Generation (1280x768, 5s, 24fps)


Text-conditioned Image-to-Video Generation (1280x768, 5s, 24fps)