Scale RAE Collection Collection for "Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders" • 7 items • Updated 15 days ago • 3
RAE Collection Collection for Diffusion Transformers with Representation Autoencoders • 1 item • Updated Oct 14, 2025 • 11
OneFlow: Concurrent Mixed-Modal and Interleaved Generation with Edit Flows Paper • 2510.03506 • Published Oct 3, 2025 • 15
Learning to See Before Seeing: Demystifying LLM Visual Priors from Language Pre-training Paper • 2509.26625 • Published Sep 30, 2025 • 43
V-JEPA 2 Collection A frontier video understanding model developed by FAIR, Meta, which extends the pretraining objectives of https://ai.meta.com/blog/v-jepa-yann • 8 items • Updated Jun 13, 2025 • 192
Cosmos-Tokenize1 Collection A suite of image and video tokenizers • 9 items • Updated 13 days ago • 9
Motion-Guided Masking for Spatiotemporal Representation Learning Paper • 2308.12962 • Published Aug 24, 2023 • 1
MetaMorph: Multimodal Understanding and Generation via Instruction Tuning Paper • 2412.14164 • Published Dec 18, 2024 • 4
Video Token Merging for Long-form Video Understanding Paper • 2410.23782 • Published Oct 31, 2024 • 2