Flow-DPPO: Divergence Proximal Policy Optimization for Flow Matching Models Paper • 2606.11025 • Published 11 days ago • 41
Unify-Agent: A Unified Multimodal Agent for World-Grounded Image Synthesis Paper • 2603.29620 • Published Mar 31 • 48
MACRO: Advancing Multi-Reference Image Generation with Structured Long-Context Data Paper • 2603.25319 • Published Mar 26 • 32
Manifold-Aware Exploration for Reinforcement Learning in Video Generation Paper • 2603.21872 • Published Mar 23 • 34
Learning Latent Proxies for Controllable Single-Image Relighting Paper • 2603.15555 • Published Mar 16 • 8
Learning Latent Proxies for Controllable Single-Image Relighting Paper • 2603.15555 • Published Mar 16 • 8
Efficient Multimodal Learning from Data-centric Perspective Paper • 2402.11530 • Published Feb 18, 2024 • 1
Seeing Clearly, Answering Incorrectly: A Multimodal Robustness Benchmark for Evaluating MLLMs on Leading Questions Paper • 2406.10638 • Published Jun 15, 2024
MMRC: A Large-Scale Benchmark for Understanding Multimodal Large Language Model in Real-World Conversation Paper • 2502.11903 • Published Feb 17, 2025
OmniGen2: Exploration to Advanced Multimodal Generation Paper • 2506.18871 • Published Jun 23, 2025 • 79
When Semantics Mislead Vision: Mitigating Large Multimodal Models Hallucinations in Scene Text Spotting and Understanding Paper • 2506.05551 • Published Jun 5, 2025 • 5
TalkVid: A Large-Scale Diversified Dataset for Audio-Driven Talking Head Synthesis Paper • 2508.13618 • Published Aug 19, 2025 • 19
Model Reveals What to Cache: Profiling-Based Feature Reuse for Video Diffusion Models Paper • 2504.03140 • Published Apr 4, 2025