DA-DPO: Cost-efficient Difficulty-aware Preference Optimization for Reducing MLLM Hallucinations Paper • 2601.00623 • Published Jan 2
TeaRAG: A Token-Efficient Agentic Retrieval-Augmented Generation Framework Paper • 2511.05385 • Published Nov 7, 2025
Pre-DPO: Improving Data Utilization in Direct Preference Optimization Using a Guiding Reference Model Paper • 2504.15843 • Published Apr 22, 2025 • 16
VerIPO: Cultivating Long Reasoning in Video-LLMs via Verifier-Gudied Iterative Policy Optimization Paper • 2505.19000 • Published May 25, 2025 • 42
OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference Paper • 2502.18411 • Published Feb 25, 2025 • 74