videoscore2/vs2_qwen3vl_sft_27k_5e-5_2fps_960_720_8192 Image-to-Text • 770k • Updated 30 days ago • 87
videoscore2/vs2_internvl3_5_sft_27k_5e-5_2fps_960_720_8192 Any-to-Any • 695k • Updated 30 days ago • 81
videoscore2/vs2_qwen3vl_sft_27k_5e-5_2fps_960_720_8192 Image-to-Text • 770k • Updated 30 days ago • 87
videoscore2/vs2_internvl3_5_sft_27k_5e-5_2fps_960_720_8192 Any-to-Any • 695k • Updated 30 days ago • 81
TUNA: Taming Unified Visual Representations for Native Unified Multimodal Models Paper • 2512.02014 • Published Dec 1, 2025 • 71
BrowserAgent: Building Web Agents with Human-Inspired Web Browsing Actions Paper • 2510.10666 • Published Oct 12, 2025 • 27
TIGER-Lab/VideoScore2-RL-no-SFT Visual Question Answering • 8B • Updated Oct 13, 2025 • 9 • 1