RADIO Collection A collection of Foundation Vision Models that combine multiple models (CLIP, DINOv2, SAM, etc.). • 19 items • Updated 1 day ago • 32
google/siglip2-giant-opt-patch16-384 Zero-Shot Image Classification • 2B • Updated Feb 21, 2025 • 138k • 35
MemOCR: Layout-Aware Visual Memory for Efficient Long-Horizon Reasoning Paper • 2601.21468 • Published 14 days ago • 20
DocReward: A Document Reward Model for Structuring and Stylizing Paper • 2510.11391 • Published Oct 13, 2025 • 27
TwinBrainVLA: Unleashing the Potential of Generalist VLMs for Embodied Tasks via Asymmetric Mixture-of-Transformers Paper • 2601.14133 • Published 23 days ago • 60