LightThinker++: From Reasoning Compression to Memory Management Paper • 2604.03679 • Published 6 days ago • 29
MinerU2.5-Pro: Pushing the Limits of Data-Centric Document Parsing at Scale Paper • 2604.04771 • Published 4 days ago • 106
LatentUM: Unleashing the Potential of Interleaved Cross-Modal Reasoning via a Latent-Space Unified Model Paper • 2604.02097 • Published 8 days ago • 30
GaussianGPT: Towards Autoregressive 3D Gaussian Scene Generation Paper • 2603.26661 • Published 14 days ago • 25
Representation Alignment for Just Image Transformers is not Easier than You Think Paper • 2603.14366 • Published 26 days ago • 13
Less Gaussians, Texture More: 4K Feed-Forward Textured Splatting Paper • 2603.25745 • Published 15 days ago • 15
AVControl: Efficient Framework for Training Audio-Visual Controls Paper • 2603.24793 • Published 15 days ago • 26
PixelSmile: Toward Fine-Grained Facial Expression Editing Paper • 2603.25728 • Published 15 days ago • 117
Repurposing Geometric Foundation Models for Multi-view Diffusion Paper • 2603.22275 • Published 18 days ago • 47
FlowScene: Style-Consistent Indoor Scene Generation with Multimodal Graph Rectified Flow Paper • 2603.19598 • Published 21 days ago • 32
InCoder-32B: Code Foundation Model for Industrial Scenarios Paper • 2603.16790 • Published 24 days ago • 307
Grounding World Simulation Models in a Real-World Metropolis Paper • 2603.15583 • Published 25 days ago • 153
OmniForcing: Unleashing Real-time Joint Audio-Visual Generation Paper • 2603.11647 • Published 29 days ago • 31
Reading, Not Thinking: Understanding and Bridging the Modality Gap When Text Becomes Pixels in Multimodal LLMs Paper • 2603.09095 • Published Mar 10 • 29