Zero123++: a Single Image to Consistent Multi-view Diffusion Base Model Paper • 2310.15110 • Published Oct 23, 2023 • 3
Condition-Aware Neural Network for Controlled Image Generation Paper • 2404.01143 • Published Apr 1, 2024 • 13
VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation Paper • 2409.04429 • Published Sep 6, 2024
HART: Efficient Visual Generation with Hybrid Autoregressive Transformer Paper • 2410.10812 • Published Oct 14, 2024 • 18
NVILA: Efficient Frontier Visual Language Models Paper • 2412.04468 • Published Dec 5, 2024 • 60
Locality-aware Parallel Decoding for Efficient Autoregressive Image Generation Paper • 2507.01957 • Published Jul 2, 2025 • 23
CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models Paper • 2503.22020 • Published Mar 27, 2025
DC-AR: Efficient Masked Autoregressive Image Generation with Deep Compression Hybrid Tokenizer Paper • 2507.04947 • Published Jul 7, 2025 • 1
VLASH: Real-Time VLAs via Future-State-Aware Asynchronous Inference Paper • 2512.01031 • Published Nov 30, 2025 • 26
Quant VideoGen: Auto-Regressive Long Video Generation via 2-Bit KV-Cache Quantization Paper • 2602.02958 • Published Feb 3 • 34