Hongje Seong's picture

18

Hongje Seong

hongjeseong

AI & ML interests

None yet

Recent Activity

upvoted a paper 1 day ago

VLM3: Vision Language Models Are Native 3D Learners

upvoted a paper 1 day ago

Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments

upvoted a paper 6 days ago

Geometry-Aware Representation Denoising for Robust Multi-view 3D Reconstruction

View all activity

Organizations

None yet

upvoted 2 papers 1 day ago

VLM3: Vision Language Models Are Native 3D Learners

Paper • 2605.30561 • Published 7 days ago • 20

Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments

Paper • 2605.30280 • Published 7 days ago • 133

upvoted 6 papers 6 days ago

Geometry-Aware Representation Denoising for Robust Multi-view 3D Reconstruction

Paper • 2605.26230 • Published 10 days ago • 41

SpatialBench: Is Your Spatial Foundation Model an All-Round Player?

Paper • 2605.27367 • Published 9 days ago • 70

LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding

Paper • 2605.27365 • Published 9 days ago • 135

CubePart: An Open-Vocabulary Part-Controllable 3D Generator

Paper • 2605.28763 • Published 8 days ago • 14

Fast-dDrive: Efficient Block-Diffusion VLM for Autonomous Driving

Paper • 2605.23163 • Published 10 days ago • 17

GEM: Generative Supervision Helps Embodied Intelligence

Paper • 2605.28548 • Published 8 days ago • 40

upvoted a paper 8 days ago

Lens: Rethinking Training Efficiency for Foundational Text-to-Image Models

Paper • 2605.21573 • Published 15 days ago • 107

upvoted a paper 13 days ago

VGGT-Ω

Paper • 2605.15195 • Published 21 days ago • 3

upvoted a paper 20 days ago

Asymmetric Flow Models

Paper • 2605.12964 • Published 22 days ago • 22

upvoted a paper 23 days ago

RLDX-1 Technical Report

Paper • 2605.03269 • Published 30 days ago • 125

upvoted 4 papers about 1 month ago

Image Generators are Generalist Vision Learners

Paper • 2604.20329 • Published Apr 22 • 21

Vista4D: Video Reshooting with 4D Point Clouds

Paper • 2604.21915 • Published Apr 23 • 12

Extending One-Step Image Generation from Class Labels to Text via Discriminative Text Representation

Paper • 2604.18168 • Published Apr 20 • 96

HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds

Paper • 2604.14268 • Published Apr 15 • 122

upvoted 2 papers about 2 months ago

Free Geometry: Refining 3D Reconstruction from Longer Versions of Itself

Paper • 2604.14048 • Published Apr 15 • 16

Prompt Relay: Inference-Time Temporal Control for Multi-Event Video Generation

Paper • 2604.10030 • Published Apr 11 • 15