Stoney Kang's picture

Stoney Kang

sikang99

·

AI & ML interests

Remote Control based on Vision

Recent Activity

upvoted a paper about 4 hours ago

World Action Models: A Survey

upvoted a paper 2 days ago

GeneralVLA-2: Geometry-Aware Reconstruction and Governed Memory for Robot Planning

upvoted a paper 2 days ago

PerceptionDLM: Parallel Region Perception with Multimodal Diffusion Language Models

View all activity

Organizations

upvoted a paper about 4 hours ago

World Action Models: A Survey

Paper • 2606.20781 • Published 7 days ago • 45

upvoted 2 papers 2 days ago

GeneralVLA-2: Geometry-Aware Reconstruction and Governed Memory for Robot Planning

Paper • 2606.17480 • Published 9 days ago • 3

PerceptionDLM: Parallel Region Perception with Multimodal Diffusion Language Models

Paper • 2606.19534 • Published 8 days ago • 58

upvoted a paper 3 days ago

MaineCoon: Pursuing A Real-Time Audio-Visual Social World Model

Paper • 2606.17800 • Published 9 days ago • 13

upvoted 6 papers 5 days ago

DF3DV-1K: A Large-Scale Dataset and Benchmark for Distractor-Free Novel View Synthesis

Paper • 2604.13416 • Published 7 days ago • 31

HumanScale: Egocentric Human Video Can Outperform Real-Robot Data for Embodied Pretraining

Paper • 2606.20521 • Published 7 days ago • 10

DragMesh-2: Physically Plausible Dexterous Hand-Object Interaction with Articulated Objects

Paper • 2606.15133 • Published 12 days ago • 72

Playful Agentic Robot Learning

Paper • 2606.19419 • Published 8 days ago • 48

JanusMesh: Fast and Zero-Shot 3D Visual Illusion Generation via Cross-Space Denoising

Paper • 2606.20563 • Published 7 days ago • 20

S-Agent: Spatial Tool-Use Elicits Reasoning for Spatial Intelligence

Paper • 2606.20515 • Published 7 days ago • 39

upvoted 9 papers 6 days ago

Adaptive Volumetric Mechanical Property Fields Invariant to Resolution

Paper • 2606.18231 • Published 9 days ago • 5

Holo-World: Unified Camera, Object and Weather Control for Video World Model

Paper • 2606.20083 • Published 7 days ago • 9

ENPIRE: Agentic Robot Policy Self-Improvement in the Real World

Paper • 2606.19980 • Published 7 days ago • 14

From Trainee to Trainer: LLM-Designed Training Environment for RL with Multi-Agent Reasoning

Paper • 2606.17682 • Published 9 days ago • 26

MolmoMotion: Forecasting Point Trajectories in 3D with Language Instruction

Paper • 2606.18558 • Published 8 days ago • 50

ViT-Up: Faithful Feature Upsampling for Vision Transformers

Paper • 2606.14024 • Published 13 days ago • 9

PAIWorld: A 3D-Consistent World Foundation Model for Robotic Manipulation

Paper • 2606.18375 • Published 9 days ago • 11

Reinforcing Dual-Path Reasoning in Spatial Vision Language Models

Paper • 2606.17539 • Published 9 days ago • 15

Kairos: A Native World Model Stack for Physical AI

Paper • 2606.16533 • Published 9 days ago • 36

upvoted a paper 7 days ago

Unified Multimodal Autoregressive Modeling with Shared Context-Visual Tokenizer is Key to Unification

Paper • 2606.18249 • Published 9 days ago • 14