AnchorWorld: Embodied Egocentric World Simulation with View-based Evolution Customization
Abstract
AnchorWorld advances egocentric simulation through enhanced interaction integrity and flexible world customization using 3D human motion and anchor view definitions.
Despite being a pivotal frontier, interactive world modeling remains underexplored in terms of the versatile controllability required by practical scenarios. To bridge this gap, we present AnchorWorld, a framework that advances egocentric simulation through enhanced interaction integrity and a flexible mechanism for world customization. First, we utilize 3D human motion as the primary interaction modality. To complement the out-of-view or truncated body parts in egocentric views, we introduce an auxiliary training supervision that incorporates exogenous viewpoints decoupled from the agent's first-person sensorium. It allows the model to observe the agent's full-body positioning relative to the environment, facilitating a more robust spatial grounding of human-world interactions. Furthermore, we propose a simple yet effective mechanism for customizing self-evolving worlds. This is achieved by defining anchor views within a unified world coordinate system, coupled with textual descriptions dictating the dynamic evolution of local scenes. Experimental results show that AnchorWorld significantly outperforms state-of-the-art baselines, while ablation studies validate the effectiveness of our key designs. Notably, our customization scheme exhibits promising spatio-temporal geometric consistency and adheres strictly to the prescribed evolutionary dynamics.
Community
We propose AnchorWorld, a framework that combines embodied egocentric action control with world customization. AnchorWorld enables human-motion-driven exploration and interaction within customizable, self-evolving worlds
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- E$^3$C: Video Generation with 3D Environmental Memory and Ego-Exo Human Pose Control (2026)
- EgoExo-WM: Unlocking Exo Video for Ego World Models (2026)
- EggHand: A Multimodal Foundation Model for Egocentric Hand Pose Forecasting (2026)
- World-Ego Modeling for Long-Horizon Evolution in Hybrid Embodied Tasks (2026)
- Embody4D: A Generalist 4D World Model for Embodied AI (2026)
- WorldCraft: From Camera Navigation to Object Manipulation in Interactive Video World Models (2026)
- Towards 3D-Aware Video Diffusion Models: Render-Free Human Motion Control with Mesh Tokenization (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Get this paper in your agent:
hf papers read 2606.07326 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 1
Spaces citing this paper 0
No Space linking this paper