Abstract
Geometry-aware generative models leveraging spherical manifolds and optimal transport techniques outperform traditional Euclidean approaches for natural image synthesis.
Recent advances in generative models highlight the power of geometry-aware modeling in manifold-constrained settings. Yet, for natural images, the field remains confined to Euclidean assumptions, failing to exploit the potential of intrinsic geometric structures within the data. In this work, we investigate the geometry of natural images and observe that semantic information is predominantly encoded in directional components, while norm components can be approximated by the global average. This property holds across both RGB and latent spaces, suggesting that natural images can be effectively modeled on a hypersphere. Building on this finding, we introduce Spherical Optimal Transport Flow Matching (SOT-CFM), which utilizes angular distance, and Spherical Flow Matching (SFM), which constrains dynamics directly on the manifold. Our experiments demonstrate that these geometry-aware methods achieve superior performance against Euclidean baselines. Ultimately, this work provides a novel perspective that bridges the gap between Riemannian manifold-based modeling and natural image generation.
Community
TL;DR: Natural images live on a hypersphere — and treating them that way improves flow matching.
Geometry-aware generative modeling has worked well on known manifolds (molecules, crystals, proteins), but natural images have stayed stuck in Euclidean space because nobody knew what manifold they lived on.
We show a surprisingly simple answer: their semantic content is almost entirely in the direction, not the norm. Projecting images (both RGB and VAE latents) onto a sphere of the dataset's mean radius leaves them perceptually indistinguishable from the originals.
Building on this, we propose SOT-CFM (angular OT cost) and SFM (fully Riemannian flow matching on the sphere). SFM is, to our knowledge, the first successful application of a fully manifold-based generative framework to large-scale natural images.
I really enjoyed your approach to utilizing spherical manifolds for natural image generation.
I am reaching out because I recently published complementary work that also tackles generative modeling on intrinsic manifolds: "Learning on the Manifold: Unlocking Standard Diffusion Transformers with Representation Encoders" (arXiv:2602.10099). While your work elegantly focuses on flow matching, ours extends manifold learning to standard Diffusion Transformers.
Given the strong thematic overlap in bridging manifold-based modeling with image generation, we would be honored if you considered referencing our paper in your related work section in future revisions.
Congratulations again on the excellent paper!
Thanks for the kind note! Just for context, our paper was first introduced back in September 2025 (OpenReview, ICLR 2026 submission: https://openreview.net/forum?id=y7xF1bQ0C5; recently accepted to ICML 2026: https://icml.cc/virtual/2026/poster/61879). We'd be glad to cite your work in our follow-up paper, and it would be wonderful if you could consider including ours in your next revision or future work as well.
Congratulations on your interesting work.
the core idea of projecting images to a fixed-radius sphere and doing the flow in angular space is a clean way to decouple semantic content from magnitude. the two variants, angular-ot and geodesic sphere flow, feel like a low-friction tweak that actually leverages directional structure Euclidean models tend to blur. i’d like to see a careful ablation on the radius estimate and a robustness check when the mean norm shifts across batches, because a small mismatch could distort the angular geometry. btw, arxivlens has a nice breakdown that helps parse the method details: https://arxivlens.com/PaperView/Details/geometry-aware-image-flow-matching-4056-bef939b9
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Aligning Latent Geometry for Spherical Flow Matching in Image Generation (2026)
- Beyond Gaussian Bottlenecks: Topologically Aligned Encoding of Vision-Transformer Feature Spaces (2026)
- Coloring the Noise: Adversarial Sobolev Alignment for Faithful Image Super Resolution (2026)
- Spatial Gram Alignment for Ultra-High-Resolution Image Synthesis (2026)
- HEART: Hyperspherical Embedding Alignment via Kent-Representation Traversal in Diffusion Models (2026)
- SRC-Flow: Compact Semantic Representations Enable Normalizing Flows for Image Generation (2026)
- Direct Product Flow Matching: Decoupling Radial and Angular Dynamics for Few-Shot Adaptation (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Get this paper in your agent:
hf papers read 2605.25294 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper