Activity Feed

AI & ML interests

None defined yet.

Recent Activity

hysts 
in Gradio-Blocks/ViTPose 6 months ago

runtime error

2
#10 opened 6 months ago by
liangnanying
DmitryRyumin 
posted an update 7 months ago
view post
Post
1366
🚀👁️🌟 New Research Alert - ICCV 2025 (Poster)! 🌟👁️🚀
📄 Title: Is Less More? Exploring Token Condensation as Training-Free Test-Time Adaptation 🔝

📝 Description: Token Condensation as Adaptation (TCA) improves the performance and efficiency of Vision Language Models in zero-shot inference by introducing domain anchor tokens.

👥 Authors: Zixin Wang, Dong Gong, Sen Wang, Zi Huang, Yadan Luo

📅 Conference: ICCV, 19 – 23 Oct, 2025 | Honolulu, Hawai'i, USA 🇺🇸

📄 Paper: Is Less More? Exploring Token Condensation as Training-free Test-time Adaptation (2410.14729)

📁 Repository: https://github.com/Jo-wang/TCA

🚀 ICCV-2023-25-Papers: https://github.com/DmitryRyumin/ICCV-2023-25-Papers

🚀 Added to the Session 1: https://github.com/DmitryRyumin/ICCV-2023-25-Papers/blob/main/sections/2025/main/session-1.md

📚 More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

🔍 Keywords: #TestTimeAdaptation #TokenCondensation #VisionLanguageModels #TrainingFreeAdaptation #ZeroShotLearning #EfficientAI #AI #ICCV2025 #ResearchHighlight
DmitryRyumin 
posted an update 7 months ago
view post
Post
2497
🚀👁️🌟 New Research Alert - ICCV 2025 (Oral)! 🌟👁️🚀
📄 Title: Diving into the Fusion of Monocular Priors for Generalized Stereo Matching 🔝

📝 Description: The proposed method enhances stereo matching by efficiently combining unbiased monocular priors from vision foundation models. This method addresses misalignment and local optima issues using a binary local ordering map and pixel-wise linear regression.

👥 Authors: Chengtang Yao, Lidong Yu, Zhidan Liu, Jiaxi Zeng, Yuwei Wu, and Yunde Jia

📅 Conference: ICCV, 19 – 23 Oct, 2025 | Honolulu, Hawai'i, USA 🇺🇸

📄 Paper: Diving into the Fusion of Monocular Priors for Generalized Stereo Matching (2505.14414)

📁 Repository: https://github.com/YaoChengTang/Diving-into-the-Fusion-of-Monocular-Priors-for-Generalized-Stereo-Matching

🚀 ICCV-2023-25-Papers: https://github.com/DmitryRyumin/ICCV-2023-25-Papers

🚀 Added to the 3D Pose Understanding Section: https://github.com/DmitryRyumin/ICCV-2023-25-Papers/blob/main/sections/2025/main/3d-pose-understanding.md

📚 More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

🔍 Keywords: #StereoMatching #MonocularDepth #VisionFoundationModels #3DReconstruction #Generalization #AI #ICCV2025 #ResearchHighlight
DmitryRyumin 
posted an update 7 months ago
view post
Post
2845
🚀👌🌟 New Research Alert - ICCV 2025 (Oral)! 🌟🤌🚀
📄 Title: Understanding Co-speech Gestures in-the-wild 🔝

📝 Description: JEGAL is a tri-modal model that learns from gestures, speech and text simultaneously, enabling devices to interpret co-speech gestures in the wild.

👥 Authors: @sindhuhegde , K R Prajwal, Taein Kwon, and Andrew Zisserman

📅 Conference: ICCV, 19 – 23 Oct, 2025 | Honolulu, Hawai'i, USA 🇺🇸

📄 Paper: Understanding Co-speech Gestures in-the-wild (2503.22668)

🌐 Web Page: https://www.robots.ox.ac.uk/~vgg/research/jegal
📁 Repository: https://github.com/Sindhu-Hegde/jegal
📺 Video: https://www.youtube.com/watch?v=TYFOLKfM-rM

🚀 ICCV-2023-25-Papers: https://github.com/DmitryRyumin/ICCV-2023-25-Papers

🚀 Added to the Human Modeling Section: https://github.com/DmitryRyumin/ICCV-2023-25-Papers/blob/main/sections/2025/main/human-modeling.md

📚 More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

🔍 Keywords: #CoSpeechGestures #GestureUnderstanding #TriModalRepresentation #MultimodalLearning #AI #ICCV2025 #ResearchHighlight
nouamanetazi 
posted an update 7 months ago
view post
Post
4893
After training 𝐒𝐦𝐨𝐥𝐋𝐌𝟑 on 𝟑𝟖𝟒 𝐇𝟏𝟎𝟎𝐬 for nearly a month, I've come to realize something most people overlook: 𝐢𝐧𝐟𝐫𝐚𝐬𝐭𝐫𝐮𝐜𝐭𝐮𝐫𝐞 𝐢𝐬 𝐭𝐡𝐞 𝐦𝐚𝐤𝐞-𝐨𝐫-𝐛𝐫𝐞𝐚𝐤 𝐟𝐚𝐜𝐭𝐨𝐫 𝐢𝐧 𝐋𝐋𝐌 𝐭𝐫𝐚𝐢𝐧𝐢𝐧𝐠. 🔥

Everyone talks about model architecture and data quality. And yes, those matter immensely. But here's what nobody tells you: when your training run fails at 2 AM because of mysterious 𝐍𝐂𝐂𝐋 𝐞𝐫𝐫𝐨𝐫𝐬, or when your expensive GPU cluster is running at 𝟔𝟎% 𝐞𝐟𝐟𝐢𝐜𝐢𝐞𝐧𝐜𝐲, the problem isn't your model. It's most probably a 𝐦𝐢𝐬𝐮𝐬𝐞 𝐨𝐟 𝐭𝐡𝐞 𝐡𝐚𝐫𝐝𝐰𝐚𝐫𝐞. 🛠️

Questions that seemed simple but had no clear answers: Why is 𝐌𝐨𝐄 𝐭𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝐬𝐥𝐨𝐰𝐞𝐫 𝐭𝐡𝐚𝐧 𝐝𝐞𝐧𝐬𝐞 𝐦𝐨𝐝𝐞𝐥𝐬? Which 𝐍𝐂𝐂𝐋 𝐟𝐥𝐚𝐠𝐬 should we actually set? How often should we checkpoint without killing throughput?

That's why we built 𝐓𝐡𝐞 𝐒𝐦𝐨𝐥 𝐓𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝐏𝐥𝐚𝐲𝐛𝐨𝐨𝐤 📖: a complete guide covering everything from model architecture and data curation to the SmolLM3 training marathon, post-training techniques, and crucially, the 𝐢𝐧𝐟𝐫𝐚𝐬𝐭𝐫𝐮𝐜𝐭𝐮𝐫𝐞 𝐥𝐚𝐲𝐞𝐫 that most teams get wrong.

We validated real vs theoretical bandwidth across the entire stack: 𝐇𝐁𝐌𝟑 𝐡𝐢𝐭𝐭𝐢𝐧𝐠 𝟑 𝐓𝐁/𝐬, 𝐍𝐕𝐋𝐢𝐧𝐤 𝟒.𝟎 𝐫𝐞𝐚𝐜𝐡𝐢𝐧𝐠 𝟕𝟖𝟔 𝐆𝐁/𝐬, 𝐏𝐂𝐈𝐞 𝐆𝐞𝐧𝟒 𝐚𝐭 𝟏𝟒.𝟐 𝐆𝐁/𝐬. Then we ran collective operations across 𝟏𝟐𝟖 𝐆𝐏𝐔𝐬 (16 nodes, 8xH100s each) and measured how performance degrades at scale: all-reduce drops from 𝟒𝟖𝟎 𝐆𝐁/𝐬 on a single node to 𝟑𝟐𝟎-𝟑𝟓𝟎 𝐆𝐁/𝐬 across 16 nodes.

If you've ever wondered why your training runs are slower than they should be, or you're planning to scale up and want to avoid expensive mistakes, this guide might save you weeks of debugging.

𝐓𝐡𝐞 𝐒𝐦𝐨𝐥 𝐓𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝐏𝐥𝐚𝐲𝐛𝐨𝐨𝐤: https://lnkd.in/e5MKXUHS

Shared with ❤️ by the HuggingFace team
DmitryRyumin 
posted an update 7 months ago
view post
Post
3988
🚀💡🌟 New Research Alert - ICCV 2025 (Oral)! 🌟🪄🚀
📄 Title: LoftUp: Learning a Coordinate-based Feature Upsampler for Vision Foundation Models 🔝

📝 Description: LoftUp is a coordinate-based transformer that upscales the low-resolution features of VFMs (e.g. DINOv2 and CLIP) using cross-attention and self-distilled pseudo-ground truth (pseudo-GT) from SAM.

👥 Authors: Haiwen Huang, Anpei Chen, Volodymyr Havrylov, Andreas Geiger, and Dan Zhang

📅 Conference: ICCV, 19 – 23 Oct, 2025 | Honolulu, Hawai'i, USA 🇺🇸

📄 Paper: LoftUp: Learning a Coordinate-Based Feature Upsampler for Vision Foundation Models (2504.14032)

🌐 Github Page: https://andrehuang.github.io/loftup-site
📁 Repository: https://github.com/andrehuang/loftup

🚀 ICCV-2023-25-Papers: https://github.com/DmitryRyumin/ICCV-2023-25-Papers

🚀 Added to the Foundation Models and Representation Learning Section: https://github.com/DmitryRyumin/ICCV-2023-25-Papers/blob/main/sections/2025/main/foundation-models-and-representation-learning.md

📚 More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

🔍 Keywords: #LoftUp #VisionFoundationModels #FeatureUpsampling #Cross-AttentionTransformer #CoordinateBasedLearning #SelfDistillation #PseudoGroundTruth #RepresentationLearning #AI #ICCV2025 #ResearchHighlight
DmitryRyumin 
posted an update 7 months ago
view post
Post
1970
🚀🏷️🌟 New Research Alert - ICCV 2025 (Oral)! 🌟🧩🚀
📄 Title: Heavy Labels Out! Dataset Distillation with Label Space Lightening 🔝

📝 Description: The HeLlO framework is a new corpus distillation method that removes the need for large soft labels. It uses a lightweight, online image-to-label projector based on CLIP. This projector has been adapted using LoRA-style, parameter-efficient tuning. It has also been initialized with text embeddings.

👥 Authors: @roseannelexie , @Huage001 , Zigeng Chen, Jingwen Ye, and Xinchao Wang

📅 Conference: ICCV, 19 – 23 Oct, 2025 | Honolulu, Hawai'i, USA 🇺🇸

📄 Paper: Heavy Labels Out! Dataset Distillation with Label Space Lightening (2408.08201)

📺 Video: https://www.youtube.com/watch?v=kAyK_3wskgA

🚀 ICCV-2023-25-Papers: https://github.com/DmitryRyumin/ICCV-2023-25-Papers

🚀 Added to the Efficient Learning Section: https://github.com/DmitryRyumin/ICCV-2023-25-Papers/blob/main/sections/2025/main/efficient-learning.md

📚 More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

🔍 Keywords: #DatasetDistillation #LabelCompression #CLIP #LoRA #EfficientAI #FoundationModels #AI #ICCV2025 #ResearchHighlight
  • 2 replies
·
DmitryRyumin 
posted an update 7 months ago
view post
Post
4829
🚀🤖🌟 New Research Alert - ICCV 2025 (Oral)! 🌟🤖🚀
📄 Title: Variance-based Pruning for Accelerating and Compressing Trained Networks 🔝

📝 Description: The one-shot pruning method efficiently compresses networks, reducing computation and memory usage while retaining almost full performance and requiring minimal fine-tuning.

👥 Authors: Uranik Berisha, Jens Mehnert, and Alexandru Paul Condurache

📅 Conference: ICCV, 19 – 23 Oct, 2025 | Honolulu, Hawai'i, USA 🇺🇸

📄 Paper: Variance-Based Pruning for Accelerating and Compressing Trained Networks (2507.12988)

🚀 ICCV-2023-25-Papers: https://github.com/DmitryRyumin/ICCV-2023-25-Papers

🚀 Added to the Efficient Learning Section: https://github.com/DmitryRyumin/ICCV-2023-25-Papers/blob/main/sections/2025/main/efficient-learning.md

📚 More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

🔍 Keywords: #VarianceBasedPruning #NetworkCompression #ModelAcceleration #EfficientDeepLearning #VisionTransformers #AI #ICCV2025 #ResearchHighlight
DmitryRyumin 
posted an update 7 months ago
view post
Post
3038
🚀👁️🌟 New Research Alert - ICCV 2025 (Oral)! 🌟👁️🚀
📄 Title: Token Activation Map to Visually Explain Multimodal LLMs 🔝

📝 Description: The Token Activation Map (TAM) is an advanced explainability method for multimodal LLMs. Using causal inference and a Rank Gaussian Filter, TAM reveals token-level interactions and eliminates redundant activations. The result is clearer, high-quality visualizations that enhance understanding of object localization, reasoning and multimodal alignment across models.

👥 Authors: Yi Li, Hualiang Wang, Xinpeng Ding, Haonan Wang, and Xiaomeng Li

📅 Conference: ICCV, 19 – 23 Oct, 2025 | Honolulu, Hawai'i, USA 🇺🇸

📄 Paper: Token Activation Map to Visually Explain Multimodal LLMs (2506.23270)

📁 Repository: https://github.com/xmed-lab/TAM

🚀 ICCV-2023-25-Papers: https://github.com/DmitryRyumin/ICCV-2023-25-Papers

🚀 Added to the Multi-Modal Learning Section: https://github.com/DmitryRyumin/ICCV-2023-25-Papers/blob/main/sections/2025/main/multi-modal-learning.md

📚 More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

🔍 Keywords: #TokenActivationMap #TAM #CausalInference #VisualReasoning #Multimodal #Explainability #VisionLanguage #LLM #XAI #AI #ICCV2025 #ResearchHighlight
  • 2 replies
·
s3nh 
posted an update 8 months ago
view post
Post
856
Eduhelp with more empathy, based on model finetuned on
psychotheraputic preferences just landed on


Beck-8B as a base model, 13000 steps on educational dataset.
Time to go further and build more 🥰
s3nh/EduHelp_Beck_8B
Thanks to @basilic_ai for computations <3
s3nh 
posted an update 8 months ago
view post
Post
4423
Just tried to create an educational assistant for younger people who can struggle with visualsation of 'what is this sorcery all about'.
Its first step of my spare time projects, sft on Qwen3-8B,

EduHelper is a child-friendly tutoring assistant fine-tuned from the Qwen3-8B base model using parameter-efficient fine-tuning (PEFT) with LoRA on the ajibawa-2023/Education-Young-Children dataset.

s3nh/EduHelp-8B

Glad to share my work, have a wonderful day!
  • 2 replies
·