Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Website
Tasks
HuggingChat
Collections
Languages
Organizations
Community
Blog
Posts
Daily Papers
Learn
Discord
Forum
GitHub
Solutions
Team & Enterprise
Hugging Face PRO
Enterprise Support
Inference Providers
Inference Endpoints
Storage Buckets
Log In
Sign Up
Open to Collab
131.0
TFLOPS
1
21
42
Soumik Rakshit
geekyrakshit
Follow
fmamberti-wandb's profile picture
pcuenq's profile picture
BenjaminDv's profile picture
25 followers
ยท
40 following
http://geekyrakshit.dev
soumikRakshit96
soumik12345
soumikrakshit
AI & ML interests
Computer vision
Recent Activity
reacted
to
m-ric
's
post
with ๐
about 11 hours ago
๐๐ฎ๐ ๐ ๐ข๐ง๐ ๐ ๐๐๐ ๐ซ๐๐ฅ๐๐๐ฌ๐๐ฌ ๐๐ข๐๐จ๐ญ๐ซ๐จ๐ง, ๐ ๐ฆ๐ข๐๐ซ๐จ๐ฌ๐๐จ๐ฉ๐ข๐ ๐ฅ๐ข๐ ๐ญ๐ก๐๐ญ ๐ฌ๐จ๐ฅ๐ฏ๐๐ฌ ๐๐๐ ๐ญ๐ซ๐๐ข๐ง๐ข๐ง๐ ๐๐ ๐ฉ๐๐ซ๐๐ฅ๐ฅ๐๐ฅ๐ข๐ณ๐๐ญ๐ข๐จ๐ง ๐ฅณ ๐ฐ๏ธ Llama-3.1-405B took 39 million GPU-hours to train, i.e. about 4.5 thousand years. ๐ด๐ป If they had needed all this time, we would have GPU stories from the time of Pharaoh ๐: "Alas, Lord of Two Lands, the shipment of counting-stones arriving from Cathay was lost to pirates, this shall delay the building of your computing temple by many moons " ๐ ๏ธ But instead, they just parallelized the training on 24k H100s, which made it take just a few months. This required parallelizing across 4 dimensions: data, tensor, context, pipeline. And it is infamously hard to do, making for bloated code repos that hold together only by magic. ๐ค ๐๐๐ ๐ป๐ผ๐ ๐๐ฒ ๐ฑ๐ผ๐ป'๐ ๐ป๐ฒ๐ฒ๐ฑ ๐ต๐๐ด๐ฒ ๐ฟ๐ฒ๐ฝ๐ผ๐ ๐ฎ๐ป๐๐บ๐ผ๐ฟ๐ฒ! Instead of building mega-training codes, Hugging Face colleagues cooked in the other direction, towards tiny 4D parallelism libs. A team has built Nanotron, already widely used in industry. And now a team releases Picotron, a radical approach to code 4D Parallelism in just a few hundred lines of code, a real engineering prowess, making it much easier to understand what's actually happening! โก ๐๐'๐ ๐๐ถ๐ป๐, ๐๐ฒ๐ ๐ฝ๐ผ๐๐ฒ๐ฟ๐ณ๐๐น: Counting in MFU (Model FLOPs Utilization, how much the model actually uses all the compute potential), this lib reaches ~50% on SmolLM-1.7B model with 8 H100 GPUs, which is really close to what huge libs would reach. (Caution: the team is leading further benchmarks to verify this) Go take a look ๐ https://github.com/huggingface/picotron/tree/main/picotron
liked
a dataset
1 day ago
ILSVRC/imagenet-1k
upvoted
an
article
15 days ago
Profiling in PyTorch (Part 1): A Beginner's Guide to torch.profiler
View all activity
Organizations
geekyrakshit
's models
5
Sort:ย Recently updated
geekyrakshit/binary-classifier
67M
โข
Updated
Nov 29, 2024
โข
4
geekyrakshit/grays-anatomy-index-medcpt
Updated
Nov 3, 2024
โข
2
geekyrakshit/grays-anatomy-index-contriever
Updated
Nov 3, 2024
โข
5
geekyrakshit/grays-anatomy-index
Updated
Nov 3, 2024
โข
4
geekyrakshit/DeepLabV3-Plus
Updated
Jul 3, 2023
โข
131