🔄 In a Training Loop

Lê Võ Quyết Thắng

thangvip

·

https://vualidon.icu

AI & ML interests

None yet

Recent Activity

updated a Space about 1 month ago

build-small-hackathon/dempsters-court

updated a Space about 1 month ago

build-small-hackathon/analog-town

updated a Space about 1 month ago

build-small-hackathon/compliment-forest

View all activity

Organizations

upvoted a paper 10 months ago

The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain

Paper • 2509.26507 • Published Sep 30, 2025 • 551

upvoted 3 papers 12 months ago

CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning

Paper • 2507.14111 • Published Jul 18, 2025 • 24

Agentic Reinforced Policy Optimization

Paper • 2507.19849 • Published Jul 26, 2025 • 161

RecGPT Technical Report

Paper • 2507.22879 • Published Jul 30, 2025 • 39

upvoted 2 papers about 1 year ago

The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models

Paper • 2505.22617 • Published May 28, 2025 • 132

R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token Routing

Paper • 2505.21600 • Published May 27, 2025 • 71

upvoted an article about 1 year ago

Article

Vision Language Models (Better, faster, stronger)

+3

merve, sergiopaniego, ariG23498, pcuenq, andito

•

May 12, 2025

• 614

upvoted a paper over 1 year ago

How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM?

Paper • 2502.14502 • Published Feb 20, 2025 • 92

upvoted 4 articles over 1 year ago

Article

Training and Finetuning Reranker Models with Sentence Transformers

tomaarsen

•

Mar 26, 2025

• 196

Article

Open R1: Update #3

open-r1

•

Mar 11, 2025

• 298

Article

SmolVLM Grows Smaller – Introducing the 256M & 500M Models!

+1

andito, mfarre, merve

•

Jan 23, 2025

• 192

Article

Open-R1: a fully open reproduction of DeepSeek-R1

+1

eliebak, lvwerra, lewtun

•

Jan 28, 2025

• 891

upvoted 5 papers over 1 year ago

Agent Laboratory: Using LLM Agents as Research Assistants

Paper • 2501.04227 • Published Jan 8, 2025 • 95

The GAN is dead; long live the GAN! A Modern GAN Baseline

Paper • 2501.05441 • Published Jan 9, 2025 • 98

Building and better understanding vision-language models: insights and future directions

Paper • 2408.12637 • Published Aug 22, 2024 • 134

Natural Language Reinforcement Learning

Paper • 2411.14251 • Published Nov 21, 2024 • 31

Hymba: A Hybrid-head Architecture for Small Language Models

Paper • 2411.13676 • Published Nov 20, 2024 • 50

upvoted a collection over 1 year ago

Tulu 3 Datasets

All datasets released with Tulu 3 -- state of the art open post-training recipes. • 32 items • Updated Mar 2 • 102

upvoted 2 papers over 1 year ago

TÜLU 3: Pushing Frontiers in Open Language Model Post-Training

Paper • 2411.15124 • Published Nov 22, 2024 • 68

Chain-of-Thought Reasoning Without Prompting

Paper • 2402.10200 • Published Feb 15, 2024 • 111