wo-datacraft
's Collections
Toolkit - AI Papers
updated
Neural Machine Translation by Jointly Learning to Align and Translate
Paper
•
1409.0473
•
Published
•
7
Attention Is All You Need
Paper
•
1706.03762
•
Published
•
108
BERT: Pre-training of Deep Bidirectional Transformers for Language
Understanding
Paper
•
1810.04805
•
Published
•
25
Hierarchical Reasoning Model
Paper
•
2506.21734
•
Published
•
46
Scaling Laws for Neural Language Models
Paper
•
2001.08361
•
Published
•
9
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and
lighter
Paper
•
1910.01108
•
Published
•
21
Language Models are Few-Shot Learners
Paper
•
2005.14165
•
Published
•
18
LoRA: Low-Rank Adaptation of Large Language Models
Paper
•
2106.09685
•
Published
•
56
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
Paper
•
2005.11401
•
Published
•
14
Training language models to follow instructions with human feedback
Paper
•
2203.02155
•
Published
•
24
Switch Transformers: Scaling to Trillion Parameter Models with Simple
and Efficient Sparsity
Paper
•
2101.03961
•
Published
•
13
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
Paper
•
2208.07339
•
Published
•
5
PaLM: Scaling Language Modeling with Pathways
Paper
•
2204.02311
•
Published
•
3
A Survey on Large Language Model based Autonomous Agents
Paper
•
2308.11432
•
Published
•
3
Paper
•
2303.08774
•
Published
•
7
Large Language Models are Zero-Shot Reasoners
Paper
•
2205.11916
•
Published
•
3
Principled Instructions Are All You Need for Questioning LLaMA-1/2,
GPT-3.5/4
Paper
•
2312.16171
•
Published
•
37
Toolformer: Language Models Can Teach Themselves to Use Tools
Paper
•
2302.04761
•
Published
•
12
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts
Language Model
Paper
•
2405.04434
•
Published
•
24
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via
Reinforcement Learning
Paper
•
2501.12948
•
Published
•
434
Absolute Zero: Reinforced Self-play Reasoning with Zero Data
Paper
•
2505.03335
•
Published
•
188
Paper
•
2505.09388
•
Published
•
321
Rethinking Memory in AI: Taxonomy, Operations, Topics, and Future
Directions
Paper
•
2505.00675
•
Published
•
3
Small Language Models are the Future of Agentic AI
Paper
•
2506.02153
•
Published
•
23
gpt-oss-120b & gpt-oss-20b Model Card
Paper
•
2508.10925
•
Published
•
12
Large Language Diffusion Models
Paper
•
2502.09992
•
Published
•
124
Less is More: Recursive Reasoning with Tiny Networks
Paper
•
2510.04871
•
Published
•
501
Poisoning Attacks on LLMs Require a Near-constant Number of Poison
Samples
Paper
•
2510.07192
•
Published
•
5
A Survey of Vibe Coding with Large Language Models
Paper
•
2510.12399
•
Published
•
49
From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence
Paper
•
2511.18538
•
Published
•
282
Denoising Diffusion Probabilistic Models
Paper
•
2006.11239
•
Published
•
8
Denoising Diffusion Implicit Models
Paper
•
2010.02502
•
Published
•
4
Score-Based Generative Modeling through Stochastic Differential
Equations
Paper
•
2011.13456
•
Published
•
2
Learning Transferable Visual Models From Natural Language Supervision
Paper
•
2103.00020
•
Published
•
19
Hierarchical Text-Conditional Image Generation with CLIP Latents
Paper
•
2204.06125
•
Published
•
3
Classifier-Free Diffusion Guidance
Paper
•
2207.12598
•
Published
•
4
Exploring the Limits of Transfer Learning with a Unified Text-to-Text
Transformer
Paper
•
1910.10683
•
Published
•
15
LLaMA: Open and Efficient Foundation Language Models
Paper
•
2302.13971
•
Published
•
20
Paper
•
2310.06825
•
Published
•
56
Gemma 2: Improving Open Language Models at a Practical Size
Paper
•
2408.00118
•
Published
•
78
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model
Paper
•
2502.02737
•
Published
•
253
An Image is Worth 16x16 Words: Transformers for Image Recognition at
Scale
Paper
•
2010.11929
•
Published
•
15