Toolkit - AI Papers - a wo-datacraft Collection

wo-datacraft 's Collections

Image Classification

Image Generation

Speech Generation

Speech Recognition

Text Generation - General

Text Generation - Reasoning

Text Generation - Vision

Toolkit - AI Papers

Toolkit - Datasets

Toolkit - Embeddings

Toolkit - Prompting Papers

Toolkit - Segmentation

Toolkit - Utilities

Video Generation

Toolkit - AI Papers

updated 18 days ago

Neural Machine Translation by Jointly Learning to Align and Translate

Paper • 1409.0473 • Published Sep 1, 2014 • 7
Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 108
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 25
Hierarchical Reasoning Model

Paper • 2506.21734 • Published Jun 26, 2025 • 46
Scaling Laws for Neural Language Models

Paper • 2001.08361 • Published Jan 23, 2020 • 9
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

Paper • 1910.01108 • Published Oct 2, 2019 • 21
Language Models are Few-Shot Learners

Paper • 2005.14165 • Published May 28, 2020 • 18
LoRA: Low-Rank Adaptation of Large Language Models

Paper • 2106.09685 • Published Jun 17, 2021 • 56
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

Paper • 2005.11401 • Published May 22, 2020 • 14
Training language models to follow instructions with human feedback

Paper • 2203.02155 • Published Mar 4, 2022 • 24
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity

Paper • 2101.03961 • Published Jan 11, 2021 • 13
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale

Paper • 2208.07339 • Published Aug 15, 2022 • 5
PaLM: Scaling Language Modeling with Pathways

Paper • 2204.02311 • Published Apr 5, 2022 • 3
A Survey on Large Language Model based Autonomous Agents

Paper • 2308.11432 • Published Aug 22, 2023 • 3
GPT-4 Technical Report

Paper • 2303.08774 • Published Mar 15, 2023 • 7
Large Language Models are Zero-Shot Reasoners

Paper • 2205.11916 • Published May 24, 2022 • 3
Principled Instructions Are All You Need for Questioning LLaMA-1/2, GPT-3.5/4

Paper • 2312.16171 • Published Dec 26, 2023 • 37
Toolformer: Language Models Can Teach Themselves to Use Tools

Paper • 2302.04761 • Published Feb 9, 2023 • 12
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Paper • 2405.04434 • Published May 7, 2024 • 24
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22, 2025 • 434
Absolute Zero: Reinforced Self-play Reasoning with Zero Data

Paper • 2505.03335 • Published May 6, 2025 • 188
Qwen3 Technical Report

Paper • 2505.09388 • Published May 14, 2025 • 321
Rethinking Memory in AI: Taxonomy, Operations, Topics, and Future Directions

Paper • 2505.00675 • Published May 1, 2025 • 3
Small Language Models are the Future of Agentic AI

Paper • 2506.02153 • Published Jun 2, 2025 • 23
gpt-oss-120b & gpt-oss-20b Model Card

Paper • 2508.10925 • Published Aug 8, 2025 • 12
Large Language Diffusion Models

Paper • 2502.09992 • Published Feb 14, 2025 • 124
Less is More: Recursive Reasoning with Tiny Networks

Paper • 2510.04871 • Published Oct 6, 2025 • 501
Poisoning Attacks on LLMs Require a Near-constant Number of Poison Samples

Paper • 2510.07192 • Published Oct 8, 2025 • 5
A Survey of Vibe Coding with Large Language Models

Paper • 2510.12399 • Published Oct 14, 2025 • 49
From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence

Paper • 2511.18538 • Published Nov 23, 2025 • 282
Denoising Diffusion Probabilistic Models

Paper • 2006.11239 • Published Jun 19, 2020 • 8
Denoising Diffusion Implicit Models

Paper • 2010.02502 • Published Oct 6, 2020 • 4
Score-Based Generative Modeling through Stochastic Differential Equations

Paper • 2011.13456 • Published Nov 26, 2020 • 2
Learning Transferable Visual Models From Natural Language Supervision

Paper • 2103.00020 • Published Feb 26, 2021 • 19
Hierarchical Text-Conditional Image Generation with CLIP Latents

Paper • 2204.06125 • Published Apr 13, 2022 • 3
Classifier-Free Diffusion Guidance

Paper • 2207.12598 • Published Jul 26, 2022 • 4
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Paper • 1910.10683 • Published Oct 23, 2019 • 15
LLaMA: Open and Efficient Foundation Language Models

Paper • 2302.13971 • Published Feb 27, 2023 • 20
Mistral 7B

Paper • 2310.06825 • Published Oct 10, 2023 • 56
Gemma 2: Improving Open Language Models at a Practical Size

Paper • 2408.00118 • Published Jul 31, 2024 • 78
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published Feb 4, 2025 • 253
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Paper • 2010.11929 • Published Oct 22, 2020 • 15