PeppePasti 's Collections LLMs
updated
SciLitLLM: How to Adapt LLMs for Scientific Literature Understanding
Paper
• 2408.15545
• Published • 38
Controllable Text Generation for Large Language Models: A Survey
Paper
• 2408.12599
• Published • 65
To Code, or Not To Code? Exploring Impact of Code in Pre-training
Paper
• 2408.10914
• Published • 45
Automated Design of Agentic Systems
Paper
• 2408.08435
• Published • 40
Gemini 1.5: Unlocking multimodal understanding across millions of tokens
of context
Paper
• 2403.05530
• Published • 64
Fast Inference from Transformers via Speculative Decoding
Paper
• 2211.17192
• Published • 11
Unlocking Efficiency in Large Language Model Inference: A Comprehensive
Survey of Speculative Decoding
Paper
• 2401.07851
• Published • 3
Decomposition Enhances Reasoning via Self-Evaluation Guided Decoding
Paper
• 2305.00633
• Published • 1
Chain-of-Verification Reduces Hallucination in Large Language Models
Paper
• 2309.11495
• Published • 40
Fine-Grained Human Feedback Gives Better Rewards for Language Model
Training
Paper
• 2306.01693
• Published • 3
Jamba-1.5: Hybrid Transformer-Mamba Models at Scale
Paper
• 2408.12570
• Published • 32
Gemma 2: Improving Open Language Models at a Practical Size
Paper
• 2408.00118
• Published • 78
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper
• 2402.17764
• Published • 628
LLM in a flash: Efficient Large Language Model Inference with Limited
Memory
Paper
• 2312.11514
• Published • 264
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper
• 2403.03507
• Published • 190
DoLa: Decoding by Contrasting Layers Improves Factuality in Large
Language Models
Paper
• 2309.03883
• Published • 36
Textbooks Are All You Need
Paper
• 2306.11644
• Published • 154
Orca: Progressive Learning from Complex Explanation Traces of GPT-4
Paper
• 2306.02707
• Published • 51
LLM Pruning and Distillation in Practice: The Minitron Approach
Paper
• 2408.11796
• Published • 60
Language Modeling on Tabular Data: A Survey of Foundations, Techniques
and Evolution
Paper
• 2408.10548
• Published
PEDAL: Enhancing Greedy Decoding with Large Language Models using
Diverse Exemplars
Paper
• 2408.08869
• Published
The Mamba in the Llama: Distilling and Accelerating Hybrid Models
Paper
• 2408.15237
• Published • 42
Leveraging Open Knowledge for Advancing Task Expertise in Large Language
Models
Paper
• 2408.15915
• Published • 19
Knowledge Navigator: LLM-guided Browsing Framework for Exploratory
Search in Scientific Literature
Paper
• 2408.15836
• Published • 14
Training Compute-Optimal Large Language Models
Paper
• 2203.15556
• Published • 11
Griffin: Mixing Gated Linear Recurrences with Local Attention for
Efficient Language Models
Paper
• 2402.19427
• Published • 57
Scaling Law with Learning Rate Annealing
Paper
• 2408.11029
• Published • 4
Physics of Language Models: Part 2.1, Grade-School Math and the Hidden
Reasoning Process
Paper
• 2407.20311
• Published • 5
Physics of Language Models: Part 1, Context-Free Grammar
Paper
• 2305.13673
• Published • 7
Physics of Language Models: Part 3.2, Knowledge Manipulation
Paper
• 2309.14402
• Published • 7
Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws
Paper
• 2404.05405
• Published • 10
Physics of Language Models: Part 3.1, Knowledge Storage and Extraction
Paper
• 2309.14316
• Published • 9
Physics of Language Models: Part 2.2, How to Learn From Mistakes on
Grade-School Math Problems
Paper
• 2408.16293
• Published • 27
Language Models are Few-Shot Learners
Paper
• 2005.14165
• Published • 20
ContextCite: Attributing Model Generation to Context
Paper
• 2409.00729
• Published • 14
OLMoE: Open Mixture-of-Experts Language Models
Paper
• 2409.02060
• Published • 80
LongRecipe: Recipe for Efficient Long Context Generalization in Large
Languge Models
Paper
• 2409.00509
• Published • 42
LongCite: Enabling LLMs to Generate Fine-grained Citations in
Long-context QA
Paper
• 2409.02897
• Published • 48
Arctic-SnowCoder: Demystifying High-Quality Data in Code Pretraining
Paper
• 2409.02326
• Published • 19
Attention Heads of Large Language Models: A Survey
Paper
• 2409.03752
• Published • 92
Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal
Sampling
Paper
• 2408.16737
• Published • 1
Many-Shot In-Context Learning
Paper
• 2404.11018
• Published • 4
How Do Your Code LLMs Perform? Empowering Code Instruction Tuning with
High-Quality Data
Paper
• 2409.03810
• Published • 35
Configurable Foundation Models: Building LLMs from a Modular Perspective
Paper
• 2409.02877
• Published • 32
Selective Reflection-Tuning: Student-Selected Data Recycling for LLM
Instruction-Tuning
Paper
• 2402.10110
• Published • 3
Making the Most of your Model: Methods for Finetuning and Applying
Pretrained Transformers
Paper
• 2408.16241
• Published
Towards a Unified View of Preference Learning for Large Language Models:
A Survey
Paper
• 2409.02795
• Published • 72
PingPong: A Benchmark for Role-Playing Language Models with User
Emulation and Multi-Model Evaluation
Paper
• 2409.06820
• Published • 68
Can Large Language Models Unlock Novel Scientific Research Ideas?
Paper
• 2409.06185
• Published • 15
Self-Harmonized Chain of Thought
Paper
• 2409.04057
• Published • 18
RetrievalAttention: Accelerating Long-Context LLM Inference via Vector
Retrieval
Paper
• 2409.10516
• Published • 43
Kolmogorov-Arnold Transformer
Paper
• 2409.10594
• Published • 45
A Comprehensive Evaluation of Quantized Instruction-Tuned Large Language
Models: An Experimental Analysis up to 405B
Paper
• 2409.11055
• Published • 17
A Controlled Study on Long Context Extension and Generalization in LLMs
Paper
• 2409.12181
• Published • 45
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic
reasoning
Paper
• 2409.12183
• Published • 39
Training Language Models to Self-Correct via Reinforcement Learning
Paper
• 2409.12917
• Published • 140
Language Models Learn to Mislead Humans via RLHF
Paper
• 2409.12822
• Published • 11
A Preliminary Study of o1 in Medicine: Are We Closer to an AI Doctor?
Paper
• 2409.15277
• Published • 38
Beyond Fine-tuning: Unleashing the Potential of Continuous Pretraining
for Clinical LLMs
Paper
• 2409.14988
• Published • 22
A Case Study of Web App Coding with OpenAI Reasoning Models
Paper
• 2409.13773
• Published • 7
EuroLLM: Multilingual Language Models for Europe
Paper
• 2409.16235
• Published • 29
OmniBench: Towards The Future of Universal Omni-Language Models
Paper
• 2409.15272
• Published • 30
NoTeeline: Supporting Real-Time Notetaking from Keypoints with Large
Language Models
Paper
• 2409.16493
• Published • 10
MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models
Paper
• 2409.17481
• Published • 47
Discovering the Gems in Early Layers: Accelerating Long-Context LLMs
with 1000x Input Token Reduction
Paper
• 2409.17422
• Published • 25
Skywork Open Reasoner 1 Technical Report
Paper
• 2505.22312
• Published • 55
SageAttention2++: A More Efficient Implementation of SageAttention2
Paper
• 2505.21136
• Published • 45
Let's Predict Sentence by Sentence
Paper
• 2505.22202
• Published • 19