ShowAndTell
updated
SPaR: Self-Play with Tree-Search Refinement to Improve
Instruction-Following in Large Language Models
Paper
•
2412.11605
•
Published
•
18
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper
•
2412.09871
•
Published
•
108
Fourier Position Embedding: Enhancing Attention's Periodic Extension for
Length Generalization
Paper
•
2412.17739
•
Published
•
41
SKETCH: Structured Knowledge Enhanced Text Comprehension for Holistic
Retrieval
Paper
•
2412.15443
•
Published
•
10
ProgCo: Program Helps Self-Correction of Large Language Models
Paper
•
2501.01264
•
Published
•
26
SDPO: Segment-Level Direct Preference Optimization for Social Agents
Paper
•
2501.01821
•
Published
•
20
ToolHop: A Query-Driven Benchmark for Evaluating Large Language Models
in Multi-Hop Tool Use
Paper
•
2501.02506
•
Published
•
10
PRMBench: A Fine-grained and Challenging Benchmark for Process-Level
Reward Models
Paper
•
2501.03124
•
Published
•
13
Evaluating Sample Utility for Data Selection by Mimicking Model Weights
Paper
•
2501.06708
•
Published
•
5
Atla Selene Mini: A General Purpose Evaluation Model
Paper
•
2501.17195
•
Published
•
35
Exploring the Limit of Outcome Reward for Learning Mathematical
Reasoning
Paper
•
2502.06781
•
Published
•
58
SpargeAttn: Accurate Sparse Attention Accelerating Any Model Inference
Paper
•
2502.18137
•
Published
•
59
StructFlowBench: A Structured Flow Benchmark for Multi-turn Instruction
Following
Paper
•
2502.14494
•
Published
•
15
Agentic Reward Modeling: Integrating Human Preferences with Verifiable
Correctness Signals for Reliable Reward Systems
Paper
•
2502.19328
•
Published
•
23
Can Large Language Models Detect Errors in Long Chain-of-Thought
Reasoning?
Paper
•
2502.19361
•
Published
•
28
Towards an AI co-scientist
Paper
•
2502.18864
•
Published
•
51
Predictive Data Selection: The Data That Predicts Is the Data That
Teaches
Paper
•
2503.00808
•
Published
•
56
From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence
Generation up to 100K Tokens
Paper
•
2502.18890
•
Published
•
30
SampleMix: A Sample-wise Pre-training Data Mixing Strategey by
Coordinating Data Quality and Diversity
Paper
•
2503.01506
•
Published
•
10
General Reasoning Requires Learning to Reason from the Get-go
Paper
•
2502.19402
•
Published
•
5
LADDER: Self-Improving LLMs Through Recursive Problem Decomposition
Paper
•
2503.00735
•
Published
•
23
Process-based Self-Rewarding Language Models
Paper
•
2503.03746
•
Published
•
39
IFIR: A Comprehensive Benchmark for Evaluating Instruction-Following in
Expert-Domain Information Retrieval
Paper
•
2503.04644
•
Published
•
21
Know You First and Be You Better: Modeling Human-Like User Simulators
via Implicit Profiles
Paper
•
2502.18968
•
Published
•
3
TruthPrInt: Mitigating LVLM Object Hallucination Via Latent
Truthful-Guided Pre-Intervention
Paper
•
2503.10602
•
Published
•
4
Temporal Consistency for LLM Reasoning Process Error Identification
Paper
•
2503.14495
•
Published
•
11
EvalTree: Profiling Language Model Weaknesses via Hierarchical
Capability Trees
Paper
•
2503.08893
•
Published
•
6
Discovering Knowledge Deficiencies of Language Models on Massive
Knowledge Base
Paper
•
2503.23361
•
Published
•
5
Bridging Evolutionary Multiobjective Optimization and GPU Acceleration
via Tensorization
Paper
•
2503.20286
•
Published
•
3
ScholarCopilot: Training Large Language Models for Academic Writing with
Accurate Citations
Paper
•
2504.00824
•
Published
•
43
Agentic Knowledgeable Self-awareness
Paper
•
2504.03553
•
Published
•
27
Heimdall: test-time scaling on the generative verification
Paper
•
2504.10337
•
Published
•
33
Paper
•
2504.11442
•
Published
•
30
Efficient Process Reward Model Training via Active Learning
Paper
•
2504.10559
•
Published
•
13
AI-University: An LLM-based platform for instructional alignment to
scientific classrooms
Paper
•
2504.08846
•
Published
•
9
Learning Adaptive Parallel Reasoning with Language Models
Paper
•
2504.15466
•
Published
•
44
Self-Generated In-Context Examples Improve LLM Agents for Sequential
Decision-Making Tasks
Paper
•
2505.00234
•
Published
•
26
100 Days After DeepSeek-R1: A Survey on Replication Studies and More
Directions for Reasoning Language Models
Paper
•
2505.00551
•
Published
•
36
Toward Evaluative Thinking: Meta Policy Optimization with Evolving
Reward Models
Paper
•
2504.20157
•
Published
•
37
TreeHop: Generate and Filter Next Query Embeddings Efficiently for
Multi-hop Question Answering
Paper
•
2504.20114
•
Published
•
4
SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning
Paper
•
2504.19162
•
Published
•
18
Enhancing LLM Reasoning with Iterative DPO: A Comprehensive Empirical
Investigation
Paper
•
2503.12854
•
Published
LLaMA-Omni2: LLM-based Real-time Spoken Chatbot with Autoregressive
Streaming Speech Synthesis
Paper
•
2505.02625
•
Published
•
22
Beyond One-Size-Fits-All: Inversion Learning for Highly Effective NLG
Evaluation Prompts
Paper
•
2504.21117
•
Published
•
26
CORG: Generating Answers from Complex, Interrelated Contexts
Paper
•
2505.00023
•
Published
•
9
RetroInfer: A Vector-Storage Approach for Scalable Long-Context LLM
Inference
Paper
•
2505.02922
•
Published
•
28
Scalable Chain of Thoughts via Elastic Reasoning
Paper
•
2505.05315
•
Published
•
26
X-Reasoner: Towards Generalizable Reasoning Across Modalities and
Domains
Paper
•
2505.03981
•
Published
•
15
AutoLibra: Agent Metric Induction from Open-Ended Feedback
Paper
•
2505.02820
•
Published
•
3
Phare: A Safety Probe for Large Language Models
Paper
•
2505.11365
•
Published
•
7
ConvSearch-R1: Enhancing Query Reformulation for Conversational Search
with Reasoning via Reinforcement Learning
Paper
•
2505.15776
•
Published
•
10
BARREL: Boundary-Aware Reasoning for Factual and Reliable LRMs
Paper
•
2505.13529
•
Published
•
11
Text Generation Beyond Discrete Token Sampling
Paper
•
2505.14827
•
Published
•
10
Scaling Diffusion Transformers Efficiently via μP
Paper
•
2505.15270
•
Published
•
35
TabSTAR: A Foundation Tabular Model With Semantically Target-Aware
Representations
Paper
•
2505.18125
•
Published
•
112
QwenLong-CPRS: Towards infty-LLMs with Dynamic Context Optimization
Paper
•
2505.18092
•
Published
•
43
Quartet: Native FP4 Training Can Be Optimal for Large Language Models
Paper
•
2505.14669
•
Published
•
78
Learning to Reason without External Rewards
Paper
•
2505.19590
•
Published
•
29
Can Large Language Models Infer Causal Relationships from Real-World
Text?
Paper
•
2505.18931
•
Published
•
1
MiniCPM4: Ultra-Efficient LLMs on End Devices
Paper
•
2506.07900
•
Published
•
93
ExpertLongBench: Benchmarking Language Models on Expert-Level Long-Form
Generation Tasks with Structured Checklists
Paper
•
2506.01241
•
Published
•
9
What Is Seen Cannot Be Unseen: The Disruptive Effect of Knowledge
Conflict on Large Language Models
Paper
•
2506.06485
•
Published
•
5
Cartridges: Lightweight and general-purpose long context representations
via self-study
Paper
•
2506.06266
•
Published
•
7
Improving large language models with concept-aware fine-tuning
Paper
•
2506.07833
•
Published
•
3
HASHIRU: Hierarchical Agent System for Hybrid Intelligent Resource
Utilization
Paper
•
2506.04255
•
Published
•
5
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper
•
2505.24726
•
Published
•
277
MemMamba: Rethinking Memory Patterns in State Space Model
Paper
•
2510.03279
•
Published
•
72