NeuLab @ LTI/CMU

university

https://www.cs.cmu.edu/~neulab/

neulab

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

seungone authored a paper 7 days ago

Measuring Sycophancy of Language Models in Multi-turn Dialogues

seungone authored a paper 7 days ago

Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning

seungone authored a paper 7 days ago

OptimalThinkingBench: Evaluating Over and Underthinking in LLMs

View all activity

seungone

authored 5 papers 7 days ago

Measuring Sycophancy of Language Models in Multi-turn Dialogues

Paper • 2505.23840 • Published May 28, 2025 • 2

Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning

Paper • 2507.00432 • Published Jul 1, 2025 • 79

OptimalThinkingBench: Evaluating Over and Underthinking in LLMs

Paper • 2508.13141 • Published Aug 18, 2025

VideoJudge: Bootstrapping Enables Scalable Supervision of MLLM-as-a-Judge for Video Understanding

Paper • 2509.21451 • Published Sep 25, 2025

SPICE: Self-Play In Corpus Environments Improves Reasoning

Paper • 2510.24684 • Published Oct 28, 2025 • 17

yuexiang96

authored 4 papers 20 days ago

Agent Data Protocol: Unifying Datasets for Diverse, Effective Fine-tuning of LLM Agents

Paper • 2510.24702 • Published Oct 28, 2025 • 28

The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution

Paper • 2510.25726 • Published Oct 29, 2025 • 45

Simulating Environments with Reasoning Models for Agent Training

Paper • 2511.01824 • Published Nov 3, 2025 • 2

On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models

Paper • 2512.07783 • Published 28 days ago • 36

lintang

published a model 25 days ago

neulab/qwen3-8b-cso-alpha

Updated 25 days ago

yueqis

updated a dataset about 1 month ago

neulab/agent-data-collection

Preview • Updated Dec 2, 2025 • 2.45k • 105

seungone

authored a paper about 1 month ago

RefineBench: Evaluating Refinement Capability of Language Models via Checklists

Paper • 2511.22173 • Published Nov 27, 2025 • 14

yueqis

updated a dataset about 1 month ago

neulab/VisualPuzzles

Viewer • Updated Nov 29, 2025 • 1.17k • 340 • 11

akariasai

authored a paper about 1 month ago

DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research

Paper • 2511.19399 • Published Nov 24, 2025 • 60

yueqis

in neulab/agent-data-collection about 1 month ago

Data in chat template agnostic format

#4 opened about 2 months ago by

license please

#2 opened 2 months ago by

Nyandwi

authored a paper 5 months ago

Grounding Multilingual Multimodal LLMs With Cultural Knowledge

Paper • 2508.07414 • Published Aug 10, 2025 • 1

ProKil

authored 2 papers 5 months ago

Sotopia-RL: Reward Design for Social Intelligence

Paper • 2508.03905 • Published Aug 5, 2025 • 23

SoMi-ToM: Evaluating Multi-Perspective Theory of Mind in Embodied Social Interactions

Paper • 2506.23046 • Published Jun 29, 2025 • 1

yuexiang96

authored a paper 6 months ago

Small Models Struggle to Learn from Strong Reasoners

Paper • 2502.12143 • Published Feb 17, 2025 • 39