Code-Switching Information Retrieval: Benchmarks, Analysis, and the Limits of Current Retrievers Paper • 2604.17632 • Published 9 days ago • 11
Dual-View Training for Instruction-Following Information Retrieval Paper • 2604.18845 • Published 8 days ago • 10
Thinking Out Loud: Do Reasoning Models Know When They're Right? Paper • 2504.06564 • Published Apr 9, 2025
The Pragmatic Mind of Machines: Tracing the Emergence of Pragmatic Competence in Large Language Models Paper • 2505.18497 • Published May 24, 2025 • 2
Seeing is Believing, but How Much? A Comprehensive Analysis of Verbalized Calibration in Vision-Language Models Paper • 2505.20236 • Published May 26, 2025 • 3
DeepSieve: Information Sieving via LLM-as-a-Knowledge-Router Paper • 2507.22050 • Published Jul 29, 2025
CARES: Comprehensive Evaluation of Safety and Adversarial Robustness in Medical LLMs Paper • 2505.11413 • Published May 16, 2025
Position: The Hidden Costs and Measurement Gaps of Reinforcement Learning with Verifiable Rewards Paper • 2509.21882 • Published Sep 26, 2025
Good Intentions Beyond ACL: Who Does NLP for Social Good, and Where? Paper • 2510.04434 • Published Oct 6, 2025 • 6
The Confidence Dichotomy: Analyzing and Mitigating Miscalibration in Tool-Use Agents Paper • 2601.07264 • Published Jan 12 • 24
NAACL: Noise-AwAre Verbal Confidence Calibration for LLMs in RAG Systems Paper • 2601.11004 • Published Jan 16 • 30
The Confidence Dichotomy: Analyzing and Mitigating Miscalibration in Tool-Use Agents Paper • 2601.07264 • Published Jan 12 • 24
Uncertainty is Fragile: Manipulating Uncertainty in Large Language Models Paper • 2407.11282 • Published Jul 15, 2024 • 1
Exploring Concept Depth: How Large Language Models Acquire Knowledge at Different Layers? Paper • 2404.07066 • Published Apr 10, 2024
MMLU-ProX: A Multilingual Benchmark for Advanced Large Language Model Evaluation Paper • 2503.10497 • Published Mar 13, 2025 • 2