D-CORE: Incentivizing Task Decomposition in Large Reasoning Models for Complex Tool Use

This repository contains the weights for D-CORE (Decomposing tasks and Composing Reasoning processes), a two-stage training framework designed to enhance the task decomposition and reflective reasoning capabilities of Large Reasoning Models (LRMs) for complex tool use.

Introduction

Effective tool use and reasoning are essential capabilities for large reasoning models (LRMs) to address complex real-world problems. Through empirical analysis, the authors identified that current LRMs lack the capability of sub-task decomposition in complex tool use scenarios, leading to "Lazy Reasoning."

To address this, D-CORE proposes a two-stage training framework:

Self-distillation: Incentivizes the LRM's task decomposition reasoning capability.
Diversity-aware Reinforcement Learning (RL): Restores the LRM's reflective reasoning capability.

D-CORE achieves robust tool-use improvements across diverse benchmarks and model scales. Notably, D-CORE-14B establishes a new state-of-the-art on BFCLv3, outperforming 70B models despite being 5$\times$ smaller.

Resources

Paper: D-CORE: Incentivizing Task Decomposition in Large Reasoning Models for Complex Tool Use
Arxiv: 2602.02160
Code: EfficientAI (GitHub)

Authors

Bowen Xu, Shaoyu Wu, Hao Jiang, Kai Liu, Xin Chen, Lulu Hu, Bin Yang

Performance

BFCL

In our network environment, for the Web Search No Snippet task, we are unable to access certain websites (e.g., Wikipedia), which results in some deviation in the No Snippet scores.

Model	Overall	Agentic							Multi Turn					Single Turn										Hallucination Measurement		Format Sensitivity
		Web Search			Memory				Overall Acc	Base	Miss Func	Miss Param	Long Context	Non-live					Live					Relevance	Irrelevance	Max Delta	SD
		Summary	Base	No Snippet	Summary	KV	Vector	Recusive Sum						Overall Acc	Simple	Multiple	Parallel	Multiple Parallel	Overall Acc	Simple	Multiple	Parallel	Multiple Parallel
		Summary	Base	No Snippet	Summary	KV	Vector	Recusive Sum						Overall Acc	Simple	Multiple	Parallel	Multiple Parallel	Overall Acc	Simple	Multiple	Parallel	Multiple Parallel
D-CORE-8B	53.15	23.00	36.00	10.00	19.14	9.03	16.77	31.61	64.88	75.50	65.00	60.50	58.50	86.85	75.92	92.50	92.00	87.00	75.80	78.29	75.02	100.00	66.67	75.00	89.99	75.0	24.67

Tau-Bench & Tau2-Bench

We use Qwen3-235B-A22B-Instruct-2507 as the user model. For each task, we sample 5 times and take the average as the final result.

Model	Tau-Bench			Tau2-Bench
	Overall	Retail	Airline	Overall	Retail	Airline	Telecom
	Overall	Retail	Airline	Overall	Retail	Airline	Telecom
D-CORE-8B	44.9	53.0	36.8	35.8	43.2	37.1	27.2

ACEBench

Model	Overall	Atom							Single Turn			Multi Turn			Similar API	Preference	Summary	Special				Agent
		Summary	Bool	Enum	Number	List	Object Short	Object Deep	Summary	Singal Function	Parallel Function	Summary	Switch	Adjust				Summary	Incomplete	Error	Irrelevant	Summary	Multi Turn	Multi Turn Process	Multi Step	Multi Step Process
		Summary	Bool	Enum	Number	List	Object Short	Object Deep	Summary	Singal Function	Parallel Function	Summary	Switch	Adjust				Summary	Incomplete	Error	Irrelevant	Summary	Multi Turn	Multi Turn Process	Multi Step	Multi Step Process
D-CORE-8B	75.2	82.7	90.0	98.0	98.0	98.0	36.0	76.0	77.5	85.0	70.0	62.0	64.0	60.0	78.0	82.0	77.9	78.7	58.0	82.0	96.0	59.2	43.3	66.8	75.0	80.8

Citation

If you find our work useful, please cite:

@article{xu2026dcore,
  title={D-CORE: Incentivizing Task Decomposition in Large Reasoning Models for Complex Tool Use},
  author={Xu, Bowen and Wu, Shaoyu and Jiang, Hao and Liu, Kai and Chen, Xin and Hu, Lulu and Yang, Bin},
  journal={arXiv preprint arXiv:2602.02160},
  year={2026}
}

Downloads last month: 22

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for bowiehsu/D-CORE-8B

Quantizations

2 models

Paper for bowiehsu/D-CORE-8B

D-CORE: Incentivizing Task Decomposition in Large Reasoning Models for Complex Tool Use

Paper • 2602.02160 • Published 12 days ago • 14