EAGLE3-LLaMA3.1-Instruct-8B-YARN-64K

Model Description

This model extends yuhuili/EAGLE3-LLaMA3.1-Instruct-8B with YARN-based positional interpolation to support context lengths of up to 64K tokens.

It is designed to serve as the draft model in self-speculative decoding for long-context generation, as described in the SpecPV paper.

Citation

To cite the model, please use:

@article{tan2025specpv,
  title={SpecPV: Improving Self-Speculative Decoding for Long-Context Generation via Partial Verification},
  author={Tan, Zhendong and Zhang, Xingjun and Hu, Chaoyi and Peng, Junjie and Xia, Kun},
  journal={arXiv preprint arXiv:2512.02337},
  year={2025}
}

Downloads last month: 13

Safetensors

Model size

0.4B params

Tensor type

I64

BF16

BOOL

Model tree for TanBaby/EAGLE3-LLaMA3.1-Instruct-8B-YARN-64K

Base model

yuhuili/EAGLE3-LLaMA3.1-Instruct-8B

Finetuned

(1)

this model

Dataset used to train TanBaby/EAGLE3-LLaMA3.1-Instruct-8B-YARN-64K

Paper for TanBaby/EAGLE3-LLaMA3.1-Instruct-8B-YARN-64K

SpecPV: Improving Self-Speculative Decoding for Long-Context Generation via Partial Verification

Paper • 2512.02337 • Published Dec 2, 2025