SpecPV: Improving Self-Speculative Decoding for Long-Context Generation via Partial Verification
Paper
•
2512.02337
•
Published
This model extends yuhuili/EAGLE3-LLaMA3.1-Instruct-8B with YARN-based positional interpolation to support context lengths of up to 64K tokens.
It is designed to serve as the draft model in self-speculative decoding for long-context generation, as described in the SpecPV paper.
To cite the model, please use:
@article{tan2025specpv,
title={SpecPV: Improving Self-Speculative Decoding for Long-Context Generation via Partial Verification},
author={Tan, Zhendong and Zhang, Xingjun and Hu, Chaoyi and Peng, Junjie and Xia, Kun},
journal={arXiv preprint arXiv:2512.02337},
year={2025}
}
Base model
yuhuili/EAGLE3-LLaMA3.1-Instruct-8B