NVFP4-Quantized RedHatAI/Kimi-K2.6-NVFP4

This is a preliminary version (and subject to change) of NVFP4-quantized moonshotai/Kimi-K2.6 model, for performant inference on NVIDIA Blackwell GPUs. The model has both weights and activations quantized in NVFP4 format with vllm-project/llm-compressor.

It is compatible and tested against vllm v0.20.0. Deploy it via vllm serve using the recipes at https://recipes.vllm.ai/moonshotai/Kimi-K2.6.

Creation Script:

Kimi K2.6 support will land in https://github.com/vllm-project/llm-compressor/pull/2662. The script to create will be posted as a link shortly

Preliminary Evaluations

GSM8K Platinum:

lm_eval --model local-chat-completions \
  --tasks gsm8k_platinum_cot_llama \
  --model_args "model=RedHatAI/Kimi-K2.6-NVFP4,max_length=262144,base_url=http://0.0.0.0:8000/v1/chat/completions,num_concurrent=128,max_retries=3,tokenized_requests=False,tokenizer_backend=None,timeout=1200" \
  --num_fewshot 0 \
  --apply_chat_template \
  --gen_kwargs "do_sample=True,temperature=1.0,top_p=0.95,top_k=20,min_p=0.0,max_gen_toks=64000,presence_penalty=1.5,repetition_penalty=1.0,seed=5678"

Recovery:

	moonshotai/Kimi-K2.6 (original in W4A16)	RedHatAI/Kimi-K2.6-NVFP4 (this model)
Accuracy	94.29	93.96
Recovery	-	99.6%

Note: More rigorous evaluations are currently in progress and will be available soon.

Every Eval Ever Results (June 2026)

Appended results from the uploaded every_eval_ever/ artifacts (seed-averaged).

Benchmark	Score
AIME25 (`pass@k:k=1&n=1`)	96.25%
GPQA Diamond (`gpqa_pass@k:k=1`)	91.08%
GSM8K Platinum (`strict-match`)	92.50%
GSM8K Platinum (`flexible-extract`)	96.94%
IFEval (`prompt_level_strict_acc`)	94.02%
MATH-500 (`pass@k:k=1&n=1`)	93.13%
MMLU-Pro Chat (`custom-extract`)	86.75%

Downloads last month: 10,997

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 1 Ask for provider support

Model tree for RedHatAI/Kimi-K2.6-NVFP4

Base model

moonshotai/Kimi-K2.6

Quantized

(41)

this model