Instructions to use MiniMaxAI/MiniMax-M2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use MiniMaxAI/MiniMax-M2 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="MiniMaxAI/MiniMax-M2", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("MiniMaxAI/MiniMax-M2", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("MiniMaxAI/MiniMax-M2", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
HuggingChat
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use MiniMaxAI/MiniMax-M2 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "MiniMaxAI/MiniMax-M2"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MiniMaxAI/MiniMax-M2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/MiniMaxAI/MiniMax-M2

SGLang

How to use MiniMaxAI/MiniMax-M2 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "MiniMaxAI/MiniMax-M2" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MiniMaxAI/MiniMax-M2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "MiniMaxAI/MiniMax-M2" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MiniMaxAI/MiniMax-M2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use MiniMaxAI/MiniMax-M2 with Docker Model Runner:
```
docker model run hf.co/MiniMaxAI/MiniMax-M2
```

Invalid reasoning-parser

#37

by willamazon1 - opened Nov 8, 2025

Discussion

willamazon1

Nov 8, 2025

I follow the deployment https://huggingface.co/MiniMaxAI/MiniMax-M2/blob/main/docs/vllm_deploy_guide.md, but encounter the following error:

vllm serve: error: argument --reasoning-parser: invalid choice: 'minimax_m2_append_think' (choose from 'deepseek_r1', 'glm45', 'openai_gptoss', 'granite', 'hunyuan_a13b', 'mistral', 'qwen3', 'seed_oss', 'step3')

youkaichao

MiniMax org Nov 8, 2025

do you use nightly vllm?

youkaichao

MiniMax org Nov 8, 2025

the latest vllm nightly should have it, see https://github.com/vllm-project/vllm/blob/d9ab1ad9d1be96885f4387a33a3a82233c009ce9/vllm/reasoning/__init__.py#L59

Yurkoff

Nov 10, 2025

The parser doesn't seem to be working, because I'm receiving a message from the model in the <think>reasoning</think>answer format, and it's not parsed. I thought the reasoning part would be in a separate reasoning_content field. But it's not separated from the content by the minimax_m2_append_think parser.

sliontc

Nov 10, 2025

I got same error. and I'v installed the latest vllm.

chaunceyjiang

Nov 12, 2025

You can use --reasoning-parser minimax_m2.

sliontc

Nov 12, 2025

SAFETENSORS_FAST_GPU=1 CUDA_VISIBLE_DEVICES=4,5,6,7 vllm serve /data2/models/MiniMax-M2 --trust-remote-code --tensor-parallel-size 4 --enable-auto-tool-choice --tool-call-parser minimax_m2 --reasoning-parser minimax_m2
INFO 11-12 07:03:34 [init.py:216] Automatically detected platform cuda.
usage: vllm serve [model_tag] [options]
vllm serve: error: argument --reasoning-parser: invalid choice: 'minimax_m2' (choose from deepseek_r1, glm45, openai_gptoss, granite, hunyuan_a13b, mistral, qwen3, seed_oss, step3)

chaunceyjiang

Nov 12, 2025

@sliontc do you use nightly vllm?

vladlen32230

Nov 17, 2025

I ran

pip install 'triton-kernels @ git+https://github.com/triton-lang/triton.git@v3.5.0#subdirectory=python/triton_kernels' \
   vllm --extra-index-url https://wheels.vllm.ai/nightly

and it still does not work. raises invalida parser

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment