Instructions to use Minhdn/deepseek-prover-sinq-4bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Minhdn/deepseek-prover-sinq-4bit with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Minhdn/deepseek-prover-sinq-4bit")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Minhdn/deepseek-prover-sinq-4bit")
model = AutoModelForCausalLM.from_pretrained("Minhdn/deepseek-prover-sinq-4bit")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Minhdn/deepseek-prover-sinq-4bit with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Minhdn/deepseek-prover-sinq-4bit"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Minhdn/deepseek-prover-sinq-4bit",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Minhdn/deepseek-prover-sinq-4bit

SGLang

How to use Minhdn/deepseek-prover-sinq-4bit with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Minhdn/deepseek-prover-sinq-4bit" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Minhdn/deepseek-prover-sinq-4bit",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Minhdn/deepseek-prover-sinq-4bit" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Minhdn/deepseek-prover-sinq-4bit",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Minhdn/deepseek-prover-sinq-4bit with Docker Model Runner:
```
docker model run hf.co/Minhdn/deepseek-prover-sinq-4bit
```

DeepSeek-Prover-V1.5-Base SINQ 4-bit

This is a 4-bit quantized version of DeepSeek-Prover-V1.5-Base using SINQ (Sinkhorn-Normalized Quantization).

Model Details

Base Model: DeepSeek-Prover-V1.5-Base (7B parameters)
Quantization Method: SINQ 4-bit with 2D tiling
Group Size: 128
Model Size: ~3.5 GB (75% reduction from ~14GB original)
Memory Usage: ~5 GB GPU memory at inference
Quality: Lower quality than original (word overlap ~40-50%)

Quantization Configuration

BaseQuantizeConfig(
    nbits=4,
    group_size=128,
    method="sinq",
    tiling_mode="2D",
    axis=1
)

Usage

Installation

pip install torch transformers
pip install git+https://github.com/huawei-csl/SINQ.git

Loading the Model

import torch
from sinq.patch_model import AutoSINQHFModel
from transformers import AutoTokenizer

# Load quantized model
tokenizer = AutoTokenizer.from_pretrained("Minhdn/deepseek-prover-sinq-4bit")
model = AutoSINQHFModel.from_quantized_safetensors(
    "Minhdn/deepseek-prover-sinq-4bit",
    device="cuda:0",
    compute_dtype=torch.bfloat16
)

# Generate
prompt = "theorem add_comm (a b : Nat) : a + b = b + a := by"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=100)
result = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(result)

Performance

Inference Speed: ~3-4 tokens/second (slower without gemlite support for 2D tiling)
Memory Savings: 75% compared to original model
Quality: Moderate - best for experimentation or resource-constrained environments

Limitations

Lower quality compared to original model (word overlap ~40-50%)
Slower inference due to 2D tiling (gemlite only supports 1D)
May produce incorrect Lean4 proofs more frequently than original
Not recommended for production use where correctness is critical

Recommendations

For better quality at the cost of larger model size, consider:

6-bit version: Minhdn/deepseek-prover-sinq-6bit (~90% quality, ~5GB)
Original model: deepseek-ai/DeepSeek-Prover-V1.5-Base (100% quality, ~14GB)

Citation

If you use this model, please cite both the original DeepSeek-Prover paper and SINQ:

@article{deepseek2024prover,
  title={DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data},
  author={DeepSeek-AI},
  journal={arXiv preprint arXiv:2405.14333},
  year={2024}
}

@article{sinq2024,
  title={SINQ: Sinkhorn-Normalized Quantization for Calibration-Free Low-Precision LLMs},
  author={SINQ Authors},
  journal={arXiv preprint arXiv:2509.22944},
  year={2024}
}

License

This model inherits the MIT license from the original DeepSeek-Prover-V1.5-Base model.

Downloads last month: 2

Safetensors

Model size

4B params

Tensor type

F16

Model tree for Minhdn/deepseek-prover-sinq-4bit

Base model

deepseek-ai/DeepSeek-Prover-V1.5-Base

Finetuned

(3)

this model

Papers for Minhdn/deepseek-prover-sinq-4bit

SINQ: Sinkhorn-Normalized Quantization for Calibration-Free Low-Precision LLM Weights

Paper • 2509.22944 • Published Sep 26, 2025 • 80

DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data

Paper • 2405.14333 • Published May 23, 2024 • 46