Text Generation
Transformers
Safetensors
English
lfm
prism
gspo
hybrid-architecture
tool-use
Thinking
Instructions to use Ex0bit/lfm-Nanotron with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Ex0bit/lfm-Nanotron with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Ex0bit/lfm-Nanotron")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Ex0bit/lfm-Nanotron", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Ex0bit/lfm-Nanotron with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Ex0bit/lfm-Nanotron" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Ex0bit/lfm-Nanotron", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Ex0bit/lfm-Nanotron
- SGLang
How to use Ex0bit/lfm-Nanotron with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Ex0bit/lfm-Nanotron" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Ex0bit/lfm-Nanotron", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Ex0bit/lfm-Nanotron" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Ex0bit/lfm-Nanotron", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Ex0bit/lfm-Nanotron with Docker Model Runner:
docker model run hf.co/Ex0bit/lfm-Nanotron
metadata
license: other
license_name: lfm-nanotron-prism-research
license_link: LICENSE.md
language:
- en
tags:
- lfm
- prism
- gspo
- hybrid-architecture
- tool-use
- Thinking
pipeline_tag: text-generation
library_name: transformers
# lfm-Nanotron: 2.6B-PRISM-SFT-GSPO-AutoRoundV2
Model Description
lfm-Nanotron: Limited Edition 2.6B PRISM Model Access. Unlock a cutting-edge Nano sized AI model!
This is lfm-Nanotron — A Nano Sized 2.6B parameter hybrid architecture language model fine-tuned with advanced techniques you won't find in mainstream releases:
- SFT (Test-Time Supervised-Fine-Tuning) — Adaptive optimization at inference
- GSPO (Group Sequence Policy Optimization) — RL Enhanced reasoning, Instruction following, thinking, tool calling & logic
- PRISM (Projected Refusal Isolation via Subspace Modification) — State-of-the-art over-refusal/propaganda removal from LLMs
- 128K Context Window — Handle massive prompts with ease
- Agentic Tool Calling — Built for multi-turn, thinking, and instruction-following tasks
Architecture Details
| Parameter | Value |
|---|---|
| Parameters | ~2.6B |
| Hidden Size | 2048 |
| Layers | 30 (22 Conv + 8 Full Attention) |
| Attention Heads | 32 |
| KV Heads | 8 (GQA) |
| Vocabulary | 65,536 |
| Max Context | 128,000 tokens |
| Architecture | Hybrid Conv + Attention (LFM2) |
Available Quantizations
| File | Quantization | Size | Use Case |
|---|---|---|---|
lfm2-nanotron-ttft-gspo-prism-bf16.gguf |
BF16 | ~4.8GB | Full precision, best quality |
lfm2-nanotron-ttft-gspo-prism-Q4_K_M.gguf (+W4A16) |
Q4_K_M | ~1.5GB | Balanced quality/size |
lfm2-nanotron-ttft-gspo-prism-Q2_K.gguf |
Q2_K (+W2A16) | ~0.9GB | Maximum compression |
Usage
With llama.cpp
./llama-cli -m lfm2-nanotron-ttft-gspo-prism-Q4_K_M.gguf -p "Your prompt here" --temp 0.3 --min-p 0.15 --repeat-penalty 1.05
Recommended Generation Parameters
{
"temperature": 0.3,
"min_p": 0.15,
"repeat_penalty": 1.05
}
Citation
If you use this model in your research, please cite:
@misc{lfm2-nanotron-2026,
title={lfm2-Nanotron: Test-Time Fine-Tuned LFM2 with GSPO+PRISM},
author={Exobit (Eric Elbaz)},
year={2026},
publisher={Hugging Face},
url={https://huggingface.co/Ex0bit/lfm2-Nanotron}
}
License
This model is released under a custom research license. See LICENSE.md for details.
