Instructions to use UCSB-SURFI/TermiGen-32B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use UCSB-SURFI/TermiGen-32B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="UCSB-SURFI/TermiGen-32B") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("UCSB-SURFI/TermiGen-32B") model = AutoModelForCausalLM.from_pretrained("UCSB-SURFI/TermiGen-32B") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use UCSB-SURFI/TermiGen-32B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "UCSB-SURFI/TermiGen-32B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "UCSB-SURFI/TermiGen-32B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/UCSB-SURFI/TermiGen-32B
- SGLang
How to use UCSB-SURFI/TermiGen-32B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "UCSB-SURFI/TermiGen-32B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "UCSB-SURFI/TermiGen-32B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "UCSB-SURFI/TermiGen-32B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "UCSB-SURFI/TermiGen-32B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use UCSB-SURFI/TermiGen-32B with Docker Model Runner:
docker model run hf.co/UCSB-SURFI/TermiGen-32B
TermiGen-32B
TermiGen-32B achieves 31.3% pass@1 on TerminalBench 1.0, establishing a new open-weight state-of-the-art and surpassing proprietary models like o4-mini with Codex CLI (20.0%).
📄 Paper: TermiGen: High-Fidelity Environment and Robust Trajectory Synthesis for Terminal Agents
💻 Environments: https://github.com/ucsb-mlsec/terminal-bench-env
🧪 Benchmark: https://github.com/laude-institute/terminal-bench
Model Description
This model is fine-tuned from Qwen2.5-Coder-32B-Instruct using the TermiGen pipeline, which synthesizes high-fidelity training data through two phases:
Phase I: Environment Synthesis
- Multi-agent system generates 3,500+ verified Docker environments
- Tasks span 11 categories: system administration, security forensics, scientific computing, MLOps, etc.
- 420 unique command-line tools across 16 functional domains
- Automated unit test validation ensures task solvability
Phase II: Error-Correction Trajectory Collection
- Generator-Critic framework with 20% error injection rate
- Teaches error → diagnosis → recovery cycles
- 3,291 trajectories (avg. 25.5 turns, 8,722 tokens each)
- Teacher model: Claude-4.5-Sonnet
Training Details
Training Hyperparameters:
- Base Model: Qwen2.5-Coder-32B-Instruct
- Learning Rate: 5e-6 (cosine schedule, 10% warmup)
- Batch Size: 32 (8 GPUs × 4 gradient accumulation)
- Sequence Length: 20,000 tokens
- Epochs: 5
- Precision: BF16 with DeepSpeed ZeRO-3
- Hardware: 8× AMD MI325X GPUs
Dataset Statistics:
- 3,500+ verified environments across 11 task categories
- 3,291 training trajectories
- Tool diversity: 420 unique CLI tools
- Average trajectory: 25.5 turns, 8,722 tokens
Evaluation Results
TerminalBench Performance
| Benchmark | Pass@1 |
|---|---|
| TerminalBench 1.0 | 31.3% |
| TerminalBench 2.0 | 18.0% |
Usage
We implemented a minimal BashAgent framework based on TerminalBench for agentic terminal execution. The agent interacts with Docker containers via bash shell, generating ReAct-style responses at each turn.
For detailed usage and integration examples, please refer to our GitHub repository.
Citation
@article{zhu2026termigen,
title={TermiGen: High-Fidelity Environment and Robust Trajectory Synthesis for Terminal Agents},
author={Zhu, Kaijie and Nie, Yuzhou and Li, Yijiang and Huang, Yiming and Wu, Jialian and Liu, Jiang and Sun, Ximeng and Yin, Zhenfei and Wang, Lun and Liu, Zicheng and Barsoum, Emad and Wang, William Yang and Guo, Wenbo},
journal={arXiv preprint arXiv:2602.07274},
url={https://arxiv.org/abs/2602.07274},
year={2026}
}
License
Apache 2.0 (inherited from Qwen2.5-Coder base model)
Contact: Kaijie Zhu (kaijiezhu@ucsb.edu)
- Downloads last month
- 38
Model tree for UCSB-SURFI/TermiGen-32B
Base model
Qwen/Qwen2.5-32B