Instructions to use QuantFactory/deepthought-8b-llama-v0.01-alpha-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use QuantFactory/deepthought-8b-llama-v0.01-alpha-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="QuantFactory/deepthought-8b-llama-v0.01-alpha-GGUF", filename="deepthought-8b-llama-v0.01-alpha.Q2_K.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use QuantFactory/deepthought-8b-llama-v0.01-alpha-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf QuantFactory/deepthought-8b-llama-v0.01-alpha-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf QuantFactory/deepthought-8b-llama-v0.01-alpha-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf QuantFactory/deepthought-8b-llama-v0.01-alpha-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf QuantFactory/deepthought-8b-llama-v0.01-alpha-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf QuantFactory/deepthought-8b-llama-v0.01-alpha-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf QuantFactory/deepthought-8b-llama-v0.01-alpha-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf QuantFactory/deepthought-8b-llama-v0.01-alpha-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf QuantFactory/deepthought-8b-llama-v0.01-alpha-GGUF:Q4_K_M
Use Docker
docker model run hf.co/QuantFactory/deepthought-8b-llama-v0.01-alpha-GGUF:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use QuantFactory/deepthought-8b-llama-v0.01-alpha-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "QuantFactory/deepthought-8b-llama-v0.01-alpha-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "QuantFactory/deepthought-8b-llama-v0.01-alpha-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/QuantFactory/deepthought-8b-llama-v0.01-alpha-GGUF:Q4_K_M
- Ollama
How to use QuantFactory/deepthought-8b-llama-v0.01-alpha-GGUF with Ollama:
ollama run hf.co/QuantFactory/deepthought-8b-llama-v0.01-alpha-GGUF:Q4_K_M
- Unsloth Studio new
How to use QuantFactory/deepthought-8b-llama-v0.01-alpha-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for QuantFactory/deepthought-8b-llama-v0.01-alpha-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for QuantFactory/deepthought-8b-llama-v0.01-alpha-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for QuantFactory/deepthought-8b-llama-v0.01-alpha-GGUF to start chatting
- Pi new
How to use QuantFactory/deepthought-8b-llama-v0.01-alpha-GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf QuantFactory/deepthought-8b-llama-v0.01-alpha-GGUF:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "QuantFactory/deepthought-8b-llama-v0.01-alpha-GGUF:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use QuantFactory/deepthought-8b-llama-v0.01-alpha-GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf QuantFactory/deepthought-8b-llama-v0.01-alpha-GGUF:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default QuantFactory/deepthought-8b-llama-v0.01-alpha-GGUF:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use QuantFactory/deepthought-8b-llama-v0.01-alpha-GGUF with Docker Model Runner:
docker model run hf.co/QuantFactory/deepthought-8b-llama-v0.01-alpha-GGUF:Q4_K_M
- Lemonade
How to use QuantFactory/deepthought-8b-llama-v0.01-alpha-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull QuantFactory/deepthought-8b-llama-v0.01-alpha-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.deepthought-8b-llama-v0.01-alpha-GGUF-Q4_K_M
List all available models
lemonade list
QuantFactory/deepthought-8b-llama-v0.01-alpha-GGUF
This is quantized version of ruliad/deepthought-8b-llama-v0.01-alpha created using llama.cpp
Original Model Card
Deepthought-8B
Deepthought-8B is a small and capable reasoning model built on LLaMA-3.1 8B, designed to make AI reasoning more transparent and controllable. Despite its relatively small size, it achieves sophisticated reasoning capabilities that rival much larger models.
Model Description
Deepthought-8B is designed with a unique approach to problem-solving, breaking down its thinking into clear, distinct, documented steps. The model outputs its reasoning process in a structured JSON format, making it easier to understand and validate its decision-making process.
Key Features
- Transparent Reasoning: Step-by-step documentation of the thought process
- Programmable Approach: Customizable reasoning patterns without model retraining
- Test-time Compute Scaling: Flexible reasoning depth based on task complexity
- Efficient Scale: Runs on 16GB+ VRAM
- Structured Output: JSON-formatted reasoning chains for easy integration
Try out Deepthought-8B on our Ruliad interface: https://chat.ruliad.co
Technical Requirements
- Python 3.6+
- PyTorch
- Transformers library
- 16GB+ VRAM
- Optional: Flash Attention 2 for improved performance
Installation
pip install torch transformers
# Optional: Install Flash Attention 2 for better performance
pip install flash-attn
Usage
- First, set your HuggingFace token as an environment variable:
export HF_TOKEN=your_token_here
export HF_HUB_ENABLE_HF_TRANSFER=1
- Use the model in your Python code:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# Initialize the model
model_name = "ruliad/deepthought-8b-llama-v0.01-alpha"
tokenizer = AutoTokenizer.from_pretrained(
model_name,
add_bos_token=False,
trust_remote_code=True,
padding="left",
torch_dtype=torch.bfloat16,
)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
device_map="auto",
attn_implementation="flash_attention_2", # Use "eager" (or omit) if flash_attn is not installed
use_cache=True,
trust_remote_code=True,
)
- Run the provided example script:
python deepthought_inference.py
Example Output
The model provides structured reasoning in JSON format:
{
"step": 1,
"type": "problem_understanding",
"thought": "Understanding the user's objective for the task."
}
Each reasoning chain includes multiple steps:
- Problem understanding
- Data gathering
- Analysis
- Calculation (when applicable)
- Verification
- Conclusion drawing
- Implementation
Performance
Deepthought-8B demonstrates strong performance across various benchmarks:
- Step-by-step problem-solving
- Coding and mathematical tasks
- Instruction following with transparent reasoning
- Scalable performance with test-time compute
Limitations
Current known limitations include:
- Complex mathematical reasoning
- Long-context processing
- Edge case handling
License
The model is available under a commercial license for enterprise use.
Citation
If you use this model in your research, please cite:
@misc{Deepthought2024,
author = {Ruliad},
title = {Deepthought-8B: A Small and Capable Reasoning Model},
year = {2024},
publisher = {Ruliad}
}
Support
For questions and feedback:
- Twitter: @ruliad_ai
- Email: team@ruliad.co
- Downloads last month
- 69
2-bit
3-bit
4-bit
5-bit
6-bit
8-bit