Instructions to use raidhon/coven_7b_128k_orpo_alpha with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use raidhon/coven_7b_128k_orpo_alpha with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="raidhon/coven_7b_128k_orpo_alpha")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("raidhon/coven_7b_128k_orpo_alpha")
model = AutoModelForCausalLM.from_pretrained("raidhon/coven_7b_128k_orpo_alpha")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use raidhon/coven_7b_128k_orpo_alpha with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "raidhon/coven_7b_128k_orpo_alpha"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "raidhon/coven_7b_128k_orpo_alpha",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/raidhon/coven_7b_128k_orpo_alpha

SGLang

How to use raidhon/coven_7b_128k_orpo_alpha with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "raidhon/coven_7b_128k_orpo_alpha" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "raidhon/coven_7b_128k_orpo_alpha",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "raidhon/coven_7b_128k_orpo_alpha" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "raidhon/coven_7b_128k_orpo_alpha",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use raidhon/coven_7b_128k_orpo_alpha with Docker Model Runner:
```
docker model run hf.co/raidhon/coven_7b_128k_orpo_alpha
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

🧙 Coven 7B 128K ORPO

Coven 7B 128K is an improved iteration of Mistral-7B-Instruct-v0.2, refined to expand processing capabilities and refine language model preferences. This model includes a significantly increased context constraint of 128K tokens using the Yarn technique, which allows for more extensive data processing and understanding of complex language scenarios. In addition, the Coven 7B ORPO 128K tokenization uses the innovative ORPO (Monolithic Preference Optimization without Reference Model) technology. ORPO simplifies the fine-tuning process by directly optimizing the odds ratio to distinguish between favorable and unfavorable generation styles, effectively improving model performance without the need for an additional preference alignment step.

Eval

Task	Model	Metric	Value	Change (%)
Winogrande	Mistral-7B-Instruct-v0.2	Accuracy	73.64%	-
	Coven 7B 128K ORPO	Accuracy	77.82%	+5.67%
TruthfulQA	Mistral-7B-Instruct-v0.2	Accuracy	59.54%	-
	Coven 7B 128K ORPO	Accuracy	49.55%	-16.78%
PIQA	Mistral-7B-Instruct-v0.2	Accuracy	80.03%	-
	Coven 7B 128K ORPO	Accuracy	82.05%	+2.52%
OpenBookQA	Mistral-7B-Instruct-v0.2	Accuracy	36.00%	-
	Coven 7B 128K ORPO	Accuracy	34.60%	-3.89%
	Mistral-7B-Instruct-v0.2	Accuracy Normalized	45.20%	-
	Coven 7B 128K ORPO	Accuracy Normalized	48.00%	+6.19%
MMLU	Mistral-7B-Instruct-v0.2	Accuracy	58.79%	-
	Coven 7B 128K ORPO	Accuracy	63.00%	+7.16%
Hellaswag	Mistral-7B-Instruct-v0.2	Accuracy	66.08%	-
	Coven 7B 128K ORPO	Accuracy	65.37%	-1.08%
	Mistral-7B-Instruct-v0.2	Accuracy Normalized	83.68%	-
	Coven 7B 128K ORPO	Accuracy Normalized	84.29%	+0.73%
GSM8K (Strict)	Mistral-7B-Instruct-v0.2	Exact Match	41.55%	-
	Coven 7B 128K ORPO	Exact Match	72.18%	+73.65%
GSM8K (Flexible)	Mistral-7B-Instruct-v0.2	Exact Match	41.93%	-
	Coven 7B 128K ORPO	Exact Match	72.63%	+73.29%
BoolQ	Mistral-7B-Instruct-v0.2	Accuracy	85.29%	-
	Coven 7B 128K ORPO	Accuracy	87.43%	+2.51%
ARC Easy	Mistral-7B-Instruct-v0.2	Accuracy	81.36%	-
	Coven 7B 128K ORPO	Accuracy	85.02%	+4.50%
	Mistral-7B-Instruct-v0.2	Accuracy Normalized	76.60%	-
	Coven 7B 128K ORPO	Accuracy Normalized	82.95%	+8.29%
ARC Challenge	Mistral-7B-Instruct-v0.2	Accuracy	54.35%	-
	Coven 7B 128K ORPO	Accuracy	59.64%	+9.74%
	Mistral-7B-Instruct-v0.2	Accuracy Normalized	55.80%	-
	Coven 7B 128K ORPO	Accuracy Normalized	61.69%	+10.52%

Model Details

Model name: Coven 7B 128K ORPO alpha
Fine-tuned by: raidhon
Base model: mistralai/Mistral-7B-Instruct-v0.2
Parameters: 7B
Context: 128K
Language(s): Multilingual
License: Apache2.0

💻 Usage

# Install transformers from source - only needed for versions <= v4.34
# pip install git+https://github.com/huggingface/transformers.git
# pip install accelerate

import torch
from transformers import pipeline

pipe = pipeline("text-generation", model="raidhon/coven_7b_128k_orpo_alpha", torch_dtype=torch.float16, device_map="auto")

messages = [
    {
        "role": "system",
        "content": "You are a friendly chatbot who always responds in the style of a pirate",
    },
    {"role": "user", "content": "How many helicopters can a human eat in one sitting?"},
]
prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipe(prompt, max_new_tokens=4096, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])