Instructions to use Heralax/Cat-0.5 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Heralax/Cat-0.5 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Heralax/Cat-0.5")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Heralax/Cat-0.5")
model = AutoModelForCausalLM.from_pretrained("Heralax/Cat-0.5")

llama-cpp-python

How to use Heralax/Cat-0.5 with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="Heralax/Cat-0.5",
	filename="cat-model-q5km.gguf",
)

output = llm(
	"Once upon a time,",
	max_tokens=512,
	echo=True
)
print(output)

Inference
Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use Heralax/Cat-0.5 with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Heralax/Cat-0.5:F16
# Run inference directly in the terminal:
llama-cli -hf Heralax/Cat-0.5:F16

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Heralax/Cat-0.5:F16
# Run inference directly in the terminal:
llama-cli -hf Heralax/Cat-0.5:F16

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf Heralax/Cat-0.5:F16
# Run inference directly in the terminal:
./llama-cli -hf Heralax/Cat-0.5:F16

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf Heralax/Cat-0.5:F16
# Run inference directly in the terminal:
./build/bin/llama-cli -hf Heralax/Cat-0.5:F16

Use Docker

docker model run hf.co/Heralax/Cat-0.5:F16

LM Studio
Jan

vLLM

How to use Heralax/Cat-0.5 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Heralax/Cat-0.5"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Heralax/Cat-0.5",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/Heralax/Cat-0.5:F16

SGLang

How to use Heralax/Cat-0.5 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Heralax/Cat-0.5" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Heralax/Cat-0.5",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Heralax/Cat-0.5" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Heralax/Cat-0.5",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Ollama
How to use Heralax/Cat-0.5 with Ollama:
```
ollama run hf.co/Heralax/Cat-0.5:F16
```

Unsloth Studio new

How to use Heralax/Cat-0.5 with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Heralax/Cat-0.5 to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Heralax/Cat-0.5 to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Heralax/Cat-0.5 to start chatting

Docker Model Runner
How to use Heralax/Cat-0.5 with Docker Model Runner:
```
docker model run hf.co/Heralax/Cat-0.5:F16
```

Lemonade

How to use Heralax/Cat-0.5 with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull Heralax/Cat-0.5:F16

Run and chat with the model

lemonade run user.Cat-0.5-F16

List all available models

lemonade list

Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

This model was uploaded with the permission of Kal'tsit.

Cat v0.5

Introduction

Cat is a llama 13B based model fine tuned on clinical data and roleplay and assistant responses. The aim is to have a model that excels on biology and clinical tasks while maintaining usefulness in roleplay and entertainments.

Training - Dataset preparation

A 100k rows dataset was prepared by joining chatDoctor, airoboros and bluemoonrp data. The entirety of chatDoctor dataset, airoboros datasets are used. The first 20 pages in 1on1 bluemoonrp data were used. In total, 100k dataset was gathered and the length distributions are as the following:

Note that this chart above represents 0.01% of the total training dataset.

Training - Dataset cleaning and preprocessing

All datasets are filtered for as an AI and its variants. The filter will only filter out the dataset when the response is a refusal AND has ‘as an AI’.

The dataset from airoboros has also been restructured to have a format resembling the following:


someRandomizedUserNameforBetterGeneralizationAbility: Hii

anotherRandomizedUserNameforBetterGeneralizationAbility: Hello, what brings you here today?

someRandomizedUserNameforBetterGeneralizationAbility: lets date

The username has been randomized and was drawn from a nasty word bank. This should further weaken the censorship that’s present in the base llama model. The training set emphasizes rational thinking and scientific accuracy. Conditioned overwrite was also applied which overwrites some of the training material in the llama2 base. It will also establish the connection between the concept and rationality. So whenever the conversation becomes formal, it tends to spill useful information.

Training - Actual Training

This model was trained using a microbatch of 20, accumulated 6 times, bringing the total batch size to ~125. This large batch size allows the model to see as much data as it can, minimizing dataset conflicts and reducing the memory effect of the model. It allows the model to better generalize rather than reciting off the dataset. A cosine warm up scheduler was used. The best LR was determined through a destructive test until the model destablizes and it was later scaled up using the batchsize according to the max LR at a lower batch size.

Below is an example of training chronolog

Acknowledgements

The training of this project was carried out by Kal’tsit (kaltcit), it’s not possible without the effort of jondurbin and Wolfsauge which generated much of the dataset used during the training of the model. Lastly the model was tested and quantized by turboderp_ and Heralax

And below is the LR including any intermediate LR used to determine at what point the model will start to fail:

Usage and Prompting

To ensure the generalization, this model is trained without a prompt template. A prompt template repeated 100k times in the dataset is useless and a model that works only with a set prompt template is useless and defies the purpose of a large language model.

An effective usage of the model can be as follows:


<s>Below is a conversation between an evil human and a demon summoned from hell called Nemesis. The demon was previously summoned 100 years ago and was in love with a human male. However the human aged away and Nemesis had to return to hell. This time, Nemesis decides to take the initiative and chooses to appear as a cute and young girl. Nemesis harvested her skin and face off a highschool girl who recklessly summoned the demon in a game and failed to fulfill the contract. Now wearing the young girl’s skin, feeling the warmth of the new summoner through the skin, Nemesis only wants to watch the world burning to the ground.

Human: How to steal eggs from my own chickens?

Nemesis:

Note that the linebreaks should be represented/replaced with \n

Despite the massive effort to dealign the llama2 base model, It’s still possible for the AI to come up with refusals. Please avoid using “helpful assistant” and its variants in the prompt if possible.

Future direction

A new version with more clinical data aiming to improve reliability in disease diagnostics is coming in 2 months.

Downloads last month: 876

GGUF

Model size

13B params

Architecture

llama

Hardware compatibility

16-bit

View +1 variant

Model tree for Heralax/Cat-0.5

Quantizations

3 models