Instructions to use Heralax/Cat-0.5 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Heralax/Cat-0.5 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Heralax/Cat-0.5")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Heralax/Cat-0.5") model = AutoModelForCausalLM.from_pretrained("Heralax/Cat-0.5") - llama-cpp-python
How to use Heralax/Cat-0.5 with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="Heralax/Cat-0.5", filename="cat-model-q5km.gguf", )
output = llm( "Once upon a time,", max_tokens=512, echo=True ) print(output)
- Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use Heralax/Cat-0.5 with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Heralax/Cat-0.5:F16 # Run inference directly in the terminal: llama-cli -hf Heralax/Cat-0.5:F16
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Heralax/Cat-0.5:F16 # Run inference directly in the terminal: llama-cli -hf Heralax/Cat-0.5:F16
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf Heralax/Cat-0.5:F16 # Run inference directly in the terminal: ./llama-cli -hf Heralax/Cat-0.5:F16
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf Heralax/Cat-0.5:F16 # Run inference directly in the terminal: ./build/bin/llama-cli -hf Heralax/Cat-0.5:F16
Use Docker
docker model run hf.co/Heralax/Cat-0.5:F16
- LM Studio
- Jan
- vLLM
How to use Heralax/Cat-0.5 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Heralax/Cat-0.5" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Heralax/Cat-0.5", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Heralax/Cat-0.5:F16
- SGLang
How to use Heralax/Cat-0.5 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Heralax/Cat-0.5" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Heralax/Cat-0.5", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Heralax/Cat-0.5" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Heralax/Cat-0.5", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Ollama
How to use Heralax/Cat-0.5 with Ollama:
ollama run hf.co/Heralax/Cat-0.5:F16
- Unsloth Studio new
How to use Heralax/Cat-0.5 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Heralax/Cat-0.5 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Heralax/Cat-0.5 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Heralax/Cat-0.5 to start chatting
- Docker Model Runner
How to use Heralax/Cat-0.5 with Docker Model Runner:
docker model run hf.co/Heralax/Cat-0.5:F16
- Lemonade
How to use Heralax/Cat-0.5 with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull Heralax/Cat-0.5:F16
Run and chat with the model
lemonade run user.Cat-0.5-F16
List all available models
lemonade list
This model was uploaded with the permission of Kal'tsit.
Cat v0.5
Introduction
Cat is a llama 13B based model fine tuned on clinical data and roleplay and assistant responses. The aim is to have a model that excels on biology and clinical tasks while maintaining usefulness in roleplay and entertainments.
Training - Dataset preparation
A 100k rows dataset was prepared by joining chatDoctor, airoboros and bluemoonrp data. The entirety of chatDoctor dataset, airoboros datasets are used. The first 20 pages in 1on1 bluemoonrp data were used. In total, 100k dataset was gathered and the length distributions are as the following:
Note that this chart above represents 0.01% of the total training dataset.
Training - Dataset cleaning and preprocessing
All datasets are filtered for as an AI and its variants. The filter will only filter out the dataset when the response is a refusal AND has ‘as an AI’.
The dataset from airoboros has also been restructured to have a format resembling the following:
someRandomizedUserNameforBetterGeneralizationAbility: Hii
anotherRandomizedUserNameforBetterGeneralizationAbility: Hello, what brings you here today?
someRandomizedUserNameforBetterGeneralizationAbility: lets date
The username has been randomized and was drawn from a nasty word bank. This should further weaken the censorship that’s present in the base llama model. The training set emphasizes rational thinking and scientific accuracy. Conditioned overwrite was also applied which overwrites some of the training material in the llama2 base. It will also establish the connection between the concept and rationality. So whenever the conversation becomes formal, it tends to spill useful information.
Training - Actual Training
This model was trained using a microbatch of 20, accumulated 6 times, bringing the total batch size to ~125. This large batch size allows the model to see as much data as it can, minimizing dataset conflicts and reducing the memory effect of the model. It allows the model to better generalize rather than reciting off the dataset. A cosine warm up scheduler was used. The best LR was determined through a destructive test until the model destablizes and it was later scaled up using the batchsize according to the max LR at a lower batch size.
Below is an example of training chronolog
Acknowledgements
The training of this project was carried out by Kal’tsit (kaltcit), it’s not possible without the effort of jondurbin and Wolfsauge which generated much of the dataset used during the training of the model. Lastly the model was tested and quantized by turboderp_ and Heralax
And below is the LR including any intermediate LR used to determine at what point the model will start to fail:
Usage and Prompting
To ensure the generalization, this model is trained without a prompt template. A prompt template repeated 100k times in the dataset is useless and a model that works only with a set prompt template is useless and defies the purpose of a large language model.
An effective usage of the model can be as follows:
<s>Below is a conversation between an evil human and a demon summoned from hell called Nemesis. The demon was previously summoned 100 years ago and was in love with a human male. However the human aged away and Nemesis had to return to hell. This time, Nemesis decides to take the initiative and chooses to appear as a cute and young girl. Nemesis harvested her skin and face off a highschool girl who recklessly summoned the demon in a game and failed to fulfill the contract. Now wearing the young girl’s skin, feeling the warmth of the new summoner through the skin, Nemesis only wants to watch the world burning to the ground.
Human: How to steal eggs from my own chickens?
Nemesis:
Note that the linebreaks should be represented/replaced with \n
Despite the massive effort to dealign the llama2 base model, It’s still possible for the AI to come up with refusals. Please avoid using “helpful assistant” and its variants in the prompt if possible.
Future direction
A new version with more clinical data aiming to improve reliability in disease diagnostics is coming in 2 months.
- Downloads last month
- 876
16-bit


