Used quants, but model is not recognized to support tools, though it does
I don't know details. I have used iquants from mrradermacher
Used quants, but model is not recognized to support tools, though it does. Maybe you need to do some settings.
Hello JLouisBiz, thank you for contacting us!
It's likely that the quantized versions of mrradermacher aren't activating the tools for this model.
Which platform are you using? Ollama, LM Studios, if you tell me, we can try to help you!
Anyway, if you're going to use quantization, I recommend using LM Studios, as it's efficient, supports Hugging Face GGUFs, and already has integrated tools!
GRM2 has been optimized for tool use, being one of the most efficient models for tools up to 3b params.
That's because the jinja template in this repo is missing, so I assume that when it got quant'd, the GGUF ended up without a template.
If you're using LMStudio, you can manually set a ChatML template (you can download and copy-paste my template) in the model's settings.
If you're using llama.cpp add "--chat-template ChatML" (or "--chat-template-file [path_to_my_template_file]") to your command-line argument for llama-server.
If you're using ollama, use something else :D
I am using llama-server and llama.cpp and for 100+ models I don't use the chat template, but let me try.
/usr/local/bin/llama-server --reasoning-format none --reasoning-budget 0 --jinja -fa on -c 131072 -v --log-timestamps --host 192.168.1.68 --threads 2 --threads-batch 2 --threads-http 4 --batch-size 4096 --ubatch-size 1024 --mlock --mmap --no-warmup --cont-batching -m /mnt/nvme0n1/LLM/quantized/GRM2-3b.i1-Q6_K.gguf --chat-template ChatML
so it is going wild there, nothing happens. Why don't you make GGUF files that work?
Hi JLouisBiz
The GGUFs you are using are not official Orion LLM Labs files, and we do not create GGUFs for our models.
Likely not a GGUF problem. The issue is probably the prompt format in llama.cpp: forcing --chat-template ChatML can break generation if the model wasn’t trained for ChatML, and llama.cpp expects chatml lowercase anyway. If you don’t use chat templates, remove that flag and use raw /v1/completions. If you want chat mode, check /props and use the model’s actual template instead of forcing ChatML.
@DedeProGames What do you mean it's not trained for ChatML?! It has the special tokens for it. im_start, im_end and so on. And you even give a ChatML jinja template in your own "tokenizer_config_search.json" file, except it's missing the tool calling part of the template... And it is in a file that no backend will look for the template in, but whatever.
Do you even know how your own model works?
@JLouisBiz Yeah, that's because normally the authors don't self sabotage their own releases. Just use my file, it works. But given the quality of their support and release, i'd just skip that thing if I were you.
I was thinking @SerialKicked knows what to do, so I followed that advice. I was using i quants from @MrRad and now I downloaded his normal quant, now I am getting functional GRM model.
Just that output on long text was repetitive, too many times repetitive sections, different written, repetitie. I have tried with Qwen3.5 4B on same text, and got a coherent text out.
I cannot use this model in this stage, I can try in future again. Thanks much.