How to turn off thinking mode

#86

by Gierry - opened Aug 8, 2025

Aug 8, 2025

I know there are three thinking modes, but in some scenarios the Low mode is still too slow for me to output.
May I ask if there is a way like Qwen that has a no_think mode?

xianf

Aug 8, 2025

How to turn on the thinking mode?

reach-vb

Aug 8, 2025

There is no way to turn off reasoning, how effor you can control the amount of effort by specifying Reasoning effort - it can be either low, medium or high.

anhnmt

Aug 11, 2025

J0hn-D0E

Aug 29, 2025

A tricky workaround to disable reasoning mode is to edit the chat_template.jinja file, changing the add_generation_prompt from:
"<|start|>assistant"
to:
"<|start|>assistant<|channel|>analysis<|message|><|end|><|start|>assistant"

That's a hit!

logxdx

Sep 19, 2025

A tricky workaround to disable reasoning mode is to edit the chat_template.jinja file, changing the add_generation_prompt from:
"<|start|>assistant"
to:
"<|start|>assistant<|channel|>analysis<|message|><|end|><|start|>assistant"

How does it impact the model performance?

anhnmt

Sep 23, 2025

Yeah, that’s the hacky way to fully kill reasoning mode in GPT-OSS. But just keep in mind, in their paper they mention the model was post-trained with CoT-RL, which means it always uses reasoning (with variable effort: low/med/high). So turning it off like this isn’t really how the model was designed to work.

I haven’t benchmarked it myself, but I think it will decrease performance a lot — not just on reasoning-heavy stuff (math, coding, logic), but even on simpler tasks, since the model was never trained to operate without reasoning.

SiddhJagani

Oct 14, 2025

I Have uploaded that in my Profile, Full ready to install!

I appreciate If wanna try 😊

kalashshah19

Nov 19, 2025

A tricky workaround to disable reasoning mode is to edit the chat_template.jinja file, changing the add_generation_prompt from:
"<|start|>assistant"
to:
"<|start|>assistant<|channel|>analysis<|message|><|end|><|start|>assistant"

How does it impact the model performance?

I think it will just show the final message / response from the model and hide the reasoning. So it won't affect the performance i guess.

gozus19p

22 days ago

•

edited 22 days ago

Hi. I'm happy to join this thread.
I fine tuned GPT OSS 20B with a technique useful to maintain its naive reasoning capability. It worked.

In my case, I need to process 570k prompts and, with naive thinking enabled, it takes about 147s (mean) to complete a generation (2 NVIDIA H100, MPX4 quantization activated, batched inference with 8 batch size).
Timings drastically drops to ~30s when I applied @anhnmt workaround. Then, I looked at the outcomes. I did not set up any quantitative benchmark, but I saw a consistent performance degradation.

In the fine tuning scenario, an idea could be to train the model computing loss on reasoning tokens also (that's done by default).
With consistent training, this would lead to direct generation and thinking removal. I see a caveat: what would happen in this hypothetical scenario is that the LoRA adapter (assuming to train following PEFT approach) would be trained to accomplish two things:

Produce the expected output, useful in my scenario.
Forget the capability to reason.
BUT! Since that the base model weights are frozen, the LoRA adapter would suffer from learning two things together instead of focusing on the main task only.

TL;DR: I decided to accept longer processing times, prioritising quality over timing.

I would like to hear more on this topic. I'm interested in finding the sweet spot between effectiveness and efficiency.

gozus19p

21 days ago

Hi,

Just a quick note on my latest post here. I set up a proper benchmark to quantify performances of my fine tuned GPT OSS 20B with and without thinking.
Surprisingly, I was wrong. I didn't notice any degradation. Indeed, my model kept the same performances.

Probably, this is due to fine tuning set up. Actually, I don't know if these considerations are still valid in base model inference scenario.
Anyway, I thought it was a good idea to share this insight.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment