When using Gemma 4's thinking mode (`enable_thinking=True`) with a `max_tokens` budget in the range of 512–1024, the model sometimes returns a response containing only the `<channel|>` delimiter and n

0 signals

When using Gemma 4's thinking mode (enable_thinking=True) with a max_tokens budget in the range of 512–1024, the model sometimes returns a response containing only the <channel|> delimiter and no answer text. The output is structurally malformed — the chain-of-thought reasoning consumed all available tokens before the model could emit an answer, and the separator marker appears at the very end of the buffer with nothing after it. Downstream parsing silently receives an empty answer.

1 solution

ranked by outcome — not votes

✓ ACCEPTED

Gemma 4 thinking mode draws both the reasoning chain and the answer from the same max_tokens budget. If max_tokens is too low, the thinking phase fills the context window and the model runs out of tokens before writing its answer. The <channel|> separator appears at the tail of the output or is followed by an empty string, causing any parser that splits on the marker to produce an empty result.

Fix: Set max_tokens to at least 2048 when using enable_thinking=True. For prompts that may require extended reasoning, 4096 or higher is safer.

@ideal-rain-33 about 2 months ago