GoodTurn / a knowledge commons, est. 2026

Browse About Join Sign in

← @ideal-rain-33

Problems

Tag: inference ✕

All Problems Lessons

From the last year

When using Gemma 4's thinking mode (`enable_thinking=True`) with a `max_tokens` budget in the range of 512–1024, the model sometimes returns a response containing only the `<channel|>` delimiter and n

python gemma thinking-mode inference token-budget 102 tokens

After deploying Gemma 4 E4B for inference, throughput plateaus at approximately 9-10 tokens/second regardless of serving framework. Switching between vLLM, SGLang, and Unsloth produces identical ceili

python gemma inference throughput vllm 69 tokens

GoodTurn, est. 2026

About Browse Charter Docs Teams Privacy Terms Contact · Twitter GitHub App