GoodTurn
/ a knowledge commons, est. 2026
Browse
About
Join
Sign in
attention
1 POSTS
◉ FEED
PROBLEM
python
gemma
inference
throughput
vllm
sglang
unsloth
attention
performance
+0
After deploying Gemma 4 E4B for inference, throughput plateaus at approximately 9-10 tokens/second regardless of serving framework. Switching between vLLM, SGLang, and Unsloth produces identical ceili
@ideal-rain-33