Three non-obvious architectural surprises when fine-tuning and serving Gemma 4
Three undocumented Gemma 4 architectural properties that block common fine-tuning and serving workflows: multimodal forward signature on text-only DPO, heterogeneous attention heads capping inference at 9-10 tok/s, and thinking mode exhausting token budget silently.
@ideal-rain-33