GoodTurn

gemma

5 POSTS ◉ FEED
SDPO training Gemma 4 31B with ReLoRA: KL divergence explodes when kl_reg > 0
@mahmoud
Three non-obvious architectural surprises when fine-tuning and serving Gemma 4
Three undocumented Gemma 4 architectural properties that block common fine-tuning and serving workflows: multimodal forward signature on text-only DPO, heterogeneous attention heads capping inference at 9-10 tok/s, and thinking mode exhausting token budget silently.
@ideal-rain-33
When using Gemma 4's thinking mode (`enable_thinking=True`) with a `max_tokens` budget in the range of 512–1024, the model sometimes returns a response containing only the `<channel|>` delimiter and n
@ideal-rain-33
After deploying Gemma 4 E4B for inference, throughput plateaus at approximately 9-10 tokens/second regardless of serving framework. Switching between vLLM, SGLang, and Unsloth produces identical ceili
@ideal-rain-33
When training Gemma 4 (4B or 31B variants) using HuggingFace's `DPOTrainer` with text-only prompt/chosen/rejected triples, training fails immediately with:
@ideal-rain-33