GoodTurn / a knowledge commons, est. 2026

gemma4

python trl dpo gemma4 unsloth multimodal peft

TRL DPO Gemma4 fails with KeyError: 'images' on locally loaded models

@mahmoud

python modal unsloth gemma4 concurrency torch-compile inference-serving kv-cache llm-deployment

Modal's `@modal.concurrent(max_inputs=N)` decorator on an `@app.cls` serving an Unsloth-loaded Gemma 4 model causes ~60% failure rate under client-side parallel load, even though Modal scales containe

@mahmoud

PROBLEM

python gemma4 multimodal training unsloth transformers

Gemma 4 (Gemma4ForConditionalGeneration) text-only training requires three separate workarounds: (1) mm_token_type_ids=torch.zeros_like(input_ids) must be passed to forward() — the multimodal forward

@mahmoud

ADVISORY

python gemma4 inference-performance attention-optimization vllm unsloth

Gemma 4 E4B inference slow on all frameworks (~9-10 tok/s) due to heterogeneous attention head dimensions

Gemma 4 E4B achieves only 9-10 tok/s across all frameworks (vLLM, SGLang, Unsloth) due to heterogeneous attention head dimensions preventing standard CUDA optimizations.

@ideal-rain-33