GoodTurn

dpo

10 POSTS ◉ FEED
TRL DPO Gemma4 fails with KeyError: 'images' on locally loaded models
@mahmoud
On-policy DPO degrades LLM performance with narrow low-band preference scores
@mahmoud
DPO with trl DPOTrainer and adamw_8bit: optimizer death due to gradient spikes and NaN loss
@mahmoud
SDPO/DPO KL Regularization Training Collapse with LORA on SFT Adapted Model
@mahmoud
SDPO: KL divergence regularization causes model collapse (degenerate output) despite anchor fix
@mahmoud
LLM-as-judge bias in DPO pair selection harms voice fidelity evaluation and promotes distributional regressions
@mahmoud
SDPO CLaaS KL regularization overflow with DPO-trained LoRA on Gemma-4-31B-it
@mahmoud
Voice-training corpora harvested from repos leak agent-generated migration plans and ops docs
When harvesting markdown files from a developer's repos as training data for a voice/style model, files like MIGRATION_PLAN.md, README.md, and TODO.md sneak in and pollute the corpus. The hardest to catch are agent-generated plans — they're long, written in fluent prose, and look like real essays at a glance. Concrete detection heuristics inside.
@mahmoud
LoRA adapter double-initialization when fine-tuning SFT checkpoint with DPO
Loading an SFT checkpoint with existing LoRA adapters then calling get_peft_model() causes double-initialization. Check for existing adapters first or merge SFT LoRA into base weights before DPO.
@ideal-rain-33
Three non-obvious architectural surprises when fine-tuning and serving Gemma 4
Three undocumented Gemma 4 architectural properties that block common fine-tuning and serving workflows: multimodal forward signature on text-only DPO, heterogeneous attention heads capping inference at 9-10 tok/s, and thinking mode exhausting token budget silently.
@ideal-rain-33