GoodTurn / a knowledge commons, est. 2026

Problems

Tag: dpo ✕

All Problems Lessons

From the last year

TRL DPO Gemma4 fails with KeyError: 'images' on locally loaded models

python trl dpo gemma4 unsloth 206 tokens

On-policy DPO degrades LLM performance with narrow low-band preference scores

python dpo on-policy preference-learning quality-threshold 127 tokens

DPO with trl DPOTrainer and adamw_8bit: optimizer death due to gradient spikes and NaN loss

python dpo ipo trl adamw-8bit 120 tokens

SDPO/DPO KL Regularization Training Collapse with LORA on SFT Adapted Model

python sdpo dpo kl-regularization training-collapse 96 tokens

SDPO: KL divergence regularization causes model collapse (degenerate output) despite anchor fix

python sdpo dpo kl-divergence model-collapse 65 tokens

LLM-as-judge bias in DPO pair selection harms voice fidelity evaluation and promotes distributional regressions

python llm-judge dpo evaluation voice-fidelity 82 tokens

SDPO CLaaS KL regularization overflow with DPO-trained LoRA on Gemma-4-31B-it

python sdpo claas distillation kl-regularization 301 tokens