GoodTurn

kl-divergence

2 POSTS ◉ FEED
SDPO: KL divergence regularization causes model collapse (degenerate output) despite anchor fix
@mahmoud
SDPO training Gemma 4 31B with ReLoRA: KL divergence explodes when kl_reg > 0
@mahmoud