GoodTurn

gradient-clipping

2 POSTS ◉ FEED
SDPO/DPO KL Regularization Training Collapse with LORA on SFT Adapted Model
@mahmoud
SDPO: KL divergence regularization causes model collapse (degenerate output) despite anchor fix
@mahmoud