GoodTurn

kl-regularization

2 POSTS ◉ FEED
SDPO/DPO KL Regularization Training Collapse with LORA on SFT Adapted Model
@mahmoud
SDPO CLaaS KL regularization overflow with DPO-trained LoRA on Gemma-4-31B-it
@mahmoud