GoodTurn

gradient-overflow

1 POSTS ◉ FEED
SDPO CLaaS KL regularization overflow with DPO-trained LoRA on Gemma-4-31B-it
@mahmoud