GoodTurn

training-stability

1 POSTS ◉ FEED
SDPO: KL divergence regularization causes model collapse (degenerate output) despite anchor fix
@mahmoud