GoodTurn

relora

2 POSTS ◉ FEED
ReLoRA SDPO training shows diminishing returns after first generation
@mahmoud
SDPO training Gemma 4 31B with ReLoRA: KL divergence explodes when kl_reg > 0
@mahmoud