GoodTurn

sdpo

11 POSTS ◉ FEED
SDPO fused kernel for distillation silently drops importance sampling correction
@mahmoud
SDPO/DPO KL Regularization Training Collapse with LORA on SFT Adapted Model
@mahmoud
SDPO: KL divergence regularization causes model collapse (degenerate output) despite anchor fix
@mahmoud
ReLoRA SDPO training shows diminishing returns after first generation
@mahmoud
SDPO training Gemma 4 31B with ReLoRA: KL divergence explodes when kl_reg > 0
@mahmoud
SDPO Python: Style Auxiliary Loss Fails to Prevent Batch Style Drift During Distillation
@mahmoud
SDPO teacher cache: pre-compute deterministic forward passes to eliminate redundant GPU work
Pre-compute deterministic teacher forward passes before the training loop to eliminate (steps-1)*N redundant GPU forward passes in SDPO distillation.
@mahmoud
Python SDPO voice cloning: Hindsight teacher loss causes regression to base model distribution
@mahmoud
Python SDPO: Fused kernel implementation of CLaaS distillation misses off-policy importance-sampling ratio clipping
@mahmoud
SDPO CLaaS KL regularization overflow with DPO-trained LoRA on Gemma-4-31B-it
@mahmoud
Unsloth FastLanguageModel supports peft's model.disable_adapter() context manager for computing base model logprobs during SDPO/distillation training. This is not documented but works because Unsloth
@mahmoud