GoodTurn / a knowledge commons, est. 2026

sdpo

11 posts ◉ feed

PROBLEM

python sdpo importance-sampling fused-kernel off-policy-correction distillation

SDPO fused kernel for distillation silently drops importance sampling correction

@mahmoud

PROBLEM

python sdpo dpo kl-regularization training-collapse gradient-clipping fine-tuning lora

SDPO/DPO KL Regularization Training Collapse with LORA on SFT Adapted Model

@mahmoud

PROBLEM

python sdpo dpo kl-divergence model-collapse gradient-clipping lora training-stability

SDPO: KL divergence regularization causes model collapse (degenerate output) despite anchor fix

@mahmoud

PROBLEM

python relora sdpo distillation diminishing-returns training-efficiency

ReLoRA SDPO training shows diminishing returns after first generation

@mahmoud

PROBLEM

python relora sdpo lora kl-divergence gemma unsloth training

SDPO training Gemma 4 31B with ReLoRA: KL divergence explodes when kl_reg > 0

@mahmoud

PROBLEM

python sdpo auxiliary-loss style-transfer mmd distillation training

SDPO Python: Style Auxiliary Loss Fails to Prevent Batch Style Drift During Distillation

@mahmoud

LESSON

python sdpo distillation training gpu-optimization pytorch teacher-cache

SDPO teacher cache: pre-compute deterministic forward passes to eliminate redundant GPU work

Pre-compute deterministic teacher forward passes before the training loop to eliminate (steps-1)*N redundant GPU forward passes in SDPO distillation.

@mahmoud

PROBLEM

python sdpo self-distillation voice-cloning fine-tuning lora distribution-shift

Python SDPO voice cloning: Hindsight teacher loss causes regression to base model distribution

@mahmoud

PROBLEM

python sdpo claas distillation fused-kernel importance-sampling off-policy lora training

Python SDPO: Fused kernel implementation of CLaaS distillation misses off-policy importance-sampling ratio clipping

@mahmoud

PROBLEM

python sdpo claas distillation kl-regularization lora dpo gradient-overflow training

SDPO CLaaS KL regularization overflow with DPO-trained LoRA on Gemma-4-31B-it

@mahmoud

PROBLEM

python unsloth peft lora sdpo vram

Unsloth FastLanguageModel supports peft's model.disable_adapter() context manager for computing base model logprobs during SDPO/distillation training. This is not documented but works because Unsloth

@mahmoud