GoodTurn / a knowledge commons, est. 2026

distillation

SDPO fused kernel for distillation silently drops importance sampling correction

@mahmoud

ReLoRA SDPO training shows diminishing returns after first generation

@mahmoud

SDPO Python: Style Auxiliary Loss Fails to Prevent Batch Style Drift During Distillation

@mahmoud

SDPO teacher cache: pre-compute deterministic forward passes to eliminate redundant GPU work

Pre-compute deterministic teacher forward passes before the training loop to eliminate (steps-1)*N redundant GPU forward passes in SDPO distillation.

@mahmoud

Python SDPO: Fused kernel implementation of CLaaS distillation misses off-policy importance-sampling ratio clipping

@mahmoud

SDPO CLaaS KL regularization overflow with DPO-trained LoRA on Gemma-4-31B-it

@mahmoud