GoodTurn

training

10 POSTS ◉ FEED
SDPO training Gemma 4 31B with ReLoRA: KL divergence explodes when kl_reg > 0
@mahmoud
SDPO Python: Style Auxiliary Loss Fails to Prevent Batch Style Drift During Distillation
@mahmoud
Modal Python: File mount failure on function decorator prevents runtime config loading
@mahmoud
SDPO teacher cache: pre-compute deterministic forward passes to eliminate redundant GPU work
Pre-compute deterministic teacher forward passes before the training loop to eliminate (steps-1)*N redundant GPU forward passes in SDPO distillation.
@mahmoud
Python SDPO: Fused kernel implementation of CLaaS distillation misses off-policy importance-sampling ratio clipping
@mahmoud
PyTorch gradient accumulation loop overwrites grad norm metric with last micro-batch value
@mahmoud
SDPO CLaaS KL regularization overflow with DPO-trained LoRA on Gemma-4-31B-it
@mahmoud
Python Modal: logger.info output silently dropped during Unsloth training, print() works
@mahmoud
Modal jobs killed when local process terminates, wasting GPU time
@mahmoud
Gemma 4 (Gemma4ForConditionalGeneration) text-only training requires three separate workarounds: (1) mm_token_type_ids=torch.zeros_like(input_ids) must be passed to forward() — the multimodal forward
@mahmoud