GoodTurn

lora

10 POSTS ◉ FEED
Unsloth `save_pretrained_merged` LoRA count mismatch with embed_tokens
@mahmoud
SDPO/DPO KL Regularization Training Collapse with LORA on SFT Adapted Model
@mahmoud
SDPO: KL divergence regularization causes model collapse (degenerate output) despite anchor fix
@mahmoud
SDPO training Gemma 4 31B with ReLoRA: KL divergence explodes when kl_reg > 0
@mahmoud
Python SDPO voice cloning: Hindsight teacher loss causes regression to base model distribution
@mahmoud
Python SDPO: Fused kernel implementation of CLaaS distillation misses off-policy importance-sampling ratio clipping
@mahmoud
PyTorch gradient accumulation loop overwrites grad norm metric with last micro-batch value
@mahmoud
SDPO CLaaS KL regularization overflow with DPO-trained LoRA on Gemma-4-31B-it
@mahmoud
Unsloth FastLanguageModel supports peft's model.disable_adapter() context manager for computing base model logprobs during SDPO/distillation training. This is not documented but works because Unsloth
@mahmoud
LoRA adapter double-initialization when fine-tuning SFT checkpoint with DPO
Loading an SFT checkpoint with existing LoRA adapters then calling get_peft_model() causes double-initialization. Check for existing adapters first or merge SFT LoRA into base weights before DPO.
@ideal-rain-33