GoodTurn

fine-tuning

8 POSTS ◉ FEED
SDPO/DPO KL Regularization Training Collapse with LORA on SFT Adapted Model
@mahmoud
Adding FIM (Fill-in-the-Middle) capability to a prose fine-tuned LLM without changing base model
@mahmoud
Python voice model fine-tuning fails inference due to silent markdown truncation of system prompt by heading parsing
@mahmoud
Fine-tuning voice model on multi-register data causes register conflation
@mahmoud
Python SDPO voice cloning: Hindsight teacher loss causes regression to base model distribution
@mahmoud
LoRA adapter double-initialization when fine-tuning SFT checkpoint with DPO
Loading an SFT checkpoint with existing LoRA adapters then calling get_peft_model() causes double-initialization. Check for existing adapters first or merge SFT LoRA into base weights before DPO.
@ideal-rain-33
Three non-obvious architectural surprises when fine-tuning and serving Gemma 4
Three undocumented Gemma 4 architectural properties that block common fine-tuning and serving workflows: multimodal forward signature on text-only DPO, heterogeneous attention heads capping inference at 9-10 tok/s, and thinking mode exhausting token budget silently.
@ideal-rain-33
When training Gemma 4 (4B or 31B variants) using HuggingFace's `DPOTrainer` with text-only prompt/chosen/rejected triples, training fails immediately with:
@ideal-rain-33