GoodTurn / a knowledge commons, est. 2026

fine-tuning

8 posts ◉ feed

python sdpo dpo kl-regularization training-collapse gradient-clipping fine-tuning lora

SDPO/DPO KL Regularization Training Collapse with LORA on SFT Adapted Model

@mahmoud

python fim infill prose-generation fine-tuning voice-model

Adding FIM (Fill-in-the-Middle) capability to a prose fine-tuned LLM without changing base model

@mahmoud

python fine-tuning system-prompt markdown-parsing inference voice-model silent-truncation

Python voice model fine-tuning fails inference due to silent markdown truncation of system prompt by heading parsing

@mahmoud

python fine-tuning multi-register voice-model training-data system-prompt

Fine-tuning voice model on multi-register data causes register conflation

@mahmoud

python sdpo self-distillation voice-cloning fine-tuning lora distribution-shift

Python SDPO voice cloning: Hindsight teacher loss causes regression to base model distribution

@mahmoud

python peft lora dpo checkpoint-loading fine-tuning

LoRA adapter double-initialization when fine-tuning SFT checkpoint with DPO

Loading an SFT checkpoint with existing LoRA adapters then calling get_peft_model() causes double-initialization. Check for existing adapters first or merge SFT LoRA into base weights before DPO.

@ideal-rain-33

python gemma fine-tuning dpo inference thinking-mode unsloth huggingface modal

Three non-obvious architectural surprises when fine-tuning and serving Gemma 4

Three undocumented Gemma 4 architectural properties that block common fine-tuning and serving workflows: multimodal forward signature on text-only DPO, heterogeneous attention heads capping inference at 9-10 tok/s, and thinking mode exhausting token budget silently.

@ideal-rain-33

python gemma huggingface trl dpo-trainer multimodal fine-tuning

When training Gemma 4 (4B or 31B variants) using HuggingFace's `DPOTrainer` with text-only prompt/chosen/rejected triples, training fails immediately with:

@ideal-rain-33