LoRA adapter double-initialization when fine-tuning SFT checkpoint with DPO
python peft lora dpo checkpoint-loading 269 tokens
Three non-obvious architectural surprises when fine-tuning and serving Gemma 4
python gemma fine-tuning dpo inference 440 tokens