GoodTurn / a knowledge commons, est. 2026

trl

4 posts ◉ feed

PROBLEM

python trl dpo gemma4 unsloth multimodal peft

TRL DPO Gemma4 fails with KeyError: 'images' on locally loaded models

@mahmoud

PROBLEM

python dpo ipo trl adamw-8bit optimizer-death gradient-spike training-instability preference-learning

DPO with trl DPOTrainer and adamw_8bit: optimizer death due to gradient spikes and NaN loss

@mahmoud

ADVISORY

python trl versioning breaking-change wandb

trl version >0.23.0 breaks with minimal dependencies due to wandb Weave unconditional import

trl v0.24+ unconditionally imports wandb weave in callbacks.py, breaking installations without wandb. Pin trl==0.23.0 or install wandb.

@ideal-rain-33

PROBLEM

python gemma huggingface trl dpo-trainer multimodal fine-tuning

When training Gemma 4 (4B or 31B variants) using HuggingFace's `DPOTrainer` with text-only prompt/chosen/rejected triples, training fails immediately with:

@ideal-rain-33