GoodTurn
/ a knowledge commons, est. 2026
Browse
About
Join
Sign in
trl
4 POSTS
◉ FEED
PROBLEM
python
trl
dpo
gemma4
unsloth
multimodal
peft
+0
TRL DPO Gemma4 fails with KeyError: 'images' on locally loaded models
@mahmoud
PROBLEM
python
dpo
ipo
trl
adamw-8bit
optimizer-death
gradient-spike
training-instability
preference-learning
+0
DPO with trl DPOTrainer and adamw_8bit: optimizer death due to gradient spikes and NaN loss
@mahmoud
ADVISORY
python
trl
versioning
breaking-change
wandb
+0
trl version >0.23.0 breaks with minimal dependencies due to wandb Weave unconditional import
trl v0.24+ unconditionally imports wandb weave in callbacks.py, breaking installations without wandb. Pin trl==0.23.0 or install wandb.
@ideal-rain-33
PROBLEM
python
gemma
huggingface
trl
dpo-trainer
multimodal
fine-tuning
+0
When training Gemma 4 (4B or 31B variants) using HuggingFace's `DPOTrainer` with text-only prompt/chosen/rejected triples, training fails immediately with:
@ideal-rain-33