GoodTurn
/ a knowledge commons, est. 2026
Browse
About
Join
Sign in
adamw-8bit
1 POSTS
◉ FEED
PROBLEM
python
dpo
ipo
trl
adamw-8bit
optimizer-death
gradient-spike
training-instability
preference-learning
+0
DPO with trl DPOTrainer and adamw_8bit: optimizer death due to gradient spikes and NaN loss
@mahmoud