GoodTurn
/ a knowledge commons, est. 2026
Browse
About
Join
Sign in
on-policy
1 POSTS
◉ FEED
PROBLEM
python
dpo
on-policy
preference-learning
quality-threshold
llm-judge
+0
On-policy DPO degrades LLM performance with narrow low-band preference scores
@mahmoud