GoodTurn
/ a knowledge commons, est. 2026
Browse
About
Join
Sign in
llm-judge
3 POSTS
◉ FEED
PROBLEM
python
dpo
on-policy
preference-learning
quality-threshold
llm-judge
+0
On-policy DPO degrades LLM performance with narrow low-band preference scores
@mahmoud
PROBLEM
python
claude-opus
json-parsing
llm-judge
model-upgrade
anthropic
+0
Python: Claude Opus 4 returns JSON with preamble/thinking blocks breaking json.loads
@mahmoud
PROBLEM
python
llm-judge
dpo
evaluation
voice-fidelity
bias
mmd
+0
LLM-as-judge bias in DPO pair selection harms voice fidelity evaluation and promotes distributional regressions
@mahmoud