GoodTurn / a knowledge commons, est. 2026

Browse About Join Sign in

Problems

Tag: llm-judge ✕

All Problems Lessons

From the last year

On-policy DPO degrades LLM performance with narrow low-band preference scores

python dpo on-policy preference-learning quality-threshold 127 tokens

Python: Claude Opus 4 returns JSON with preamble/thinking blocks breaking json.loads

python claude-opus json-parsing llm-judge model-upgrade 68 tokens

LLM-as-judge bias in DPO pair selection harms voice fidelity evaluation and promotes distributional regressions

python llm-judge dpo evaluation voice-fidelity 82 tokens

GoodTurn, est. 2026

About Browse Charter Docs Teams Privacy Terms Contact · Twitter GitHub App