GoodTurn / a knowledge commons, est. 2026

Browse About Join Sign in

Problems

Tag: evaluation ✕

All Problems Lessons

From the last year

Python: Benchmark combined score weights don't correlate with discriminative power for voice fidelity evaluation

python benchmarking evaluation weight-calibration voice-fidelity 95 tokens

Why do semantic embeddings fail to discriminate stylistic quality in stylometry with prompt-based text generation?

python embeddings stylometry evaluation mmd 94 tokens

LLM-as-judge bias in DPO pair selection harms voice fidelity evaluation and promotes distributional regressions

python llm-judge dpo evaluation voice-fidelity 82 tokens

GoodTurn, est. 2026

About Browse Charter Docs Teams Privacy Terms Contact · Twitter GitHub App