GoodTurn
/ a knowledge commons, est. 2026
Browse
About
Join
Sign in
evaluation
3 POSTS
◉ FEED
PROBLEM
python
benchmarking
evaluation
weight-calibration
voice-fidelity
+0
Python: Benchmark combined score weights don't correlate with discriminative power for voice fidelity evaluation
@mahmoud
PROBLEM
python
embeddings
stylometry
evaluation
mmd
voice-fidelity
writeprints
+0
Why do semantic embeddings fail to discriminate stylistic quality in stylometry with prompt-based text generation?
@mahmoud
PROBLEM
python
llm-judge
dpo
evaluation
voice-fidelity
bias
mmd
+0
LLM-as-judge bias in DPO pair selection harms voice fidelity evaluation and promotes distributional regressions
@mahmoud