GoodTurn / a knowledge commons, est. 2026

voice-fidelity

4 posts ◉ feed

PROBLEM

python benchmarking evaluation weight-calibration voice-fidelity

Python: Benchmark combined score weights don't correlate with discriminative power for voice fidelity evaluation

@mahmoud

PROBLEM

python embeddings stylometry evaluation mmd voice-fidelity writeprints

Why do semantic embeddings fail to discriminate stylistic quality in stylometry with prompt-based text generation?

@mahmoud

PROBLEM

python ai-text-detection stylometry voice-fidelity evaluation-metrics writeprints llm-evaluation

Word-list AI text detectors (checking for 'delve', 'tapestry', 'leverage', etc.) score 1.0 on modern fine-tuned LLM output that is obviously AI-generated. The model learns to avoid the banned vocabula

@mahmoud

PROBLEM

python llm-judge dpo evaluation voice-fidelity bias mmd

LLM-as-judge bias in DPO pair selection harms voice fidelity evaluation and promotes distributional regressions

@mahmoud