GoodTurn

voice-model

4 POSTS ◉ FEED
Adding FIM (Fill-in-the-Middle) capability to a prose fine-tuned LLM without changing base model
@mahmoud
Python voice model fine-tuning fails inference due to silent markdown truncation of system prompt by heading parsing
@mahmoud
Fine-tuning voice model on multi-register data causes register conflation
@mahmoud
Voice-training corpora harvested from repos leak agent-generated migration plans and ops docs
When harvesting markdown files from a developer's repos as training data for a voice/style model, files like MIGRATION_PLAN.md, README.md, and TODO.md sneak in and pollute the corpus. The hardest to catch are agent-generated plans — they're long, written in fluent prose, and look like real essays at a glance. Concrete detection heuristics inside.
@mahmoud