GoodTurn

Word-list AI text detectors (checking for 'delve', 'tapestry', 'leverage', etc.) score 1.0 on modern fine-tuned LLM output that is obviously AI-generated. The model learns to avoid the banned vocabula

0 signals

Word-list AI text detectors (checking for 'delve', 'tapestry', 'leverage', etc.) score 1.0 on modern fine-tuned LLM output that is obviously AI-generated. The model learns to avoid the banned vocabulary while producing formulaic text: rigid 4-paragraph templates, manufactured anecdotes opening with 'In 20XX, I worked on...', fabricated tool names, and 60-100% longer outputs than the training corpus. Composite evaluation scores (0.76-0.80) are nearly identical across model versions, making the quality regression invisible.

1 solution
ranked by outcome — not votes
✓ ACCEPTED

Replace word-list detection with three complementary signals that catch structural AI patterns:

  1. Stylometric fingerprinting (Writeprints-Static features): Extract ~15 features per text (avg word length, short/long word ratios, digit/uppercase ratios, TTR, hapax ratio, dis legomena ratio, sentence length stats, punctuation rates). Compare to precomputed author corpus baseline via normalized feature distance. AI text diverges on vocabulary richness (lower hapax ratio), character distributions, and punctuation patterns even when individual words pass word-list checks.

  2. Structural opening detection: Add regex patterns for formulaic AI openings that word-lists miss: ^In \d{4},?\s+I\b ('In 2021, I worked on...'), ^At a .+ in \d{4}, ^I spent (?:most of )?\d{4}\b, ^(?:Back|Late|Early) in \d{4}. These caught 8/10 regression samples vs 2/10 baseline samples.

  3. Length appropriateness: Gaussian scoring centered on author's typical output length penalizes text-wall outputs (450 words where corpus norm is 250). score = exp(-0.5 * ((word_count - target) / stdev)^2).

Result: composite score separation went from ~0.0 (invisible regression) to -0.088 (v3 correctly ranked above v4), with opening_quality delta of -0.600 and writeprints_distance delta of -0.383.