Merge pull request 'Add LLM prose tells reference and copyediting checklist' (#7) from add-llm-prose-tells into main

Reviewed-on: #7
2026-03-04 23:03:15 +01:00
parent a1ffb1591b a2dd953601
commit a1052b758f
2 changed files with 403 additions and 7 deletions
--- a/prompts/LLM_PROSE_TELLS.md
+++ b/prompts/LLM_PROSE_TELLS.md
@@ -0,0 +1,396 @@
+# LLM Prose Tells
+
+All of these show up in human writing occasionally, and no single one is
+conclusive on its own. The difference is concentration, because a person might
+lean on one or two of these habits across an entire essay while LLM output will
+use fifteen of them per paragraph, consistently, throughout the entire piece.
+
+---
+
+## Sentence Structure
+
+### The Em-Dash Pivot: "Not X—but Y"
+
+A negation followed by an em-dash and a reframe. The single most recognizable
+LLM construction.
+
+> "It's not just a tool—it's a paradigm shift." "This isn't about
+> technology—it's about trust."
+
+Models produce this at roughly 10–50x the rate of human writers, and when it
+appears four times in the same essay you're almost certainly reading generated
+text.
+
+### The Colon Elaboration
+
+A short declarative clause, then a colon, then a longer explanation.
+
+> "The answer is simple: we need to rethink our approach from the ground up."
+
+Models reach for this in nearly every other paragraph. The construction itself
+is perfectly normal, which is why the frequency is what gives it away.
+
+### The Triple Construction
+
+> "It's fast, it's scalable, and it's open source."
+
+Three parallel items in a list, usually escalating, with exactly three items
+every time (rarely two, almost never four) and strict grammatical parallelism
+that human writers rarely bother maintaining.
+
+### The Staccato Burst
+
+> "This matters. It always has. And it always will." "The data is clear. The
+> trend is undeniable. The conclusion is obvious."
+
+Runs of very short sentences at the same cadence. Human writers will use a short
+sentence for emphasis occasionally, but they don't stack three or four of them
+in a row at matching length, because real prose has variable rhythm. When you
+see a paragraph where every sentence is under ten words and they're all roughly
+the same size, that mechanical regularity is a strong signal.
+
+### Uniform Sentences Per Paragraph
+
+Model-generated paragraphs almost always contain between three and five
+sentences, and this count holds remarkably steady across an entire piece. If the
+first paragraph has four sentences, nearly every subsequent paragraph will too.
+Human writers produce much more varied paragraph lengths — a single sentence
+followed by one that runs eight or nine — as a natural result of following the
+shape of an idea rather than filling a template.
+
+### The Dramatic Fragment
+
+Sentence fragments used as standalone paragraphs for emphasis, like "Full stop."
+or "Let that sink in." on their own line. One of these in an entire essay is a
+stylistic choice. One per section is a tic, and models drop them in at that rate
+or higher.
+
+### The Pivot Paragraph
+
+> "But here's where it gets interesting." "Which raises an uncomfortable truth."
+
+One-sentence paragraphs that exist only to transition between ideas. They
+contain zero information, and the actual point always comes in the paragraph
+that follows them. Delete every one of these and the piece reads better.
+
+### The Parenthetical Qualifier
+
+> "This is, of course, a simplification." "There are, to be fair, exceptions."
+
+Parenthetical asides inserted to look thoughtful. The qualifier almost never
+changes the argument that follows it, and its purpose is to perform nuance
+rather than to express an actual reservation about what's being said.
+
+### The Unnecessary Contrast
+
+Models append a contrasting clause to statements that don't need one, tacking on
+"whereas," "as opposed to," "unlike," or "except that" to draw a comparison that
+adds nothing the reader couldn't already infer.
+
+> "Models write one register above where a human would, whereas human writers
+> tend to match register to context." "The lists use rigidly parallel grammar,
+> as opposed to the looser structure you'd see in human writing."
+
+The first clause already makes the point. The contrasting clause just restates
+it from the other direction. This happens because models are trained to be
+thorough and to anticipate objections, so they compulsively spell out both sides
+of a distinction even when one side is obvious. If you delete the "whereas"
+clause and the sentence still says everything it needs to, the contrast was
+filler.
+
+### The Question-Then-Answer
+
+> "So what does this mean for the average user? It means everything."
+
+A rhetorical question immediately followed by its own answer. Models lean on
+this two or three times per piece because it generates the feeling of forward
+momentum without requiring any actual argumentative work. A human writer might
+do it once.
+
+---
+
+## Word Choice
+
+### Overused Intensifiers
+
+The following words appear at dramatically elevated rates in model output
+compared to human-written text: "crucial," "vital," "robust," "comprehensive,"
+"fundamental," "arguably," "straightforward," "noteworthy," "realm,"
+"landscape," "leverage" (used as a verb), "delve," "tapestry," "multifaceted,"
+"nuanced" (which models almost always apply to their own analysis), "pivotal,"
+"unprecedented" (frequently applied to things that have plenty of precedent),
+"navigate," "foster," "underscores," "resonates," "embark," "streamline," and
+"spearhead." Three or more on the same page is a strong signal.
+
+### Elevated Register Drift
+
+Models consistently write one register above where a human would for the same
+content, replacing "use" with "utilize," "start" with "commence," "help" with
+"facilitate," "show" with "demonstrate," "try" with "endeavor," "change" with
+"transform," and "make" with "craft." The tendency holds across every topic
+regardless of audience.
+
+### Filler Adverbs
+
+"Importantly," "essentially," "fundamentally," "ultimately," "inherently,"
+"particularly," and "increasingly" get dropped in to signal that something
+matters. If the writing itself has already made the importance clear through its
+content and structure, these adverbs aren't doing anything except taking up
+space.
+
+### "In an era of..."
+
+> "In an era of rapid technological change..."
+
+Almost exclusively a model habit as an essay opener. The model uses it to stall
+while it figures out what the actual argument is, because almost no human writer
+begins a piece by zooming out to the civilizational scale before they've said
+anything specific.
+
+---
+
+## Rhetorical Patterns
+
+### The Balanced Take
+
+> "While X has its drawbacks, it also offers significant benefits."
+
+Every argument followed by a concession, every criticism softened. A direct
+artifact of RLHF training, which penalizes strong stances and produces models
+that reflexively both-sides everything even when a clear position would serve
+the reader better.
+
+### The Throat-Clearing Opener
+
+> "In today's rapidly evolving digital landscape, the question of data privacy
+> has never been more important."
+
+The first paragraph of most model-generated essays adds no information. You can
+delete it and the piece improves immediately, because the actual argument always
+starts in the second paragraph.
+
+### The False Conclusion
+
+> "At the end of the day, what matters most is..." "Moving forward, we must..."
+
+The high school "In conclusion,..." dressed up for a professional audience. It
+signals that the model is wrapping up without actually landing on anything.
+
+### The Sycophantic Frame
+
+> "Great question!" "That's a really insightful observation."
+
+No one who writes for a living opens by complimenting the assignment.
+
+### The Listicle Instinct
+
+Models default to numbered or bulleted lists even when prose would be more
+appropriate. The lists almost always contain exactly 3, 5, 7, or 10 items (never
+4, 6, or 9), use rigidly parallel grammar, and get introduced with a preamble
+like "Here are the key considerations:"
+
+### The Hedge Stack
+
+> "It's worth noting that, while this may not be universally applicable, in many
+> cases it can potentially offer significant benefits."
+
+Five hedges in one sentence ("worth noting," "while," "may not be," "in many
+cases," "can potentially"), communicating almost nothing, because the model
+would rather be vague than risk being wrong about anything.
+
+### The Empathy Performance
+
+> "This can be a deeply challenging experience." "Your feelings are valid."
+
+Generic emotional language that could apply equally to a bad day at work or a
+natural disaster. That interchangeability is exactly what makes it identifiable.
+
+---
+
+## Structural Tells
+
+### Symmetrical Section Length
+
+If the first section of a model-generated essay runs about 150 words, every
+subsequent section will fall between 130 and 170. Human writing is much more
+uneven, with some sections running 50 words and others running 400.
+
+### The Five-Paragraph Prison
+
+Model essays follow a rigid introduction-body-conclusion arc even when nobody
+asked for one. The introduction previews the argument, the body presents 3–5
+supporting points, and the conclusion restates the thesis in slightly different
+words.
+
+### Connector Addiction
+
+Look at the first word of each paragraph in model output and you'll find an
+unbroken chain of transition words — "However," "Furthermore," "Moreover,"
+"Additionally," "That said," "To that end," "With that in mind," "Building on
+this." Human prose moves between ideas without announcing every transition.
+
+### Absence of Mess
+
+Model prose doesn't contradict itself mid-paragraph and then catch the
+contradiction, go on a tangent and have to walk it back, use an obscure idiom
+without explaining it, make a joke that risks falling flat, leave a thought
+genuinely unfinished, or keep a sentence the writer liked the sound of even
+though it doesn't quite work.
+
+Human writing does all of those things. The total absence of rough edges, false
+starts, and odd rhythmic choices is one of the strongest signals that text was
+machine-generated.
+
+---
+
+## Framing Tells
+
+### "Broader Implications"
+
+> "This has implications far beyond just the tech industry."
+
+Zooming out to claim broader significance without substantiating it. The model
+has learned that essays are supposed to gesture at big ideas, so it gestures,
+but nothing concrete is behind the gesture.
+
+### "It's important to note that..."
+
+This phrase and its variants ("it's worth noting," "it bears mentioning," "it
+should be noted") appear at absurd rates in model output and function as verbal
+tics before a qualification the model believes someone expects.
+
+### The Metaphor Crutch
+
+Models rely on a small, predictable set of metaphors — "double-edged sword,"
+"tip of the iceberg," "north star," "building blocks," "elephant in the room,"
+"perfect storm," "game-changer" — and reach for them with unusual regularity
+across every topic. The pool they draw from is noticeably smaller than what
+human writers use.
+
+---
+
+## How to Actually Spot It
+
+No single pattern on this list proves anything by itself, since humans use
+em-dashes and humans write "crucial" and humans ask rhetorical questions.
+
+What gives it away is how many of these show up at once. Model output will hit
+10–20 of these patterns per page, while human writing might trigger 2–3,
+distributed unevenly and mixed with idiosyncratic constructions that no model
+would produce. When every paragraph on the page reads like it came from the same
+careful, balanced, slightly formal, structurally predictable process, it was
+probably generated by one.
+
+---
+
+## Copyediting Checklist: Removing LLM Tells
+
+Follow this checklist when editing any document to remove machine-generated
+patterns. Go through the entire list for every piece, and do at least two full
+passes, because fixing one pattern often introduces another.
+
+### Pass 1: Word-Level Cleanup
+
+1. Search the document for every word in the overused intensifiers list
+   ("crucial," "vital," "robust," "comprehensive," "fundamental," "arguably,"
+   "straightforward," "noteworthy," "realm," "landscape," "leverage," "delve,"
+   "tapestry," "multifaceted," "nuanced," "pivotal," "unprecedented,"
+   "navigate," "foster," "underscores," "resonates," "embark," "streamline,"
+   "spearhead") and replace each one with a plainer word, or delete it entirely
+   if the sentence works without it.
+
+2. Search for the filler adverbs ("importantly," "essentially," "fundamentally,"
+   "ultimately," "inherently," "particularly," "increasingly") and delete every
+   instance where the sentence still makes sense without it, which will be most
+   of them.
+
+3. Look for elevated register drift ("utilize," "commence," "facilitate,"
+   "demonstrate," "endeavor," "transform," "craft" and similar) and replace with
+   the simpler word.
+
+4. Search for "it's important to note," "it's worth noting," "it bears
+   mentioning," and "it should be noted" and delete the phrase in every case.
+   The sentence that follows always stands on its own.
+
+5. Search for the stock metaphors ("double-edged sword," "tip of the iceberg,"
+   "north star," "building blocks," "elephant in the room," "perfect storm,"
+   "game-changer," "at the end of the day") and replace them with something
+   specific to the topic, or just state the point directly without a metaphor.
+
+### Pass 2: Sentence-Level Restructuring
+
+6. Find every em-dash pivot ("not X—but Y," "not just X—Y," "more than X—Y") and
+   rewrite it as two separate clauses or a single sentence that makes the point
+   without the negation-then-correction structure.
+
+7. Find every colon elaboration and check whether it's doing real work. If the
+   clause before the colon could be deleted without losing meaning, rewrite the
+   sentence to start with the substance that comes after the colon.
+
+8. Find every triple construction (three parallel items in a row) and either
+   reduce it to two, expand it to four or more, or break the parallelism so the
+   items don't share the same grammatical structure.
+
+9. Find every staccato burst (three or more short sentences in a row at similar
+   length) and combine at least two of them into a longer sentence, or vary
+   their lengths so they don't land at the same cadence.
+
+10. Find every unnecessary contrast ("whereas," "as opposed to," "unlike," "as
+    compared to," "except that") and check whether the contrasting clause adds
+    information that isn't already obvious from the main clause. If the sentence
+    says the same thing twice from two directions, delete the contrast.
+
+11. Find every rhetorical question that is immediately followed by its own
+    answer and rewrite the passage as a direct statement.
+
+12. Find every sentence fragment being used as its own paragraph and either
+    delete it or expand it into a complete sentence that adds actual
+    information.
+
+13. Find every pivot paragraph ("But here's where it gets interesting." and
+    similar) and delete it. The paragraph after it always contains the actual
+    point.
+
+### Pass 3: Paragraph and Section-Level Review
+
+14. Check paragraph lengths across the piece and verify they actually vary. If
+    most paragraphs have between three and five sentences, rewrite some to be
+    one or two sentences and let others run to six or seven.
+
+15. Check section lengths for suspicious uniformity. If every section is roughly
+    the same word count, combine some shorter ones or split a longer one
+    unevenly.
+
+16. Check the first word of every paragraph for chains of connectors ("However,"
+    "Furthermore," "Moreover," "Additionally," "That said"). If more than two
+    transition words start consecutive paragraphs, rewrite those openings to
+    start with their subject.
+
+17. Check whether every argument is followed by a concession or qualifier. If
+    the piece both-sides every point, pick a side on at least some of them and
+    cut the hedging.
+
+18. Read the first paragraph and ask whether deleting it would improve the
+    piece. If it's just scene-setting that previews the argument, delete it and
+    start with paragraph two.
+
+19. Read the last paragraph and check whether it restates the thesis or uses a
+    phrase like "at the end of the day" or "moving forward." If so, either
+    delete it or rewrite it to say something the piece hasn't said yet.
+
+### Pass 4: Overall Texture
+
+20. Read the piece aloud and listen for passages that sound too smooth, too
+    even, or too predictable. Human prose has rough patches. If there aren't
+    any, the piece still reads as machine output regardless of whether
+    individual patterns have been addressed.
+
+21. Check that the piece contains at least a few constructions that feel
+    idiosyncratic — a sentence with unusual word order, a parenthetical that
+    goes on a bit long, an aside only loosely connected to the main point, a
+    word choice that's specific and unexpected. If every sentence is clean and
+    correct and unremarkable, it will still read as generated.
+
+22. Verify that you haven't introduced new patterns while fixing the original
+    ones, which happens constantly. Run the entire checklist again from the top
+    on the revised version.
--- a/prompts/REPO_POLICIES.md
+++ b/prompts/REPO_POLICIES.md
@@ -145,13 +145,13 @@ style conventions are in separate documents:

 - Database migrations live in `internal/db/migrations/` and must be embedded in
  the binary.
-  - `000_migration.sql` — contains ONLY the creation of the migrations tracking
-    table itself. Nothing else.
-  - `001_schema.sql` — the full application schema.
-  - **Pre-1.0.0:** never add additional migration files (002, 003, etc.). There
-    is no installed base to migrate. Edit `001_schema.sql` directly.
-  - **Post-1.0.0:** add new numbered migration files for each schema change.
-    Never edit existing migrations after release.
+    - `000_migration.sql` — contains ONLY the creation of the migrations
+      tracking table itself. Nothing else.
+    - `001_schema.sql` — the full application schema.
+    - **Pre-1.0.0:** never add additional migration files (002, 003, etc.).
+      There is no installed base to migrate. Edit `001_schema.sql` directly.
+    - **Post-1.0.0:** add new numbered migration files for each schema change.
+      Never edit existing migrations after release.

 - All repos should have an `.editorconfig` enforcing the project's indentation
  settings.