Merge pull request 'Add LLM prose tells reference and copyediting checklist' (#7) from add-llm-prose-tells into main
All checks were successful
check / check (push) Successful in 4s

Reviewed-on: #7
This commit was merged in pull request #7.
This commit is contained in:
2026-03-04 23:03:15 +01:00
2 changed files with 403 additions and 7 deletions

396
prompts/LLM_PROSE_TELLS.md Normal file
View File

@@ -0,0 +1,396 @@
# LLM Prose Tells
All of these show up in human writing occasionally, and no single one is
conclusive on its own. The difference is concentration, because a person might
lean on one or two of these habits across an entire essay while LLM output will
use fifteen of them per paragraph, consistently, throughout the entire piece.
---
## Sentence Structure
### The Em-Dash Pivot: "Not X—but Y"
A negation followed by an em-dash and a reframe. The single most recognizable
LLM construction.
> "It's not just a tool—it's a paradigm shift." "This isn't about
> technology—it's about trust."
Models produce this at roughly 1050x the rate of human writers, and when it
appears four times in the same essay you're almost certainly reading generated
text.
### The Colon Elaboration
A short declarative clause, then a colon, then a longer explanation.
> "The answer is simple: we need to rethink our approach from the ground up."
Models reach for this in nearly every other paragraph. The construction itself
is perfectly normal, which is why the frequency is what gives it away.
### The Triple Construction
> "It's fast, it's scalable, and it's open source."
Three parallel items in a list, usually escalating, with exactly three items
every time (rarely two, almost never four) and strict grammatical parallelism
that human writers rarely bother maintaining.
### The Staccato Burst
> "This matters. It always has. And it always will." "The data is clear. The
> trend is undeniable. The conclusion is obvious."
Runs of very short sentences at the same cadence. Human writers will use a short
sentence for emphasis occasionally, but they don't stack three or four of them
in a row at matching length, because real prose has variable rhythm. When you
see a paragraph where every sentence is under ten words and they're all roughly
the same size, that mechanical regularity is a strong signal.
### Uniform Sentences Per Paragraph
Model-generated paragraphs almost always contain between three and five
sentences, and this count holds remarkably steady across an entire piece. If the
first paragraph has four sentences, nearly every subsequent paragraph will too.
Human writers produce much more varied paragraph lengths — a single sentence
followed by one that runs eight or nine — as a natural result of following the
shape of an idea rather than filling a template.
### The Dramatic Fragment
Sentence fragments used as standalone paragraphs for emphasis, like "Full stop."
or "Let that sink in." on their own line. One of these in an entire essay is a
stylistic choice. One per section is a tic, and models drop them in at that rate
or higher.
### The Pivot Paragraph
> "But here's where it gets interesting." "Which raises an uncomfortable truth."
One-sentence paragraphs that exist only to transition between ideas. They
contain zero information, and the actual point always comes in the paragraph
that follows them. Delete every one of these and the piece reads better.
### The Parenthetical Qualifier
> "This is, of course, a simplification." "There are, to be fair, exceptions."
Parenthetical asides inserted to look thoughtful. The qualifier almost never
changes the argument that follows it, and its purpose is to perform nuance
rather than to express an actual reservation about what's being said.
### The Unnecessary Contrast
Models append a contrasting clause to statements that don't need one, tacking on
"whereas," "as opposed to," "unlike," or "except that" to draw a comparison that
adds nothing the reader couldn't already infer.
> "Models write one register above where a human would, whereas human writers
> tend to match register to context." "The lists use rigidly parallel grammar,
> as opposed to the looser structure you'd see in human writing."
The first clause already makes the point. The contrasting clause just restates
it from the other direction. This happens because models are trained to be
thorough and to anticipate objections, so they compulsively spell out both sides
of a distinction even when one side is obvious. If you delete the "whereas"
clause and the sentence still says everything it needs to, the contrast was
filler.
### The Question-Then-Answer
> "So what does this mean for the average user? It means everything."
A rhetorical question immediately followed by its own answer. Models lean on
this two or three times per piece because it generates the feeling of forward
momentum without requiring any actual argumentative work. A human writer might
do it once.
---
## Word Choice
### Overused Intensifiers
The following words appear at dramatically elevated rates in model output
compared to human-written text: "crucial," "vital," "robust," "comprehensive,"
"fundamental," "arguably," "straightforward," "noteworthy," "realm,"
"landscape," "leverage" (used as a verb), "delve," "tapestry," "multifaceted,"
"nuanced" (which models almost always apply to their own analysis), "pivotal,"
"unprecedented" (frequently applied to things that have plenty of precedent),
"navigate," "foster," "underscores," "resonates," "embark," "streamline," and
"spearhead." Three or more on the same page is a strong signal.
### Elevated Register Drift
Models consistently write one register above where a human would for the same
content, replacing "use" with "utilize," "start" with "commence," "help" with
"facilitate," "show" with "demonstrate," "try" with "endeavor," "change" with
"transform," and "make" with "craft." The tendency holds across every topic
regardless of audience.
### Filler Adverbs
"Importantly," "essentially," "fundamentally," "ultimately," "inherently,"
"particularly," and "increasingly" get dropped in to signal that something
matters. If the writing itself has already made the importance clear through its
content and structure, these adverbs aren't doing anything except taking up
space.
### "In an era of..."
> "In an era of rapid technological change..."
Almost exclusively a model habit as an essay opener. The model uses it to stall
while it figures out what the actual argument is, because almost no human writer
begins a piece by zooming out to the civilizational scale before they've said
anything specific.
---
## Rhetorical Patterns
### The Balanced Take
> "While X has its drawbacks, it also offers significant benefits."
Every argument followed by a concession, every criticism softened. A direct
artifact of RLHF training, which penalizes strong stances and produces models
that reflexively both-sides everything even when a clear position would serve
the reader better.
### The Throat-Clearing Opener
> "In today's rapidly evolving digital landscape, the question of data privacy
> has never been more important."
The first paragraph of most model-generated essays adds no information. You can
delete it and the piece improves immediately, because the actual argument always
starts in the second paragraph.
### The False Conclusion
> "At the end of the day, what matters most is..." "Moving forward, we must..."
The high school "In conclusion,..." dressed up for a professional audience. It
signals that the model is wrapping up without actually landing on anything.
### The Sycophantic Frame
> "Great question!" "That's a really insightful observation."
No one who writes for a living opens by complimenting the assignment.
### The Listicle Instinct
Models default to numbered or bulleted lists even when prose would be more
appropriate. The lists almost always contain exactly 3, 5, 7, or 10 items (never
4, 6, or 9), use rigidly parallel grammar, and get introduced with a preamble
like "Here are the key considerations:"
### The Hedge Stack
> "It's worth noting that, while this may not be universally applicable, in many
> cases it can potentially offer significant benefits."
Five hedges in one sentence ("worth noting," "while," "may not be," "in many
cases," "can potentially"), communicating almost nothing, because the model
would rather be vague than risk being wrong about anything.
### The Empathy Performance
> "This can be a deeply challenging experience." "Your feelings are valid."
Generic emotional language that could apply equally to a bad day at work or a
natural disaster. That interchangeability is exactly what makes it identifiable.
---
## Structural Tells
### Symmetrical Section Length
If the first section of a model-generated essay runs about 150 words, every
subsequent section will fall between 130 and 170. Human writing is much more
uneven, with some sections running 50 words and others running 400.
### The Five-Paragraph Prison
Model essays follow a rigid introduction-body-conclusion arc even when nobody
asked for one. The introduction previews the argument, the body presents 35
supporting points, and the conclusion restates the thesis in slightly different
words.
### Connector Addiction
Look at the first word of each paragraph in model output and you'll find an
unbroken chain of transition words — "However," "Furthermore," "Moreover,"
"Additionally," "That said," "To that end," "With that in mind," "Building on
this." Human prose moves between ideas without announcing every transition.
### Absence of Mess
Model prose doesn't contradict itself mid-paragraph and then catch the
contradiction, go on a tangent and have to walk it back, use an obscure idiom
without explaining it, make a joke that risks falling flat, leave a thought
genuinely unfinished, or keep a sentence the writer liked the sound of even
though it doesn't quite work.
Human writing does all of those things. The total absence of rough edges, false
starts, and odd rhythmic choices is one of the strongest signals that text was
machine-generated.
---
## Framing Tells
### "Broader Implications"
> "This has implications far beyond just the tech industry."
Zooming out to claim broader significance without substantiating it. The model
has learned that essays are supposed to gesture at big ideas, so it gestures,
but nothing concrete is behind the gesture.
### "It's important to note that..."
This phrase and its variants ("it's worth noting," "it bears mentioning," "it
should be noted") appear at absurd rates in model output and function as verbal
tics before a qualification the model believes someone expects.
### The Metaphor Crutch
Models rely on a small, predictable set of metaphors — "double-edged sword,"
"tip of the iceberg," "north star," "building blocks," "elephant in the room,"
"perfect storm," "game-changer" — and reach for them with unusual regularity
across every topic. The pool they draw from is noticeably smaller than what
human writers use.
---
## How to Actually Spot It
No single pattern on this list proves anything by itself, since humans use
em-dashes and humans write "crucial" and humans ask rhetorical questions.
What gives it away is how many of these show up at once. Model output will hit
1020 of these patterns per page, while human writing might trigger 23,
distributed unevenly and mixed with idiosyncratic constructions that no model
would produce. When every paragraph on the page reads like it came from the same
careful, balanced, slightly formal, structurally predictable process, it was
probably generated by one.
---
## Copyediting Checklist: Removing LLM Tells
Follow this checklist when editing any document to remove machine-generated
patterns. Go through the entire list for every piece, and do at least two full
passes, because fixing one pattern often introduces another.
### Pass 1: Word-Level Cleanup
1. Search the document for every word in the overused intensifiers list
("crucial," "vital," "robust," "comprehensive," "fundamental," "arguably,"
"straightforward," "noteworthy," "realm," "landscape," "leverage," "delve,"
"tapestry," "multifaceted," "nuanced," "pivotal," "unprecedented,"
"navigate," "foster," "underscores," "resonates," "embark," "streamline,"
"spearhead") and replace each one with a plainer word, or delete it entirely
if the sentence works without it.
2. Search for the filler adverbs ("importantly," "essentially," "fundamentally,"
"ultimately," "inherently," "particularly," "increasingly") and delete every
instance where the sentence still makes sense without it, which will be most
of them.
3. Look for elevated register drift ("utilize," "commence," "facilitate,"
"demonstrate," "endeavor," "transform," "craft" and similar) and replace with
the simpler word.
4. Search for "it's important to note," "it's worth noting," "it bears
mentioning," and "it should be noted" and delete the phrase in every case.
The sentence that follows always stands on its own.
5. Search for the stock metaphors ("double-edged sword," "tip of the iceberg,"
"north star," "building blocks," "elephant in the room," "perfect storm,"
"game-changer," "at the end of the day") and replace them with something
specific to the topic, or just state the point directly without a metaphor.
### Pass 2: Sentence-Level Restructuring
6. Find every em-dash pivot ("not X—but Y," "not just X—Y," "more than X—Y") and
rewrite it as two separate clauses or a single sentence that makes the point
without the negation-then-correction structure.
7. Find every colon elaboration and check whether it's doing real work. If the
clause before the colon could be deleted without losing meaning, rewrite the
sentence to start with the substance that comes after the colon.
8. Find every triple construction (three parallel items in a row) and either
reduce it to two, expand it to four or more, or break the parallelism so the
items don't share the same grammatical structure.
9. Find every staccato burst (three or more short sentences in a row at similar
length) and combine at least two of them into a longer sentence, or vary
their lengths so they don't land at the same cadence.
10. Find every unnecessary contrast ("whereas," "as opposed to," "unlike," "as
compared to," "except that") and check whether the contrasting clause adds
information that isn't already obvious from the main clause. If the sentence
says the same thing twice from two directions, delete the contrast.
11. Find every rhetorical question that is immediately followed by its own
answer and rewrite the passage as a direct statement.
12. Find every sentence fragment being used as its own paragraph and either
delete it or expand it into a complete sentence that adds actual
information.
13. Find every pivot paragraph ("But here's where it gets interesting." and
similar) and delete it. The paragraph after it always contains the actual
point.
### Pass 3: Paragraph and Section-Level Review
14. Check paragraph lengths across the piece and verify they actually vary. If
most paragraphs have between three and five sentences, rewrite some to be
one or two sentences and let others run to six or seven.
15. Check section lengths for suspicious uniformity. If every section is roughly
the same word count, combine some shorter ones or split a longer one
unevenly.
16. Check the first word of every paragraph for chains of connectors ("However,"
"Furthermore," "Moreover," "Additionally," "That said"). If more than two
transition words start consecutive paragraphs, rewrite those openings to
start with their subject.
17. Check whether every argument is followed by a concession or qualifier. If
the piece both-sides every point, pick a side on at least some of them and
cut the hedging.
18. Read the first paragraph and ask whether deleting it would improve the
piece. If it's just scene-setting that previews the argument, delete it and
start with paragraph two.
19. Read the last paragraph and check whether it restates the thesis or uses a
phrase like "at the end of the day" or "moving forward." If so, either
delete it or rewrite it to say something the piece hasn't said yet.
### Pass 4: Overall Texture
20. Read the piece aloud and listen for passages that sound too smooth, too
even, or too predictable. Human prose has rough patches. If there aren't
any, the piece still reads as machine output regardless of whether
individual patterns have been addressed.
21. Check that the piece contains at least a few constructions that feel
idiosyncratic — a sentence with unusual word order, a parenthetical that
goes on a bit long, an aside only loosely connected to the main point, a
word choice that's specific and unexpected. If every sentence is clean and
correct and unremarkable, it will still read as generated.
22. Verify that you haven't introduced new patterns while fixing the original
ones, which happens constantly. Run the entire checklist again from the top
on the revised version.

View File

@@ -145,13 +145,13 @@ style conventions are in separate documents:
- Database migrations live in `internal/db/migrations/` and must be embedded in
the binary.
- `000_migration.sql` — contains ONLY the creation of the migrations tracking
table itself. Nothing else.
- `001_schema.sql` — the full application schema.
- **Pre-1.0.0:** never add additional migration files (002, 003, etc.). There
is no installed base to migrate. Edit `001_schema.sql` directly.
- **Post-1.0.0:** add new numbered migration files for each schema change.
Never edit existing migrations after release.
- `000_migration.sql` — contains ONLY the creation of the migrations
tracking table itself. Nothing else.
- `001_schema.sql` — the full application schema.
- **Pre-1.0.0:** never add additional migration files (002, 003, etc.).
There is no installed base to migrate. Edit `001_schema.sql` directly.
- **Post-1.0.0:** add new numbered migration files for each schema change.
Never edit existing migrations after release.
- All repos should have an `.editorconfig` enforcing the project's indentation
settings.