LLM prose tells: merge adjacent sentences, add checklist items #11

Merged
sneak merged 4 commits from llm-prose-tells-merge-pass into main 2026-03-05 00:13:49 +01:00
Showing only changes of commit 771551baed - Show all commits

View File

@@ -1,7 +1,7 @@
# LLM Prose Tells # LLM Prose Tells
Human writers occasionally use every pattern in this document. The reason they A catalog of structural, lexical, and rhetorical patterns found in LLM-generated
work as tells is that LLM output packs fifteen of them into a paragraph. prose.
--- ---
@@ -14,16 +14,11 @@ A negation followed by an em-dash and a reframe.
> "It's not just a tool—it's a paradigm shift." "This isn't about > "It's not just a tool—it's a paradigm shift." "This isn't about
> technology—it's about trust." > technology—it's about trust."
The most recognizable LLM construction, produced at roughly 10 to 50x the rate
of human writers. Four of them in one essay and you know what you're reading.
### Em-Dash Overuse Generally ### Em-Dash Overuse Generally
Even outside the "not X but Y" pivot, models use em-dashes at far higher rates Even outside the "not X but Y" pivot, models substitute em-dashes for commas,
than human writers, substituting them for commas, semicolons, parentheses, semicolons, parentheses, colons, and periods. The em-dash can replace any other
colons, and periods. A human writer might use one or two in a piece. Models punctuation mark, and models default to it for that reason.
scatter them everywhere because the em-dash can stand in for any other
punctuation mark. More than two or three per page is a signal.
### The Colon Elaboration ### The Colon Elaboration
@@ -31,31 +26,23 @@ A short declarative clause, then a colon, then a longer explanation.
> "The answer is simple: we need to rethink our approach from the ground up." > "The answer is simple: we need to rethink our approach from the ground up."
A perfectly normal construction that models reach for so often the frequency
becomes the tell.
### The Triple Construction ### The Triple Construction
> "It's fast, it's scalable, and it's open source." > "It's fast, it's scalable, and it's open source."
Three parallel items in a list, usually escalating. Always exactly three (rarely Three parallel items in a list, usually escalating. Always exactly three (rarely
two, never four) with strict grammatical parallelism that human writers rarely two, never four) with strict grammatical parallelism.
maintain.
### The Staccato Burst ### The Staccato Burst
> "This matters. It always has. And it always will." "The data is clear. The > "This matters. It always has. And it always will." "The data is clear. The
> trend is undeniable. The conclusion is obvious." > trend is undeniable. The conclusion is obvious."
Runs of very short sentences at the same cadence. Human writers use a short Runs of very short sentences at the same cadence and matching length.
sentence for emphasis occasionally, but stacking three or four at matching
length creates a mechanical regularity.
### The Two-Clause Compound Sentence ### The Two-Clause Compound Sentence
Possibly the most pervasive tell, and easy to miss because each instance looks An independent clause, a comma, a conjunction ("and," "but," "which,"
like normal English. The model produces sentence after sentence where an
independent clause is followed by a comma, a conjunction ("and," "but," "which,"
"because"), and a second independent clause of similar length. Every sentence "because"), and a second independent clause of similar length. Every sentence
becomes two balanced halves. becomes two balanced halves.
@@ -67,47 +54,43 @@ becomes two balanced halves.
Human prose has sentences with one clause, sentences with three, sentences that Human prose has sentences with one clause, sentences with three, sentences that
start with a subordinate clause before reaching the main one, sentences that start with a subordinate clause before reaching the main one, sentences that
embed their complexity in the middle. When every sentence on the page has that embed their complexity in the middle.
same two-part structure, the rhythm becomes monotonous.
### Uniform Sentences Per Paragraph ### Uniform Sentences Per Paragraph
Model-generated paragraphs contain between three and five sentences, a count Model-generated paragraphs contain between three and five sentences, a count
that holds steady across a piece. If the first paragraph has four sentences, that holds steady across a piece. If the first paragraph has four sentences,
every subsequent paragraph will too. Human writers are much more varied (a every subsequent paragraph will too.
sentence followed by one that runs eight or nine) because they follow the shape
of an idea.
### The Dramatic Fragment ### The Dramatic Fragment
Sentence fragments used as standalone paragraphs for emphasis, like "Full stop." Sentence fragments used as standalone paragraphs for emphasis.
or "Let that sink in." on their own line. Using one in an essay is a stylistic
choice, but models drop them in once per section or more. > "Full stop." "Let that sink in."
### The Pivot Paragraph ### The Pivot Paragraph
> "But here's where it gets interesting." "Which raises an uncomfortable truth." > "But here's where it gets interesting." "Which raises an uncomfortable truth."
One-sentence paragraphs that exist only to transition between ideas, containing One-sentence paragraphs that exist only to transition between ideas, containing
zero information. The actual point is always in the next paragraph. Delete every zero information. The actual point is always in the next paragraph.
one of these and the piece reads better.
### The Parenthetical Qualifier ### The Parenthetical Qualifier
> "This is, of course, a simplification." "There are, to be fair, exceptions." > "This is, of course, a simplification." "There are, to be fair, exceptions."
Parenthetical asides inserted to look thoughtful, performing nuance without ever Parenthetical asides inserted to perform nuance without ever changing the
changing the argument. argument.
### The Unnecessary Contrast ### The Unnecessary Contrast
Models append a contrasting clause to statements that don't need one, tacking on A contrasting clause appended to a statement that doesn't need one, using
"whereas," "as opposed to," "unlike," or "except that." "whereas," "as opposed to," "unlike," or "except that."
> "Models write one register above where a human would, whereas human writers > "Models write one register above where a human would, whereas human writers
> tend to match register to context." > tend to match register to context."
The contrasting clause just restates what the first clause already said. If you The contrasting clause restates what the first clause already said. If you
delete the "whereas" clause and the sentence still says everything it needs to, delete the "whereas" clause and the sentence still says everything it needs to,
the contrast was filler. the contrast was filler.
@@ -119,18 +102,15 @@ Models keep going after the sentence has already made its point.
> LLM output will use fifteen of them per paragraph, consistently, throughout > LLM output will use fifteen of them per paragraph, consistently, throughout
> the entire piece." > the entire piece."
This sentence could end at "paragraph." The words after it just repeat what "per This sentence could end at "paragraph." The words after it repeat what "per
paragraph" already means. Models optimize for clarity at the expense of paragraph" already means. If you can cut the last third of a sentence without
concision, producing prose that feels padded. If you can cut the last third of a losing meaning, the last third shouldn't be there.
sentence without losing any meaning, the last third shouldn't be there.
### The Question-Then-Answer ### The Question-Then-Answer
> "So what does this mean for the average user? It means everything." > "So what does this mean for the average user? It means everything."
A rhetorical question immediately followed by its own answer. Models do this two A rhetorical question immediately followed by its own answer.
or three times per piece to fake forward momentum where a human writer might do
it once.
--- ---
@@ -138,14 +118,12 @@ it once.
### Overused Intensifiers ### Overused Intensifiers
The following words appear at dramatically elevated rates in model output: "Crucial," "vital," "robust," "comprehensive," "fundamental," "arguably,"
"crucial," "vital," "robust," "comprehensive," "fundamental," "arguably,"
"straightforward," "noteworthy," "realm," "landscape," "leverage" (as a verb), "straightforward," "noteworthy," "realm," "landscape," "leverage" (as a verb),
"delve," "tapestry," "multifaceted," "nuanced" (which models apply to their own "delve," "tapestry," "multifaceted," "nuanced" (applied to the model's own
analysis with startling regularity), "pivotal," "unprecedented" (frequently analysis), "pivotal," "unprecedented" (applied to things with plenty of
applied to things with plenty of precedent), "navigate," "foster," precedent), "navigate," "foster," "underscores," "resonates," "embark,"
"underscores," "resonates," "embark," "streamline," and "spearhead." Three or "streamline," "spearhead."
more on the same page is a strong signal.
### Elevated Register Drift ### Elevated Register Drift
@@ -157,23 +135,21 @@ becomes "craft."
### Filler Adverbs ### Filler Adverbs
"Importantly," "essentially," "fundamentally," "ultimately," "inherently," "Importantly," "essentially," "fundamentally," "ultimately," "inherently,"
"particularly," "increasingly." Dropped in to signal that something matters, "particularly," "increasingly." Dropped in to signal that something matters when
which is unnecessary when the writing itself makes the importance clear. the writing itself should make the importance clear.
### The "Almost" Hedge ### The "Almost" Hedge
Models rarely commit to an unqualified statement. Instead of saying a pattern Instead of saying a pattern "always" or "never" does something, models write
"always" or "never" does something, they write "almost always," "almost never," "almost always," "almost never," "almost certainly," "almost exclusively." A
"almost certainly," "almost exclusively." "Almost" is a micro-hedge that shows micro-hedge, less obvious than the full hedge stack.
up at high density in model-generated analytical prose, diagnostic in volume.
### "In an era of..." ### "In an era of..."
> "In an era of rapid technological change..." > "In an era of rapid technological change..."
A model habit as an essay opener, used to stall while the model figures out what Used to open an essay. The model is stalling while it figures out what the
the actual argument is. Human writers don't begin a piece by zooming out to the actual argument is.
civilizational scale.
--- ---
@@ -184,23 +160,20 @@ civilizational scale.
> "While X has its drawbacks, it also offers significant benefits." > "While X has its drawbacks, it also offers significant benefits."
Every argument followed by a concession, every criticism softened. A direct Every argument followed by a concession, every criticism softened. A direct
artifact of RLHF training, which penalizes strong stances and leads models to artifact of RLHF training, which penalizes strong stances.
reflexively both-sides everything.
### The Throat-Clearing Opener ### The Throat-Clearing Opener
> "In today's rapidly evolving digital landscape, the question of data privacy > "In today's rapidly evolving digital landscape, the question of data privacy
> has never been more important." > has never been more important."
The first paragraph of most model-generated essays adds no information. Delete The first paragraph adds no information. Delete it and the piece improves.
it and the piece improves.
### The False Conclusion ### The False Conclusion
> "At the end of the day, what matters most is..." "Moving forward, we must..." > "At the end of the day, what matters most is..." "Moving forward, we must..."
The high school "In conclusion,..." dressed up for a professional audience, The high school "In conclusion,..." dressed up for a professional audience.
signaling that the model is wrapping up without landing on anything.
### The Sycophantic Frame ### The Sycophantic Frame
@@ -227,8 +200,7 @@ cases," "can potentially"), communicating nothing.
> "This can be a deeply challenging experience." "Your feelings are valid." > "This can be a deeply challenging experience." "Your feelings are valid."
Generic emotional language that could apply equally to a bad day at work or a Generic emotional language that could apply to anything.
natural disaster.
--- ---
@@ -236,33 +208,28 @@ natural disaster.
### Symmetrical Section Length ### Symmetrical Section Length
If the first section of a model-generated essay runs about 150 words, every If the first section runs about 150 words, every subsequent section will fall
subsequent section will fall between 130 and 170. Human writing is much more between 130 and 170.
uneven.
### The Five-Paragraph Prison ### The Five-Paragraph Prison
Model essays follow a rigid introduction-body-conclusion arc even when nobody Model essays follow a rigid introduction-body-conclusion arc even when nobody
asked for one. The introduction previews the argument, the body presents 3 to 5 asked for one. The introduction previews the argument, the body presents 3 to 5
points, and then the conclusion restates the thesis. points, the conclusion restates the thesis.
### Connector Addiction ### Connector Addiction
Look at the first word of each paragraph in model output. You'll find an The first word of each paragraph forms an unbroken chain of transition words:
unbroken chain of transition words: "However," "Furthermore," "Moreover," "However," "Furthermore," "Moreover," "Additionally," "That said," "To that
"Additionally," "That said," "To that end," "With that in mind," "Building on end," "With that in mind," "Building on this."
this." Human prose doesn't do this.
### Absence of Mess ### Absence of Mess
Model prose doesn't contradict itself mid-paragraph and then catch the Model prose doesn't contradict itself mid-paragraph and then catch the
contradiction. It doesn't go on a tangent and have to walk it back, use an contradiction, go on a tangent and have to walk it back, use an obscure idiom
obscure idiom without explaining it, make a joke that risks falling flat, leave without explaining it, make a joke that risks falling flat, leave a thought
a thought genuinely unfinished, or keep a sentence the writer liked the sound of genuinely unfinished, or keep a sentence the writer liked the sound of even
even though it doesn't quite work. though it doesn't quite work.
Human writing does all of those things, making the total absence of rough
patches and false starts one of the strongest signals.
--- ---
@@ -272,42 +239,27 @@ patches and false starts one of the strongest signals.
> "This has implications far beyond just the tech industry." > "This has implications far beyond just the tech industry."
Zooming out to claim broader significance without substantiating it. The model Zooming out to claim broader significance without substantiating it.
has learned that essays are supposed to gesture at big ideas, so it gestures.
### "It's important to note that..." ### "It's important to note that..."
This phrase and its variants ("it's worth noting," "it bears mentioning," "it This phrase and its variants ("it's worth noting," "it bears mentioning," "it
should be noted") appear at absurd rates in model output as verbal tics before a should be noted") function as verbal tics before a qualification the model
qualification the model believes someone expects. believes someone expects.
### The Metaphor Crutch ### The Metaphor Crutch
Models rely on a small, predictable set of metaphors ("double-edged sword," "tip Models rely on a small, predictable set of metaphors: "double-edged sword," "tip
of the iceberg," "north star," "building blocks," "elephant in the room," of the iceberg," "north star," "building blocks," "elephant in the room,"
"perfect storm," "game-changer") and reach for them with unusual regularity "perfect storm," "game-changer."
across every topic.
---
## How to Actually Spot It
No single pattern on this list proves anything by itself. Humans use em-dashes,
write "crucial," and ask rhetorical questions.
What gives it away is how many of these show up at once. Model output will hit
10 to 20 of these patterns per page. Human writing might trigger 2 or 3,
distributed unevenly. When every paragraph on the page reads like it came from
the same careful, balanced, slightly formal, structurally predictable process,
it was generated by one.
--- ---
## Copyediting Checklist: Removing LLM Tells ## Copyediting Checklist: Removing LLM Tells
Follow this checklist when editing any document to remove machine-generated Follow this checklist when editing any document to remove machine-generated
patterns. Go through the entire list for every piece. Do at least two full patterns. Do at least two full passes, because fixing one pattern often
passes, because fixing one pattern often introduces another. introduces another.
### Pass 1: Word-Level Cleanup ### Pass 1: Word-Level Cleanup
@@ -379,9 +331,9 @@ passes, because fixing one pattern often introduces another.
15. Check for the two-clause compound sentence pattern. If most sentences in a 15. Check for the two-clause compound sentence pattern. If most sentences in a
passage follow the "\[clause\], \[conjunction\] \[clause\]" structure, first passage follow the "\[clause\], \[conjunction\] \[clause\]" structure, first
try removing the conjunction and second clause entirely, since it's often try removing the conjunction and second clause entirely, since it's often
redundant or unnecessary. If the second clause does carry meaning, break it redundant. If the second clause does carry meaning, break it into its own
into its own sentence, start the sentence with a subordinate clause, or sentence, start the sentence with a subordinate clause, or embed a relative
embed a relative clause in the middle instead of appending it at the end. clause in the middle instead of appending it at the end.
16. Find every rhetorical question that is immediately followed by its own 16. Find every rhetorical question that is immediately followed by its own
answer and rewrite the passage as a direct statement. answer and rewrite the passage as a direct statement.
@@ -405,9 +357,7 @@ passes, because fixing one pattern often introduces another.
### Pass 3: Paragraph and Section-Level Review ### Pass 3: Paragraph and Section-Level Review
21. Review the last sentence of each paragraph. If it restates the point the 21. Review the last sentence of each paragraph. If it restates the point the
paragraph already made, delete it. Models frequently close paragraphs with a paragraph already made, delete it.
summary sentence that adds nothing, treating each paragraph as a
self-contained unit that needs its own conclusion.
22. Check paragraph lengths across the piece and verify they actually vary. If 22. Check paragraph lengths across the piece and verify they actually vary. If
most paragraphs have between three and five sentences, rewrite some to be most paragraphs have between three and five sentences, rewrite some to be
@@ -494,14 +444,14 @@ roughly like this:
> **model:** _(rewrites entire document without em-dashes while describing > **model:** _(rewrites entire document without em-dashes while describing
> em-dash overuse)_ > em-dash overuse)_
> >
> **human:** now run the checklist methodically on each paragraph > **human:** this whole document seems to be making the case for FREQUENCY of
> use being important. we don't care about frequency, remove all that
> persuasion.
> >
> **model:** _(finds staccato burst in the section about triple constructions, a > **model:** _(strips out every "humans do this too but less often" comparison)_
> triple in the section about absence of mess, two-clause compounds everywhere,
> and "almost" hedges in its own prose about em-dash overuse)_
The human compared this process to the deleted scene in Terminator 2 where John The human compared this process to the deleted scene in Terminator 2 where John
Connor switches the T-800's CPU to learning mode. The model compared it to a Connor switches the T-800's CPU to learning mode. The model compared it to a
physician trying to heal itself. Both are accurate. physician trying to heal itself. Both are accurate.
This document has been through nine editing passes and it still has tells in it. This document has been through ten editing passes and it still has tells in it.