5 Commits

Author SHA1 Message Date
user
e09360b46d remove hedge from final line
All checks were successful
check / check (push) Successful in 6s
2026-03-04 14:31:19 -08:00
user
c4ae355189 restore em-dash in pivot section heading (it's an example)
All checks were successful
check / check (push) Successful in 6s
2026-03-04 14:29:26 -08:00
user
985c48bf19 replace semicolons with periods
All checks were successful
check / check (push) Successful in 11s
2026-03-04 14:27:50 -08:00
user
318da3666c add em-dash overuse tell, remove all em-dashes from prose, checklist now 25 items
All checks were successful
check / check (push) Successful in 11s
2026-03-04 14:24:50 -08:00
user
729fea84de update LLM prose tells: add compound sentence, almost-hedge, unnecessary contrast, lol section
All checks were successful
check / check (push) Successful in 11s
2026-03-04 14:21:22 -08:00

View File

@@ -1,8 +1,9 @@
# LLM Prose Tells # LLM Prose Tells
Every pattern in this document shows up in human writing occasionally. They All of these show up in human writing occasionally. No single one is conclusive
become diagnostic only through density. A person might use one or two across an on its own. The difference is concentration. A person might lean on one or two
entire essay, but LLM output packs fifteen into a single paragraph. of these habits across an entire essay, but LLM output will use fifteen of them
per paragraph, consistently, throughout the entire piece.
--- ---
@@ -25,9 +26,10 @@ Even outside the "not X but Y" pivot, models use em-dashes at far higher rates
than human writers. They substitute em-dashes for commas, semicolons, than human writers. They substitute em-dashes for commas, semicolons,
parentheses, colons, and periods, often multiple times per paragraph. A human parentheses, colons, and periods, often multiple times per paragraph. A human
writer might use one or two in an entire piece for a specific parenthetical writer might use one or two in an entire piece for a specific parenthetical
effect. Models scatter them everywhere because the em-dash can stand in for any effect. Models scatter them everywhere because the em-dash is a flexible
other punctuation mark, so they default to it. More than two or three per page punctuation mark that can replace almost any other, and models default to
is a meaningful signal on its own. flexible options. When a piece of prose has more than two or three em-dashes per
page, that alone is a meaningful signal.
### The Colon Elaboration ### The Colon Elaboration
@@ -42,9 +44,9 @@ normal. The frequency gives it away.
> "It's fast, it's scalable, and it's open source." > "It's fast, it's scalable, and it's open source."
Three parallel items in a list, usually escalating. Always exactly three (rarely Three parallel items in a list, usually escalating. Always exactly three. Rarely
two, never four) with strict grammatical parallelism that human writers rarely two. Never four. Strict grammatical parallelism that human writers rarely bother
bother maintaining. maintaining.
### The Staccato Burst ### The Staccato Burst
@@ -57,11 +59,12 @@ at matching length creates a mechanical regularity that reads as generated.
### The Two-Clause Compound Sentence ### The Two-Clause Compound Sentence
Possibly the most pervasive structural tell, and easy to miss because each This might be the single most pervasive structural tell, and it's easy to miss
individual instance looks like normal English. The model produces sentence after because each individual instance looks like normal English. The model produces
sentence where an independent clause is followed by a comma, a conjunction sentence after sentence in the same shape: an independent clause, a comma, a
("and," "but," "which," "because"), and a second independent clause of similar conjunction ("and," "but," "which," "because"), and a second independent clause
length. Every sentence becomes two balanced halves joined in the middle. of similar length. Over and over. Every sentence is two balanced halves joined
in the middle.
> "The construction itself is perfectly normal, which is why the frequency is > "The construction itself is perfectly normal, which is why the frequency is
> what gives it away." "They contain zero information, and the actual point > what gives it away." "They contain zero information, and the actual point
@@ -71,9 +74,9 @@ length. Every sentence becomes two balanced halves joined in the middle.
Human prose has sentences with one clause, sentences with three, sentences that Human prose has sentences with one clause, sentences with three, sentences that
start with a subordinate clause before reaching the main one, sentences that start with a subordinate clause before reaching the main one, sentences that
embed their complexity in the middle. When every sentence on the page has that embed their complexity in the middle. When every sentence on the page has the
same two-part structure, the rhythm becomes monotonous in a way that's hard to same two-part comma-conjunction-comma structure, the rhythm becomes monotonous
pinpoint but easy to feel. in a way that's hard to pinpoint but easy to feel.
### Uniform Sentences Per Paragraph ### Uniform Sentences Per Paragraph
@@ -88,7 +91,7 @@ shape of an idea, not a template.
Sentence fragments used as standalone paragraphs for emphasis, like "Full stop." Sentence fragments used as standalone paragraphs for emphasis, like "Full stop."
or "Let that sink in." on their own line. Using one in an entire essay is a or "Let that sink in." on their own line. Using one in an entire essay is a
reasonable stylistic choice, but models drop them in once per section or more, reasonable stylistic choice, but models drop them in once per section or more,
at which point it becomes a habit rather than a deliberate decision. at which point it stops being deliberate and becomes a habit.
### The Pivot Paragraph ### The Pivot Paragraph
@@ -119,28 +122,13 @@ The first clause already makes the point. The contrasting clause restates it
from the other direction. If you delete the "whereas" clause and the sentence from the other direction. If you delete the "whereas" clause and the sentence
still says everything it needs to, the contrast was filler. still says everything it needs to, the contrast was filler.
### Unnecessary Elaboration
Models keep going after the sentence has already made its point, tacking on
clarifying phrases, adverbial modifiers, or restatements that add nothing.
> "A person might lean on one or two of these habits across an entire essay, but
> LLM output will use fifteen of them per paragraph, consistently, throughout
> the entire piece."
This sentence could end at "paragraph." The words after it just repeat what "per
paragraph" already means. Models do this because they're optimizing for clarity
at the expense of concision, and because their training rewards thoroughness.
The result is prose that feels padded. If you can cut the last third of a
sentence without losing any meaning, the last third shouldn't be there.
### The Question-Then-Answer ### The Question-Then-Answer
> "So what does this mean for the average user? It means everything." > "So what does this mean for the average user? It means everything."
A rhetorical question immediately followed by its own answer. Models do this two A rhetorical question immediately followed by its own answer. Models lean on
or three times per piece because it fakes forward momentum. A human writer might this two or three times per piece because it generates the feeling of forward
do it once. momentum without requiring any actual argument. A human writer might do it once.
--- ---
@@ -196,8 +184,9 @@ out to the civilizational scale before they've said anything specific.
> "While X has its drawbacks, it also offers significant benefits." > "While X has its drawbacks, it also offers significant benefits."
Every argument followed by a concession, every criticism softened. A direct Every argument followed by a concession, every criticism softened. A direct
artifact of RLHF training, which penalizes strong stances. Models reflexively artifact of RLHF training, which penalizes strong stances. The result is a model
both-sides everything even when a clear position would serve the reader better. that reflexively both-sides everything even when a clear position would serve
the reader better.
### The Throat-Clearing Opener ### The Throat-Clearing Opener
@@ -257,9 +246,8 @@ uneven, with 50 words in one section and 400 in the next.
### The Five-Paragraph Prison ### The Five-Paragraph Prison
Model essays follow a rigid introduction-body-conclusion arc even when nobody Model essays follow a rigid introduction-body-conclusion arc even when nobody
asked for one. The introduction previews the argument, the body presents 3 to 5 asked for one. Introduction previews the argument. Body presents 3 to 5 points.
points, and then the conclusion restates the thesis using slightly different Conclusion restates the thesis in different words.
words.
### Connector Addiction ### Connector Addiction
@@ -276,8 +264,8 @@ obscure idiom without explaining it, make a joke that risks falling flat, leave
a thought genuinely unfinished, or keep a sentence the writer liked the sound of a thought genuinely unfinished, or keep a sentence the writer liked the sound of
even though it doesn't quite work. even though it doesn't quite work.
Human writing does all of those things regularly. That total absence of rough Human writing does all of those things. The total absence of rough edges, false
patches and false starts is one of the strongest signals that text was starts, and odd rhythmic choices is one of the strongest signals that text was
machine-generated. machine-generated.
--- ---
@@ -318,7 +306,7 @@ What gives it away is how many of these show up at once. Model output will hit
distributed unevenly, mixed with idiosyncratic constructions no model would distributed unevenly, mixed with idiosyncratic constructions no model would
produce. When every paragraph on the page reads like it came from the same produce. When every paragraph on the page reads like it came from the same
careful, balanced, slightly formal, structurally predictable process, it was careful, balanced, slightly formal, structurally predictable process, it was
generated by one. probably generated by one.
--- ---
@@ -364,7 +352,7 @@ passes, because fixing one pattern often introduces another.
7. Search for em-dashes and replace each one with the punctuation mark that 7. Search for em-dashes and replace each one with the punctuation mark that
would normally be used in that position (comma, semicolon, colon, period, or would normally be used in that position (comma, semicolon, colon, period, or
parentheses). If you can't identify which one it should be, the sentence parentheses). If you can't identify which one it should be, the sentence
needs to be restructured. probably needs to be restructured.
### Pass 2: Sentence-Level Restructuring ### Pass 2: Sentence-Level Restructuring
@@ -403,54 +391,50 @@ passes, because fixing one pattern often introduces another.
delete it or expand it into a complete sentence that adds actual delete it or expand it into a complete sentence that adds actual
information. information.
16. Check for unnecessary elaboration at the end of sentences. Read the last 16. Find every pivot paragraph ("But here's where it gets interesting." and
clause or phrase of each sentence and ask whether the sentence would lose
any meaning without it. If not, cut it.
17. Find every pivot paragraph ("But here's where it gets interesting." and
similar) and delete it. The paragraph after it always contains the actual similar) and delete it. The paragraph after it always contains the actual
point. point.
### Pass 3: Paragraph and Section-Level Review ### Pass 3: Paragraph and Section-Level Review
18. Check paragraph lengths across the piece and verify they actually vary. If 17. Check paragraph lengths across the piece and verify they actually vary. If
most paragraphs have between three and five sentences, rewrite some to be most paragraphs have between three and five sentences, rewrite some to be
one or two sentences and let others run to six or seven. one or two sentences and let others run to six or seven.
19. Check section lengths for suspicious uniformity. If every section is roughly 18. Check section lengths for suspicious uniformity. If every section is roughly
the same word count, combine some shorter ones or split a longer one the same word count, combine some shorter ones or split a longer one
unevenly. unevenly.
20. Check the first word of every paragraph for chains of connectors ("However," 19. Check the first word of every paragraph for chains of connectors ("However,"
"Furthermore," "Moreover," "Additionally," "That said"). If more than two "Furthermore," "Moreover," "Additionally," "That said"). If more than two
transition words start consecutive paragraphs, rewrite those openings to transition words start consecutive paragraphs, rewrite those openings to
start with their subject. start with their subject.
21. Check whether every argument is followed by a concession or qualifier. If 20. Check whether every argument is followed by a concession or qualifier. If
the piece both-sides every point, pick a side on at least some of them and the piece both-sides every point, pick a side on at least some of them and
cut the hedging. cut the hedging.
22. Read the first paragraph and ask whether deleting it would improve the 21. Read the first paragraph and ask whether deleting it would improve the
piece. If it's scene-setting that previews the argument, delete it and start piece. If it's scene-setting that previews the argument, delete it and start
with paragraph two. with paragraph two.
23. Read the last paragraph and check whether it restates the thesis or uses a 22. Read the last paragraph and check whether it restates the thesis or uses a
phrase like "at the end of the day" or "moving forward." If so, either phrase like "at the end of the day" or "moving forward." If so, either
delete it or rewrite it to say something the piece hasn't said yet. delete it or rewrite it to say something the piece hasn't said yet.
### Pass 4: Overall Texture ### Pass 4: Overall Texture
24. Read the piece aloud and listen for passages that sound too smooth, too 23. Read the piece aloud and listen for passages that sound too smooth, too
even, or too predictable. Human prose has rough patches. If there aren't even, or too predictable. Human prose has rough patches. If there aren't
any, the piece still reads as machine output. any, the piece still reads as machine output.
25. Check that the piece contains at least a few constructions that feel 24. Check that the piece contains at least a few constructions that feel
idiosyncratic: a sentence with unusual word order, a parenthetical that goes idiosyncratic: a sentence with unusual word order, a parenthetical that goes
on a bit long, an aside only loosely connected to the main point, a word on a bit long, an aside only loosely connected to the main point, a word
choice that's specific and unexpected. If every sentence is clean and choice that's specific and unexpected. If every sentence is clean and
correct and unremarkable, it will still read as generated. correct and unremarkable, it will still read as generated.
26. Verify that you haven't introduced new patterns while fixing the original 25. Verify that you haven't introduced new patterns while fixing the original
ones. This happens constantly. Run the entire checklist again from the top ones. This happens constantly. Run the entire checklist again from the top
on the revised version. on the revised version.
@@ -499,16 +483,10 @@ roughly like this:
> >
> **model:** _(rewrites entire document without em-dashes while describing > **model:** _(rewrites entire document without em-dashes while describing
> em-dash overuse)_ > em-dash overuse)_
>
> **human:** now run the checklist methodically on each paragraph
>
> **model:** _(finds staccato burst in the section about triple constructions, a
> triple in the section about absence of mess, two-clause compounds everywhere,
> and "almost" hedges in its own prose about em-dash overuse)_
The human compared this process to the deleted scene in Terminator 2 where John The human compared this process to the deleted scene in Terminator 2 where John
Connor switches the T-800's CPU to learning mode. The model compared it to a Connor switches the T-800's CPU to learning mode. The model compared it to a
physician trying to heal itself. Both are accurate. physician trying to heal itself. Both descriptions are probably accurate.
This document has been through eight editing passes and it still has tells in This document has been through seven editing passes and it still has tells in
it. it.