Merge pull request 'LLM prose tells: methodical checklist pass' (#9) from llm-prose-tells-checklist-pass into main
All checks were successful
check / check (push) Successful in 4s
All checks were successful
check / check (push) Successful in 4s
Reviewed-on: #9
This commit was merged in pull request #9.
This commit is contained in:
@@ -3,7 +3,7 @@
|
|||||||
All of these show up in human writing occasionally. No single one is conclusive
|
All of these show up in human writing occasionally. No single one is conclusive
|
||||||
on its own. The difference is concentration. A person might lean on one or two
|
on its own. The difference is concentration. A person might lean on one or two
|
||||||
of these habits across an entire essay, but LLM output will use fifteen of them
|
of these habits across an entire essay, but LLM output will use fifteen of them
|
||||||
per paragraph, consistently, throughout the entire piece.
|
per paragraph.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@@ -26,10 +26,9 @@ Even outside the "not X but Y" pivot, models use em-dashes at far higher rates
|
|||||||
than human writers. They substitute em-dashes for commas, semicolons,
|
than human writers. They substitute em-dashes for commas, semicolons,
|
||||||
parentheses, colons, and periods, often multiple times per paragraph. A human
|
parentheses, colons, and periods, often multiple times per paragraph. A human
|
||||||
writer might use one or two in an entire piece for a specific parenthetical
|
writer might use one or two in an entire piece for a specific parenthetical
|
||||||
effect. Models scatter them everywhere because the em-dash is a flexible
|
effect. Models scatter them everywhere because the em-dash can stand in for any
|
||||||
punctuation mark that can replace almost any other, and models default to
|
other punctuation mark, so they default to it. More than two or three per page
|
||||||
flexible options. When a piece of prose has more than two or three em-dashes per
|
is a meaningful signal on its own.
|
||||||
page, that alone is a meaningful signal.
|
|
||||||
|
|
||||||
### The Colon Elaboration
|
### The Colon Elaboration
|
||||||
|
|
||||||
@@ -44,9 +43,9 @@ normal. The frequency gives it away.
|
|||||||
|
|
||||||
> "It's fast, it's scalable, and it's open source."
|
> "It's fast, it's scalable, and it's open source."
|
||||||
|
|
||||||
Three parallel items in a list, usually escalating. Always exactly three. Rarely
|
Three parallel items in a list, usually escalating. Always exactly three (rarely
|
||||||
two. Never four. Strict grammatical parallelism that human writers rarely bother
|
two, never four) with strict grammatical parallelism that human writers rarely
|
||||||
maintaining.
|
bother maintaining.
|
||||||
|
|
||||||
### The Staccato Burst
|
### The Staccato Burst
|
||||||
|
|
||||||
@@ -59,12 +58,11 @@ at matching length creates a mechanical regularity that reads as generated.
|
|||||||
|
|
||||||
### The Two-Clause Compound Sentence
|
### The Two-Clause Compound Sentence
|
||||||
|
|
||||||
This might be the single most pervasive structural tell, and it's easy to miss
|
Possibly the most pervasive structural tell, and easy to miss because each
|
||||||
because each individual instance looks like normal English. The model produces
|
individual instance looks like normal English. The model produces sentence after
|
||||||
sentence after sentence in the same shape: an independent clause, a comma, a
|
sentence where an independent clause is followed by a comma, a conjunction
|
||||||
conjunction ("and," "but," "which," "because"), and a second independent clause
|
("and," "but," "which," "because"), and a second independent clause of similar
|
||||||
of similar length. Over and over. Every sentence is two balanced halves joined
|
length. Every sentence becomes two balanced halves joined in the middle.
|
||||||
in the middle.
|
|
||||||
|
|
||||||
> "The construction itself is perfectly normal, which is why the frequency is
|
> "The construction itself is perfectly normal, which is why the frequency is
|
||||||
> what gives it away." "They contain zero information, and the actual point
|
> what gives it away." "They contain zero information, and the actual point
|
||||||
@@ -74,9 +72,9 @@ in the middle.
|
|||||||
|
|
||||||
Human prose has sentences with one clause, sentences with three, sentences that
|
Human prose has sentences with one clause, sentences with three, sentences that
|
||||||
start with a subordinate clause before reaching the main one, sentences that
|
start with a subordinate clause before reaching the main one, sentences that
|
||||||
embed their complexity in the middle. When every sentence on the page has the
|
embed their complexity in the middle. When every sentence on the page has that
|
||||||
same two-part comma-conjunction-comma structure, the rhythm becomes monotonous
|
same two-part structure, the rhythm becomes monotonous in a way that's hard to
|
||||||
in a way that's hard to pinpoint but easy to feel.
|
pinpoint but easy to feel.
|
||||||
|
|
||||||
### Uniform Sentences Per Paragraph
|
### Uniform Sentences Per Paragraph
|
||||||
|
|
||||||
@@ -91,7 +89,7 @@ shape of an idea, not a template.
|
|||||||
Sentence fragments used as standalone paragraphs for emphasis, like "Full stop."
|
Sentence fragments used as standalone paragraphs for emphasis, like "Full stop."
|
||||||
or "Let that sink in." on their own line. Using one in an entire essay is a
|
or "Let that sink in." on their own line. Using one in an entire essay is a
|
||||||
reasonable stylistic choice, but models drop them in once per section or more,
|
reasonable stylistic choice, but models drop them in once per section or more,
|
||||||
at which point it stops being deliberate and becomes a habit.
|
at which point it becomes a habit rather than a deliberate decision.
|
||||||
|
|
||||||
### The Pivot Paragraph
|
### The Pivot Paragraph
|
||||||
|
|
||||||
@@ -122,13 +120,28 @@ The first clause already makes the point. The contrasting clause restates it
|
|||||||
from the other direction. If you delete the "whereas" clause and the sentence
|
from the other direction. If you delete the "whereas" clause and the sentence
|
||||||
still says everything it needs to, the contrast was filler.
|
still says everything it needs to, the contrast was filler.
|
||||||
|
|
||||||
|
### Unnecessary Elaboration
|
||||||
|
|
||||||
|
Models keep going after the sentence has already made its point, tacking on
|
||||||
|
clarifying phrases, adverbial modifiers, or restatements that add nothing.
|
||||||
|
|
||||||
|
> "A person might lean on one or two of these habits across an entire essay, but
|
||||||
|
> LLM output will use fifteen of them per paragraph, consistently, throughout
|
||||||
|
> the entire piece."
|
||||||
|
|
||||||
|
This sentence could end at "paragraph." The words after it just repeat what "per
|
||||||
|
paragraph" already means. Models do this because they're optimizing for clarity
|
||||||
|
at the expense of concision, and because their training rewards thoroughness.
|
||||||
|
The result is prose that feels padded. If you can cut the last third of a
|
||||||
|
sentence without losing any meaning, the last third shouldn't be there.
|
||||||
|
|
||||||
### The Question-Then-Answer
|
### The Question-Then-Answer
|
||||||
|
|
||||||
> "So what does this mean for the average user? It means everything."
|
> "So what does this mean for the average user? It means everything."
|
||||||
|
|
||||||
A rhetorical question immediately followed by its own answer. Models lean on
|
A rhetorical question immediately followed by its own answer. Models do this two
|
||||||
this two or three times per piece because it generates the feeling of forward
|
or three times per piece because it fakes forward momentum. A human writer might
|
||||||
momentum without requiring any actual argument. A human writer might do it once.
|
do it once.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@@ -184,9 +197,8 @@ out to the civilizational scale before they've said anything specific.
|
|||||||
> "While X has its drawbacks, it also offers significant benefits."
|
> "While X has its drawbacks, it also offers significant benefits."
|
||||||
|
|
||||||
Every argument followed by a concession, every criticism softened. A direct
|
Every argument followed by a concession, every criticism softened. A direct
|
||||||
artifact of RLHF training, which penalizes strong stances. The result is a model
|
artifact of RLHF training, which penalizes strong stances. Models reflexively
|
||||||
that reflexively both-sides everything even when a clear position would serve
|
both-sides everything even when a clear position would serve the reader better.
|
||||||
the reader better.
|
|
||||||
|
|
||||||
### The Throat-Clearing Opener
|
### The Throat-Clearing Opener
|
||||||
|
|
||||||
@@ -246,8 +258,9 @@ uneven, with 50 words in one section and 400 in the next.
|
|||||||
### The Five-Paragraph Prison
|
### The Five-Paragraph Prison
|
||||||
|
|
||||||
Model essays follow a rigid introduction-body-conclusion arc even when nobody
|
Model essays follow a rigid introduction-body-conclusion arc even when nobody
|
||||||
asked for one. Introduction previews the argument. Body presents 3 to 5 points.
|
asked for one. The introduction previews the argument, the body presents 3 to 5
|
||||||
Conclusion restates the thesis in different words.
|
points, and then the conclusion restates the thesis using slightly different
|
||||||
|
words.
|
||||||
|
|
||||||
### Connector Addiction
|
### Connector Addiction
|
||||||
|
|
||||||
@@ -264,8 +277,8 @@ obscure idiom without explaining it, make a joke that risks falling flat, leave
|
|||||||
a thought genuinely unfinished, or keep a sentence the writer liked the sound of
|
a thought genuinely unfinished, or keep a sentence the writer liked the sound of
|
||||||
even though it doesn't quite work.
|
even though it doesn't quite work.
|
||||||
|
|
||||||
Human writing does all of those things. The total absence of rough edges, false
|
Human writing does all of those things regularly. That total absence of rough
|
||||||
starts, and odd rhythmic choices is one of the strongest signals that text was
|
patches and false starts is one of the strongest signals that text was
|
||||||
machine-generated.
|
machine-generated.
|
||||||
|
|
||||||
---
|
---
|
||||||
@@ -306,7 +319,7 @@ What gives it away is how many of these show up at once. Model output will hit
|
|||||||
distributed unevenly, mixed with idiosyncratic constructions no model would
|
distributed unevenly, mixed with idiosyncratic constructions no model would
|
||||||
produce. When every paragraph on the page reads like it came from the same
|
produce. When every paragraph on the page reads like it came from the same
|
||||||
careful, balanced, slightly formal, structurally predictable process, it was
|
careful, balanced, slightly formal, structurally predictable process, it was
|
||||||
probably generated by one.
|
generated by one.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@@ -352,7 +365,7 @@ passes, because fixing one pattern often introduces another.
|
|||||||
7. Search for em-dashes and replace each one with the punctuation mark that
|
7. Search for em-dashes and replace each one with the punctuation mark that
|
||||||
would normally be used in that position (comma, semicolon, colon, period, or
|
would normally be used in that position (comma, semicolon, colon, period, or
|
||||||
parentheses). If you can't identify which one it should be, the sentence
|
parentheses). If you can't identify which one it should be, the sentence
|
||||||
probably needs to be restructured.
|
needs to be restructured.
|
||||||
|
|
||||||
### Pass 2: Sentence-Level Restructuring
|
### Pass 2: Sentence-Level Restructuring
|
||||||
|
|
||||||
@@ -391,50 +404,54 @@ passes, because fixing one pattern often introduces another.
|
|||||||
delete it or expand it into a complete sentence that adds actual
|
delete it or expand it into a complete sentence that adds actual
|
||||||
information.
|
information.
|
||||||
|
|
||||||
16. Find every pivot paragraph ("But here's where it gets interesting." and
|
16. Check for unnecessary elaboration at the end of sentences. Read the last
|
||||||
|
clause or phrase of each sentence and ask whether the sentence would lose
|
||||||
|
any meaning without it. If not, cut it.
|
||||||
|
|
||||||
|
17. Find every pivot paragraph ("But here's where it gets interesting." and
|
||||||
similar) and delete it. The paragraph after it always contains the actual
|
similar) and delete it. The paragraph after it always contains the actual
|
||||||
point.
|
point.
|
||||||
|
|
||||||
### Pass 3: Paragraph and Section-Level Review
|
### Pass 3: Paragraph and Section-Level Review
|
||||||
|
|
||||||
17. Check paragraph lengths across the piece and verify they actually vary. If
|
18. Check paragraph lengths across the piece and verify they actually vary. If
|
||||||
most paragraphs have between three and five sentences, rewrite some to be
|
most paragraphs have between three and five sentences, rewrite some to be
|
||||||
one or two sentences and let others run to six or seven.
|
one or two sentences and let others run to six or seven.
|
||||||
|
|
||||||
18. Check section lengths for suspicious uniformity. If every section is roughly
|
19. Check section lengths for suspicious uniformity. If every section is roughly
|
||||||
the same word count, combine some shorter ones or split a longer one
|
the same word count, combine some shorter ones or split a longer one
|
||||||
unevenly.
|
unevenly.
|
||||||
|
|
||||||
19. Check the first word of every paragraph for chains of connectors ("However,"
|
20. Check the first word of every paragraph for chains of connectors ("However,"
|
||||||
"Furthermore," "Moreover," "Additionally," "That said"). If more than two
|
"Furthermore," "Moreover," "Additionally," "That said"). If more than two
|
||||||
transition words start consecutive paragraphs, rewrite those openings to
|
transition words start consecutive paragraphs, rewrite those openings to
|
||||||
start with their subject.
|
start with their subject.
|
||||||
|
|
||||||
20. Check whether every argument is followed by a concession or qualifier. If
|
21. Check whether every argument is followed by a concession or qualifier. If
|
||||||
the piece both-sides every point, pick a side on at least some of them and
|
the piece both-sides every point, pick a side on at least some of them and
|
||||||
cut the hedging.
|
cut the hedging.
|
||||||
|
|
||||||
21. Read the first paragraph and ask whether deleting it would improve the
|
22. Read the first paragraph and ask whether deleting it would improve the
|
||||||
piece. If it's scene-setting that previews the argument, delete it and start
|
piece. If it's scene-setting that previews the argument, delete it and start
|
||||||
with paragraph two.
|
with paragraph two.
|
||||||
|
|
||||||
22. Read the last paragraph and check whether it restates the thesis or uses a
|
23. Read the last paragraph and check whether it restates the thesis or uses a
|
||||||
phrase like "at the end of the day" or "moving forward." If so, either
|
phrase like "at the end of the day" or "moving forward." If so, either
|
||||||
delete it or rewrite it to say something the piece hasn't said yet.
|
delete it or rewrite it to say something the piece hasn't said yet.
|
||||||
|
|
||||||
### Pass 4: Overall Texture
|
### Pass 4: Overall Texture
|
||||||
|
|
||||||
23. Read the piece aloud and listen for passages that sound too smooth, too
|
24. Read the piece aloud and listen for passages that sound too smooth, too
|
||||||
even, or too predictable. Human prose has rough patches. If there aren't
|
even, or too predictable. Human prose has rough patches. If there aren't
|
||||||
any, the piece still reads as machine output.
|
any, the piece still reads as machine output.
|
||||||
|
|
||||||
24. Check that the piece contains at least a few constructions that feel
|
25. Check that the piece contains at least a few constructions that feel
|
||||||
idiosyncratic: a sentence with unusual word order, a parenthetical that goes
|
idiosyncratic: a sentence with unusual word order, a parenthetical that goes
|
||||||
on a bit long, an aside only loosely connected to the main point, a word
|
on a bit long, an aside only loosely connected to the main point, a word
|
||||||
choice that's specific and unexpected. If every sentence is clean and
|
choice that's specific and unexpected. If every sentence is clean and
|
||||||
correct and unremarkable, it will still read as generated.
|
correct and unremarkable, it will still read as generated.
|
||||||
|
|
||||||
25. Verify that you haven't introduced new patterns while fixing the original
|
26. Verify that you haven't introduced new patterns while fixing the original
|
||||||
ones. This happens constantly. Run the entire checklist again from the top
|
ones. This happens constantly. Run the entire checklist again from the top
|
||||||
on the revised version.
|
on the revised version.
|
||||||
|
|
||||||
@@ -483,10 +500,16 @@ roughly like this:
|
|||||||
>
|
>
|
||||||
> **model:** _(rewrites entire document without em-dashes while describing
|
> **model:** _(rewrites entire document without em-dashes while describing
|
||||||
> em-dash overuse)_
|
> em-dash overuse)_
|
||||||
|
>
|
||||||
|
> **human:** now run the checklist methodically on each paragraph
|
||||||
|
>
|
||||||
|
> **model:** _(finds staccato burst in the section about triple constructions, a
|
||||||
|
> triple in the section about absence of mess, two-clause compounds everywhere,
|
||||||
|
> and "almost" hedges in its own prose about em-dash overuse)_
|
||||||
|
|
||||||
The human compared this process to the deleted scene in Terminator 2 where John
|
The human compared this process to the deleted scene in Terminator 2 where John
|
||||||
Connor switches the T-800's CPU to learning mode. The model compared it to a
|
Connor switches the T-800's CPU to learning mode. The model compared it to a
|
||||||
physician trying to heal itself. Both descriptions are probably accurate.
|
physician trying to heal itself. Both are accurate.
|
||||||
|
|
||||||
This document has been through seven editing passes and it probably still has
|
This document has been through eight editing passes and it still has tells in
|
||||||
tells in it.
|
it.
|
||||||
|
|||||||
Reference in New Issue
Block a user