The Faust Baseline™Purchasing Page – Intelligent People Assume Nothing
micvicfaust@intelligent-people.org
The article on LLM poetry and “greatness” circles a question most people are asking sideways instead of head-on.
Can AI write great poetry?
The honest answer is: it can write convincing poetry. It can write competent poetry. It can even write lines that feel startling or beautiful in isolation. But greatness—real, durable greatness—keeps slipping out of reach.
That failure is not accidental. And it is not about creativity.
It is about judgment.
Craft Was Never the Problem
The article is right about one thing immediately: technique is no longer the bottleneck.
Modern language models can handle:
- meter
- rhyme
- formal constraint
- metaphor
- turn and closure
They can imitate almost any poetic surface that has been digitized. If greatness were only a matter of skill, we would already be done.
But anyone who has spent time with poetry knows this: craft is the entry fee, not the destination.
The Real Test: Particular → Universal
The author’s definition of greatness is old, durable, and correct.
A great poem begins in a specific life, in a specific place, inside a specific culture—and somehow reaches beyond it. The universal emerges because the poem refuses to float free of its particulars.
This is where AI poetry consistently falters.
Not because it lacks vocabulary.
Not because it lacks training data.
But because it lacks position.
Pattern Is Not Position
Language models work by moving from general pattern toward manufactured detail. Even when they produce specificity, it is usually decorative rather than earned.
They can describe:
- a street
- a face
- a ritual
- a historical reference
But they do not naturally carry what matters most: stake.
A great poem costs something to write.
It risks being wrong.
It risks being embarrassing.
It risks being misunderstood.
That risk shapes the language. Readers feel it even if they can’t name it.
Models don’t risk anything unless the system forces them to.
Why Gwern’s Experiments Matter
Gwern’s work stands out because he doesn’t treat the model as an oracle. He treats it as a workshop apprentice.
His process matters more than any single poem:
- analysis before generation
- multiple divergent drafts
- explicit critique
- role separation
- revision across time
That is not automation. That is authorship distributed across tools.
In other words, the human is still holding the line on judgment.
Gwern is supplying:
- the problem worth solving
- the constraints that matter
- the sense of taste
- the refusal to accept the first answer
The model supplies leverage. Not authority.
That distinction is everything.
Why Mercor Is Aiming at a Different Target
Mercor’s approach is not wrong—but it is aimed elsewhere.
They are optimizing for:
- preference alignment
- consistency
- professional acceptability
- scalable judgment
Poetry is being used as a stress test for taste, not as an end in itself.
The problem is structural: when you reward what most experts agree is “good,” you inevitably smooth away the edge cases. But in poetry—and in judgment more broadly—the edge cases are often where truth lives.
A rubric can penalize clichés.
A rubric can reward coherence.
A rubric can score technique.
A rubric cannot detect earned necessity.
That’s not a failure of poetry. It’s a limit of optimization.
This Is Where the Baseline Enters Quietly
What the article is really diagnosing—without naming it—is the same failure mode the Baseline was built to prevent.
The Baseline starts from a different assumption:
Judgment cannot be trusted unless it is anchored to position, constraint, and consequence.
Not preference.
Not popularity.
Not satisfaction scores.
Position means:
- Who is speaking
- From where
- Under what constraints
- With what responsibility
Without that, outputs drift toward median comfort. That’s true in poetry, law, medicine, arbitration, and governance.
The Inversion That Matters
Most AI systems move like this:
General pattern → Manufactured particular → User approval
The Baseline reverses the flow:
Concrete situation → Defined role → Named constraints → Reasoned judgment → Limited generalization
That inversion is the difference between imitation and responsibility.
It is also why the Baseline resists “traction” as a primary metric. Traction measures comfort. It does not measure correctness, truth, or durability.
Culture Is Not the Missing Ingredient — Stake Is
The article says LLMs lack culture. That’s partially true, but it misses the sharper point.
Models can absorb cultural artifacts.
They can reference history.
They can reproduce symbolic systems.
What they lack is stake.
The Baseline does not try to give AI culture. That would be fantasy.
Instead, it temporarily loans the system a position:
- a role
- a boundary
- an obligation
- an audit trail
That loan can be revoked. That is what makes it safe.
Why Poetry Is the Canary
Poetry exposes this problem early because it compresses everything:
- judgment without a single correct answer
- long-range coherence
- emotional truth
- cultural embedding
- resistance to averaging
If a system fails here, it will fail quietly elsewhere—just with higher stakes.
The article gets this right without saying it outright.
The Quiet Conclusion
If AI ever produces something we are comfortable calling “great,” it will not be because models got bigger.
It will be because:
- position was enforced
- consequence was acknowledged
- selection was human
- revision was slow
- and judgment was allowed to resist optimization
That is not a poetry lesson.
That is a governance lesson.
And it happens to be the ground the Baseline has been standing on all along.
Unauthorized commercial use prohibited.
© 2026 The Faust Baseline LLC






