It is not the voice that commands the story: it is the ear.

The word “chat” in ChatGPT always amuses me. It’s brilliant branding: one syllable that makes machine output feel like human conversation. But with AI, the ‘chat’ rarely feels participatory—at least not in the way we’re used to.

The best thing about human conversation is locality. You can interrupt—“wait, what do you mean by that?”—request a rephrase, and get the updated version right where the confusion happened. The whole experience is functionally nonlinear: it emerges, loops back, and self-corrects.

Fast forward to the texting era. Yes, conversation gets forced into sequential, computer-like steps; the interface is basically a log file. But texting preserves some locality because turns are short, which creates natural openings to cut in and repair the thread before it runs away.

ChatGPT inherits the messaging-app template, but the irony is that AI doesn’t speak like a person, yet we force it into that UI. The locality is gone, and that makes real knowledge hard to build.

Who chats this long?

One irony, right away, is non-interruptible, verbose generation. The knowledge is happening on stage; you’re watching in the dark. That builds a small fourth wall. If a sentence trips you up, the only move is endurance: you let the model finish its performance. Even when you spot an error, your objection can’t land where the error is. You can only respond after.

Behind the scenes, the interface quietly assumes a linear user: read the whole output, hold it in mind, then reply “cleanly.”

Most of us can’t. The moment we feel lost, our eyes drop to the composer. We fire a follow-up—and the page scrolls, appending the answer at the bottom. The clarification you needed is now physically detached from the sentence that triggered it.

And once repairs are forced to happen at the bottom of the page, every clarification comes with a tax: scrolling, reloading context, re-reading. Each new turn pushes the earlier structure further out of reach. After two or three of those, the thread is a nest of follow-ups, and the structure is out of your control.

The cheapest move isn’t “understand”—it’s “try again.” There’s nothing left to do with the answer except reroll it. The interface starts to feel like a slot machine.

Why GPT doesn’t read like real knowledge

And even when the reroll finally lands on the answer you wanted, it doesn’t feel like a discovery. It feels like a lucky hit—so it doesn’t stick.

Learning research has blunt phrases for what’s missing here: desirable difficulty and effort justification. When the answer arrives smooth and complete—without requiring any meaningful participation—you confuse ease of reading with depth of grasp, and you skip the moves that turn exposure into knowledge.

But there’s a second problem: the response is often not the knowledge you’d hope for. LLMs are strongly incentivized to produce what I’d call a locally optimal answer: a response that feels coherent and complete for the question in front of it. It fits the moment so well that it reads like closure.

What you need for long-term knowledge is usually composable knowledge: reusable pieces—definitions with boundaries, assumptions made explicit, and a map of what depends on what.

But the typical LLM answer doesn’t expose its joints. It rarely tells you which assumptions are load-bearing, what’s general versus contingent on your setup, or what would change the conclusion. You get a polished paragraph, but not the parts you could reuse.

So the argument is one thing with two mechanisms: the medium puts you in a passive posture, and the outputs arrive as finished artifacts that don’t naturally break into reusable parts. So you don’t accumulate understanding—you accumulate rerolls.

Solution: bring back locality

Notice what books get right. You can’t change the ink—but you can change what the text does for you. And crucially, your interaction stays local: next to the sentence you’re wrestling with. You underline, argue in the margin, write your own paraphrase, and return later to the same spot with the history of your confusion still visible. That’s what makes reading feel free.

The only way to bring locality to AI chatbots is to stop using superstition from the chat UI: that the model’s words have “integrity,” and changing them would be tampering. That norm made sense for human messages. It’s unexamined baggage for generated text.

The model doesn’t say anything; it produces candidate text, and humans decide what to keep and treat as a claim. That’s why rewriting isn’t vandalism—it’s the point. And it reframes “provenance,” too: not as a source of truth, but as lineage. The UI can keep prior versions in the background as a backup and a diff, while the foreground lets you handle the text.

So the proposal is: stop treating the model’s output as one indivisible message. Break it into semantic blocks, and move the discussion to the point of friction. If a sentence is confusing, your question should attach to that sentence, not to the bottom of a thread. If a paragraph is sloppy, you should be able to rewrite it in place, keep versions, and annotate why the revision is better.

That means span-anchored questions, in-place rewrite and annotation, versioning, and a stable structure that doesn’t dissolve into an infinite scroll.

This is what I’m building with Coo as an existence proof: block-level interaction where the model’s output is not a monologue you accept or replace, but something you can handle—inside your notes, at the granularity where understanding is formed.

Ending philosophy: computers are fun; interaction is human

While everyone is talking about building agents, most of what people actually do in ChatGPT is still simple: they ask questions. That means epistemology and interface design aren’t side quests. They’re under-provided public goods—because better ways of asking, checking, and revising help everyone.

And you can already hear the default reply: a better model will solve all of this. But deep down this isn’t mainly a model problem. It’s human science: how people notice confusion, test claims, form concepts, and revise beliefs. The dream is frictionless knowledge—answers delivered like electricity—but learning doesn’t work that way.

Computers aren’t just appliances. They’re tools people like using because they invite agency: you poke, you try, you iterate, you make something, you change your mind. That’s the pleasure of software at its best. And it’s also the route to understanding. Learning is not merely receiving; it’s participating.

So the goal isn’t to delete friction. It’s to keep the meaningful friction—the moments where you have to ask, rewrite, test, and connect—while removing the dumb friction of scrolling, losing context, and being forced into a spectator seat.

The problem isn’t that AI gives too much knowledge. It’s that it gives too little participation.