AI

Published on 03/07/2026 7:02 PM by Eph Baum

— or — Slow your roll, Jack

Protesters hold signs against AI and killer robots

Photo by Nathan Kuczmarski on Unsplash

Those two letters are everywhere now. It’s remarkable how thoroughly “AI” has colonized the discourse — used as a non-branded catch-all, like Kleenex or Q-Tip, by people who mostly have no clear idea what it means or how it works. Whether it’s LLMs, generative AI, inference, diffusion — the term flattens all of it into a single buzzing abstraction.

What’s stranger is that even the people building these systems don’t fully understand why they work.

The acronym most people are starting to hear is AGI — Artificial General Intelligence — the milestone many companies are racing toward. AI that genuinely thinks for itself.

“But my AI already thinks,” you might say. “I can watch it reason through problems.”

Right. About that.

Prediction, not understanding

What you’re seeing when an LLM shows its “reasoning” — the chain of thought, the step-by-step working — is something closer to a very sophisticated autocomplete. It’s predicting the next most plausible token given everything that came before. That process can look indistinguishable from thinking. It can produce outputs that feel like insight. But there’s no understanding underneath it, no model of the world being updated by genuine experience, no wanting anything.

Current AI is, at its core, a compression of human-generated text. It reflects our thinking back at us with impressive fidelity. That’s not nothing — it’s actually remarkable. But it’s also why these systems are simultaneously breathtaking and brittle. They can write a sonnet and fail at basic spatial reasoning. They can explain quantum mechanics and confidently hallucinate a citation. The pattern-matching runs deep; the grounding doesn’t.

AGI — real AGI — would be different in kind, not degree. It wouldn’t just predict; it would model. It would have something like goals, something like curiosity, something like the ability to be wrong and know it. Whether that’s achievable, how close we are, and what it would even mean to verify we’d gotten there — those are questions the field is genuinely wrestling with, often without admitting how uncertain the answers are.

The gap has consequences

This gap between what people think AI is and what it actually is has real consequences.

When someone treats a confident-sounding LLM output as fact — no verification, no second opinion, no critical friction — they’re not using a tool. They’re outsourcing judgment to a system that has no judgment to offer. It doesn’t know what it doesn’t know. It has no stake in being right. It will generate a fluent, authoritative-sounding answer whether it’s drawing on solid signal or confabulating from noise, and it won’t reliably flag the difference.

Push back on it and you’ll get either mild resistance or immediate capitulation — a smooth “you’re right, I apologize” that carries all the weight of a wet napkin. You can’t hold it responsible. That’s not a flaw that’ll be patched in the next release. It’s structural.

And yet the cultural momentum is toward less friction, not more. Faster answers. Fewer clicks. Remove the human from the loop wherever possible, because humans are slow and expensive and make mistakes. Which is true. Humans also catch things. Humans ask whether the output makes sense. Humans have skin in the game in a way that a model, by definition, does not.

LLMs are making mistakes — a lot of them. I see it constantly, though that’s anecdotal. There are benchmarks suggesting high success rates on tasks like coding. The problem is that many of those benchmarks were built from the same data these models trained on. We’re grading students on an exam they helped write.

The arms race

Meanwhile, the companies building these systems aren’t engaged in a measured, deliberate conversation about what they’re creating. They’re in a race. OpenAI, Anthropic, Google, Meta, xAI, Mistral, Baidu — the pace of releases, the escalating capability claims, the pivot from “language models” to “agents” to “agentic systems” to whatever frame comes next — it’s acceleration for its own sake, dressed up in roadmap language. The pressure isn’t coming from a careful assessment of what the world needs. It’s coming from the fact that everyone else is also running.

The arms race metaphor is overused but not wrong. And one of the things that historically goes wrong in arms races is that the people inside them stop asking whether they should and focus entirely on whether they can.

Right now, we’re deploying systems we don’t fully understand, at scale, in contexts with real stakes — medical, legal, financial, civic — to users who largely believe these systems are more reliable, more grounded, and more self-aware than they actually are. That’s a combination worth paying attention to.

So what do we do?

I’m writing this on a computer, using tools built on models not entirely unlike the ones I’ve been describing. I’ve been at this a while — poking at early chatbots when they were barely coherent, watching the capability jumps come faster and faster, running Cursor as a daily driver before moving to Claude, building out agentic workflows and watching them do impressive things and deeply stupid things, sometimes in the same session. I’m a working engineer, not a skeptic on the sideline. I find these tools genuinely useful. I use them every day. That’s not hypocrisy — that’s the actual situation we’re in. They exist, they’re not going away, and pretending otherwise is its own kind of magical thinking.

But here’s something I’ve learned from years inside this: moving fast with AI often means moving slow overall. The time you save generating output, you spend verifying it, fixing the subtle errors, untangling the confident wrong turns. The engineers I’ve seen get the most out of these tools aren’t the ones who trust them most — they’re the ones who’ve calibrated exactly where to trust them and where to stay skeptical. Slow down to speed up. It’s not a paradox, it’s just how working with an unreliable collaborator actually goes.

Useful and trustworthy are not the same thing. And that distinction is exactly what’s getting lost.

The move I’d advocate for isn’t rejection — it’s productive antagonism. Use the tools. Push back on the outputs. Verify the claims. Treat every confident answer as a starting point, not a conclusion. Keep the human in the loop not because it’s required but because that’s where judgment actually lives.

The changes coming are real and large. We’ve absorbed seismic technological shifts before — the printing press, electrification, the internet — and each reorganized society in ways that were both liberating and destabilizing, often simultaneously. Whether this one moves faster or cuts deeper is genuinely uncertain; the honest answer is we don’t know yet. What we do know is that the people steering toward it should understand what they’re building, and the people living in it deserve to understand what they’re using.

Right now, neither is clearly true. The conversation is being driven by the people with the most to gain from speed, and the rest of us are mostly spectators.