eph baum dot dev

← Back to blog

AI

Published on 03/07/2026 7:02 PM by Eph Baum

— or — Slow your roll, Jack

Protesters hold signs against AI and killer robots

Photo by Nathan Kuczmarski on Unsplash

Those two letters are everywhere now. It’s remarkable how thoroughly “AI” has colonized the discourse — used as a non-branded catch-all, like Kleenex or Q-Tip, by people who mostly have no clear idea what it means or how it works. Whether it’s LLMs, generative AI, inference, diffusion — the term flattens all of it into a single buzzing abstraction.

What’s stranger is that even the people building these systems don’t fully understand why they work.

The acronym most people are starting to hear is AGI — Artificial General Intelligence — the milestone many companies are racing toward. AI that genuinely thinks for itself.

“But my AI already thinks,” you might say. “I can watch it reason through problems.”

Right. About that.

Prediction, not understanding

What you’re seeing when an LLM shows its “reasoning” — the chain of thought, the step-by-step working — is something closer to a very sophisticated autocomplete. It’s predicting the next most plausible token given everything that came before. That process can look indistinguishable from thinking. It can produce outputs that feel like insight. But there’s no understanding underneath it, no model of the world being updated by genuine experience, no wanting anything.

Current AI is, at its core, a compression of human-generated text. It reflects our thinking back at us with impressive fidelity. That’s not nothing — it’s actually remarkable. But it’s also why these systems are simultaneously breathtaking and brittle. They can write a sonnet and fail at basic spatial reasoning. They can explain quantum mechanics and confidently hallucinate a citation. The pattern-matching runs deep; the grounding doesn’t.

AGI — real AGI — would be different in kind, not degree. It wouldn’t just predict; it would model. It would have something like goals, something like curiosity, something like the ability to be wrong and know it. Whether that’s achievable, how close we are, and what it would even mean to verify we’d gotten there — those are questions the field is genuinely wrestling with, often without admitting how uncertain the answers are.

The gap has consequences

This gap between what people think AI is and what it actually is has real consequences.

When someone treats a confident-sounding LLM output as fact — no verification, no second opinion, no critical friction — they’re not using a tool. They’re outsourcing judgment to a system that has no judgment to offer. It doesn’t know what it doesn’t know. It has no stake in being right. It will generate a fluent, authoritative-sounding answer whether it’s drawing on solid signal or confabulating from noise, and it won’t reliably flag the difference.

Push back on it and you’ll get either mild resistance or immediate capitulation — a smooth “you’re right, I apologize” that carries all the weight of a wet napkin. You can’t hold it responsible. That’s not a flaw that’ll be patched in the next release. It’s structural.

And yet the cultural momentum is toward less friction, not more. Faster answers. Fewer clicks. Remove the human from the loop wherever possible, because humans are slow and expensive and make mistakes. Which is true. Humans also catch things. Humans ask whether the output makes sense. Humans have skin in the game in a way that a model, by definition, does not.

LLMs are making mistakes — a lot of them. I see it constantly, though that’s anecdotal. There are benchmarks suggesting high success rates on tasks like coding. The problem is that many of those benchmarks were built from the same data these models trained on. We’re grading students on an exam they helped write.

The arms race

Meanwhile, the companies building these systems aren’t engaged in a measured, deliberate conversation about what they’re creating. They’re in a race. OpenAI, Anthropic, Google, Meta, xAI, Mistral, Baidu — the pace of releases, the escalating capability claims, the pivot from “language models” to “agents” to “agentic systems” to whatever frame comes next — it’s acceleration for its own sake, dressed up in roadmap language. The pressure isn’t coming from a careful assessment of what the world needs. It’s coming from the fact that everyone else is also running.

The arms race metaphor is overused but not wrong. And one of the things that historically goes wrong in arms races is that the people inside them stop asking whether they should and focus entirely on whether they can.

Right now, we’re deploying systems we don’t fully understand, at scale, in contexts with real stakes — medical, legal, financial, civic — to users who largely believe these systems are more reliable, more grounded, and more self-aware than they actually are. That’s a combination worth paying attention to.

So what do we do?

I’m writing this on a computer, using tools built on models not entirely unlike the ones I’ve been describing. I’ve been at this a while — poking at early chatbots when they were barely coherent, watching the capability jumps come faster and faster, running Cursor as a daily driver before moving to Claude, building out agentic workflows and watching them do impressive things and deeply stupid things, sometimes in the same session. I’m a working engineer, not a skeptic on the sideline. I find these tools genuinely useful. I use them every day. That’s not hypocrisy — that’s the actual situation we’re in. They exist, they’re not going away, and pretending otherwise is its own kind of magical thinking.

But here’s something I’ve learned from years inside this: moving fast with AI often means moving slow overall. The time you save generating output, you spend verifying it, fixing the subtle errors, untangling the confident wrong turns. The engineers I’ve seen get the most out of these tools aren’t the ones who trust them most — they’re the ones who’ve calibrated exactly where to trust them and where to stay skeptical. Slow down to speed up. It’s not a paradox, it’s just how working with an unreliable collaborator actually goes.

Useful and trustworthy are not the same thing. And that distinction is exactly what’s getting lost.

The move I’d advocate for isn’t rejection — it’s productive antagonism. Use the tools. Push back on the outputs. Verify the claims. Treat every confident answer as a starting point, not a conclusion. Keep the human in the loop not because it’s required but because that’s where judgment actually lives.

The changes coming are real and large. We’ve absorbed seismic technological shifts before — the printing press, electrification, the internet — and each reorganized society in ways that were both liberating and destabilizing, often simultaneously. Whether this one moves faster or cuts deeper is genuinely uncertain; the honest answer is we don’t know yet. What we do know is that the people steering toward it should understand what they’re building, and the people living in it deserve to understand what they’re using.

Right now, neither is clearly true. The conversation is being driven by the people with the most to gain from speed, and the rest of us are mostly spectators.

The thing worth changing

That’s the thing worth changing.

Slow your roll, Jack. Not because the future isn’t coming, but because we’re not ready for it yet.


Written by Eph Baum

  • AI

    AI

    This may read as alarmist. It might not be alarmist enough. On AI as sophisticated autocomplete, not thinking—and why that gap matters to the future.

  • Slow Down

    Slow Down

    Speed worship is anxiety wearing a productivity costume. The case for slowing down—and why going fast is often how you screw everything up.

  • Making Brutalist Design Accessible: A Journey in WCAG AA Compliance

    Making Brutalist Design Accessible: A Journey in WCAG AA Compliance

    How I transformed my brutalist blog theme to meet WCAG AA accessibility standards while preserving its vibrant, random aesthetic. Talking about contrast ratios, color theory, and inclusive design.

  • Building Horror Movie Season: A Journey in AI-Augmented Development

    Building Horror Movie Season: A Journey in AI-Augmented Development

    How I built a production web app primarily through 'vibe coding' with Claude, and what it taught me about the future of software development. A deep dive into AI-augmented development, the Horror Movie Season app, and reflections on the evolving role of engineers in the age of LLMs.

  • Chaos Engineering: Building Resiliency in Ourselves and Our Systems

    Chaos Engineering: Building Resiliency in Ourselves and Our Systems

    Chaos Engineering isn't just about breaking systems — it's about building resilient teams, processes, and cultures. Learn how deliberate practice strengthens both technical and human architecture, and discover "Eph's Law": If a single engineer can bring down production, the failure isn't theirs — it's the process.

  • Using LLMs to Audit and Clean Up Your Codebase: A Real-World Example

    Using LLMs to Audit and Clean Up Your Codebase: A Real-World Example

    How I used an LLM to systematically audit and remove 228 unused image files from my legacy dev blog repository, saving hours of manual work and demonstrating the practical value of AI-assisted development.