Reading view

There are new articles available, click to refresh the page.

If You’ve Never Broken It, You Don’t Really Know It

By: Tim O'Brien

17 December 2025 at 09:54

The following article originally appeared on Medium and is being republished here with the author’s permission.

There’s a fake confidence you can carry around when you’re learning a new technology. You watch a few videos, skim some docs, get a toy example working, and tell yourself, “Yeah, I’ve got this.” I’ve done that. It never lasts. A difficult lesson often accompanies the only experience that matters.

You learn through failure—falling flat on your face, looking at the mess, and figuring out why it broke. Anything that feels too easy? It probably was, and you didn’t exit the process with anything worth learning.

Ask About Failure: Failure === Experience

When I’m hiring someone who claims relational database expertise, I ask a “trick” question:

Tell me about the worst database schema you ever created. What did it teach you to avoid?

It’s not really a trick. Anyone who’s been knee‑deep in relational databases knows there’s no perfect schema. There are competing use cases that constantly pull against each other. You design for transaction workloads, but inevitably, someone tries to use it for reporting, then everyone wonders why queries crawl. Another developer on the team inadvertently optimizes the schema (usually years later) for the reporting use case only to make the transactional workload unworkable.

The correct answer usually sounds like:

We built for transactional throughput—one of the founders of the company thought MySQL was a database, which was our first mistake. The business then used it for reporting purposes. The system changed hands several times over the course of several years. Joins became gnarly, indices didn’t match the access patterns, and nightly jobs started interfering with user traffic. We had to split read replicas, eventually introduce a warehouse, and after 5–6 years, we ended up simplifying the transactions and moving them over to Cassandra.

That’s a person who has lived the trade-offs. They’ve experienced a drawn-out existential failure related to running a database. While they might not know how to solve some of the silly logic questions that are increasingly popular in job interviews, this is the sort of experience that carries far more weight with me.

The Schema That Nearly Broke Me

I once shipped a transactional schema that looked fine on paper: normalized, neat, everything in its proper place.

Then analytics showed up with “just a couple of quick dashboards.” Next thing you know, my pretty 3NF model, now connected to every elementary classroom in America, was being used like a million-row Excel spreadsheet to summarize an accounting report. For a few months, it was fine until it wasn’t, and the database had made a slow‑motion faceplant because it was spending 80% of its time updating an index. It wasn’t as if I could fix anything, because that would mean several days of downtime coupled with a rewrite for a project whose contract was almost up.

And how were we trying to fix it? If you’ve been in this situation, you’ll understand that what I’m about to write is the sign that you have reached a new level of desperate failure. Instead of considering a rational approach to reform the schema or separating what had become a “web-scale” workload in 2007 from a NoSQL database, we were trying to figure out how to purchase faster hard drives with higher IOPS.

I learned a lot of things:

I learned that upgrading hardware (buying a faster machine or dropping a million dollars on hard drives) will only delay your crisis. The real fix is unavoidable—massive horizontal scaling is incompatible with relational databases.
I learned the meaning of “query plan from hell.” We band‑aided it with materialized views and read replicas. Then we did what we should’ve done from day one: set up an actual reporting path.
If you are having to optimize for a query plan every week? Your database is sending you an important signal, which you should translate to, “It’s time to start looking for an alternative.”

Lesson burned in: Design for the use case you actually have, not the one you hope to have—and assume the use case will change.

What Does This Have to Do with Cursor and Copilot?

I’m seeing a lot of people writing on LinkedIn and other sites about how amazing vibe coding is. These celebratory posts reveal more about the people posting them than they realize, as they rarely acknowledge the reality of the process—it’s not all fun and games. While it is astonishing how much progress one can make in a day or a week, those of us who are actually using these tools to write code are the first to tell you that we’re learning a lot of difficult lessons.

It’s not “easy.” There’s nothing “vibey” about the process, and if you are doing it right, you are starting to use curse words in your prompts. For example, some of my prompts in response to a Cursor Agent yesterday were: “You have got to be kidding me, I have a rule that stated that I never wanted you to do that, and you just ignored it?”

Whenever I see people get excited about the latest, greatest fad thing that’s changing the world, I’m also the first to notice that maybe they aren’t using it all. If they were, they’d understand that it’s not as “easy” as they are reporting.

The failure muscle you build with databases is the same one you need with AI coding tools. You can’t tiptoe in. You have to push until something breaks. Then you figure out how to approach a new technology as a professional.

Ask an agent to refactor one file—great.
Ask it to coordinate changes across 20 files, rethink error handling, and keep tests passing—now we’re learning.
Watch where it stumbles, and learn to frame the work so it can succeed next time.
Spend an entire weekend on a “wild goose chase” because your agentic coder decided to ignore your Cursor rules completely. ← This is expensive, but it’s how you learn.

The trick isn’t avoiding failure. It’s failing in a controlled, reversible way.

The Meta Lesson

If you’ve never broken it, you don’t really know it. This is true for coding, budgeting, managing, cooking, and skiing. If you haven’t failed, you don’t know it. And most of the people talking about “vibe coding” haven’t.

The people I trust most as engineers can tell me why something failed and how they adjusted their approach as a result. That’s the entire game with AI coding tools. The faster you can run the loop—try → break → inspect → refine—the better you get.

The End of Debugging

Oreilly

By: Tim O'Brien

10 December 2025 at 07:18

The following article originally appeared on Medium and is being republished here with the author’s permission.

This post is a follow-up to a post from last week on the progress of logging. A colleague pushed back on the idea that we’d soon be running code we don’t fully understand. He was skeptical: “We’ll still be the ones writing the code, right? You can only support the code if you wrote it, right?…right?”

That’s the assumption—but it’s already slipping.

You Don’t Have to Write (or Even Read) Every Line Anymore

I gave him a simple example. I needed drag-and-drop ordering in a form. I’ve built it before, but this time I asked Cursor: “Take this React component, make the rows draggable, persist the order, and generate tests.”

It did. I ran the tests, and everything passed; I then shipped the feature without ever opening the code. Not because I couldn’t but because I didn’t have to. That doesn’t mean I always ship this way. Most of the time, I still review, but it’s becoming more common that I don’t need to.

And this isn’t malpractice or vibe coding. The trust comes from two things: I know I can debug and fix if something goes wrong, and I have enough validation to know when the output is solid. If the code works, passes tests, and delivers the feature, I don’t need to micromanage every line of code. That shift is already here—and it’s only accelerating.

Already Comfortable Ceding Control

Which brings me back to site reliability. Production systems are on the same trajectory. We’re walking into a world where the software is watching itself, anticipating failures, and quietly fixing them before a human would ever notice. Consider how Airbus advises pilots to keep the autopilot on during turbulence. Computers don’t panic or overcorrect; they ride it out smoothly. That’s what’s coming for operations—systems that absorb the bumps without asking you to grab the controls.

This shift doesn’t eliminate humans, but it does change the work. We won’t be staring at charts all day, because the essential decisions won’t be visible in dashboards. Vendors like Elastic, Grafana, and Splunk won’t vanish, but they’ll need to reinvent their value in a world where the software is diagnosing and correcting itself before alerts even fire.

And this happens faster than you think. Not because the technology matures slowly and predictably, but because the incentives are brutal: The first companies to eliminate downtime and pager duty will have an unassailable advantage, and everyone else will scramble to follow. Within a couple of years (sorry, I meant weeks), the default assumption will be that you’re building for an MCP—the standard machine control plane that consumes your logs, interprets your signals, and acts on your behalf. If you’re not writing for it, you’ll be left behind.

More Powerful Primitives (We May Not Fully Understand)

I’ll end with this. I majored in computer engineering. I know how to design an 8-bit microprocessor on FPGAs. . .in the late 1990s. Do you think I fully understand the Apple M4 chip in the laptop I’m writing on? Conceptually, yes—I understand the principles. But I don’t know everything it’s doing, instruction by instruction. And that’s fine.

We already accept that kind of abstraction all the time. As Edsger W. Dijkstra said: “The purpose of abstraction is not to be vague, but to create a new semantic level in which one can be absolutely precise.” Abstractions give us new building blocks—smaller, sharper units of thought—that let us stop worrying about every transistor and instead design at the level of processors, operating systems, or languages.

Code generation is about to redefine that building block again. It’s not just another abstraction layer; it’s a new “atom” for how we think about software. Once that shift takes hold, we’ll start leveling up—not because we know less but because we’ll be working with more powerful primitives.

Job for 2027: Senior Director of Million-Dollar Regexes

Oreilly

By: Tim O'Brien

24 November 2025 at 07:04

The following article originally appeared on Medium and is being republished here with the author’s permission.

Don’t get me wrong, I’m up all night using these tools.

But I also sense we’re heading for an expensive hangover. The other day, a colleague told me about a new proposal to route a million documents a day through a system that identifies and removes Social Security numbers.

I joked that this was going to be a “million-dollar regular expression.”

Run the math on the “naïve” implementation with full GPT-5 and it’s eye-watering: A million messages a day at ~50K characters each works out to around 12.5 billion tokens daily, or $15,000 a day at current pricing. That’s nearly $6 million a year to check for Social Security numbers. Even if you migrate to GPT-5 Nano, you still spend about $230,000 a year.

That’s a success. You “saved” $5.77 million a year…

How about running this code for a million documents a day? How much would this cost:

import re; s = re.sub(r”\b\d{3}[- ]?\d{2}[- ]?\d{4}\b”, “[REDACTED]”, s)

A plain old EC2 instance could handle this… A single EC2 instance—something like an m1.small at 30 bucks a month—could churn through the same workload with a regex and cost you a few hundred dollars a year.

Which means that in practice, companies will be calling people like me in a year saying, “We’re burning a million dollars to do something that should cost a fraction of that—can you fix it?”

From $15,000/day to $0.96/day—I do think we’re about to see a lot of companies realize that a thinking model connected to an MCP server is way more expensive than just paying someone to write a bash script. Starting now, you’ll be able to make a career out of un-LLM-ifying applications.

Your AI Pair Programmer Is Not a Person

Oreilly

By: Tim O'Brien

12 November 2025 at 07:21

The following article originally appeared on Medium and is being republished here with the author’s permission.

Early on, I caught myself saying “you” to my AI tools—“Can you add retries?” “Great idea!”—like I was talking to a junior dev. And then I’d get mad when it didn’t “understand” me.

That’s on me. These models aren’t people. An AI model doesn’t understand. It generates, and it follows patterns. But the keyword here is “it.”

The Illusion of Understanding

It feels like there’s a mind on the other side because the output is fluent and polite. It says things like “Great idea!” and “I recommend…” as if it weighed options and judged your plan. It didn’t. The model doesn’t have opinions. It recognized patterns from training data and your prompt, then synthesized the next token.

That doesn’t make the tool useless. It means you are the one doing the understanding. The model is clever, fast, and often correct, but it can often be wildly wrong in a way that will confound you. But what’s important to understand is that it is your fault if this happens because you didn’t give it enough context.

Here’s an example of naive pattern following:

A friend asked his model to scaffold a project. It spit out a block comment that literally said “This is authored by <Random Name>.” He Googled the name. It was someone’s public snippet that the model had basically learned as a pattern—including the “authored by” comment—and parroted back into a new file. Not malicious. Just mechanical. It didn’t “know” that adding a fake author attribution was absurd.

Build Trust Before Code

The first mistake most folks make is overtrust. The second is lazy prompting. The fix for both is the same: Be precise about inputs, and validate the assumption you are throwing at models.

Spell out context, constraints, directory boundaries, and success criteria.

Require diffs. Run tests. Ask it to second-guess your assumptions.

Make it restate your problem, and require it to ask for confirmation.

Before you throw a $500/hour problem at a set of parallel model executions, do your own homework to make sure that you’ve communicated all of your assumptions and that the model has understood what your criteria are for success.

Failure? Look Within

I continue to fall into this trap when I ask this tool to take on too much complexity without giving it enough context. And when it fails, I’ll type things like, “You’ve got to be kidding me? Why did you…”

Just remember, there is no “you” here other than yourself.

It doesn’t share your assumptions. If you didn’t tell it not to update the database, and it wrote an idiotic migration, you did that by not outlining that the tool shouldn’t refrain from doing so.
It didn’t read your mind about the scope. If you don’t lock it to a folder, it will “helpfully” refactor the world. If it tries to remove your home directory to be helpful? That’s on you.
It wasn’t trained on only “good” code. A lot of code on the internet… is not great. Your job is to specify constraints and success criteria.

The Mental Model I Use

Treat the model like a compiler for instructions. Garbage in, garbage out. Assume it’s smart about patterns, not about your domain. Make it prove correctness with tests, invariants, and constraints.

It’s not a person. That’s not an insult. It’s your advantage. Suppose you stop expecting human‑level judgment and start supplying machine‑level clarity. In that case, your results jump, but don’t let sycophantic agreement lull you into thinking that you have a pair programmer next to you.

Code Generation and the Shifting Value of Software

Oreilly

By: Tim O'Brien

23 October 2025 at 07:14

This article originally appeared on Medium. Tim O’Brien has given us permission to repost here on Radar.

One of the most unexpected changes in software development right now comes from code generation. We’ve all known that it could speed up certain kinds of work, but what’s becoming clear is that it also reshapes the economics of libraries, frameworks, and even the way we think about open source.

Just to be clear, I don’t view this as a threat to the employment of developers. I think we’ll end up needing more developers, and I also think that more people will start to consider themselves developers. But I do think that there are practices that are expiring:

Purchasing software—It will become more challenging to sell software unless it provides a compelling and difficult-to-reproduce product.
Adopting open source frameworks—Don’t get me wrong, open source will continue to play a role, but there’s going to be more of it, and there will be fewer “star stage” projects.
Software architects—Again, I’m not saying that we won’t have software architects, but the human process of considering architecture alternatives and having very expensive discussions about abstractions is already starting to disappear.

Why Are You Paying for That?

Take paid libraries as an example. For years, developers paid for specific categories of software simply because they solved problems that felt tedious or complex to recreate. A table renderer with pagination, custom cell rendering, and filtering might have justified a license fee because of the time it saved. What developer wants to stop and rewrite the pagination logic for that React table library?

Lately, I’ve started answering, “me.” Instead of upgrading the license and paying some ridiculous per-developer fee, why not just ask Claude Sonnet to “render this component with an HTML table that also supports on-demand pagination”? At first, it feels like a mistake, but then you realize it’s cheaper and faster to ask a generative model to write a tailored implementation for that table—and it’s simpler.

Most developers who buy software libraries end up using one or two features, while most of the library’s surface area goes untouched. Flipping the switch and moving to a simpler custom approach makes your build cleaner. (I know some of you pay for a very popular React component library with a widespread table implementation that recently raised prices. I also know some of you started asking, “Do I really need this?”)

If you can point your IDE at it and say, “Hey, can you implement this in HTML with some simple JavaScript?” and it generates flawless code in five minutes—why wouldn’t you? The next question becomes: Will library creators start adding new legal clauses to lock you in? (My prediction: That’s next.)

The moat around specific, specialized libraries keeps shrinking. If you can answer “Can I just replace that?” in five minutes, then replace it.

Did You Need That Library?

This same shift also touches open source. Many of the libraries we use came out of long-term community efforts to solve straightforward problems. Logging illustrates this well: Packages like Log4j or Winston exist because developers needed consistent logging across projects. However, most teams utilize only a fraction of that functionality. These days, generating a lightweight logging library with exactly the levels and formatting you need often proves easier.

Although adopting a shared library still offers interoperability benefits, the balance tilts toward custom solutions. I just needed to format logs in a standard way. Instead of adding a dependency, we wrote a 200-line internal library. Done.

Five years ago, that might have sounded wild. Why rewrite Winston? But once you see the level of complexity these libraries carry, and you realize Claude Opus can generate that same logging library to your exact specifications in five minutes, the whole discussion shifts. Again, I’m not saying you should drop everything and craft your own logging library. But look at the 100 dependencies you have in your software—some of them add complexity you’ll never use.

Say Goodbye to “Let’s Think About”

Another subtle change shows up in how we solve problems. In the past, a new requirement meant pausing to consider the architecture, interfaces, or patterns before implementing anything. Increasingly, I delegate that “thinking” step to a model. It runs in parallel, proposing solutions while I evaluate and refine. The time between idea and execution keeps shrinking. Instead of carefully choosing among frameworks or libraries, I can ask for a bespoke implementation and iterate from there.

Compare that to five years ago. Back then, you assembled your most senior engineers and architects to brainstorm an approach. That still happens, but more often today, you end up discussing the output of five or six independent models that have already generated solutions. You discuss outcomes of models, not ideas for abstractions.

The bigger implication: Entire categories of software may lose relevance. I’ve spent years working on open source libraries like Jakarta Commons—collections of utilities that solved countless minor problems. Those projects may no longer matter when developers can write simple functionality on demand. Even build tools face this shift. Maven, for example, once justified an ecosystem of training and documentation. But in the future, documenting your build system in a way that a generative model can understand might prove more useful than teaching people how to use Maven.

The Common Thread

The pattern across all of this is simple: Software generation makes it harder to justify paying for prepackaged solutions. Both proprietary and open source libraries lose value when it’s faster to generate something custom. Direct automation displaces tooling and frameworks. Frameworks existed to capture standard code that generative models can now produce on demand.

As a result, the future may hold more custom-built code and fewer compromises to fit preexisting systems. In short, code generation doesn’t just speed up development—it fundamentally changes what’s worth building, buying, and maintaining.

Control Codegen Spend

Oreilly

By: Tim O'Brien

9 October 2025 at 07:19

This article originally appeared on Medium. Tim O’Brien has given us permission to repost here on Radar.

When you’re working with AI tools like Cursor or GitHub Copilot, the real power isn’t just having access to different models—it’s knowing when to use them. Some jobs are OK with Auto. Others need a stronger model. And sometimes you should bail and switch if you continue spending money on a complex problem with a lower-quality model. If you don’t, you’ll waste both time and money.

And this is the missing discussion in code generation. There are a few “camps” here; the majority of people writing about this appear to view this as a fantastical and fun “vibe coding” experience, and a few people out there are trying to use this technology to deliver real products. If you are in that last category, you’ve probably started to realize that you can spend a fantastic amount of money if you don’t have a strategy for model selection.

Let’s make it very specific—if you sign up for Cursor and drop $20/month on a subscription using Auto and you are happy with the output, there’s not much to worry about. But if you are starting to run agents in parallel and are paying for token consumption atop a monthly subscription, this post will make sense. In my own experience, a single developer working alone can easily spend $200–$300/day (or four times that figure) if they are trying to tackle a project and have opted for the most expensive model.

And—if you are a company and you give your developers unlimited access to these tools—get ready for some surprises.

My Escalation Ladder for Models…

Start here: Auto. Let Cursor route to a strong model with good capacity. If output quality degrades or the loop occurs, escalate the issue. (Cursor explicitly says Auto selects among premium models and will switch when output is degraded.)
Medium-complexity tasks: Sonnet 4/GPT‑5/Gemini. Use for focused tasks on a handful of files: robust unit tests, targeted refactors, API remodels.
Heavy lift: Sonnet 4 – 1 million. If I need to do something that requires more context, but I still don’t want to pay top dollar, I’ve been starting to move up models that don’t quickly max out on context.
Ultraheavy lift: Opus 4/4.1. Use this when the task spans multiple projects or requires long context and careful reasoning, then switch back once the big move is done. (Anthropic positions Opus 4 as a deep‑reasoning, long‑horizon model for coding and agent workflows.)

Auto works fine, but there are times when you can sense that it’s selected the wrong model, and if you use these models enough, you know when you are looking at Gemini Pro output by the verbosity or the ChatGPT models by the way they go about solving a problem.

I’ll admit that my heavy and ultraheavy choices here are biased towards the models I’ve had more experience with—your own experience might vary. Still, you should also have a similar escalation list. Start with Auto and only upgrade if you need to; otherwise, you are going to learn some lessons about how much this costs.

Watch Out for “Thinking” Model Costs

Some models support explicit “thinking” (longer reasoning). Useful, but costlier. Cursor’s docs note that enabling thinking on specific Sonnet versions can count as two requests under team request accounting, and in the individual plans, the same idea translates to more tokens burned. In short, thinking mode is excellent—use it when you need it.

And when do you need it? My rule of thumb here is that when I understand what needs to be done already, when I’m asking for a unit test to be polished or a method to be executed in the pattern of another… I usually don’t need a thinking model. On the other hand, if I’m asking it to analyze a problem and propose various options for me to choose from, or (something I do often) when I’m asking it to challenge my decisions and play devil’s advocate, I will pay the premium for the best model.

Max Mode and When to Use It

If you need giant context windows or extended reasoning (e.g., sweeping changes across 20+ files), Max Mode can help—but it will consume more usage. Make Max Mode a temporary tool, not your default. If you find yourself constantly requiring Max Mode to be turned on, there’s a good chance you are “overapplying” this technology.

If it needs to consume a million tokens for hours on end? That’s usually a hint that you need another programmer. More on that later, but what I’ve seen too often are managers who think this is like the “vibe coding” they are witnessing. Spoiler alert: Vibe coding is that thing that people do in presentations because it takes five minutes to make a silly video game. It’s 100% not programming, and to use codegen, here’s the secret: You have to understand how to program.

Max Mode and thinking models are not a shortcut, and neither are they a replacement for good programmers. If you think they are, you are going to be paying top dollar for code that will one day have to be rewritten by a good programmer using these same tools.

Most Important Tip: Watch Your Bill as It Happens

The most important tip is to regularly monitor your utilization and usage fees in Cursor, since they appear within a minute or two of running something. You can see usage by the minute, the number of tokens consumed, and in some cases, how much you’re being charged beyond your subscription. Make a habit of checking a couple of times a day, especially during heavy sessions, and ideally every half hour. This helps you catch runaway costs—like spending $100 an hour—before they get out of hand, which is entirely possible if you’re running many parallel agents or doing resource-intensive work. Paying attention ensures you stay in control of both your usage and your bill.

Keep Track and Avoid Loops

The other thing you need to do is keep track of what works and what doesn’t. Over time, you’ll notice it’s very easy to make mistakes, and the models themselves can sometimes fall into loops. You might give an instruction, and instead of resolving it, the system keeps running the same process again and again. If you’re not paying attention, you can burn through a lot of tokens—and a lot of money—without actually getting sound output. That’s why it’s essential to watch your sessions closely and be ready to interrupt if something looks like it’s stuck.

Another pitfall is pushing the models beyond their limits. There are tasks they can’t handle well, and when that happens, it’s tempting to keep rephrasing the request and asking again, hoping for a better result. In practice, that often leads to the same cycle of failure, except you’re footing the bill for every attempt. Knowing where the boundaries are and when to stop is critical.

A practical way to stay on top of this is to maintain a running diary of what worked and what didn’t. Record prompts, outcomes, and notes about efficiency so you can learn from experience instead of repeating expensive mistakes. Combined with keeping an eye on your live usage metrics, this habit will help you refine your approach and avoid wasting both time and money.