Normal view

There are new articles available, click to refresh the page.

Before yesterdayMain stream

Oreilly
Job for 2027: Senior Director of Million-Dollar Regexes 24 November 2025 at 07:04

Job for 2027: Senior Director of Million-Dollar Regexes

Oreilly

By: Tim O'Brien

24 November 2025 at 07:04

The following article originally appeared on Medium and is being republished here with the author’s permission.

Don’t get me wrong, I’m up all night using these tools.

But I also sense we’re heading for an expensive hangover. The other day, a colleague told me about a new proposal to route a million documents a day through a system that identifies and removes Social Security numbers.

I joked that this was going to be a “million-dollar regular expression.”

Run the math on the “naïve” implementation with full GPT-5 and it’s eye-watering: A million messages a day at ~50K characters each works out to around 12.5 billion tokens daily, or $15,000 a day at current pricing. That’s nearly $6 million a year to check for Social Security numbers. Even if you migrate to GPT-5 Nano, you still spend about $230,000 a year.

That’s a success. You “saved” $5.77 million a year…

How about running this code for a million documents a day? How much would this cost:

import re; s = re.sub(r”\b\d{3}[- ]?\d{2}[- ]?\d{4}\b”, “[REDACTED]”, s)

A plain old EC2 instance could handle this… A single EC2 instance—something like an m1.small at 30 bucks a month—could churn through the same workload with a regex and cost you a few hundred dollars a year.

Which means that in practice, companies will be calling people like me in a year saying, “We’re burning a million dollars to do something that should cost a fraction of that—can you fix it?”

From $15,000/day to $0.96/day—I do think we’re about to see a lot of companies realize that a thinking model connected to an MCP server is way more expensive than just paying someone to write a bash script. Starting now, you’ll be able to make a career out of un-LLM-ifying applications.

Oreilly
Your AI Pair Programmer Is Not a Person 12 November 2025 at 07:21

Your AI Pair Programmer Is Not a Person

Oreilly

By: Tim O'Brien

12 November 2025 at 07:21

The following article originally appeared on Medium and is being republished here with the author’s permission.

Early on, I caught myself saying “you” to my AI tools—“Can you add retries?” “Great idea!”—like I was talking to a junior dev. And then I’d get mad when it didn’t “understand” me.

That’s on me. These models aren’t people. An AI model doesn’t understand. It generates, and it follows patterns. But the keyword here is “it.”

The Illusion of Understanding

It feels like there’s a mind on the other side because the output is fluent and polite. It says things like “Great idea!” and “I recommend…” as if it weighed options and judged your plan. It didn’t. The model doesn’t have opinions. It recognized patterns from training data and your prompt, then synthesized the next token.

That doesn’t make the tool useless. It means you are the one doing the understanding. The model is clever, fast, and often correct, but it can often be wildly wrong in a way that will confound you. But what’s important to understand is that it is your fault if this happens because you didn’t give it enough context.

Here’s an example of naive pattern following:

A friend asked his model to scaffold a project. It spit out a block comment that literally said “This is authored by <Random Name>.” He Googled the name. It was someone’s public snippet that the model had basically learned as a pattern—including the “authored by” comment—and parroted back into a new file. Not malicious. Just mechanical. It didn’t “know” that adding a fake author attribution was absurd.

Build Trust Before Code

The first mistake most folks make is overtrust. The second is lazy prompting. The fix for both is the same: Be precise about inputs, and validate the assumption you are throwing at models.

Spell out context, constraints, directory boundaries, and success criteria.

Require diffs. Run tests. Ask it to second-guess your assumptions.

Make it restate your problem, and require it to ask for confirmation.

Before you throw a $500/hour problem at a set of parallel model executions, do your own homework to make sure that you’ve communicated all of your assumptions and that the model has understood what your criteria are for success.

Failure? Look Within

I continue to fall into this trap when I ask this tool to take on too much complexity without giving it enough context. And when it fails, I’ll type things like, “You’ve got to be kidding me? Why did you…”

Just remember, there is no “you” here other than yourself.

It doesn’t share your assumptions. If you didn’t tell it not to update the database, and it wrote an idiotic migration, you did that by not outlining that the tool shouldn’t refrain from doing so.
It didn’t read your mind about the scope. If you don’t lock it to a folder, it will “helpfully” refactor the world. If it tries to remove your home directory to be helpful? That’s on you.
It wasn’t trained on only “good” code. A lot of code on the internet… is not great. Your job is to specify constraints and success criteria.

The Mental Model I Use

Treat the model like a compiler for instructions. Garbage in, garbage out. Assume it’s smart about patterns, not about your domain. Make it prove correctness with tests, invariants, and constraints.

It’s not a person. That’s not an insult. It’s your advantage. Suppose you stop expecting human‑level judgment and start supplying machine‑level clarity. In that case, your results jump, but don’t let sycophantic agreement lull you into thinking that you have a pair programmer next to you.

Oreilly
Code Generation and the Shifting Value of Software 23 October 2025 at 07:14

Code Generation and the Shifting Value of Software

Oreilly

By: Tim O'Brien

23 October 2025 at 07:14

This article originally appeared on Medium. Tim O’Brien has given us permission to repost here on Radar.

One of the most unexpected changes in software development right now comes from code generation. We’ve all known that it could speed up certain kinds of work, but what’s becoming clear is that it also reshapes the economics of libraries, frameworks, and even the way we think about open source.

Just to be clear, I don’t view this as a threat to the employment of developers. I think we’ll end up needing more developers, and I also think that more people will start to consider themselves developers. But I do think that there are practices that are expiring:

Purchasing software—It will become more challenging to sell software unless it provides a compelling and difficult-to-reproduce product.
Adopting open source frameworks—Don’t get me wrong, open source will continue to play a role, but there’s going to be more of it, and there will be fewer “star stage” projects.
Software architects—Again, I’m not saying that we won’t have software architects, but the human process of considering architecture alternatives and having very expensive discussions about abstractions is already starting to disappear.

Why Are You Paying for That?

Take paid libraries as an example. For years, developers paid for specific categories of software simply because they solved problems that felt tedious or complex to recreate. A table renderer with pagination, custom cell rendering, and filtering might have justified a license fee because of the time it saved. What developer wants to stop and rewrite the pagination logic for that React table library?

Lately, I’ve started answering, “me.” Instead of upgrading the license and paying some ridiculous per-developer fee, why not just ask Claude Sonnet to “render this component with an HTML table that also supports on-demand pagination”? At first, it feels like a mistake, but then you realize it’s cheaper and faster to ask a generative model to write a tailored implementation for that table—and it’s simpler.

Most developers who buy software libraries end up using one or two features, while most of the library’s surface area goes untouched. Flipping the switch and moving to a simpler custom approach makes your build cleaner. (I know some of you pay for a very popular React component library with a widespread table implementation that recently raised prices. I also know some of you started asking, “Do I really need this?”)

If you can point your IDE at it and say, “Hey, can you implement this in HTML with some simple JavaScript?” and it generates flawless code in five minutes—why wouldn’t you? The next question becomes: Will library creators start adding new legal clauses to lock you in? (My prediction: That’s next.)

The moat around specific, specialized libraries keeps shrinking. If you can answer “Can I just replace that?” in five minutes, then replace it.

Did You Need That Library?

This same shift also touches open source. Many of the libraries we use came out of long-term community efforts to solve straightforward problems. Logging illustrates this well: Packages like Log4j or Winston exist because developers needed consistent logging across projects. However, most teams utilize only a fraction of that functionality. These days, generating a lightweight logging library with exactly the levels and formatting you need often proves easier.

Although adopting a shared library still offers interoperability benefits, the balance tilts toward custom solutions. I just needed to format logs in a standard way. Instead of adding a dependency, we wrote a 200-line internal library. Done.

Five years ago, that might have sounded wild. Why rewrite Winston? But once you see the level of complexity these libraries carry, and you realize Claude Opus can generate that same logging library to your exact specifications in five minutes, the whole discussion shifts. Again, I’m not saying you should drop everything and craft your own logging library. But look at the 100 dependencies you have in your software—some of them add complexity you’ll never use.

Say Goodbye to “Let’s Think About”

Another subtle change shows up in how we solve problems. In the past, a new requirement meant pausing to consider the architecture, interfaces, or patterns before implementing anything. Increasingly, I delegate that “thinking” step to a model. It runs in parallel, proposing solutions while I evaluate and refine. The time between idea and execution keeps shrinking. Instead of carefully choosing among frameworks or libraries, I can ask for a bespoke implementation and iterate from there.

Compare that to five years ago. Back then, you assembled your most senior engineers and architects to brainstorm an approach. That still happens, but more often today, you end up discussing the output of five or six independent models that have already generated solutions. You discuss outcomes of models, not ideas for abstractions.

The bigger implication: Entire categories of software may lose relevance. I’ve spent years working on open source libraries like Jakarta Commons—collections of utilities that solved countless minor problems. Those projects may no longer matter when developers can write simple functionality on demand. Even build tools face this shift. Maven, for example, once justified an ecosystem of training and documentation. But in the future, documenting your build system in a way that a generative model can understand might prove more useful than teaching people how to use Maven.

The Common Thread

The pattern across all of this is simple: Software generation makes it harder to justify paying for prepackaged solutions. Both proprietary and open source libraries lose value when it’s faster to generate something custom. Direct automation displaces tooling and frameworks. Frameworks existed to capture standard code that generative models can now produce on demand.

As a result, the future may hold more custom-built code and fewer compromises to fit preexisting systems. In short, code generation doesn’t just speed up development—it fundamentally changes what’s worth building, buying, and maintaining.

Oreilly
Control Codegen Spend 9 October 2025 at 07:19

Control Codegen Spend

Oreilly

By: Tim O'Brien

9 October 2025 at 07:19

This article originally appeared on Medium. Tim O’Brien has given us permission to repost here on Radar.

When you’re working with AI tools like Cursor or GitHub Copilot, the real power isn’t just having access to different models—it’s knowing when to use them. Some jobs are OK with Auto. Others need a stronger model. And sometimes you should bail and switch if you continue spending money on a complex problem with a lower-quality model. If you don’t, you’ll waste both time and money.

And this is the missing discussion in code generation. There are a few “camps” here; the majority of people writing about this appear to view this as a fantastical and fun “vibe coding” experience, and a few people out there are trying to use this technology to deliver real products. If you are in that last category, you’ve probably started to realize that you can spend a fantastic amount of money if you don’t have a strategy for model selection.

Let’s make it very specific—if you sign up for Cursor and drop $20/month on a subscription using Auto and you are happy with the output, there’s not much to worry about. But if you are starting to run agents in parallel and are paying for token consumption atop a monthly subscription, this post will make sense. In my own experience, a single developer working alone can easily spend $200–$300/day (or four times that figure) if they are trying to tackle a project and have opted for the most expensive model.

And—if you are a company and you give your developers unlimited access to these tools—get ready for some surprises.

My Escalation Ladder for Models…

Start here: Auto. Let Cursor route to a strong model with good capacity. If output quality degrades or the loop occurs, escalate the issue. (Cursor explicitly says Auto selects among premium models and will switch when output is degraded.)
Medium-complexity tasks: Sonnet 4/GPT‑5/Gemini. Use for focused tasks on a handful of files: robust unit tests, targeted refactors, API remodels.
Heavy lift: Sonnet 4 – 1 million. If I need to do something that requires more context, but I still don’t want to pay top dollar, I’ve been starting to move up models that don’t quickly max out on context.
Ultraheavy lift: Opus 4/4.1. Use this when the task spans multiple projects or requires long context and careful reasoning, then switch back once the big move is done. (Anthropic positions Opus 4 as a deep‑reasoning, long‑horizon model for coding and agent workflows.)

Auto works fine, but there are times when you can sense that it’s selected the wrong model, and if you use these models enough, you know when you are looking at Gemini Pro output by the verbosity or the ChatGPT models by the way they go about solving a problem.

I’ll admit that my heavy and ultraheavy choices here are biased towards the models I’ve had more experience with—your own experience might vary. Still, you should also have a similar escalation list. Start with Auto and only upgrade if you need to; otherwise, you are going to learn some lessons about how much this costs.

Watch Out for “Thinking” Model Costs

Some models support explicit “thinking” (longer reasoning). Useful, but costlier. Cursor’s docs note that enabling thinking on specific Sonnet versions can count as two requests under team request accounting, and in the individual plans, the same idea translates to more tokens burned. In short, thinking mode is excellent—use it when you need it.

And when do you need it? My rule of thumb here is that when I understand what needs to be done already, when I’m asking for a unit test to be polished or a method to be executed in the pattern of another… I usually don’t need a thinking model. On the other hand, if I’m asking it to analyze a problem and propose various options for me to choose from, or (something I do often) when I’m asking it to challenge my decisions and play devil’s advocate, I will pay the premium for the best model.

Max Mode and When to Use It

If you need giant context windows or extended reasoning (e.g., sweeping changes across 20+ files), Max Mode can help—but it will consume more usage. Make Max Mode a temporary tool, not your default. If you find yourself constantly requiring Max Mode to be turned on, there’s a good chance you are “overapplying” this technology.

If it needs to consume a million tokens for hours on end? That’s usually a hint that you need another programmer. More on that later, but what I’ve seen too often are managers who think this is like the “vibe coding” they are witnessing. Spoiler alert: Vibe coding is that thing that people do in presentations because it takes five minutes to make a silly video game. It’s 100% not programming, and to use codegen, here’s the secret: You have to understand how to program.

Max Mode and thinking models are not a shortcut, and neither are they a replacement for good programmers. If you think they are, you are going to be paying top dollar for code that will one day have to be rewritten by a good programmer using these same tools.

Most Important Tip: Watch Your Bill as It Happens

The most important tip is to regularly monitor your utilization and usage fees in Cursor, since they appear within a minute or two of running something. You can see usage by the minute, the number of tokens consumed, and in some cases, how much you’re being charged beyond your subscription. Make a habit of checking a couple of times a day, especially during heavy sessions, and ideally every half hour. This helps you catch runaway costs—like spending $100 an hour—before they get out of hand, which is entirely possible if you’re running many parallel agents or doing resource-intensive work. Paying attention ensures you stay in control of both your usage and your bill.

Keep Track and Avoid Loops

The other thing you need to do is keep track of what works and what doesn’t. Over time, you’ll notice it’s very easy to make mistakes, and the models themselves can sometimes fall into loops. You might give an instruction, and instead of resolving it, the system keeps running the same process again and again. If you’re not paying attention, you can burn through a lot of tokens—and a lot of money—without actually getting sound output. That’s why it’s essential to watch your sessions closely and be ready to interrupt if something looks like it’s stuck.

Another pitfall is pushing the models beyond their limits. There are tasks they can’t handle well, and when that happens, it’s tempting to keep rephrasing the request and asking again, hoping for a better result. In practice, that often leads to the same cycle of failure, except you’re footing the bill for every attempt. Knowing where the boundaries are and when to stop is critical.

A practical way to stay on top of this is to maintain a running diary of what worked and what didn’t. Record prompts, outcomes, and notes about efficiency so you can learn from experience instead of repeating expensive mistakes. Combined with keeping an eye on your live usage metrics, this habit will help you refine your approach and avoid wasting both time and money.