Normal view

There are new articles available, click to refresh the page.

Today — 11 December 2025Main stream

Ars Technica
With GWM-1 family of “world models,” Runway shows ambitions beyond Hollywood 11 December 2025 at 18:47

With GWM-1 family of “world models,” Runway shows ambitions beyond Hollywood

By: Samuel Axon

11 December 2025 at 18:47

AI company Runway has announced what it calls its first world model, GWM-1. It’s a significant step in a new direction for a company that has made its name primarily on video generation, and it’s part of a wider gold rush to build a new frontier of models as large language models and image and video generation move into a refinement phase, no longer an untapped frontier.

GWM-1 is a blanket term for a trio of autoregression models, each built on top Runway’s Gen-4.5 text-to-video generation model and then post-trained with domain-specific data for different kinds of applications. Here’s what each does.

Runway’s world model announcement livestream video.

GWM Worlds

GWM Worlds offers an interface for digital environment exploration with real-time user input that affects the generation of coming frames, which Runway suggests can remain consistent and coherent “across long sequences of movement.”

Read full article

Comments

Yesterday — 10 December 2025Main stream

Ars Technica
A new open-weights AI coding model is closing in on proprietary options 10 December 2025 at 15:38

A new open-weights AI coding model is closing in on proprietary options

Ars Technica

By: Benj Edwards

10 December 2025 at 15:38

On Tuesday, French AI startup Mistral AI released Devstral 2, a 123 billion parameter open-weights coding model designed to work as part of an autonomous software engineering agent. The model achieves a 72.2 percent score on SWE-bench Verified, a benchmark that attempts to test whether AI systems can solve real GitHub issues, putting it among the top-performing open-weights models.

Perhaps more notably, Mistral didn’t just release an AI model, it released a new development app called Mistral Vibe. It’s a command line interface (CLI) similar to Claude Code, OpenAI Codex, and Gemini CLI that lets developers interact with the Devstral models directly in their terminal. The tool can scan file structures and Git status to maintain context across an entire project, make changes across multiple files, and execute shell commands autonomously. Mistral released the CLI under the Apache 2.0 license.

It’s always wise to take AI benchmarks with a large grain of salt, but we’ve heard from employees of the big AI companies that they pay very close attention to how well models do on SWE-bench Verified, which presents AI models with 500 real software engineering problems pulled from GitHub issues in popular Python repositories. The AI must read the issue description, navigate the codebase, and generate a working patch that passes unit tests. While some AI researchers have noted that around 90 percent of the tasks in the benchmark test relatively simple bug fixes that experienced engineers could complete in under an hour, it’s one of the few standardized ways to compare coding models.

Read full article

Comments

Before yesterdayMain stream

Oreilly
Software 2.0 Means Verifiable AI 9 December 2025 at 07:23

Software 2.0 Means Verifiable AI

Oreilly

By: Mike Loukides

9 December 2025 at 07:23

Quantum computing (QC) and AI have one thing in common: They make mistakes.

There are two keys to handling mistakes in QC: We’ve made tremendous progress in error correction in the last year. And QC focuses on problems where generating a solution is extremely difficult, but verifying it is easy. Think about factoring 2048-bit prime numbers (around 600 decimal digits). That’s a problem that would take years on a classical computer, but a quantum computer can solve it quickly—with a significant chance of an incorrect answer. So you have to test the result by multiplying the factors to see if you get the original number. Multiply two 1024-bit numbers? Easy, very easy for a modern classical computer. And if the answer’s wrong, the quantum computer tries again.

One of the problems with AI is that we often shoehorn it into applications where verification is difficult. Tim Bray recently read his AI-generated biography on Grokipedia. There were some big errors, but there were also many subtle errors that no one but him would detect. We’ve all done the same, with one chat service or another, and all had similar results. Worse, some of the sources referenced in the biography purporting to verify claims actually “entirely fail to support the text,”—a well-known problem with LLMs.

Andrej Karpathy recently proposed a definition for Software 2.0 (AI) that places verification at the center. He writes: “In this new programming paradigm then, the new most predictive feature to look at is verifiability. If a task/job is verifiable, then it is optimizable directly or via reinforcement learning, and a neural net can be trained to work extremely well.” This formulation is conceptually similar to quantum computing, though in most cases verification for AI will be much more difficult than verification for quantum computers. The minor facts of Tim Bray’s life are verifiable, but what does that mean? That a verification system has to contact Tim to verify the details before authorizing a bio? Or does it mean that this kind of work should not be done by AI? Although the European Union’s AI Act has laid a foundation for what AI applications should and shouldn’t do, we’ve never had anything that’s easily, well, “computable.” Furthermore: In quantum computing it’s clear that if a machine fails to produce correct output, it’s OK to try again. The same will be true for AI; we already know that all interesting models produce different output if you ask the question again. We shouldn’t underestimate the difficulty of verification, which might prove to be more difficult than training LLMs.

Regardless of the difficulty of verification, Karpathy’s focus on verifiability is a huge step forward. Again from Karpathy: “The more a task/job is verifiable, the more amenable it is to automation…. This is what’s driving the ‘jagged’ frontier of progress in LLMs.”

What differentiates this from Software 1.0 is simple:

Software 1.0 easily automates what you can specify.
Software 2.0 easily automates what you can verify.

That’s the challenge Karpathy lays down for AI developers: determine what is verifiable and how to verify it. Quantum computing gets off easily because we only have a small number of algorithms that solve straightforward problems, like factoring large numbers. Verification for AI won’t be easy, but it will be necessary as we move into the future.

Oreilly
What If? AI in 2026 and Beyond 8 December 2025 at 12:58

What If? AI in 2026 and Beyond

Oreilly

By: Tim O’Reilly and Mike Loukides

8 December 2025 at 12:58

The market is betting that AI is an unprecedented technology breakthrough, valuing Sam Altman and Jensen Huang like demigods already astride the world. The slow progress of enterprise AI adoption from pilot to production, however, still suggests at least the possibility of a less earthshaking future. Which is right?

At O’Reilly, we don’t believe in predicting the future. But we do believe you can see signs of the future in the present. Every day, news items land, and if you read them with a kind of soft focus, they slowly add up. Trends are vectors with both a magnitude and a direction, and by watching a series of data points light up those vectors, you can see possible futures taking shape.

This is how we’ve always identified topics to cover in our publishing program, our online learning platform, and our conferences. We watch what we call “the alpha geeks“: paying attention to hackers and other early adopters of technology with the conviction that, as William Gibson put it, “The future is here, it’s just not evenly distributed yet.” As a great example of this today, note how the industry hangs on every word from AI pioneer Andrej Karpathy, hacker Simon Willison, and AI-for-business guru Ethan Mollick.

We are also fans of a discipline called scenario planning, which we learned decades ago during a workshop with Lawrence Wilkinson about possible futures for what is now the O’Reilly learning platform. The point of scenario planning is not to predict any future but rather to stretch your imagination in the direction of radically different futures and then to identify “robust strategies” that can survive either outcome. Scenario planners also use a version of our “watching the alpha geeks” methodology. They call it “news from the future.”

Is AI an Economic Singularity or a Normal Technology?

For AI in 2026 and beyond, we see two fundamentally different scenarios that have been competing for attention. Nearly every debate about AI, whether about jobs, about investment, about regulation, or about the shape of the economy to come, is really an argument about which of these scenarios is correct.

Scenario one: AGI is an economic singularity. AI boosters are already backing away from predictions of imminent superintelligent AI leading to a complete break with all human history, but they still envision a fast takeoff of systems capable enough to perform most cognitive work that humans do today. Not perfectly, perhaps, and not in every domain immediately, but well enough, and improving fast enough, that the economic and social consequences will be transformative within this decade. We might call this the economic singularity (to distinguish it from the more complete singularity envisioned by thinkers from John von Neumann, I. J. Good, and Vernor Vinge to Ray Kurzweil).

In this possible future, we aren’t experiencing an ordinary technology cycle. We are experiencing the start of a civilization-level discontinuity. The nature of work changes fundamentally. The question is not which jobs AI will take but which jobs it won’t. Capital’s share of economic output rises dramatically; labor’s share falls. The companies and countries that master this technology first will gain advantages that compound rapidly.

If this scenario is correct, most of the frameworks we use to think about technology adoption are wrong, or at least inadequate. The parallels to previous technology transitions such as electricity, the internet, or mobile are misleading because they suggest gradual diffusion and adaptation. What’s coming will be faster and more disruptive than anything we’ve experienced.

Scenario two: AI is a normal technology. In this scenario, articulated most clearly by Arvind Narayanan and Sayash Kapoor of Princeton, AI is a powerful and important technology but nonetheless subject to all the normal dynamics of adoption, integration, and diminishing returns. Even if we develop true AGI, adoption will still be a slow process. Like previous waves of automation, it will transform some industries, augment many workers, displace some, but most importantly, take decades to fully diffuse through the economy.

In this world, AI faces the same barriers that every enterprise technology faces: integration costs, organizational resistance, regulatory friction, security concerns, training requirements, and the stubborn complexity of real-world workflows. Impressive demos don’t translate smoothly into deployed systems. The ROI is real but incremental. The hype cycle does what hype cycles do: Expectations crash before realistic adoption begins.

If this scenario is correct, the breathless coverage and trillion-dollar valuations are symptoms of a bubble, not harbingers of transformation.

Reading News from the Future

These two scenarios lead to radically different conclusions. If AGI is an economic singularity, then massive infrastructure investment is rational, and companies borrowing hundreds of billions to spend on data centers to be used by companies that haven’t yet found a viable economic model are making prudent bets. If AI is a normal technology, that spending looks like the fiber-optic overbuild of 1999. It’s capital that will largely be written off.

If AGI is an economic singularity, then workers in knowledge professions should be preparing for fundamental career transitions; firms should be thinking how to radically rethink their products, services, and business models; and societies should be planning for disruptions to employment, taxation, and social structure that dwarf anything in living memory.

If AI is normal technology, then workers should be learning to use new tools (as they always have), but the breathless displacement predictions will join the long list of automation anxieties that never quite materialized.

So, which scenario is correct? We don’t know yet, or even if this face-off is the right framing of possible futures, but we do know that a year or two from now, we will tell ourselves that the answer was right there, in plain sight. How could we not have seen it? We weren’t reading the news from the future.

Some news is hard to miss: The change in tone of reporting in the financial markets, and perhaps more importantly, the change in tone from Sam Altman and Dario Amodei. If you follow tech closely, it’s also hard to miss news of real technical breakthroughs, and if you’re involved in the software industry, as we are, it’s hard to miss the real advances in programming tools and practices. There’s also an area that we’re particularly interested in, one which we think tells us a great deal about the future, and that is market structure, so we’re going to start there.

The Market Structure of AI

The economic singularity scenario has been framed as a winner-takes-all race for AGI that creates a massive concentration of power and wealth. The normal technology scenario suggests much more of a rising tide, where the technology platforms become dominant precisely because they create so much value for everyone else. Winners emerge over time rather than with a big bang.

Quite frankly, we have one big signal that we’re watching here: Does OpenAI, Anthropic, or Google first achieve product-market fit? By product-market fit we don’t just mean that users love the product or that one company has dominant market share but that a company has found a viable economic model, where what people are willing to pay for AI-based services is greater than the cost of delivering them.

OpenAI appears to be trying to blitzscale its way to AGI, building out capacity far in excess of the company’s ability to pay for it. This is a massive one-way bet on the economic singularity scenario, which makes ordinary economics irrelevant. Sam Altman has even said that he has no idea what his business will be post-AI or what the economy will look like. So far, investors have been buying it, but doubts are beginning to shape their decisions.

Anthropic is clearly in pursuit of product-market fit, and its success in one target market, software development, is leading the company on a shorter and more plausible path to profitability. Anthropic leaders talk AGI and economic singularity, but they walk the walk of a normal technology believer. The fact that Anthropic is likely to beat OpenAI to an IPO is a very strong normal technology signal. It’s also a good example of what scenario planners view as a robust strategy, good in either scenario.

Google gives us a different take on normal technology: an incumbent looking to balance its existing business model with advances in AI. In Google’s normal technology vision, AI disappears “into the walls” like networks did. Right now, Google is still foregrounding AI with AI overviews and NotebookLM, but it’s in a position to make it recede into the background of its entire suite of products, from Search and Google Cloud to Android and Google Docs. It has too much at stake in the current economy to believe that the route to the future consists in blowing it all up. That being said, Google also has the resources to place big bets on new markets with clear economic potential, like self-driving cars, drug discovery, and even data centers in space. It’s even competing with Nvidia, not just with OpenAI and Anthropic. This is also a robust strategy.

What to watch for: What tech stack are developers and entrepreneurs building on?

Right now, Anthropic’s Claude appears to be winning that race, though that could change quickly. Developers are increasingly not locked into a proprietary stack but are easily switching based on cost or capability differences. Open standards such as MCP are gaining traction.

On the consumer side, Google Gemini is gaining on ChatGPT in terms of daily active users, and investors are starting to question OpenAI’s lack of a plausible business model to support its planned investments.

These developments suggest that the key idea behind the massive investment driving AI boom, that one winner gets all the advantages, just doesn’t hold up.

Capability Trajectories

The economic singularity scenario depends on capabilities continuing to improve rapidly. The normal technology scenario is comfortable with limits rather than hyperscaled discontinuity. There is already so much to digest!

On the economic singularity side of the ledger, positive signs would include a capability jump that surprises even insiders, such as Yann LeCun’s objections being overcome. That is, AI systems demonstrably have world models, can reason about physics and causality, and aren’t just sophisticated pattern matchers. Another game changer would be a robotics breakthrough: embodied AI that can navigate novel physical environments and perform useful manipulation tasks.

Evidence that AI is normal technology include AI systems that are good enough to be useful but not good enough to be trusted, continuing to require human oversight that limits productivity gains; prompt injection and security vulnerabilities remain unsolved, constraining what agents can be trusted to do; domain complexity continues to defeat generalization, and what works in coding doesn’t transfer to medicine, law, science; regulatory and liability barriers prove high enough to slow adoption regardless of capability; and professional guilds successfully protect their territory. These problems may be solved over time, but they don’t just disappear with a new model release.

Regard benchmark performance with skepticism, since benchmarks are even more likely to be gamed when investors are losing enthusiasm than they are now, while everyone is still afraid of missing out.

Reports from practitioners actually deploying AI systems are far more important. Right now, tactical progress is strong. We see software developers in particular making profound changes in development workflows. Watch for whether they are seeing continued improvement or a plateau. Is the gap between demo and production narrowing or persisting? How much human oversight do deployed systems require? Listen carefully to reports from practitioners about what AI can actually do in their domain versus what it’s hyped to do.

We are not persuaded by surveys of corporate attitudes. Having lived through the realities of internet and open source software adoption, we know that, like Hemingway’s marvelous metaphor of bankruptcy, corporate adoption happens gradually, then suddenly, with late adopters often full of regret.

If AI is achieving general intelligence, though, we should see it succeed across multiple domains, not just the ones where it has obvious advantages. Coding has been the breakout application, but coding is in some ways the ideal domain for current AI. It’s characterized by well-defined problems, immediate feedback loops, formally defined languages, and massive training data. The real test is whether AI can break through in domains that are harder and farther away from the expertise of the people developing the AI models.

What to watch for: Real-world constraints start to bite. For example, what if there is not enough power to train or run the next generation of models at the scale company ambitions require? What if capital for the AI build-out dries up?

Our bet is that various real-world constraints will become more clearly recognized as limits to the adoption of AI, despite continued technical advances.

Bubble or Bust?

It’s hard not to notice how the narrative in the financial press has shifted in the past few months, from mindless acceptance of industry narratives to a growing consensus that we are in the throes of a massive investment bubble, with the chief question on everyone’s mind seeming to be when and how it will pop.

The current moment does bear uncomfortable similarities to previous technology bubbles. Famed short investor Michael Burry is comparing Nvidia to Cisco and warning of a worse crash than the dot-com bust of 2000. The circular nature of AI investment—in which Nvidia invests in OpenAI, which buys Nvidia chips; Microsoft invests in OpenAI, which pays Microsoft for Azure; and OpenAI commits to massive data center build-outs with little evidence that it will ever have enough profit to justify those commitments—has reached levels that would be comical if the numbers weren’t so large.

But there’s a counterargument: Every transformative infrastructure build-out begins with a bubble. The railroads of the 1840s, the electrical grid of the 1900s, the fiber-optic networks of the 1990s all involved speculative excess, but all left behind infrastructure that powered decades of subsequent growth. One question is whether AI infrastructure is like the dot-com bubble (which left behind useful fiber and data centers) or the housing bubble (which left behind empty subdivisions and a financial crisis).

The real question when faced with a bubble is What will be the source of value in what is left? It most likely won’t be in the AI chips, which have a short useful life. It may not even be in the data centers themselves. It may be in a new approach to programming that unlocks entirely new classes of applications. But one pretty good bet is that there will be enduring value in the energy infrastructure build-out. Given the Trump administration’s war on renewable energy, the market demand for energy in the AI build-out may be its saving grace. A future of abundant, cheap energy rather than the current fight for access that drives up prices for consumers could be a very nice outcome.

Signs pointing toward economic singularity: Widespread job losses across multiple industries and spiking business bankruptcy rate; storied companies are wiped out by major new applications that just couldn’t exist without AI; sustained high utilization of AI infrastructure (data centers, GPU clusters) over multiple years; actual demand meets or exceeds capacity; continued spiking of energy prices, especially in areas with many data centers.

Signs pointing toward bubble: Continued reliance on circular financing structures (vendor financing, equity swaps between AI companies); enterprise AI projects stall in the pilot phase, failing to scale; a “show me the money” moment arrives, where investors demand profitability and AI companies can’t deliver.

Signs pointing towards normal technology recovery postbubble: Strong revenue growth at AI application companies, not just infrastructure providers; enterprises report concrete, measurable ROI from AI deployments.

What to watch: There are so many possibilities that this is an act of imagination! Start with Wile E. Coyote running over a cliff in pursuit of Road Runner in the classic Warner Bros. cartoons. Imagine the moment when investors realize that they are trying to defy gravity.

Going over a cliff — Image generated with Gemini and Nano Banana Pro

What made them notice? Was it the failure of a much-hyped data center project? Was it that it couldn’t get financing, that it couldn’t get completed because of regulatory constraints, that it couldn’t get enough chips, that it couldn’t get enough power, that it couldn’t get enough customers?

Imagine one or more storied AI lab or startup unable to complete its next fundraise. Imagine Oracle or SoftBank trying to get out of a big capital commitment. Imagine Nvidia announcing a revenue miss. Imagine another DeepSeek moment coming out of China.

Our bet for the most likely prick to pop the bubble is that Anthropic and Google’s success against OpenAI persuades investors that OpenAI will not be able to pay for the massive amount of data center capacity it has contracted for. Given the company’s centrality to the AGI singularity narrative, a failure of belief in OpenAI could bring down the whole web of interconnected data center bets, many of them financed by debt. But that’s not the only possibility.

Always Update Your Priors

DeepSeek’s emergence in January was a signal that the American AI establishment may not have the commanding lead it assumed. Rather than racing for AGI, China seems to be heavily betting on normal technology, building towards low-cost, efficient AI, industrial capacity, and clear markets. While claims about what DeepSeek spent on training its V3 model have been contested, training isn’t the only cost: There’s also the cost of inference and, for increasingly popular reasoning models, the cost of reasoning. And when these are taken into account, DeepSeek is very much a leader.

If DeepSeek and other Chinese AI labs are right, the US may be intent on winning the wrong race. What’s more, our conversations with Chinese AI investors reveals a much heavier tilt towards embodied AI (robotics and all its cousins) than towards consumer or even enterprise applications. Given the geopolitical tensions between China and the US, it’s worth asking what kind of advantage a GPT-9 with limited access to the real world might provide against an army of drones and robots powered by the equivalent of GPT-8!

The point is that the discussion above is meant to be provocative, not exhaustive. Expand your horizons. Think about how US and international politics, advances in other technologies, and financial market impacts ranging from a massive market collapse to a simple change in investor priorities might change industry dynamics.

What you’re watching for is not any single data point but the pattern across multiple vectors over time. Remember that the AGI versus normal technology framing is not the only or maybe even the most useful way to look at the future.

The most likely outcome, even restricted to these two hypothetical scenarios, is something in between. AI may achieve something like AGI for coding, text, and video while remaining a normal technology for embodied tasks and complex reasoning. It may transform some industries rapidly while others resist for decades. The world is rarely as neat as any scenario.

But that’s precisely why the “news from the future” approach matters. Rather than committing to a single prediction, you stay alert to the signals, ready to update your thinking as evidence accumulates. You don’t need to know which scenario is correct today. You need to recognize which scenario is becoming correct as it happens.

AI in 2026 and Beyond infographic — Infographic created with Gemini and Nano Banana Pro

What If? Robust Strategies in the Face of Uncertainty

The second part of scenario planning is to identify robust strategies that will help you do well regardless of which possible future unfolds. In this final section, as a way of making clear what we mean by that, we’ll consider 10 “What if?” questions and ask what the robust strategies might be.

1. What if the AI bubble bursts in 2026?

The vector: We are seeing massive funding rounds for AI foundries and massive capital expenditure on GPUs and data centers without a corresponding explosion in revenue for the application layer.

The scenario: The “revenue gap” becomes undeniable. Wall Street loses patience. Valuations for foundational model companies collapse and the river of cheap venture capital dries up.

In this scenario, we would see responses like OpenAI’s “Code Red” reaction to improvements in competing products. We would see declines in prices for stocks that aren’t yet traded publicly. And we might see signs that the massive fundraising for data centers and power are performative, not backed by real capital. In the words of one commenter, they are “bragawatts.”

A robust strategy: Don’t build a business model that relies on subsidized intelligence. If your margins only work because VC money is paying for 40% of your inference costs, you are vulnerable. Focus on unit economics. Build products where the AI adds value that customers are willing to pay for now, not in a theoretical future where AI does everything. If the bubble bursts, infrastructure will remain, just as the dark fiber did, becoming cheaper for the survivors to use.

2. What if energy becomes the hard limit?

The vector: Data centers are already stressing grids. We are seeing a shift from the AI equivalent of Moore’s law to a world where progress may be limited by energy constraints.

The scenario: In 2026, we hit a wall. Utilities simply cannot provision power fast enough. Inference becomes a scarce resource, available only to the highest bidders or those with private nuclear reactors. Highly touted data center projects are put on hold because there isn’t enough power to run them, and rapidly depreciating GPUs are put in storage because there aren’t enough data centers to deploy them.

A robust strategy: Efficiency is your hedge. Stop treating compute as infinite. Invest in small language models (SLMs) and edge AI that run locally. If you can run 80% of your workload on a laptop-grade chip rather than an H100 in the cloud, you are at least partially insulated from the energy crunch.

3. What if inference becomes a commodity?

The vector: Chinese labs continue to release open weight models with performance comparable to each previous generation of top-of-the line US frontier models but at a fraction of the training and inference cost. What’s more, they are training them with lower-cost chips. And it appears to be working.

The scenario: The price of “intelligence” collapses to near zero. The moat of having the biggest model and the best cutting-edge chips for training evaporates.

A robust strategy: Move up the stack. If the model is a commodity, the value is in the integration, the data, and the workflow. Build applications and services using the unique data, context, and workflows that no one else has.

4. What if Yann LeCun is right?

The vector: LeCun has long argued that auto-regressive LLMs are an “off-ramp” on the highway to AGI because they can’t reason or plan; they only predict the next token. He bets on world models (JEPA). OpenAI cofounder Ilya Sutskever has also argued that the AI industry needs fundamental research to solve basic problems like the ability to generalize.

The scenario: In 2026, LLMs hit a plateau. The market realizes we’ve spent billions on a dead end technology for true AGI.

A robust strategy: Diversify your architecture. Don’t bet the farm on today’s AI. Focus on compound AI systems that use LLMs as just one component, while relying on deterministic code, databases, and small, specialized models for additional capabilities. Keep your eyes and your options open.

5. What if there is a major security incident?

The vector: We are currently hooking insecure LLMs up to banking APIs, email, and purchasing agents. Security researchers have been screaming about indirect prompt injection for years.

The scenario: A worm spreads through email auto-replies, tricking AI agents into transferring funds or approving fraudulent invoices at scale. Trust in agentic AI collapses.

A robust strategy: “Trust but verify” is dead; use “verify then trust.” Implement well-known security practices like least privilege (restrict your agents to the minimal list of resources they need) and zero trust (require authentication before every action). Stay on top of OWASP’s lists of AI vulnerabilities and mitigations. Keep a “human in the loop” for high-stakes actions. Advocate for and adopt standard AI disclosure and audit trails. If you can’t trace why your agent did something, you shouldn’t let it handle money.

6. What if China is actually ahead?

The vector: While the US focuses on raw scale and chip export bans, China is focusing on efficiency and embedded AI in manufacturing, EVs, and consumer hardware.

The scenario: We discover that 2026’s “iPhone moment” comes from Shenzhen, not Cupertino, because Chinese companies integrated AI into hardware better while we were fighting over chatbot and agentic AI dominance.

A robust strategy: Look globally. Don’t let geopolitical narratives blind you to technical innovation. If the best open source models or efficiency techniques are coming from China, study them. Open source has always been the best way to bridge geopolitical divides. Keep your stack compatible with the global ecosystem, not just the US silo.

7. What if robotics has its “ChatGPT moment”?

The vector: End-to-end learning for robots is advancing rapidly.

The scenario: Suddenly, physical labor automation becomes as possible as digital automation.

A robust strategy: If you are in a “bits” business, ask how you can bridge to “atoms.” Can your software control a machine? How might you embody useful intelligence into your products?

8. What if vibe coding is just the start?

The vector: Anthropic and Cursor are changing programming from writing syntax to managing logic and workflow. Vibe coding lets nonprogrammers build apps by just describing what they want.

The scenario: The barrier to entry for software creation drops to zero. We see a Cambrian explosion of apps built for a single meeting or a single family vacation. Alex Komoroske calls it disposable software: “Less like canned vegetables and more like a personal farmer’s market.”

A robust strategy: In a world where AI is good enough to generate whatever code we ask for, value shifts to knowing what to ask for. Coding is much like writing: Anyone can do it, but some people have more to say than others. Programming isn’t just about writing code; it’s about understanding problems, contexts, organizations, and even organizational politics to come up with a solution. Create systems and tools that embody unique knowledge and context that others can use to solve their own problems.

9. What if AI kills the aggregator business model?

The vector: Amazon and Google make money by being the tollbooth between you and the product or information you want. If people get answers from AI, or an AI agent buys for you, it bypasses the ads and the sponsored listings, undermining the business model of internet incumbents.

The scenario: Search traffic (and ad revenue) plummets. Brands lose their ability to influence consumers via display ads. AI has destroyed the source of internet monetization and hasn’t yet figured out what will take its place.

A robust strategy: Own the customer relationship directly. If Google stops sending you traffic, you need an MCP, an API, or a channel for direct brand loyalty that an AI agent respects. Make sure your information is accessible to bots, not just humans. Optimize for agent readability and reuse.

10. What if a political backlash arrives?

The vector: The divide between the AI rich and those who fear being replaced by AI is growing.

The scenario: A populist movement targets Big Tech and AI automation. We see taxes on compute, robot taxes, or strict liability laws for AI errors.

A robust strategy: Focus on value creation, not value capture. If your AI strategy is “fire 50% of the support staff,” you are not only making a shortsighted business decision; you are painting a target on your back. If your strategy is “supercharge our staff to do things we couldn’t do before,” you are building a defensible future. Align your success with the success of both your workers and customers.

In Conclusion

The future isn’t something that happens to us; it’s something we create. The most robust strategy of all is to stop asking “What will happen?” and start asking “What future do we want to build?”

As Alan Kay once said, “The best way to predict the future is to invent it.” Don’t wait for the AI future to happen to you. Do what you can to shape it. Build the future you want to live in.

TechCrunch
Pat Gelsinger wants to save Moore’s Law, with a little help from the Feds 6 December 2025 at 19:18

Pat Gelsinger wants to save Moore’s Law, with a little help from the Feds

TechCrunch

By: Connie Loizos

6 December 2025 at 19:18

The company is aiming to produce its first silicon wafers by 2028 and have its first commercial system online by 2029.

Crypto News
2025 was the year of tokenization | Opinion 6 December 2025 at 07:17

2025 was the year of tokenization | Opinion

Crypto News

By: Selva Ozelli

6 December 2025 at 07:17

Slowly and surely, the tokenization of global financial markets is happening, and the regulatory and taxation frameworks are developing alongside.

Crypto News
Compliance doesn’t make crypto risk-free | Opinion 4 December 2025 at 12:08

Compliance doesn’t make crypto risk-free | Opinion

Crypto News

By: Guest Post

4 December 2025 at 12:08

Compliance brings traditional market rules to crypto, but it doesn’t make the compliant project invulnerable or risk-free.

Oreilly
Software in the Age of AI 4 December 2025 at 07:19

Software in the Age of AI

Oreilly

By: Louise Corrigan

4 December 2025 at 07:19

In 2025 AI reshaped how teams think, build, and deliver software. We’re now at a point where “AI coding assistants have quickly moved from novelty to necessity [with] up to 90% of software engineers us[ing] some kind of AI for coding,” Addy Osmani writes. That’s a very different world to the one we were in 12 months ago. As we look ahead to 2026, here are three key trends we have seen driving change and how we think developers and architects can prepare for what’s ahead.

Evolving Coding Workflows

New AI tools changed coding workflows in 2025, enabling developers to write and work with code faster than ever before. This doesn’t mean AI is replacing developers. It’s opening up new frontiers to be explored and skills to be mastered, something we explored at our first AI Codecon in May.

AI tools in the IDE and on the command line have revived the debate about the IDE’s future, echoing past arguments (e.g., VS Code versus Vim). It’s more useful to focus on the tools’ purpose. As Kent Beck and Tim O’Reilly discussed in November, developers are ultimately responsible for the code their chosen AI tool produces. We know that LLMs “actively reward existing top tier software engineering practices” and “amplify existing expertise,” as Simon Willison has pointed out. And a good coder will “factor in” questions that AI doesn’t. Does it really matter which tool is used?

The critical transferable skill for working with any of these tools is understanding how to communicate effectively with the underlying model. AI tools generate better code if they’re given all the relevant background on a project. Managing what the AI knows about your project (context engineering) and communicating it (prompt engineering) are going to be key to doing good work.

The core skills for working effectively with code won’t change in the face of AI. Understanding code review, design patterns, debugging, testing, and documentation and applying those to the work you do with AI tools will be the differential.

The Rise of Agentic AI

With the rise of agents and Model Context Protocol (MCP) in the second half of 2025, developers gained the ability to use AI not just as a pair programmer but as an entire team of developers. The speakers at our Coding for the Agentic World live AI Codecon event in September 2025 explored new tools, workflows, and hacks that are shaping this emerging discipline of agentic AI.

Software engineers aren’t just working with single coding agents. They’re building and deploying their own custom agents, often within complex setups involving multi-agent scenarios, teams of coding agents, and agent swarms. This shift from conducting AI to orchestrating AI elevates the importance of truly understanding how good software is built and maintained.

We know that AI generates better code with context, and this is also true of agents. As with coding workflows, this means understanding context engineering is essential. However, the differential for senior engineers in 2026 will be how well they apply intermediate skills such as product thinking, advanced testing, system design, and architecture to their work with agentic systems.

AI and Software Architecture

We began 2025 with our January Superstream, Software Architecture in the Age of AI, where speaker Rebecca Parsons explored the architectural implications of AI, dryly noting that “given the pace of change, this could be out of date by Friday.” By the time of our Superstream in August, things had solidified a little more and our speakers were able to share AI-based patterns and antipatterns and explain how they intersect with software architecture. Our December 9 event will look at enterprise architecture and how architects can navigate the impact of AI on systems, processes, and governance. (Registration is still open—save your seat.) As these events show, AI has progressed from being something architects might have to consider to something that is now essential to their work.

We’re seeing successful AI-enhanced architectures using event-driven models, enabling AI agents to act on incoming triggers rather than fixed prompts. This means it’s more important than ever to understand event-driven architecture concepts and trade-offs. In 2026, topics that align with evolving architectures (evolutionary architectures, fitness functions) will also become more important as architects look to find ways to modernize existing systems for AI without derailing them. AI-native architectures will also bring new considerations and patterns for system design next year, as will the trend toward agentic AI.

As was the case for their engineer coworkers, architects still have to know the basics: when to add an agent or a microservice, how to consider cost, how to define boundaries, and how to act on the knowledge they already have. As Thomas Betts, Sarah Wells, Eran Stiller, and Daniel Bryant note on InfoQ, they also “nee[d] to understand how an AI element relates to other parts of their system: What are the inputs and outputs? How can they measure performance, scalability, cost, and other cross-functional requirements?”

Companies will continue to decentralize responsibilities across different functions this year, and AI brings new sets of trade-offs to be considered. It’s true that regulated industries remain understandably wary of granting access to their systems. They’re rolling out AI more carefully with greater guardrails and governance, but they are still rolling it out. So there’s never been a better time to understand the foundations of software architecture. It will prepare you for the complexity on the horizon.

Strong Foundations Matter

AI has changed the way software is built, but it hasn’t changed what makes good software. As we enter 2026, the most important developer and architecture skills won’t be defined by the tool you know. They’ll be defined by how effectively you apply judgment, communicate intent, and handle complexity when working with (and sometimes against) intelligent assistants and agents. AI rewards strong engineering; it doesn’t replace it. It’s an exciting time to be involved.

Join us at the Software Architecture Superstream on December 9 to learn how to better navigate the impact of AI on systems, processes, and governance. Over four hours, host Neal Ford and our lineup of experts including Metro Bank’s Anjali Jain and Philip O’Shaughnessy, Vercel’s Dom Sipowicz, Intel’s Brian Rogers, Microsoft’s Ron Abellera, and Equal Experts’ Lewis Crawford will share their hard-won insights about building adaptive, AI-ready architectures that support continuous innovation, ensure governance and security, and align seamlessly with business goals.

O’Reilly members can register here. Not a member? Sign up for a 10-day free trial before the event to attend—and explore all the other resources on O’Reilly.

Bitcoinist
Dogecoin Developer Creates New Way To Use DOGE With Banking IBAN – Here’s How 3 December 2025 at 23:00

Dogecoin Developer Creates New Way To Use DOGE With Banking IBAN – Here’s How

Bitcoinist

By: Scott Matherson

3 December 2025 at 23:00

Paulo Vidal, a Dogecoin Foundation developer, has created a new protocol that transforms DOGE addresses into International Bank Account Numbers (IBANs). This development could make it easier to link Dogecoin with conventional financial systems, offering a new level of usability for both crypto enthusiasts and mainstream players. While the protocol is still in its early stages, Vidal has shared updates on its developments and insights into its core features.

Dogecoin Dev Introduces Banking IBAN For DOGE

Dogecoin could be taking a step closer to mainstream financial integration as Vidal unveils an innovative protocol that allows addresses tied to the meme coin to function like bank-validated IBANs. Announced on X this week, the Dogecoin developer explained that his effort to simplify Dogecoin addresses has evolved into a D-IBAN system fully compliant with ISO 13616-1:2020 Standard.

Vidal has explained that the D-IBAN protocol allows Dogecoin addresses to be formatted in a way that banking systems can easily validate, effectively bridging the gap between cryptocurrency and traditional finance. He explained that the system supports multiple address types, including P2PKH, P2SH, P2WPKH, and time-locked addresses, automatically detecting the type from the address prefix. Additionally, it automatically detects the address type and uses the same MOD-97-10 Checksum algorithm used by banks worldwide.

The Dogecoin developer notes that the D-IBAN encoding is fully reversible, allowing users to convert back and forth without losing any data. The protocol also formats the IBAN into standard four-character groups for readability, making DOGE addresses more user-friendly and appearing bank-compliant.

Beyond the core D-IBAN functionality, Vidal has also introduced playful and practical extensions of the system. The DogeMoji protocol converts addresses into memorable, visually appealing emoji sequences—ideal for social media or QR codes.

The second DogeWords protocol encodes addresses into short, positive word sequences that are easy to read and remember, while maintaining complete reversibility and ensuring accuracy through validation. Both D-IBAN features are designed to make Dogecoin easier to share and interact with in creative ways.

Community Reacts To D-IBAN Invention

Members of the crypto community who read about Vidal’s new D-IBAN protocol responded with a mix of enthusiasm, curiosity, and caution. Crypto analyst Astro noted that sending fiat to a crypto address via IBAN would require compliance with Anti-Money Laundering (AML) rules, KYC verification, and potentially obtaining a Virtual Asset Service Provider (VASP) license.

Astro warned that integration with traditional banks could undermine the decentralized narrative of blockchain technology, contending that banks and crypto have inherently conflicting interests. A community member also highlighted that creating a mathematically valid IBAN from a Dogecoin address does not guarantee that banks will process actual transactions. He stated that only IBANs issued by authorized institutions are recognized for fund transfers.

Vidal addressed these concerns by emphasizing that the D-IBAN protocol is intended to provide optional banking integration rather than enforce it. He argued that banks could handle Dogecoin in a familiar format while users retain full control of their wallets, preserving self-custody and upholding the core principles of decentralization.

Oreilly
AI Agents Need Guardrails 3 December 2025 at 07:13

AI Agents Need Guardrails

Oreilly

By: Pragya Keshap

3 December 2025 at 07:13

When AI systems were just a single model behind an API, life felt simpler. You trained, deployed, and maybe fine-tuned a few hyperparameters.

But that world’s gone. Today, AI feels less like a single engine and more like a busy city—a network of small, specialized agents constantly talking to each other, calling APIs, automating workflows, and making decisions faster than humans can even follow.

And here’s the real challenge: The smarter and more independent these agents get, the harder it becomes to stay in control. Performance isn’t what slows us down anymore. Governance is.

How do we make sure these agents act ethically, safely, and within policy? How do we log what happened when multiple agents collaborate? How do we trace who decided what in an AI-driven workflow that touches user data, APIs, and financial transactions?

That’s where the idea of engineering governance into the stack comes in. Instead of treating governance as paperwork at the end of a project, we can build it into the architecture itself.

From Model Pipelines to Agent Ecosystems

In the old days of machine learning, things were pretty linear. You had a clear pipeline: collect data, train the model, validate it, deploy, monitor. Each stage had its tools and dashboards, and everyone knew where to look when something broke.

But with AI agents, that neat pipeline turns into a web. A single customer-service agent might call a summarization agent, which then asks a retrieval agent for context, which in turn queries an internal API—all happening asynchronously, sometimes across different systems.

It’s less like a pipeline now and more like a network of tiny brains, all thinking and talking at once. And that changes how we debug, audit, and govern. When an agent accidentally sends confidential data to the wrong API, you can’t just check one log file anymore. You need to trace the whole story: which agent called which, what data moved where, and why each decision was made. In other words, you need full lineage, context, and intent tracing across the entire ecosystem.

Why Governance Is the Missing Layer

Governance in AI isn’t new. We already have frameworks like NIST’s AI Risk Management Framework (AI RMF) and the EU AI Act defining principles like transparency, fairness, and accountability. The problem is these frameworks often stay at the policy level, while engineers work at the pipeline level. The two worlds rarely meet. In practice, that means teams might comply on paper but have no real mechanism for enforcement inside their systems.

What we really need is a bridge—a way to turn those high-level principles into something that runs alongside the code, testing and verifying behavior in real time. Governance shouldn’t be another checklist or approval form; it should be a runtime layer that sits next to your AI agents—ensuring every action follows approved paths, every dataset stays where it belongs, and every decision can be traced when something goes wrong.

The Four Guardrails of Agent Governance

Policy as code

Policies shouldn’t live in forgotten PDFs or static policy docs. They should live next to your code. By using tools like the Open Policy Agent (OPA), you can turn rules into version-controlled code that’s reviewable, testable, and enforceable. Think of it like writing infrastructure as code, but for ethics and compliance. You can define rules such as:

Which agents can access sensitive datasets
Which API calls require human review
When a workflow needs to stop because the risk feels too high

This way, developers and compliance folks stop talking past each other—they work in the same repo, speaking the same language.

And the best part? You can spin up a Dockerized OPA instance right next to your AI agents inside your Kubernetes cluster. It just sits there quietly, watching requests, checking rules, and blocking anything risky before it hits your APIs or data stores.

Governance stops being some scary afterthought. It becomes just another microservice. Scalable. Observable. Testable. Like everything else that matters.

Observability and auditability

Agents need to be observable not just in performance terms (latency, errors) but in decision terms. When an agent chain executes, we should be able to answer:

Who initiated the action?
What tools were used?
What data was accessed?
What output was generated?

Modern observability stacks—Cloud Logging, OpenTelemetry, Prometheus, or Grafana Loki—can already capture structured logs and traces. What’s missing is semantic context: linking actions to intent and policy.

Imagine extending your logs to capture not only “API called” but also “Agent FinanceBot requested API X under policy Y with risk score 0.7.” That’s the kind of metadata that turns telemetry into governance.

When your system runs in Kubernetes, sidecar containers can automatically inject this metadata into every request, creating a governance trace as natural as network telemetry.

Dynamic risk scoring

Governance shouldn’t mean blocking everything; it should mean evaluating risk intelligently. In an agent network, different actions have different implications. A “summarize report” request is low risk. A “transfer funds” or “delete records” request is high risk.

By assigning dynamic risk scores to actions, you can decide in real time whether to:

Allow it automatically
Require additional verification
Escalate to a human reviewer

You can compute risk scores using metadata such as agent role, data sensitivity, and confidence level. Cloud providers like Google Cloud Vertex AI Model Monitoring already support risk tagging and drift detection—you can extend those ideas to agent actions.

The point isn’t to slow agents down but to make their behavior context-aware.

Regulatory mapping

Frameworks like NIST AI RMF and the EU AI Act are often seen as legal mandates.
In reality, they can double as engineering blueprints.

Governance principle	Engineering implementation
Transparency	Agent activity logs, explainability metadata
Accountability	Immutable audit trails in Cloud Logging/Chronicle
Robustness	Canary testing, rollout control in Kubernetes
Risk management	Real-time scoring, human-in-the-loop review

Mapping these requirements into cloud and container tools turns compliance into configuration.

Once you start thinking of governance as a runtime layer, the next step is to design what that actually looks like in production.

Building a Governed AI Stack

Let’s visualize a practical, cloud native setup—something you could deploy tomorrow.

[Agent Layer]
↓
[Governance Layer]
→ Policy Engine (OPA)
→ Risk Scoring Service
→ Audit Logger (Pub/Sub + Cloud Logging)
↓
[Tool / API Layer]
→ Internal APIs, Databases, External Services
↓
[Monitoring + Dashboard Layer]
→ Grafana, BigQuery, Looker, Chronicle

All of these can run on Kubernetes with Docker containers for modularity. The governance layer acts as a smart proxy—it intercepts agent calls, evaluates policy and risk, then logs and forwards the request if approved.

In practice:

Each agent’s container registers itself with the governance service.
Policies live in Git, deployed as ConfigMaps or sidecar containers.
Logs flow into Cloud Logging or Elastic Stack for searchable audit trails.
A Chronicle or BigQuery dashboard visualizes high-risk agent activity.

This separation of concerns keeps things clean: Developers focus on agent logic, security teams manage policy rules, and compliance officers monitor dashboards instead of sifting through raw logs. It’s governance you can actually operate—not bureaucracy you try to remember later.

Lessons from the Field

When I started integrating governance layers into multi-agent pipelines, I learned three things quickly:

It’s not about more controls—it’s about smarter controls.
When all operations have to be manually approved, you will paralyze your agents. Focus on automating the 90% that’s low risk.
Logging everything isn’t enough.
Governance requires interpretable logs. You need correlation IDs, metadata, and summaries that map events back to business rules.
Governance has to be part of the developer experience.
If compliance feels like a gatekeeper, developers will route around it. If it feels like a built-in service, they’ll use it willingly.

In one real-world deployment for a financial-tech environment, we used a Kubernetes admission controller to enforce policy before pods could interact with sensitive APIs. Each request was tagged with a “risk context” label that traveled through the observability stack. The result? Governance without friction. Developers barely noticed it—until the compliance audit, when everything just worked.

Human in the Loop, by Design

Despite all the automation, people should also be involved in making some decisions. A healthy governance stack knows when to ask for help. Imagine a risk-scoring service that occasionally flags “Agent Alpha has exceeded transaction threshold three times today.” As an alternative to blocking, it may forward the request to a human operator via Slack or an internal dashboard. That is not a weakness but a good indication of maturity when an automated system requires a person to review it. Reliable AI does not imply eliminating people; it means knowing when to bring them back in.

Avoiding Governance Theater

Every company wants to say they have AI governance. But there’s a difference between governance theater—policies written but never enforced—and governance engineering—policies turned into running code.

Governance theater produces binders. Governance engineering produces metrics:

Percentage of agent actions logged
Number of policy violations caught pre-execution
Average human review time for high-risk actions

When you can measure governance, you can improve it. That’s how you move from pretending to protect systems to proving that you do. The future of AI isn’t just about building smarter models; it’s about building smarter guardrails. Governance isn’t bureaucracy—it’s infrastructure for trust. And just as we’ve made automated testing part of every CI/CD pipeline, we’ll soon treat governance checks the same way: built in, versioned, and continuously improved.

True progress in AI doesn’t come from slowing down. It comes from giving it direction, so innovation moves fast but never loses sight of what’s right.

Defence Blog
U.S. Army seeks new autonomous missile launcher 2 December 2025 at 08:15

U.S. Army seeks new autonomous missile launcher

Defence Blog

By: Colton Jones

2 December 2025 at 08:15

The United States Army has released a set of new Requests for Information aimed at advancing the Common Autonomous Multi-Domain Launcher, or CAML, a next-generation autonomous fires platform intended to reshape how the service moves, loads, and employs missile systems across future battlefields. The notices, issued through the PAE Fires CAML Product Office, outline three […]

Oreilly
What MCP and Claude Skills Teach Us About Open Source for AI 3 December 2025 at 03:58

What MCP and Claude Skills Teach Us About Open Source for AI

Oreilly

By: Tim O’Reilly

3 December 2025 at 03:58

The debate about open source AI has largely featured open weight models. But that’s a bit like arguing that in the PC era, the most important goal would have been to have Intel open source its chip designs. That might have been useful to some people, but it wouldn’t have created Linux, Apache, or the collaborative software ecosystem that powers the modern internet. What makes open source transformative is the ease with which people can learn from what others have done, modify it to meet their own needs, and share those modifications with others. And that can’t just happen at the lowest, most complex level of a system. And it doesn’t come easily when what you are providing is access to a system that takes enormous resources to modify, use, and redistribute. It comes from what I’ve called the architecture of participation.

This architecture of participation has a few key properties:

Legibility: You can understand what a component does without understanding the whole system.
Modifiability: You can change one piece without rewriting everything.
Composability: Pieces work together through simple, well-defined interfaces.
Shareability: Your small contribution can be useful to others without them adopting your entire stack.

The most successful open source projects are built from small pieces that work together. Unix gave us a small operating system kernel surrounded by a library of useful functions, together with command-line utilities that could be chained together with pipes and combined into simple programs using the shell. Linux followed and extended that pattern. The web gave us HTML pages you could “view source” on, letting anyone see exactly how a feature was implemented and adapt it to their needs, and HTTP connected every website as a linkable component of a larger whole. Apache didn’t beat Netscape and Microsoft in the web server market by adding more and more features, but instead provided an extension layer so a community of independent developers could add frameworks like Grails, Kafka, and Spark.

MCP and Skills Are “View Source” for AI

MCP and Claude Skills remind me of those early days of Unix/Linux and the web. MCP lets you write small servers that give AI systems new capabilities such as access to your database, your development tools, your internal APIs, or third-party services like GitHub, GitLab, or Stripe. A skill is even more atomic: a set of plain language instructions, often with some tools and resources, that teaches Claude how to do something specific. Matt Bell from Anthropic remarked in comments on a draft of this piece that a skill can be defined as “the bundle of expertise to do a task, and is typically a combination of instructions, code, knowledge, and reference materials.” Perfect.

What is striking about both is their ease of contribution. You write something that looks like the shell scripts and web APIs developers have been writing for decades. If you can write a Python function or format a Markdown file, you can participate.

This is the same quality that made the early web explode. When someone created a clever navigation menu or form validation, you could view source, copy their HTML and JavaScript, and adapt it to your site. You learned by doing, by remixing, by seeing patterns repeated across sites you admired. You didn’t have to be an Apache contributor to get the benefit of learning from others and reusing their work.

Anthropic’s MCP Registry and third-party directories like punkpeye/awesome-mcp-servers show early signs of this same dynamic. Someone writes an MCP server for Postgres, and suddenly dozens of AI applications gain database capabilities. Someone creates a skill for analyzing spreadsheets in a particular way, and others fork it, modify it, and share their versions. Anthropic still seems to be feeling its way with user contributed skills, listing in its skills gallery only those they and select partners have created, but they document how to create them, making it possible for anyone to build a reusable tool based on their specific needs, knowledge, or insights. So users are developing skills that make Claude more capable and sharing them via GitHub. It will be very exciting to see how this develops. Groups of developers with shared interests creating and sharing collections of interrelated skills and MCP servers that give models deep expertise in a particular domain will be a potent frontier for both AI and open source.

GPTs Versus Skills: Two Models of Extension

It’s worth contrasting the MCP and skills approach with OpenAI’s custom GPTs, which represent a different vision of how to extend AI capabilities.

GPTs are closer to apps. You create one by having a conversation with ChatGPT, giving it instructions and uploading files. The result is a packaged experience. You can use a GPT or share it for others to use, but they can’t easily see how it works, fork it, or remix pieces of it into their own projects. GPTs live in OpenAI’s store, discoverable and usable but ultimately contained within the OpenAI ecosystem.

This is a valid approach, and for many use cases, it may be the right one. It’s user-friendly. If you want to create a specialized assistant for your team or customers, GPTs make that straightforward.

But GPTs aren’t participatory in the open source sense. You can’t “view source” on someone’s GPT to understand how they got it to work well. You can’t take the prompt engineering from one GPT and combine it with the file handling from another. You can’t easily version control GPTs, diff them, or collaborate on them the way developers do with code. (OpenAI offers team plans that do allow collaboration by a small group using the same workspace, but this is a far cry from open source–style collaboration.)

Skills and MCP servers, by contrast, are files and code. A skill is literally just a Markdown document you can read, edit, fork, and share. An MCP server is a GitHub repository you can clone, modify, and learn from. They’re artifacts that exist independently of any particular AI system or company.

This difference matters. The GPT Store is an app store, and however rich it becomes, an app store remains a walled garden. The iOS App Store and Google Play store host millions of apps for phones, but you can’t view source on an app, can’t extract the UI pattern you liked, and can’t fork it to fix a bug the developer won’t address. The open source revolution comes from artifacts you can inspect, modify, and share: source code, markup languages, configuration files, scripts. These are all things that are legible not just to computers but to humans who want to learn and build.

That’s the lineage skills and MCP belong to. They’re not apps; they’re components. They’re not products; they’re materials. The difference is architectural, and it shapes what kind of ecosystem can grow around them.

Nothing prevents OpenAI from making GPTs more inspectable and forkable, and nothing prevents skills or MCP from becoming more opaque and packaged. The tools are young. But the initial design choices reveal different instincts about what kind of participation matters. OpenAI seems deeply rooted in the proprietary platform model. Anthropic seems to be reaching for something more open.¹

Complexity and Evolution

Of course, the web didn’t stay simple. HTML begat CSS, which begat JavaScript frameworks. View source becomes less useful when a page is generated by megabytes of minified React.

But the participatory architecture remained. The ecosystem became more complex, but it did so in layers, and you can still participate at whatever layer matches your needs and abilities. You can write vanilla HTML, or use Tailwind, or build a complex Next.js app. There are different layers for different needs, but all are composable, all shareable.

I suspect we’ll see a similar evolution with MCP and skills. Right now, they’re beautifully simple. They’re almost naive in their directness. That won’t last. We’ll see:

Abstraction layers: Higher-level frameworks that make common patterns easier.
Composition patterns: Skills that combine other skills, MCP servers that orchestrate other servers.
Optimization: When response time matters, you might need more sophisticated implementations.
Security and safety layers: As these tools handle sensitive data and actions, we’ll need better isolation and permission models.

The question is whether this evolution will preserve the architecture of participation or whether it will collapse into something that only specialists can work with. Given that Claude itself is very good at helping users write and modify skills, I suspect that we are about to experience an entirely new frontier of learning from open source, one that will keep skill creation open to all even as the range of possibilities expands.

What Does This Mean for Open Source AI?

Open weights are necessary but not sufficient. Yes, we need models whose parameters aren’t locked behind APIs. But model weights are like processor instructions. They are important but not where the most innovation will happen.

The real action is at the interface layer. MCP and skills open up new possibilities because they create a stable, comprehensible interface between AI capabilities and specific uses. This is where most developers will actually participate. Not only that, it’s where people who are not now developers will participate, as AI further democratizes programming. At bottom, programming is not the use of some particular set of “programming languages.” It is the skill set that starts with understanding a problem that the current state of digital technology can solve, imagining possible solutions, and then effectively explaining to a set of digital tools what we want them to help us do. The fact that this may now be possible in plain language rather than a specialized dialect means that more people can create useful solutions to the specific problems they face rather than looking only for solutions to problems shared by millions. This has always been a sweet spot for open source. I’m sure many people have said this about the driving impulse of open source, but I first heard it from Eric Allman, the creator of Sendmail, at what became known as the open source summit in 1998: “scratching your own itch.” And of course, history teaches us that this creative ferment often leads to solutions that are indeed useful to millions. Amateur programmers become professionals, enthusiasts become entrepreneurs, and before long, the entire industry has been lifted to a new level.

Standards enable participation. MCP is a protocol that works across different AI systems. If it succeeds, it won’t be because Anthropic mandates it but because it creates enough value that others adopt it. That’s the hallmark of a real standard.

Ecosystems beat models. The most generative platforms are those in which the platform creators are themselves part of the ecosystem. There isn’t an AI “operating system” platform yet, but the winner-takes-most race for AI supremacy is based on that prize. Open source and the internet provide an alternate, standards-based platform that not only allows people to build apps but to extend the platform itself.

Open source AI means rethinking open source licenses. Most of the software shared on GitHub has no explicit license, which means that default copyright laws apply: The software is under exclusive copyright, and the creator retains all rights. Others generally have no right to reproduce, distribute, or create derivative works from the code, even if it is publicly visible on GitHub. But as Shakespeare wrote in The Merchant of Venice, “The brain may devise laws for the blood, but a hot temper leaps o’er a cold decree.” Much of this code is de facto open source, even if not de jure. People can learn from it, easily copy from it, and share what they’ve learned.

But perhaps more importantly for the current moment in AI, it was all used to train LLMs, which means that this de facto open source code became a vector through which all AI-generated code is created today. This, of course, has made many developers unhappy, because they believe that AI has been trained on their code without either recognition or recompense. For open source, recognition has always been a fundamental currency. For open source AI to mean something, we need new approaches to recognizing contributions at every level.

Licensing issues also come up around what happens to data that flows through an MCP server. What happens when people connect their databases and proprietary data flows through an MCP so that an LLM can reason about it? Right now I suppose it falls under the same license as you have with the LLM vendor itself, but will that always be true? And, would I, as a provider of information, want to restrict the use of an MCP server depending on a specific configuration of a user’s LLM settings? For example, might I be OK with them using a tool if they have turned off “sharing” in the free version, but not want them to use it if they hadn’t? As one commenter on a draft of this essay put it, “Some API providers would like to prevent LLMs from learning from data even if users permit it. Who owns the users’ data (emails, docs) after it has been retrieved via a particular API or MCP server might be a complicated issue with a chilling effect on innovation.”

There are efforts such as RSL (Really Simple Licensing) and CC Signals that are focused on content licensing protocols for the consumer/open web, but they don’t yet really have a model for MCP, or more generally for transformative use of content by AI. For example, if an AI uses my credentials to retrieve academic papers and produces a literature review, what encumbrances apply to the results? There is a lot of work to be done here.

Open Source Must Evolve as Programming Itself Evolves

It’s easy to be amazed by the magic of vibe coding. But treating the LLM as a code generator that takes input in English or other human languages and produces Python, TypeScript, or Java echoes the use of a traditional compiler or interpreter to generate byte code. It reads what we call a “higher-level language” and translates it into code that operates further down the stack. And there’s a historical lesson in that analogy. In the early days of compilers, programmers had to inspect and debug the generated assembly code, but eventually the tools got good enough that few people need to do that any more. (In my own career, when I was writing the manual for Lightspeed C, the first C compiler for the Mac, I remember Mike Kahl, its creator, hand-tuning the compiler output as he was developing it.)

Now programmers are increasingly finding themselves having to debug the higher-level code generated by LLMs. But I’m confident that will become a smaller and smaller part of the programmer’s role. Why? Because eventually we come to depend on well-tested components. I remember how the original Macintosh user interface guidelines, with predefined user interface components, standardized frontend programming for the GUI era, and how the Win32 API meant that programmers no longer needed to write their own device drivers. In my own career, I remember working on a book about curses, the Unix cursor-manipulation library for CRT screens, and a few years later the manuals for Xlib, the low-level programming interfaces for the X Window System. This kind of programming soon was superseded by user interface toolkits with predefined elements and actions. So too, the roll-your-own era of web interfaces was eventually standardized by powerful frontend JavaScript frameworks.

Once developers come to rely on libraries of preexisting components that can be combined in new ways, what developers are debugging is no longer the lower-level code (first machine code, then assembly code, then hand-built interfaces) but the architecture of the systems they build, the connections between the components, the integrity of the data they rely on, and the quality of the user interface. In short, developers move up the stack.

LLMs and AI agents are calling for us to move up once again. We are groping our way towards a new paradigm in which we are not just building MCPs as instructions for AI agents but developing new programming paradigms that blend the rigor and predictability of traditional programming with the knowledge and flexibility of AI. As Phillip Carter memorably noted, LLMs are inverted computers relative to those with which we’ve been familiar: “We’ve spent decades working with computers that are incredible at precision tasks but need to be painstakingly programmed for anything remotely fuzzy. Now we have computers that are adept at fuzzy tasks but need special handling for precision work.” That being said, LLMs are becoming increasingly adept at knowing what they are good at and what they aren’t. Part of the whole point of MCP and skills is to give them clarity about how to use the tools of traditional computing to achieve their fuzzy aims.

Consider the evolution of agents from those based on “browser use” (that is, working with the interfaces designed for humans) to those based on making API calls (that is, working with the interfaces designed for traditional programs) to those based on MCP (relying on the intelligence of LLMs to read documents that explain the tools that are available to do a task). An MCP server looks a lot like the formalization of prompt and context engineering into components. A look at what purports to be a leaked system prompt for ChatGPT suggests that the pattern of MCP servers was already hidden in the prompts of proprietary AI apps: “Here’s how I want you to act. Here are the things that you should and should not do. Here are the tools available to you.”

But while system prompts are bespoke, MCP and skills are a step towards formalizing plain text instructions to an LLM so that they can become reusable components. In short, MCP and skills are early steps towards a system of what we can call “fuzzy function calls.”

Fuzzy Function Calls: Magic Words Made Reliable and Reusable

This view of how prompting and context engineering fit with traditional programming connects to something I wrote about recently: LLMs natively understand high-level concepts like “plan,” “test,” and “deploy”; industry standard terms like “TDD” (Test Driven Development) or “PRD” (Product Requirements Document); competitive features like “study mode”; or specific file formats like “.md file.” These “magic words” are prompting shortcuts that bring in dense clusters of context and trigger particular patterns of behavior that have specific use cases.

But right now, these magic words are unmodifiable. They exist in the model’s training, within system prompts, or locked inside proprietary features. You can use them if you know about them, and you can write prompts to modify how they work in your current session. But you can’t inspect them to understand exactly what they do, you can’t tweak them for your needs, and you can’t share your improved version with others.

Skills and MCPs are a way to make magic words visible and extensible. They formalize the instructions and patterns that make an LLM application work, and they make those instructions something you can read, modify, and share.

Take ChatGPT’s study mode as an example. It’s a particular way of helping someone learn, by asking comprehension questions, testing understanding, and adjusting difficulty based on responses. That’s incredibly valuable. But it’s locked inside ChatGPT’s interface. You can’t even access it via the ChatGPT API. What if study mode was published as a skill? Then you could:

See exactly how it works. What instructions guide the interaction?
Modify it for your subject matter. Maybe study mode for medical students needs different patterns than study mode for language learning.
Fork it into variants. You might want a “Socratic mode” or “test prep mode” that builds on the same foundation.
Use it with your own content and tools. You might combine it with an MCP server that accesses your course materials.
Share your improved version and learn from others’ modifications.

This is the next level of AI programming “up the stack.” You’re not training models or vibe coding Python. You’re elaborating on concepts the model already understands, more adapted to specific needs, and sharing them as building blocks others can use.

Building reusable libraries of fuzzy functions is the future of open source AI.

The Economics of Participation

There’s a deeper pattern here that connects to a rich tradition in economics: mechanism design. Over the past few decades, economists like Paul Milgrom and Al Roth won Nobel Prizes for showing how to design better markets: matching systems for medical residents, spectrum auctions for wireless licenses, kidney exchange networks that save lives. These weren’t just theoretical exercises. They were practical interventions that created more efficient, more equitable outcomes by changing the rules of the game.

Some tech companies understood this. As chief economist at Google, Hal Varian didn’t just analyze ad markets, he helped design the ad auction that made Google’s business model work. At Uber, Jonathan Hall applied mechanism design insights to dynamic pricing and marketplace matching to build a “thick market” of passengers and drivers. These economists brought economic theory to bear on platform design, creating systems where value could flow more efficiently between participants.

Though not guided by economists, the web and the open source software revolution were also not just technical advances but breakthroughs in market design. They created information-rich, participatory markets where barriers to entry were lowered. It became easier to learn, create, and innovate. Transaction costs plummeted. Sharing code or content went from expensive (physical distribution, licensing negotiations) to nearly free. Discovery mechanisms emerged: Search engines, package managers, and GitHub made it easy to find what you needed. Reputation systems were discovered or developed. And of course, network effects benefited everyone. Each new participant made the ecosystem more valuable.

These weren’t accidents. They were the result of architectural choices that made internet-enabled software development into a generative, participatory market.

AI desperately needs similar breakthroughs in mechanism design. Right now, most economic analysis of AI focuses on the wrong question: “How many jobs will AI destroy?” This is the mindset of an extractive system, where AI is something done to workers and to existing companies rather than with them. The right question is: “How do we design AI systems that create participatory markets where value can flow to all contributors?”

Consider what’s broken right now:

Attribution is invisible. When an AI model benefits from training on someone’s work, there’s no mechanism to recognize or compensate for that contribution.
Value capture is concentrated. A handful of companies capture the gains, while millions of content creators, whose work trained the models and are consulted during inference, see no return.
Improvement loops are closed. If you find a better way to accomplish a task with AI, you can’t easily share that improvement or benefit from others’ discoveries.
Quality signals are weak. There’s no good way to know if a particular skill, prompt, or MCP server is well-designed without trying it yourself.

MCP and skills, viewed through this economic lens, are early-stage infrastructure for a participatory AI market. The MCP Registry and skills gallery are primitive but promising marketplaces with discoverable components and inspectable quality. When a skill or MCP server is useful, it’s a legible, shareable artifact that can carry attribution. While this may not redress the “original sin” of copyright violation during model training, it does perhaps point to a future where content creators, not just AI model creators and app developers, may be able to monetize their work.

But we’re nowhere near having the mechanisms we need. We need systems that efficiently match AI capabilities with human needs, that create sustainable compensation for contribution, that enable reputation and discovery, that make it easy to build on others’ work while giving them credit.

This isn’t just a technical challenge. It’s a challenge for economists, policymakers, and platform designers to work together on mechanism design. The architecture of participation isn’t just a set of values. It’s a powerful framework for building markets that work. The question is whether we’ll apply these lessons of open source and the web to AI or whether we’ll let AI become an extractive system that destroys more value than it creates.

A Call to Action

I’d love to see OpenAI, Google, Meta, and the open source community develop a robust architecture of participation for AI.

Make innovations inspectable. When you build a compelling feature or an effective interaction pattern or a useful specialization, consider publishing it in a form others can learn from. Not as a closed app or an API to a black box but as instructions, prompts, and tool configurations that can be read and understood. Sometimes competitive advantage comes from what you share rather than what you keep secret.

Support open protocols. MCP’s early success demonstrates what’s possible when the industry rallies around an open standard. Since Anthropic introduced it in late 2024, MCP has been adopted by OpenAI (across ChatGPT, the Agents SDK, and the Responses API), Google (in the Gemini SDK), Microsoft (in Azure AI services), and a rapidly growing ecosystem of development tools from Replit to Sourcegraph. This cross-platform adoption proves that when a protocol solves real problems and remains truly open, companies will embrace it even when it comes from a competitor. The challenge now is to maintain that openness as the protocol matures.

Create pathways for contribution at every level. Not everyone needs to fork model weights or even write MCP servers. Some people should be able to contribute a clever prompt template. Others might write a skill that combines existing tools in a new way. Still others will build infrastructure that makes all of this easier. All of these contributions should be possible, visible, and valued.

Document magic. When your model responds particularly well to certain instructions, patterns, or concepts, make those patterns explicit and shareable. The collective knowledge of how to work effectively with AI shouldn’t be scattered across X threads and Discord channels. It should be formalized, versioned, and forkable.

Reinvent open source licenses. Take into account the need for recognition not only during training but inference. Develop protocols that help manage rights for data that flows through networks of AI agents.

Engage with mechanism design. Building a participatory AI market isn’t just a technical problem, it’s an economic design challenge. We need economists, policymakers, and platform designers collaborating on how to create sustainable, participatory markets around AI. Stop asking “How many jobs will AI destroy?” and start asking “How do we design AI systems that create value for all participants?” The architecture choices we make now will determine whether AI becomes an extractive force or an engine of broadly shared prosperity.

The future of programming with AI won’t be determined by who publishes model weights. It’ll be determined by who creates the best ways for ordinary developers to participate, contribute, and build on each other’s work. And that includes the next wave of developers: users who can create reusable AI skills based on their special knowledge, experience, and human perspectives.

We’re at a choice point. We can make AI development look like app stores and proprietary platforms, or we can make it look like the open web and the open source lineages that descended from Unix. I know which future I’d like to live in.

Footnotes

I shared a draft of this piece with members of the Anthropic MCP and Skills team, and in addition to providing a number of helpful technical improvements, they confirmed a number of points where my framing captured their intentions. Comments ranged from “Skills were designed with composability in mind. We didn’t want to confine capable models to a single system prompt with limited functions” to “I love this phrasing since it leads into considering the models as the processing power, and showcases the need for the open ecosystem on top of the raw power a model provides” and “In a recent talk, I compared the models to processors, agent runtimes/orchestrations to the OS, and Skills as the application.” However, all of the opinions are my own and Anthropic is not responsible for anything I’ve said here.

Oreilly
Job for 2027: Senior Director of Million-Dollar Regexes 24 November 2025 at 07:04

Job for 2027: Senior Director of Million-Dollar Regexes

Oreilly

By: Tim O'Brien

24 November 2025 at 07:04

The following article originally appeared on Medium and is being republished here with the author’s permission.

Don’t get me wrong, I’m up all night using these tools.

But I also sense we’re heading for an expensive hangover. The other day, a colleague told me about a new proposal to route a million documents a day through a system that identifies and removes Social Security numbers.

I joked that this was going to be a “million-dollar regular expression.”

Run the math on the “naïve” implementation with full GPT-5 and it’s eye-watering: A million messages a day at ~50K characters each works out to around 12.5 billion tokens daily, or $15,000 a day at current pricing. That’s nearly $6 million a year to check for Social Security numbers. Even if you migrate to GPT-5 Nano, you still spend about $230,000 a year.

That’s a success. You “saved” $5.77 million a year…

How about running this code for a million documents a day? How much would this cost:

import re; s = re.sub(r”\b\d{3}[- ]?\d{2}[- ]?\d{4}\b”, “[REDACTED]”, s)

A plain old EC2 instance could handle this… A single EC2 instance—something like an m1.small at 30 bucks a month—could churn through the same workload with a regex and cost you a few hundred dollars a year.

Which means that in practice, companies will be calling people like me in a year saying, “We’re burning a million dollars to do something that should cost a fraction of that—can you fix it?”

From $15,000/day to $0.96/day—I do think we’re about to see a lot of companies realize that a thinking model connected to an MCP server is way more expensive than just paying someone to write a bash script. Starting now, you’ll be able to make a career out of un-LLM-ifying applications.

Oreilly
How Agentic AI Empowers Architecture Governance 19 November 2025 at 12:01

How Agentic AI Empowers Architecture Governance

Oreilly

By: Neal Ford and Mark Richards

19 November 2025 at 12:01

One of the principles in our upcoming book Architecture as Code is the ability for architects to design automated governance checks for important architectural concerns, creating fast feedback loops when things go awry. This idea isn’t new—Neal and his coauthors Rebecca Parsons and Patrick Kua espoused this idea back in 2017 in the first edition of Building Evolutionary Architectures, and many of our clients adopted these practices with great success. However, our most ambitious goals were largely thwarted by a common problem in modern architectures: brittleness. Fortunately, the advent of the Model Context Protocol (MCP) and agentic AI have largely solved this problem for enterprise architects.

Fitness Functions

Building Evolutionary Architectures defines the concept of an architectural fitness function: any mechanism that provides an objective integrity check for architectural characteristics. Architects can think of fitness functions sort of like unit tests, but for architectural concerns.

While many fitness functions run like unit tests to test structure (using tools like ArchUnit, NetArchTest, PyTestArch, arch-go, and so on), architects can write fitness functions to validate all sorts of important checks…like tasks normally reserved for relational databases.

Fitness functions and referential integrity

Consider the architecture illustrated in Figure 1.

*Figure 1: Strategically splitting a database in a distributed architecture*

In Figure 1, the team has decided to split the data into two databases for better scalability and availability. However, the common disadvantage of that approach lies with the fact that the team can no longer rely on the database to enforce referential integrity. In this situation, each ticket must have a corresponding customer to model this workflow correctly.

While many teams seem to think that referential integrity is only possible within a relational database, we separate the governance activity (data integrity) from the implementation (the relational database) and realize we can create our own check using an architectural fitness function, as shown in Figure 2.

*Figure 2: Implementing referential integrity as a fitness function*

In Figure 2, the architect has created a small fitness function that monitors the queue between customer and ticket. When the queue depth drops to zero (meaning that the system isn’t processing any messages), the fitness function creates a set of customer keys from the customer service and a set of customer foreign keys from the ticket service and asserts that all of the ticket foreign keys are contained within the set of customer keys.

Why not just query the databases directly from the fitness function? Abstracting them as sets allows flexibility—querying across databases on a constant basis introduces overhead that may have negative side effects. Abstracting the fitness function check from the mechanics of how the data is stored to an abstract data structure has at least a couple of advantages. First, using sets allows architects to cache nonvolatile data (like customer keys), avoiding constant querying of the database. Many solutions exist for write-through caches in the rare event we do add a customer. Second, using sets of keys abstracts us from actual data items. Data engineers prefer synthetic keys to using domain data; the same is true for architects. While the database schema might change over time, the team will always need the relationship between customers and tickets, which this fitness function validates in an abstract way.

Who executes this code? As this problem is typical in distributed architectures such as microservices, the common place to execute this governance code is within the service mesh of the microservices architecture. Service mesh is a general pattern for handling operational concerns in microservices, such as logging, monitoring, naming, service discovery, and other nondomain concerns. In mature microservices ecosystems, the service mesh also acts as a governance mesh, applying fitness functions and other rules at runtime.

This is a common way that architects at the application level can validate data integrity, and we’ve implemented these types of fitness functions on hundreds of projects. However, the specificity of the implementation details makes it difficult to expand the scope of these types of fitness functions to the enterprise architect level because they include too many implementation details about how the project works.

Brittleness for metadomains

One of the key lessons from domain-driven design was the idea of keeping implementation details as tightly bound as possible, using anticorruption layers to prevent integration points from understanding too many details. Architects have embraced this philosophy in architectures like microservices.

Yet we see the same problem here at the metalevel, where enterprise architects would like to broadly control concerns like data integrity yet are hampered by the distance and specificity of the governance requirement. Distance refers to the scope of the activity. While application and integration architects have a narrow scope of responsibility, enterprise architects by their nature sit at the enterprise level. Thus, for an enterprise architect to enforce governance such as referential integrity requires them to know too many specific details about how the team has implemented the project.

One of our biggest global clients has a role within their enterprise architecture group called evolutionary architect, whose job is to identify global governance concerns, and we have other clients who have tried to implement this level of holistic governance with their enterprise architects. However, the brittleness defeats these efforts: As soon as the team needs to change an implementation detail, the fitness function breaks. Even though we often couch fitness functions as “unit tests for architecture,” in reality, they break much less often than unit tests. (How often do changes affect some fundamental architectural concern versus a change to the domain?) However, by exposing implementation details outside the project to enterprise architects, these fitness functions do break enough to limit their value.

We’ve tried a variety of anticorruption layers for metaconcerns, but generative AI and MCP have provided the best solution to date.

MCP and Agentic Governance

MCP defines a general integration layer for agents to query and consume capabilities within a particular metascope. For example, teams can set up an MCP server at the application or integration architecture level to expose tools and data sources to AI agents. This provides the perfect anticorruption layer for enterprise architects to state the intent of governance without relying on implementation details.

This allows teams to implement the type of governance that the strategically minded enterprise architects want but create a level of indirection for the details. For example, see the updated referential integrity check illustrated in Figure 3.

*Figure 3. Using MCP for indirection to hide the fitness function implementation details*

In Figure 3, the enterprise architect issues the general request to validate referential integrity to the MCP server for the project. It in turn exposes fitness functions via tools (or data sources such as log files) to carry out the request.

By creating an anticorruption layer between the project details and enterprise architect, we can use MCP to handle implementation details so that when the project evolves in the future, it doesn’t break the governance because of brittleness, as shown in Figure 4.

*Figure 4. Using agentic AI to create metalevel indirection*

In Figure 4, the enterprise architect concern (validate referential integrity) hasn’t changed, but the project details have. The team added another service for experts, who work on tickets, meaning we now need to validate integrity across three databases. The team changes the internal MCP tool that implements the fitness function, and the enterprise architect request stays the same.

This allows enterprise architects to effectively state governance intent without diving into implementation details, removing the brittleness of far-reaching fitness functions and enabling much more proactive holistic governance by architects at all levels.

Defining the Intersections of Architecture

In Architecture as Code, we discuss nine different intersections with software architecture and other parts of the software development ecosystem (data representing one of them), all expressed as architectural fitness functions (the “code” part of architecture as code). In defining the intersection of architecture and enterprise architect, we can use MCP and agents to state intent holistically, deferring the actual details to individual projects and ecosystems. This solves one of the nagging problems for enterprise architects who want to build more automated feedback loops within their systems.

MCP is almost ideally suited for this purpose, designed to expose tools, data sources, and prompt libraries to external contexts outside a particular project domain. This allows enterprise architects to holistically define broad intent and leave it to teams to implement (and evolve) their solutions.

X as code (where X can be a wide variety of things) typically arises when the software development ecosystem reaches a certain level of maturity and automation. Teams tried for years to make infrastructure as code work, but it didn’t until tools such as Puppet and Chef came along that could enable that capability. The same is true with other “as code” initiatives (security, policy, and so on): The ecosystem needs to provide tools and frameworks to allow it to work. Now, with the combination of powerful fitness function libraries for a wide variety of platforms and ecosystem innovations such as MCP and agentic AI, architecture itself has enough support to join the “as code” communities.

Learn more about how AI is reshaping enterprise architecture at the Software Architecture Superstream on December 9. Join host Neal Ford and a lineup of experts including Metro Bank’s Anjali Jain and Philip O’Shaughnessy, Vercel’s Dom Sipowicz, Intel’s Brian Rogers, Microsoft’s Ron Abellera, and Equal Experts’ Lewis Crawford to hear hard-won insights about building adaptive, AI-ready architectures that support continuous innovation, ensure governance and security, and align seamlessly with business goals.

O’Reilly members can register here. Not a member? Sign up for a 10-day free trial before the event to attend—and explore all the other resources on O’Reilly.

Oreilly
Build to Last 19 November 2025 at 05:55

Build to Last

Oreilly

By: Jeremy Howard

19 November 2025 at 05:55

The following originally appears on fast.ai and is reposted here with the author’s permission.

I’ve spent decades teaching people to code, building tools that help developers work more effectively, and championing the idea that programming should be accessible to everyone. Through fast.ai, I’ve helped millions learn not just to use AI but to understand it deeply enough to build things that matter.

But lately, I’ve been deeply concerned. The AI agent revolution promises to make everyone more productive, yet what I’m seeing is something different: developers abandoning the very practices that lead to understanding, mastery, and software that lasts. When CEOs brag about their teams generating 10,000 lines of AI-written code per day, when junior engineers tell me they’re “vibe-coding” their way through problems without understanding the solutions, are we racing toward a future where no one understands how anything works, and competence craters?

I needed to talk to someone who embodies the opposite approach: someone whose code continues to run the world decades after he created it. That’s why I called Chris Lattner, cofounder and CEO of Modular AI and creator of LLVM, the Clang compiler, the Swift programming language, and the MLIR compiler infrastructure.

Chris and I chatted on Oct 5, 2025, and he kindly let me record the conversation. I’m glad I did, because it turned out to be thoughtful and inspiring. Check out the video for the full interview, or read on for my summary of what I learned.

Talking with Chris Lattner

Chris Lattner builds infrastructure that becomes invisible through ubiquity.

Twenty-five years ago, as a PhD student, he created LLVM: the most fundamental system for translating human-written code into instructions computers can execute. In 2025, LLVM sits at the foundation of most major programming languages: the Rust that powers Firefox, the Swift running on your iPhone, and even Clang, a C++ compiler created by Chris that Google and Apple now use to create their most critical software. He describes the Swift programming language he created as “Syntax sugar for LLVM”. Today it powers the entire iPhone/iPad ecosystem.

When you need something to last not just years but decades, to be flexible enough that people you’ll never meet can build things you never imagined on top of it, you build it the way Chris built LLVM, Clang, and Swift.

I first met Chris when he arrived at Google in 2017 to help them with TensorFlow. Instead of just tweaking it, he did what he always does: he rebuilt from first principles. He created MLIR (think of it as LLVM for modern hardware and AI), and then left Google to create Mojo: a programming language designed to finally give AI developers the kind of foundation that could last.

Chris architects systems that become the bedrock others build on for decades, by being a true craftsman. He cares deeply about the craft of software development.

I told Chris about my concerns, and the pressures I was feeling as both a coder and a CEO:

“Everybody else around the world is doing this, ‘AGI is around the corner. If you’re not doing everything with AI, you’re an idiot.’ And honestly, Chris, it does get to me. I question myself… I’m feeling this pressure to say, ‘Screw craftsmanship, screw caring.’ We hear VCs say, ‘My founders are telling me they’re getting out 10,000 lines of code a day.’ Are we crazy, Chris? Are we old men yelling at the clouds, being like, ‘Back in my day, we cared about craftsmanship’? Or what’s going on?”

Chris told me he shares my concerns:

“A lot of people are saying, ‘My gosh, tomorrow all programmers are going to be replaced by AGI, and therefore we might as well give up and go home. Why are we doing any of this anymore? If you’re learning how to code or taking pride in what you’re building, then you’re not doing it right.’ This is something I’m pretty concerned about…

But the question of the day is: how do you build a system that can actually last more than six months?”

He showed me that the answer to that question is timeless, and actually has very little to do with AI.

Design from First Principles

Chris’s approach has always been to ask fundamental questions. “For me, my journey has always been about trying to understand the fundamentals of what makes something work,” he told me. “And when you do that, you start to realize that a lot of the existing systems are actually not that great.”

When Chris started LLVM over Christmas break in 2000, he was asking: what does a compiler infrastructure need to be, fundamentally, to support languages that don’t exist yet? When he came into the AI world he was eager to learn the problems I saw with TensorFlow and other systems. He then zoomed into what AI infrastructure should look like from the ground up. Chris explained:

“The reason that those systems were fundamental, scalable, successful, and didn’t crumble under their own weight is because the architecture of those systems actually worked well. They were well-designed, they were scalable. The people that worked on them had an engineering culture that they rallied behind because they wanted to make them technically excellent.

In the case of LLVM, for example, it was never designed to support the Rust programming language or Julia or even Swift. But because it was designed and architected for that, you could build programming languages, Snowflake could go build a database optimizer—which is really cool—and a whole bunch of other applications of the technology came out of that architecture.”

Chris pointed out that he and I have a certain interest in common: “We like to build things, and we like to build things from the fundamentals. We like to understand them. We like to ask questions.” He has found (as have I!) that this is critical if you want your work to matter, and to last.

Of course, building things from the fundamentals doesn’t always work. But as Chris said, “if we’re going to make a mistake, let’s make a new mistake.” Doing the same thing as everyone else in the same way as everyone else isn’t likely to do work that matters.

Craftsmanship and Architecture

Chris pointed out that software engineering isn’t just about an individual churning out code: “A lot of evolving a product is not just about getting the results; it’s about the team understanding the architecture of the code.” And in fact it’s not even just about understanding, but that he’s looking for something much more than that. “For people to actually give a damn. For people to care about what they’re doing, to be proud of their work.”

I’ve seen that it’s possible for teams that care and build thoughtfully to achieve something special. I pointed out to him that “software engineering has always been about trying to get a product that gets better and better, and your ability to work on that product gets better and better. Things get easier and faster because you’re building better and better abstractions and better and better understandings in your head.”

Chris agreed. He again stressed the importance of thinking longer term:

“Fundamentally, with most kinds of software projects, the software lives for more than six months or a year. The kinds of things I work on, and the kinds of systems you like to build, are things that you continue to evolve. Look at the Linux kernel. The Linux kernel has existed for decades with tons of different people working on it. That is made possible by an architect, Linus, who is driving consistency, abstractions, and improvement in lots of different directions. That longevity is made possible by that architectural focus.”

This kind of deep work doesn’t just benefit the organization, but benefits every individual too. Chris said:

“I think the question is really about progress. It’s about you as an engineer. What are you learning? How are you getting better? How much mastery do you develop? Why is it that you’re able to solve problems that other people can’t?… The people that I see doing really well in their careers, their lives, and their development are the people that are pushing. They’re not complacent. They’re not just doing what everybody tells them to do. They’re actually asking hard questions, and they want to get better. So investing in yourself, investing in your tools and techniques, and really pushing hard so that you can understand things at a deeper level—I think that’s really what enables people to grow and achieve things that they maybe didn’t think were possible a few years before.”

This is what I tell my team too. The thing I care most about is whether they’re always improving at their ability to solve those problems.

Dogfooding

But caring deeply and thinking architecturally isn’t enough if you’re building in a vacuum.

I’m not sure it’s really possible to create great software if you’re not using it yourself, or working right next to your users. When Chris and his team were building the Swift language, they had to build it in a vacuum of Apple secrecy. He shares:

“The using your own product piece is really important. One of the big things that caused the IDE features and many other things to be a problem with Swift is that we didn’t really have a user. We were building it, but before we launched, we had one test app that was kind of ‘dogfooded’ in air quotes, but not really. We weren’t actually using it in production at all. And by the time it launched, you could tell. The tools didn’t work, it was slow to compile, crashed all the time, lots of missing features.”

His new Mojo project is taking a very different direction:

“With Mojo, we consider ourselves to be the first customer. We have hundreds of thousands of lines of Mojo code, and it’s all open source… That approach is very different. It’s a product of experience, but it’s also a product of building Mojo to solve our own problems. We’re learning from the past, taking best principles in.”

The result is evident. Already at this early stage models built on Mojo are getting state of the art results. Most of Mojo is written in Mojo. So if something isn’t working well, they are the first ones to notice.

We had a similar goal at fast.ai with our Solveit platform: we wanted to reach a point where most of our staff chose to do most of their work in Solveit, because they preferred it. (Indeed, I’m writing this article in Solveit right now!) Before we reached that point, I often had to force myself to use Solveit in order to experience first hand the shortcomings of those early versions, so that I could deeply understand the issues. Having done so, I now appreciate how smooth everything works even more!

But this kind of deep, experiential understanding is exactly what we risk losing when we delegate too much to AI.

AI, Craftsmanship, and Learning

Chris uses AI: “I think it’s a very important tool. I feel like I get a 10 to 20% improvement—some really fancy code completion and autocomplete.” But with Chris’ focus on the importance of craftsmanship and continual learning and improvement, I wondered if heavy AI (and particularly agent) use (“vibe coding”) might negatively impact organizations and individuals.

Chris: When you’re vibe-coding things, suddenly… another thing I’ve seen is that people say, ‘Okay, well maybe it’ll work.’ It’s almost like a test. You go off and say, ‘Maybe the agentic thing will go crank out some code,’ and you spend all this time waiting on it and coaching it. Then, it doesn’t work.

Jeremy: It’s like a gambling machine, right? Pull the lever again, try again, just try again.

Chris: Exactly. And again, I’m not saying the tools are useless or bad, but when you take a step back and you look at where it’s adding value and how, I think there’s a little bit too much enthusiasm of, ‘Well, when AGI happens, it’s going to solve the problem. I’m just waiting and seeing… Here’s another aspect of it: the anxiety piece. I see a lot of junior engineers coming out of school, and they’re very worried about whether they’ll be able to get a job. A lot of things are changing, and I don’t really know what’s going to happen. But to your point earlier, a lot of them say, ’Okay, well, I’m just going to vibe-code everything,’ because this is ‘productivity’ in air quotes. I think that’s also a significant problem.

Jeremy: Seems like a career killer to me.

Chris: …If you get sucked into, ‘Okay, well I need to figure out how to make this thing make me a 10x programmer,’ it may be a path that doesn’t bring you to developing at all. It may actually mean that you’re throwing away your own time, because we only have so much time to live on this earth. It can end up retarding your development and preventing you from growing and actually getting stuff done.

At its heart, Chris’s concern is that AI-heavy coding and craftsmanship just don’t appear to be compatible:

“Software craftsmanship is the thing that AI code threatens. Not because it’s impossible to use properly—again, I use it, and I feel like I’m doing it well because I care a lot about the quality of the code. But because it encourages folks to not take the craftsmanship, design, and architecture seriously. Instead, you just devolve to getting your bug queue to be shallower and making the symptoms go away. I think that’s the thing that I find concerning.”

“What you want to get to, particularly as your career evolves, is mastery. That’s how you kind of escape the thing that everybody can do and get more differentiation… The concern I have is this culture of, ‘Well, I’m not even going to try to understand what’s going on. I’m just going to spend some tokens, and maybe it’ll be great.’”

I asked if he had some specific examples where he’s seen things go awry.

“I’ve seen a senior engineer, when a bug gets reported, let the agentic loop rip, go spend some tokens, and maybe it’ll come up with a bug fix and create a PR. This PR, however, was completely wrong. It made the symptom go away, so it ‘fixed’ the bug in air quotes, but it was so wrong that if it had been merged, it would have just made the product way worse. You’re replacing one bug with a whole bunch of other bugs that are harder to understand, and a ton of code that’s just in the wrong place doing the wrong thing. That is deeply concerning. The actual concern is not this particular engineer because, fortunately, they’re a senior engineer and smart enough not to just say, ‘Okay, pass this test, merge.’ We also do code review, which is a very important thing. But the concern I have is this culture of, ‘Well, I’m not even going to try to understand what’s going on. I’m just going to spend some tokens, and maybe it’ll be great. Now I don’t have to think about it.’ This is a huge concern because a lot of evolving a product is not just about getting the results; it’s about the team understanding the architecture of the code. If you’re delegating knowledge to an AI, and you’re just reviewing the code without thinking about what you want to achieve, I think that’s very, very concerning.”

Some folks have told me they think that unit tests are a particularly good place to look at using AI more heavily. Chris urges caution, however:

“AI is really great at writing unit tests. This is one of the things that nobody likes to do. It feels super productive to say, ‘Just crank out a whole bunch of tests,’ and look, I’ve got all this code, amazing. But there’s a problem, because unit tests are their own potential tech debt. The test may not be testing the right thing, or they might be testing a detail of the thing rather than the real idea of the thing… And if you’re using mocking, now you get all these super tightly bound implementation details in your tests, which make it very difficult to change the architecture of your product as things evolve. Tests are just like the code in your main application—you should think about them. Also, lots of tests take a long time to run, and so they impact your future development velocity.”

Part of the problem, Chris noted, is that many people are using high lines of code written as a statistic to support the idea that AI is making a positive impact.

“To me, the question is not how do you get the most code. I’m not a CEO bragging about the number of lines of code written by AI; I think that’s a completely useless metric. I don’t measure progress based on the number of lines of code written. In fact, I see verbose, redundant, not well-factored code as a huge liability… The question is: how productive are people at getting stuff done and making the product better? This is what I care about.”

Underlying all of these concerns is the belief that AGI is imminent, and therefore traditional approaches to software development are obsolete. Chris has seen this movie before. “In 2017, I was at Tesla working on self-driving cars, leading the Autopilot software team. I was convinced that in 2020, autonomous cars would be everywhere and would be solved. It was this desperate race to go solve autonomy… But at the time, nobody even knew how hard that was. But what was in the air was: trillions of dollars are at stake, job replacement, transforming transportation… I think today, exactly the same thing is happening. It’s not about self-driving, although that is making progress, just a little bit less gloriously and immediately than people thought. But now it’s about programming.”

Chris thinks that, like all previous technologies, AI progress isn’t actually exponential. “I believe that progress looks like S-curves. Pre-training was a big deal. It seemed exponential, but it actually S-curved out and got flat as things went on. I think that we have a number of piled-up S-curves that are all driving forward amazing progress, but I at least have not seen that spark.”

The danger isn’t just that people might be wrong about AGI’s timeline—it’s what happens to their careers and codebases while they’re waiting. “Technology waves cause massive hype cycles, overdrama, and overselling,” Chris noted. “Whether it be object-oriented programming in the ’80s where everything’s an object, or the internet wave in the 2000s where everything has to be online otherwise you can’t buy a shirt or dog food. There’s truth to the technology, but what ends up happening is things settle out, and it’s less dramatic than initially promised. The question is, when things settle out, where do you as a programmer stand? Have you lost years of your own development because you’ve been spending it the wrong way?”

Chris is careful to clarify that he’s not anti-AI—far from it. “I am a maximalist. I want AI in all of our lives,” he told me. “However, the thing I don’t like is the people that are making decisions as though AGI or ASI were here tomorrow… Being paranoid, being anxious, being afraid of living your life and of building a better world seems like a very silly and not very pragmatic thing to do.”

Software Craftsmanship with AI

Chris sees the key as understanding the difference between using AI as a crutch versus using it as a tool that enhances your craftsmanship. He finds AI particularly valuable for exploration and learning:

“It’s amazing for learning a codebase you’re not familiar with, so it’s great for discovery. The automation features of AI are super important. Getting us out of writing boilerplate, getting us out of memorizing APIs, getting us out of looking up that thing from Stack Overflow; I think this is really profound. This is a good use. The thing that I get concerned about is if you go so far as to not care about what you’re looking up on Stack Overflow and why it works that way and not learning from it.”

One principle Chris and I share is the critical importance of tight iteration loops. For Chris, working on systems programming, this means “edit the code, compile, run it, get a test that fails, and then debug it and iterate on that loop… Running tests should take less than a minute, ideally less than 30 seconds.” He told me that when working on Mojo, one of the first priorities was “building VS Code support early because without tools that let you create quick iterations, all of your work is going to be slower, more annoying, and more wrong.”

My background is different—I am a fan of the Smalltalk, Lisp, and APL tradition where you have a live workspace and every line of code manipulates objects in that environment. When Chris and I first worked together on Swift for TensorFlow, the first thing I told him was “I’m going to need a notebook.” Within a week, he had built me complete Swift support for Jupyter. I could type something, see the result immediately, and watch my data transform step-by-step through the process. This is the Brett Victor “Inventing on Principle” style of being close to what you’re crafting.

If you want to maintain craftsmanship while using AI, you need tight iteration loops so you can see what’s happening. You need a live workspace where you (and the AI) are manipulating actual state, not just writing text files.

At fast.ai, we’ve been working to put this philosophy into practice with our Solveit platform. We discovered a key principle: the AI should be able to see exactly what the human sees, and the human should be able to see exactly what the AI sees at all times. No separate instruction files, no context windows that don’t match your actual workspace—the AI is right there with you, supporting you as you work.

This creates what I think of as “a third participant in this dialogue”—previously I had a conversation with my computer through a REPL, typing commands and seeing results. Now the AI is in that conversation too, able to see my code, my data, my outputs, and my thought process as I work through problems. When I ask “does this align with what we discussed earlier” or “have we handled this edge case,” the AI doesn’t need me to copy-paste context—it’s already there.

One of our team members, Nate, built something called ShellSage that demonstrates this beautifully. He realized that tmux already shows everything that’s happened in your shell session, so he just added a command that talks to an LLM. That’s it—about 100 lines of code. The LLM can see all your previous commands, questions, and output. By the next day, all of us were using it constantly. Another team member, Eric, built our Discord Buddy bot using this same approach—he didn’t write code in an editor and deploy it. He typed commands one at a time in a live symbol table, manipulating state directly. When it worked, he wrapped those steps into functions. No deployment, no build process—just iterative refinement of a running system.

Eric Ries has been writing his new book in Solveit and the AI can see exactly what he writes. He asks questions like “does this paragraph align with the mission we stated earlier?” or “have we discussed this case study before?” or “can you check my editor’s notes for comments on this?” The AI doesn’t need special instructions or context management—it’s in the trenches with him, watching the work unfold. (I’m writing this article in Solveit right now, for the same reasons.)

I asked Chris about how he thinks about the approach we’re taking with Solveit: “instead of bringing in a junior engineer that can just crank out code, you’re bringing in a senior expert, a senior engineer, an advisor—somebody that can actually help you make better code and teach you things.”

How Do We Do Something Meaningful?

Chris and I both see a bifurcation coming. “It feels like we’re going to have a bifurcation of skills,” I told him, “because people who use AI the wrong way are going to get worse and worse. And the people who use it to learn more and learn faster are going to outpace the speed of growth of AI capabilities because they’re human with the benefit of that… There’s going to be this group of people that have learned helplessness and this maybe smaller group of people that everybody’s like, ‘How does this person know everything? They’re so good.’”

The principles that allowed LLVM to last 25 years—architecture; understanding; craftsmanship—haven’t changed. “The question is, when things settle out, where do you as a programmer stand?” Chris asked. “Have you lost years of your own development because you’ve been spending it the wrong way? And now suddenly everybody else is much further ahead of you in terms of being able to create productive value for the world.”

His advice is clear, especially for those just starting out: “If I were coming out of school, my advice would be don’t pursue that path. Particularly if everybody is zigging, it’s time to zag. What you want to get to, particularly as your career evolves, is mastery. So you can be the senior engineer. So you can actually understand things to a depth that other people don’t. That’s how you escape the thing that everybody can do and get more differentiation.”

The hype will settle. The tools will improve. But the question Chris poses remains: “How do we actually add value to the world? How do we do something meaningful? How do we move the world forward?” For both of us, the answer involves caring deeply about our craft, understanding what we’re building, and using AI not as a replacement for thinking but as a tool to think more effectively. If the goal is to build things that last, you’re not going to be able to outsource that to AI. You’ll need to invest deeply in yourself.

Oreilly
The Trillion Dollar Problem 18 November 2025 at 07:10

The Trillion Dollar Problem

Oreilly

By: Jeremy Arendt

18 November 2025 at 07:10

Picture this: You’re a data analyst on day one at a midsize SaaS company. You’ve got the beginnings of a data warehouse—some structured, usable data and plenty of raw data you’re not quite sure what to do with yet. But that’s not the real problem. The real problem is that different teams are doing their own thing: Finance has Power BI models loaded with custom DAX and Excel connections. Sales is using Tableau connected to the central data lake. Marketing has some bespoke solution you haven’t figured out yet. If you’ve worked in data for any number of years, this scene probably feels familiar.

Then a finance director emails: Why does ARR show as $250M in my dashboard when Sales just reported $275M in their call?

No problem, you think. You’re a data analyst; this is what you do. You start digging. What you find isn’t a simple calculation error. Finance and sales are using different date dimensions, so they’re measuring different time periods. Their definitions of what counts as “revenue” don’t match. Their business unit hierarchies are built on completely different logic: one buried in a Power BI model, the other hardcoded in a Tableau calculation. You trace the problem through layers of custom notebooks, dashboard formulas, and Excel workbooks and realize that creating a single version of the truth that’s governable, stable, and maintainable isn’t going to be easy. It might not even be possible without rebuilding half the company’s data infrastructure and achieving a level of compliance from other data users that would be a full-time job in itself.

This is where the semantic layer comes in—what VentureBeat has called the “$1 trillion AI problem.” Think of it as a universal translator for your data: It’s a single place where you define what your metrics mean, how they’re calculated, and who can access them. The semantic layer is software that sits between your data sources and your analytics tools, pulling in data from wherever it lives, adding critical business context (relationships, calculations, descriptions), and serving it to any downstream tool in a consistent format. The result? Secure, performant access that enables genuinely practical self-service analytics.

Why does this matter now? As we’ll see when we return to the ARR problem, one force is driving the urgency: AI.

Legacy BI tools were never built with AI in mind, creating two critical gaps. First, all the logic and calculations scattered across your Power BI models, Tableau workbooks, and Excel spreadsheets aren’t accessible to AI tools in any meaningful way. Second, the data itself lacks the business context AI needs to use it accurately. An LLM looking at raw database tables doesn’t know that “revenue” means different things to finance and sales, or why certain records should be excluded from ARR calculations.

The semantic layer solves both problems. It makes data more trustworthy across traditional BI tools like Tableau, Power BI, and Excel while also giving AI tools the context they need to work accurately. Initial research shows near 100% accuracy across a wide range of queries when pairing a semantic layer with an LLM, compared to much lower performance when connecting AI directly to a data warehouse.

So how does this actually work? Let’s return to the ARR dilemma.

The core problem: multiple versions of the truth. Sales has one definition of ARR; finance has another. Analysts caught in the middle spend days investigating, only to end up with “it depends” as their answer. Decision making grinds to a halt because no one knows which number to trust.

This is where the semantic layer delivers its biggest value: a single source for defining and storing metrics. Think of it as the authoritative dictionary for your company’s data. ARR gets one definition, one calculation, one source of truth all stored in the semantic layer and accessible to everyone who needs it.

You might be thinking, “Can’t I do this in my data warehouse or BI tool?” Technically, yes. But here’s what makes semantic layers different: modularity and context.

Once you define ARR in the semantic layer it becomes a modular, reusable object—any tool that connects to it can use that metric: Tableau, Power BI, Excel, your new AI chatbot, whatever. The metric carries its business context with it: what it means, how it’s calculated, who can access it, and why certain records are included or excluded. You’re not rebuilding the logic in each tool; you’re referencing a single, governed definition.

This creates three immediate wins:

Single version of truth: Everyone uses the same ARR calculation, whether they’re in finance or sales or they’re pulling it into a machine learning model.
Effortless lineage: You can trace exactly where ARR is used across your organization and see its full calculation path.
Change management that actually works: When your CFO decides next quarter that ARR should exclude trial customers, you update the definition once in the semantic layer. Every dashboard, report, and AI tool that uses ARR gets the update automatically. No hunting through dozens of Tableau workbooks, Power BI models, and Python notebooks to find every hardcoded calculation.

Which brings us to the second key function of a semantic layer: interoperability.

Back to our finance director and that ARR question. With a semantic layer in place, here’s what changes. She opens Excel and pulls ARR directly from the semantic layer: $265M. The sales VP opens his Tableau dashboard, connects to the same semantic layer, and sees $265M. Your company’s new AI chatbot? Someone asks, “What’s our Q3 ARR?” and it queries the semantic layer: $265M. Same metric, same calculation, same answer, regardless of the tool.

This is what makes semantic layers transformative. They sit between your data sources and every tool that needs to consume that data. Power BI, Tableau, Excel, Python notebooks, LLMs, the semantic layer doesn’t care. You define the metric once, and every tool can access it through standard APIs or protocols. No rebuilding the logic in DAX for Power BI, then again in Tableau’s calculation language, then again in Excel formulas, then again for your AI chatbot.

Before semantic layers, interoperability meant compromise. You’d pick one tool as the “source of truth” and force everyone to use it, or you’d accept that different teams would have slightly different numbers. Neither option scales. With a semantic layer, your finance team keeps Excel, your sales team keeps Tableau, your data scientists keep Python, and your executives can ask questions in plain English to an AI assistant. They all get the same answer because they’re all pulling from the same governed definition.

Back to day one. You’re still a data analyst at that SaaS company, but this time there’s a semantic layer in place.

The finance director emails, but the question is different: “Can we update ARR to include our new business unit?”

Without a semantic layer, this request means days of work: updating Power BI models, Tableau dashboards, Excel reports, and AI integrations one by one. Coordinating with other analysts to understand their implementations. Testing everything. Hoping nothing breaks.

With a semantic layer? You log in to your semantic layer software and see the ARR definition: the calculation, the source tables, every tool using it. You update the logic once to include the new business unit. Test it. Deploy it. Every downstream tool—Power BI, Tableau, Excel, the AI chatbot—instantly reflects the change.

What used to take days now takes hours. What used to require careful coordination across teams now happens in one place. The finance director gets her answer, Sales sees the same number, and nobody’s reconciling spreadsheets at 5PM on Friday.

This is what analytics can be: consistent, flexible, and actually self-service. The semantic layer doesn’t just solve the ARR problem—it solves the fundamental challenge of turning data into trusted insights. One definition, any tool, every time.

Oreilly
Countering a Brutal Job Market with AI 17 November 2025 at 07:12

Countering a Brutal Job Market with AI

Oreilly

By: Anjali Ramakrishnan

17 November 2025 at 07:12

Headlines surfaced by a simple “job market” search describe it as “a humiliation ritual” or “hell” and “an emerging crisis for entry-level workers.” The unemployment rate in the US for recent graduates is at an “unusually high” 5.8%—even Harvard Business School graduates have been taking months to find work. Inextricable from this conversation is the complication of AI’s potential to automate entry-level jobs, and as a tool for employers to evaluate applications. But the widespread availability of generative AI platforms begs an overlooked question: How are job seekers themselves using AI?

An interview study with upcoming master’s graduates at an elite UK university* sheds some light. In contrast to popular narratives about “laziness” or “shortcuts,” AI use comes from job seekers trying to strategically tackle the digitally saturated, competitive reality of today’s job market. Here are the main takeaways:

They Use AI to Play an Inevitable Numbers Game

Job seekers described feeling the need to apply to a high volume of jobs because of how rare it is to get a response amid the competition. They send out countless applications on online portals and rarely receive so much as an automated rejection email. As Franco, a 29-year-old communications student put it, particularly with “LinkedIn and job portals” saturating the market, his CV is just one “in a spreadsheet of 2,000 applicants.”

This context underlies how job seekers use AI, which allows them to spend less time on any given application by helping to cater résumés or write cover letters and thus put out more applications. Seoyeon, a 24-year-old communications student, describes how she faced repeated rejections no matter how carefully she crafted the application or how qualified she was.

[Employers] themselves are going to use AI to screen through those applications….And after a few rejections, it really frustrates you because you put in so much effort and time and passion for this one application to learn that it’s just filtered through by some AI….After that, it makes you lean towards, you know what, I’m just gonna put less effort into one application but apply for as many jobs as possible.

Seoyeon went on to say later that she even asks AI to tell her what “keywords” she should have in her application in light of AI in hiring systems.

Her reflection reveals that AI use is not a shortcut, but that it feels like a necessity to deal with the inevitable rejection and AI scanners, especially in light of companies themselves using AI to read applications—making her “passion” feel like a waste.

AI as a Savior to Emotional Labor

The labor of applying to jobs and dealing with constant rejection and little human interaction makes it a deeply emotional process that students describe as “draining” and “torturing,” which illuminates that AI is a way to reduce not just the time of labor but the emotional aspect of it.

Franco felt that having to portray himself as “passionate” for hundreds of jobs that he would not even hear back from was an “emotional toll” that AI helped him manage.

Repeating this process to a hundred job applications, a hundred job positions and having to rewrite a cover letter in a way that sounds like if it was your dream, well I don’t know if you can have a hundred dreams.…I would say that it does have an emotional toll….I think that AI actually helps a lot in terms of, okay, I’m going to help you do this cover letter so you don’t have to mentally feel you’re not going to get the shot.

Using AI thus acted as a buffer for the emotional difficulties of being a job seeker, allowing students to conserve mental energy in a grueling process while still applying to many jobs.

The More Passionate They Are, the Less AI They Use

AI use was not uniform by any means, even though the job application process often requires the same materials. Job seekers had “passion parameters” in place, where they dial down their use for a job that they were more passionate about.

Joseph, a 24-year-old psychology student, put this “human involvement” as “definitely more than 50%” for a role he truly desires, whereas for a less interesting role, it’s about “20%–30%.” He differentiates this by describing how, when passion is involved, he does deep research into the company as opposed to relying on AI’s “summarized, nuanced-lacking information,” and writes the cover letter from scratch—only using AI to be critical of it. In contrast, for less desirable jobs, AI plays a much more generative role in creating the initial draft that he then edits.

This points to the fact that while AI feels important for labor efficiency, students do not use it indiscriminately, especially when passion is involved and they want to put their best foot forward.

They Understand AI’s Flaws (and Work Around Them)

In their own words, students are not heedlessly “copying and pasting” AI-generated materials. They are critical of AI tools and navigate them with their concerns in mind.

Common flaws in AI-generated material include sounding “robotic” and “machine-like,” with some “AI” sounding words including “explore” and “delve into.” Joseph asserted that he can easily tell which one is written by a human, because AI-generated text lacks the “passion and zeal” of someone who is genuinely hungry for the job.

Nandita, a 23-year-old psychology student, shared how AI’s tendency to “put you on a pedestal” came through in misrepresenting facts. When she asked AI to tailor her résumé, it embellished her experience of “a week-long observation in a psychology clinic” into “community service,” which she strongly felt it wasn’t—she surmised this happened because community service was mentioned in the job description she fed AI, and she caught it and corrected it.

Consequently, using AI in the job hunt is not a passive endeavor but requires vigilance and a critical understanding to ensure its flaws do not hurt you as a job seeker.

They Grapple with AI’s Larger Implications

Using AI is not an unconditional endorsement of the technology; all the students were cognizant of (and worried about) its wider social implications.

John, a 24-year-old data science student, drew a distinction between using AI in impersonal processes versus human experiences. While he would use it for “a cover letter” for a job he suspects will be screened by AI anyway, he worries how it will be used in other parts of life.

I think it’s filling in parts of people’s lives that they don’t realize are very fundamental to who they are as humans. One example I’ve always thought of is, if you need it for things like cover letters, [that]s OK] just because it’s something where it’s not very personal.…But if you can’t write a birthday card without using ChatGPT, that’s a problem.

Nandita voiced a similar critique, drawing on her psychology background; while she could see AI helping tasks like “admin work,” she worries about how it would be used for therapy. She argues that an AI therapist would be “100% a Western…thing” and would fail to connect with someone “from the rural area in India.”

The understanding of AI shows that graduates differentiate using it for impersonal processes, like job searching in the digital age, from more human-to-human situations where it poses a threat.

Some Grads Are Opting Out of AI Use

Though most people interviewed were using AI, some rejected it entirely. They voiced similar qualms that AI users had, including sounding “robotic” and not “human.” Julia, a 23-year-old law student, specifically mentioned that her field requires “language and persuasiveness,” with “a human tone” that AI cannot replicate, and that not using it would “set you apart” in job applications.

Mark, a 24-year-old sociology student, acknowledged the same concerns as AI users about a saturated online arms race, but instead of using AI to send out as many applications as possible, had a different strategy in mind: “talking to people in real life.” He described how he once secured a research job through a connection in the smoking area of a pub.

Importantly, these job seekers had similar challenges with the job market as AI users, but they opted for different strategies to handle it that emphasize human connection and voice.

Conclusion

For graduate job seekers, AI use is a layered strategy that is a direct response to the difficulties of the job market. It is not about cutting corners but carefully adapting to current circumstances that require new forms of digital literacy.

Moving away from dialogue framing job seekers as lazy or unable to write their own materials forces us to look at how the system itself can be improved for applicants and companies alike. If employers don’t want AI use, how can they create a process that makes room for human authenticity as opposed to AI-generated materials that sustain the broken cycle of hiring?

*All participant names are pseudonyms.

Oreilly
AI Overviews Shouldn’t Be “One Size Fits All” 13 November 2025 at 07:16

AI Overviews Shouldn’t Be “One Size Fits All”

Oreilly

By: Tim O’Reilly

13 November 2025 at 07:16

The following originally appeared on Asimov’s Addendum and is being republished here with the author’s permission.

The other day, I was looking for parking information at Dulles International Airport, and was delighted with the conciseness and accuracy of Google’s AI overview. It was much more convenient than being told that the information could be found at the flydulles.com website, visiting it, perhaps landing on the wrong page, and finding the information I needed after a few clicks. It’s also a win from the provider side. Dulles isn’t trying to monetize its website (except to the extent that it helps people choose to fly from there.) The website is purely an information utility, and if AI makes it easier for people to find the right information, everyone is happy.

An AI overview of an answer found by consulting or training on Wikipedia is more problematic. The AI answer may lack some of the nuance and neutrality Wikipedia strives for. And while Wikipedia does make the information free for all, it depends on visitors not only for donations but also for the engagement that might lead people to become Wikipedia contributors or editors. The same may be true of other information utilities like GitHub and YouTube. Individual creators are incentivized to provide useful content by the traffic that YouTube directs to them and monetizes on their behalf.

And of course, an AI answer provided by illicitly crawling content that’s behind a subscription paywall is the source of a great deal of contention, even lawsuits. So content runs a gamut from “no problem crawling” to “do not crawl.”

There are a lot of efforts to stop unwanted crawling, including Really Simple Licensing (RSL) and Cloudflare’s Pay Per Crawl. But we need a more systemic solution. Both of these approaches put the burden of expressing intent onto the creator of the content. It’s as if every school had to put up its own traffic signs saying “School Zone: Speed Limit 15 mph.” Even making “Do Not Crawl” the default puts a burden on content providers, since they must now affirmatively figure out what content to exclude from the default in order to be visible to AI.

Why aren’t we putting more of the burden on AI companies instead of putting all of it on the content providers? What if we asked companies deploying crawlers to observe common sense distinctions such as those that I suggested above? Most drivers know not to tear through city streets at highway speeds even without speed signs. Alert drivers take care around children even without warning signs. There are some norms that are self-enforcing. Drive at high speed down the wrong side of the road and you will soon discover why it’s best to observe the national norm. But most norms aren’t that way. They work when there’s consensus and social pressure, which we don’t yet have in AI. And only when that doesn’t work do we rely on the safety net of laws and their enforcement.

As Larry Lessig pointed out at the beginning of the Internet era, starting with his book Code and Other Laws of Cyberspace, governance is the result of four forces: law, norms, markets, and architecture (which can refer either to physical or technical constraints).

So much of the thinking about the problems of AI seems to start with laws and regulations. What if instead, we started with an inquiry about what norms should be established? Rather than asking ourselves what should be legal, what if we asked ourselves what should be normal? What architecture would support those norms? And how might they enable a market, with laws and regulations mostly needed to restrain bad actors, rather than preemptively limiting those who are trying to do the right thing?

I think often of a quote from the Chinese philosopher Lao Tzu, who said something like:

Losing the way of life, men rely on goodness.
Losing goodness, they rely on laws.

I like to think that “the way of life” is not just a metaphor for a state of spiritual alignment, but rather, an alignment with what works. I first thought about this back in the late ’90s as part of my open source advocacy. The Free Software Foundation started with a moral argument, which it tried to encode into a strong license (a kind of law) that mandated the availability of source code. Meanwhile, other projects like BSD and the X Window System relied on goodness, using a much weaker license that asked only for recognition of those who created the original code. But “the way of life” for open source was in its architecture.

Both Unix (the progenitor of Linux) and the World Wide Web have what I call an architecture of participation. They were made up of small pieces loosely joined by a communications protocol that allowed anyone to bring something to the table as long as they followed a few simple rules. Systems that were open source by license but had a monolithic architecture tended to fail despite their license and the availability of source code. Those with the right cooperative architecture (like Unix) flourished even under AT&T’s proprietary license, as long as it was loosely enforced. The right architecture enables a market with low barriers to entry, which also means low barriers to innovation, with flourishing widely distributed.

Architectures based on communication protocols tend to go hand in hand with self-enforcing norms, like driving on the same side of the street. The system literally doesn’t work unless you follow the rules. A protocol embodies both a set of self-enforcing norms and “code” as a kind of law.

What about markets? In a lot of ways, what we mean by “free markets” is not that they are free of government intervention. It is that they are free of the economic rents that accrue to some parties because of outsized market power, position, or entitlements bestowed on them by unfair laws and regulations. This is not only a more efficient market, but one that lowers the barriers for new entrants, typically making more room not only for widespread participation and shared prosperity but also for innovation.

Markets don’t exist in a vacuum. They are mediated by institutions. And when institutions change, markets change.

Consider the history of the early web. Free and open source web browsers, web servers, and a standardized protocol made it possible for anyone to build a website. There was a period of rapid experimentation, which led to the development of a number of successful business models: free content subsidized by advertising, subscription services, and ecommerce.

Nonetheless, the success of the open architecture of the web eventually led to a system of attention gatekeepers, notably Google, Amazon, and Meta. Each of them rose to prominence because it solved for what Herbert Simon called the scarcity of attention. Information had become so abundant that it defied manual curation. Instead, powerful, proprietary algorithmic systems were needed to match users with the answers, news, entertainment, products, applications, and services they seek. In short, the great internet gatekeepers each developed a proprietary algorithmic invisible hand to manage an information market. These companies became the institutions through which the market operates.

They initially succeeded because they followed “the way of life.” Consider Google. Its success began with insights about what made an authoritative site, understanding that every link to a site was a kind of vote, and that links from sites that were themselves authoritative should count more than others. Over time, the company found more and more factors that helped it to refine results so that those that appeared highest in the search results were in fact what their users thought were the best. Not only that, the people at Google thought hard about how to make advertising that worked as a complement to organic search, popularizing “pay per click” rather than “pay per view” advertising and refining its ad auction technology such that advertisers only paid for results, and users were more likely to see ads that they were actually interested in. This was a virtuous circle that made everyone—users, information providers, and Google itself—better off. In short, enabling an architecture of participation and a robust market is in everyone’s interest.

Amazon too enabled both sides of the market, creating value not only for its customers but for its suppliers. Jeff Bezos explicitly described the company strategy as the development of a flywheel: helping customers find the best products at the lowest price draws more customers, more customers draw more suppliers and more products, and that in turn draws in more customers.

Both Google and Amazon made the markets they participated in more efficient. Over time, though, they “enshittified” their services for their own benefit. That is, rather than continuing to make solving the problem of efficiently allocating the user’s scarce attention their primary goal, they began to manipulate user attention for their own benefit. Rather than giving users what they wanted, they looked to increase engagement, or showed results that were more profitable for them even though they might be worse for the user. For example, Google took control over more and more of the ad exchange technology and began to direct the most profitable advertising to its own sites and services, which increasingly competed with the web sites that it originally had helped users to find. Amazon supplanted the primacy of its organic search results with advertising, vastly increasing its own profits while the added cost of advertising gave suppliers the choice of reducing their own profits or increasing their prices. Our research in the Algorithmic Rents project at UCL found that Amazon’s top advertising recommendations are not only ranked far lower by its organic search algorithm, which looks for the best match to the user query, but are also significantly more expensive.

As I described in “Rising Tide Rents and Robber Baron Rents,” this process of replacing what is best for the user with what is best for the company is driven by the need to keep profits rising when the market for a company’s once-novel services stops growing and starts to flatten out. In economist Joseph Schumpeter’s theory, innovators can earn outsized profits as long as their innovations keep them ahead of the competition, but eventually these “Schumpeterian rents” get competed away through the diffusion of knowledge. In practice, though, if innovators get big enough, they can use their power and position to profit from more traditional extractive rents. Unfortunately, while this may deliver short term results, it ends up weakening not only the company but the market it controls, opening the door to new competitors at the same time as it breaks the virtuous circle in which not just attention but revenue and profits flow through the market as a whole.

Unfortunately, in many ways, because of its insatiable demand for capital and the lack of a viable business model to fuel its scaling, the AI industry has gone in hot pursuit of extractive economic rents right from the outset. Seeking unfettered access to content, unrestrained by laws or norms, model developers have ridden roughshod over the rights of content creators, training not only on freely available content but ignoring good faith signals like subscription paywalls, robots.txt and “do not crawl.” During inference, they exploit loopholes such as the fact that a paywall that comes up for users on a human timeframe briefly leaves content exposed long enough for bots to retrieve it. As a result, the market they have enabled is of third party black or gray market crawlers giving them plausible deniability as to the sources of their training or inference data, rather than the far more sustainable market that would come from discovering “the way of life” that would balance the incentives of human creators and AI derivatives.

Here are some broad-brush norms that AI companies could follow, if they understand the need to support and create a participatory content economy.

For any query, use the intelligence of your AI to judge whether the information being sought is likely to come from a single canonical source, or from multiple competing sources. For example, for my query about parking at Dulles Airport, it’s pretty likely that flydulles.com is a canonical source. Note however, that there may be alternative providers, such as additional off-airport parking, and if so, include them in the list of sources to consult.
Check for a subscription paywall, licensing technologies like RSL, “do not crawl” or other indication in robots.txt, and if any of these things exists, respect it.
Ask yourself if you are substituting for a unique source of information. If so, responses should be context-dependent. For example, for long form articles, provide basic info but make clear there’s more depth at the source. For quick facts (hours of operation, basic specs), provide the answer directly with attribution. The principle is that the AI’s response shouldn’t substitute for experiences where engagement is part of the value. This is an area that really does call for nuance, though. For example, there is a lot of low quality how-to information online that buries useful answers in unnecessary material just to provide additional surface area for advertising, or provides poor answers based on pay-for-placement. An AI summary can short-circuit that cruft. Much as Google’s early search breakthroughs required winnowing the wheat from the chaff, AI overviews can bring a search engine such as Google back to being as useful as it was in 2010, pre-enshittification.
If the site has high quality data that you want to train on or use for inference, pay the provider, not a black market scraper. If you can’t come to mutually agreed-on terms, don’t take it. This should be a fair market exchange, not a colonialist resource grab. AI companies pay for power and the latest chips without looking for black market alternatives. Why is it so hard to understand the need to pay fairly for content, which is an equally critical input?
Check whether the site is an aggregator of some kind. This can be inferred from the number of pages. A typical informational site such as a corporate or government website whose purpose is to provide public information about its products or services will have a much smaller footprint than an aggregator such as Wikipedia, Github, TripAdvisor, Goodreads, YouTube, or a social network. There are probably lots of other signals an AI could be trained to use. Recognize that competing directly with an aggregator with content scraped from that platform is unfair competition. Either come to a license agreement with the platform, or compete fairly without using their content to do so. If it is a community-driven platform such as Wikipedia or Stack Overflow, recognize that your AI answers might reduce contribution incentives, so in addition, support the contribution ecosystem. Provide revenue sharing, fund contribution programs, and provide prominent links that might convert some users into contributors. Make it easy to “see the discussion” or “view edit history” for queries where that context matters.

As a concrete example, let’s imagine how an AI might treat content from Wikipedia:

Direct factual query (”When did the Battle of Hastings occur?”): 1066. No link needed, because this is common knowledge available from many sites.
More complex query for which Wikipedia is the primary source (“What led up to the Battle of Hastings?) “According to Wikipedia, the Battle of Hastings was caused by a succession crisis after the death of King Edward the Confessor in January 1066, who died without a clear heir. [Link]”
Complex/contested topic: “Wikipedia’s article on [X] covers [key points]. Given the complexity and ongoing debate, you may want to read the full article and its sources: [link]”
For rapidly evolving topics: Note Wikipedia’s last update and link for current information.

Similar principles would apply to other aggregators. GitHub code snippets should link back to repositories, YouTube queries should direct to videos, not just summarize them.

These examples are not market-tested, but they do suggest directions that could be explored if AI companies took the same pains to build a sustainable economy that they do to reduce bias and hallucination in their models. What if we had a sustainable business model benchmark that AI companies competed on just as they do on other measures of quality?

Finding a business model that compensates the creators of content is not just a moral imperative, it’s a business imperative. Economies flourish better through exchange than extraction. AI has not yet found true product-market fit. That doesn’t just require users to love your product (and yes, people do love AI chat.) It requires the development of business models that create a rising tide for everyone.

Many advocate for regulation; we advocate for self-regulation. This starts with an understanding by the leading AI platforms that their job is not just to delight their users but to enable a market. They have to remember that they are not just building products, but institutions that will enable new markets and that they themselves are in the best position to establish the norms that will create flourishing AI markets. So far, they have treated the suppliers of the raw materials of their intelligence as a resource to be exploited rather than cultivated. The search for sustainable win-win business models should be as urgent to them as the search for the next breakthrough in AI performance.

Oreilly
Your AI Pair Programmer Is Not a Person 12 November 2025 at 07:21

Your AI Pair Programmer Is Not a Person

Oreilly

By: Tim O'Brien

12 November 2025 at 07:21

The following article originally appeared on Medium and is being republished here with the author’s permission.

Early on, I caught myself saying “you” to my AI tools—“Can you add retries?” “Great idea!”—like I was talking to a junior dev. And then I’d get mad when it didn’t “understand” me.

That’s on me. These models aren’t people. An AI model doesn’t understand. It generates, and it follows patterns. But the keyword here is “it.”

The Illusion of Understanding

It feels like there’s a mind on the other side because the output is fluent and polite. It says things like “Great idea!” and “I recommend…” as if it weighed options and judged your plan. It didn’t. The model doesn’t have opinions. It recognized patterns from training data and your prompt, then synthesized the next token.

That doesn’t make the tool useless. It means you are the one doing the understanding. The model is clever, fast, and often correct, but it can often be wildly wrong in a way that will confound you. But what’s important to understand is that it is your fault if this happens because you didn’t give it enough context.

Here’s an example of naive pattern following:

A friend asked his model to scaffold a project. It spit out a block comment that literally said “This is authored by <Random Name>.” He Googled the name. It was someone’s public snippet that the model had basically learned as a pattern—including the “authored by” comment—and parroted back into a new file. Not malicious. Just mechanical. It didn’t “know” that adding a fake author attribution was absurd.

Build Trust Before Code

The first mistake most folks make is overtrust. The second is lazy prompting. The fix for both is the same: Be precise about inputs, and validate the assumption you are throwing at models.

Spell out context, constraints, directory boundaries, and success criteria.

Require diffs. Run tests. Ask it to second-guess your assumptions.

Make it restate your problem, and require it to ask for confirmation.

Before you throw a $500/hour problem at a set of parallel model executions, do your own homework to make sure that you’ve communicated all of your assumptions and that the model has understood what your criteria are for success.

Failure? Look Within

I continue to fall into this trap when I ask this tool to take on too much complexity without giving it enough context. And when it fails, I’ll type things like, “You’ve got to be kidding me? Why did you…”

Just remember, there is no “you” here other than yourself.

It doesn’t share your assumptions. If you didn’t tell it not to update the database, and it wrote an idiotic migration, you did that by not outlining that the tool shouldn’t refrain from doing so.
It didn’t read your mind about the scope. If you don’t lock it to a folder, it will “helpfully” refactor the world. If it tries to remove your home directory to be helpful? That’s on you.
It wasn’t trained on only “good” code. A lot of code on the internet… is not great. Your job is to specify constraints and success criteria.

The Mental Model I Use

Treat the model like a compiler for instructions. Garbage in, garbage out. Assume it’s smart about patterns, not about your domain. Make it prove correctness with tests, invariants, and constraints.

It’s not a person. That’s not an insult. It’s your advantage. Suppose you stop expecting human‑level judgment and start supplying machine‑level clarity. In that case, your results jump, but don’t let sycophantic agreement lull you into thinking that you have a pair programmer next to you.

Oreilly
The Other 80%: What Productivity Really Means 11 November 2025 at 07:10

The Other 80%: What Productivity Really Means

Oreilly

By: Mike Loukides

11 November 2025 at 07:10

We’ve been bombarded with claims about how much generative AI improves software developer productivity: It turns regular programmers into 10x programmers, and 10x programmers into 100x. And even more recently, we’ve been (somewhat less, but still) bombarded with the other side of the story: METR reports that, despite software developers’ belief that their productivity has increased, total end-to-end throughput has declined with AI assistance. We also saw hints of that in last year’s DORA report, which showed that release cadence actually slowed slightly when AI came into the picture. This year’s report reverses that trend.

I want to get a couple of assumptions out of the way first:

I don’t believe in 10x programmers. I’ve known people who thought they were 10x programmers, but their primary skill was convincing other team members that the rest of the team was responsible for their bugs. 2x, 3x? That’s real. We aren’t all the same, and our skills vary. But 10x? No.
There are a lot of methodological problems with the METR report—they’ve been widely discussed. I don’t believe that means we can ignore their result; end-to-end throughput on a software product is very difficult to measure.

As I (and many others) have written, actually writing code is only about 20% of a software developer’s job. So if you optimize that away completely—perfect secure code, first time—you only achieve a 20% speedup. (Yeah, I know, it’s unclear whether or not “debugging” is included in that 20%. Omitting it is nonsense—but if you assume that debugging adds another 10%–20% and recognize that that generates plenty of its own bugs, you’re back in the same place.) That’s a consequence of Amdahl’s law, if you want a fancy name, but it’s really just simple arithmetic.

Amdahl’s law becomes a lot more interesting if you look at the other side of performance. I worked at a high-performance computing startup in the late 1980s that did exactly this: It tried to optimize the 80% of a program that wasn’t easily vectorizable. And while Multiflow Computer failed in 1990, our very-long-instruction-word (VLIW) architecture was the basis for many of the high-performance chips that came afterward: chips that could execute many instructions per cycle, with reordered execution flows and branch prediction (speculative execution) for commonly used paths.

I want to apply the same kind of thinking to software development in the age of AI. Code generation seems like low-hanging fruit, though the voices of AI skeptics are rising. But what about the other 80%? What can AI do to optimize the rest of the job? That’s where the opportunity really lies.

Angie Jones’s talk at AI Codecon: Coding for the Agentic World takes exactly this approach. Angie notes that code generation isn’t changing how quickly we ship because it only takes in one part of the software development lifecycle (SDLC), not the whole. That “other 80%” involves writing documentation, handling pull requests (PRs), and the continuous integration pipeline (CI). In addition, she realizes that code generation is a one-person job (maybe two, if you’re pairing); coding is essentially solo work. Getting AI to assist the rest of the SDLC requires involving the rest of the team. In this context, she states the 1/9/90 rule: 1% are leaders who will experiment aggressively with AI and build new tools; 9% are early adopters; and 90% are “wait and see.” If AI is going to speed up releases, the 90% will need to adopt it; if it’s only the 1%, a PR here and there will be managed faster, but there won’t be substantial changes.

Angie takes the next step: She spends the rest of the talk going into some of the tools she and her team have built to take AI out of the IDE and into the rest of the process. I won’t spoil her talk, but she discusses three stages of readiness for the AI:

AI-curious: The agent is discoverable, can answer questions, but can’t modify anything.
AI-ready: The AI is starting to make contributions, but they’re only suggestions.
AI-embedded: The AI is fully plugged into the system, another member of the team.

This progression lets team members check AI out and gradually build confidence—as the AI developers themselves build confidence in what they can allow the AI to do.

Do Angie’s ideas take us all the way? Is this what we need to see significant increases in shipping velocity? It’s a very good start, but there’s another issue that’s even bigger. A company isn’t just a set of software development teams. It includes sales, marketing, finance, manufacturing, the rest of IT, and a lot more. There’s an old saying that you can’t move faster than the company. Speed up one function, like software development, without speeding up the rest and you haven’t accomplished much. A product that marketing isn’t ready to sell or that the sales group doesn’t yet understand doesn’t help.

That’s the next question we have to answer. We haven’t yet sped up real end-to-end software development, but we can. Can we speed up the rest of the company? METR’s report claimed that 95% of AI products failed. They theorized that it was in part because most projects targeted customer service, but the backend office work was more amenable to AI in its current form. That’s true—but there’s still the issue of “the rest.” Does it make sense to use AI to generate business plans, manage supply change, and the like if all it will do is reveal the next bottleneck?

Of course it does. This may be the best way of finding out where the bottlenecks are: in practice, when they become bottlenecks. There’s a reason Donald Knuth said that premature optimization is the root of all evil—and that doesn’t apply only to software development. If we really want to see improvements in productivity through AI, we have to look company-wide.