Reading view

There are new articles available, click to refresh the page.

New Reports Reveal Years of Unaddressed Osprey Safety Risks



DEEP DIVE — It is one of the most lauded defense developments in recent decades, providing preeminent capability to U.S. military personnel worldwide, but that prowess evidently comes with a steep cost that military leadership allowed to grow for years.

Critics have long asserted that the military failed to adequately address a mounting series of safety issues with the V-22 Osprey aircraft, even as service members died in preventable crashes. The Naval Air Systems Command review and Government Accountability Office report paint a scathing portrait of systemic failures by the Joint Program Office overseeing V-22 variants for the Air Force, Marine Corps, and Navy.

The Marine Corps operates approximately 348 MV-22s, the Air Force 52 CV-22s, and the Navy 29 CMV-22s, with the program of record at around 464 total across services. Japan operates 17 MV-22s, with deliveries complete or near-complete.

The Deadly Track Record

Some 30 U.S. Marines lost their lives in three separate crashes during the testing and development phase throughout the 1990s, giving the Osprey the nickname “The Widow Maker.” Since its introduction in 2007, at least 35 servicemembers have died in 10 fatal crashes.

“Initially, the V-22 suffered from Vortex Ring State, which produced crashes during development. The problem was diagnosed and remediated, and the loss rate went down dramatically,” John Pike, a leading defense, space and intelligence policy expert and Director of GlobalSecurity.org, tells The Cipher Brief. “Subsequent losses have been ‘normal accidents’ due to the usual mechanical and human failings.”

The GAO found that serious Osprey mishaps in 2023 and 2024 exceeded the previous eight years and generally surpassed accident rates of other Navy and Air Force aircraft. In August 2023, three Marines died in Australia. In 2022, four U.S. soldiers were killed in a NATO training mission, and five Marines were killed in California.

Unresolved Problems

The NAVAIR report revealed that “the cumulative risk posture of the V-22 platform has been growing since initial fielding,” and the program office “has not promptly implemented fixes.” Of 12 Class A mishaps in the past four years, seven involved parts failures already identified as major problems but not addressed.

Issues with hard-clutch engagement (HCE) caused the July 2022 California crash that killed five. The problem occurs when the clutch connecting the engine to the propeller gearbox slips and reengages abruptly, causing a power spike that can throw the aircraft into an uncontrolled roll.

There were eight Air Force servicemembers killed in the November 2023 crash off Yakushima Island when a catastrophic propeller gearbox failed due to cracks in the metal pinion gear, and the pilot continued flying despite multiple warnings, contributing to the crash.

This manufacturing issue dates to 2006, but the Joint Program Office didn’t formally assess the risk until March 2024 – nearly two decades later. A NAVAIR logbook review found that over 40 safety-critical components were operating beyond their airworthiness limits, and that 81 percent of ground accidents were due to human error.

A Broken System: Poor Communication Between Services

The GAO also found that the three services don’t routinely share critical safety information. Aircrews haven’t met regularly to review aircraft knowledge and emergency procedures. The services operate with significantly different maintenance standards, with three parallel review processes and no common source of material.

The GAO identified 34 unresolved safety risks, including eight potentially catastrophic risks that have remained open for a median of 10 years. The V-22 has the oldest average age of unresolved catastrophic safety risks across the Navy’s aircraft inventory.

Fixes May Take a Decade

The Navy report indicated fixes won’t be complete until 2033-2034. Officials now say the fleet won’t return to unrestricted operations until 2026 – a year later than planned. The V-22 program plans to upgrade gearboxes with triple-melted steel, reducing inclusions by 90 percent.

Under current restrictions, overwater flights are prohibited unless within 30 minutes of a safe landing spot, severely limiting their use by the Navy and Marine Corps.

Subscriber+Members get exclusive access to expert-driven briefings on the top national security issues we face today. Gain access to save your virtual seat now.

Osprey's Unmatched Capabilities

The Osprey still offers a game-changing advantage for U.S. troops, despite its troubled past, according to its supporters.

As it currently stands, the entire fleet operates under restrictions that prevent overwater flights unless within 30 minutes of a safe landing spot, significantly limiting its utility for Navy and Marine Corps missions.

In 1979 to 1980, American hostages were taken in Iran during Operation Eagle Claw, which gave rise to the Osprey. As five of the eight Navy helicopters that arrived at Desert One were inoperable, it was clear that rapid troop movement in harsh environmental conditions was urgently needed.

After development began in 1985, the Osprey entered service in 2007, replacing the Vietnam-era CH-46 Sea Knight.

Compared to fixed-wing transports, the Osprey can land troops just where they are needed. Airdrops with parachutes tend to scatter paratroops all over the place; see ‘Saving Private Ryan,’” Pike explained. “And compared with other rotary wing aircraft, the Osprey is much faster and has a much longer range.”

The Osprey shifts from helicopter to airplane mode in under 12 seconds, reaches speeds of 315 mph, has an operational range of 580 miles, and carries 10,000 pounds – or 24 troops. It’s used for missions ranging from combat operations to the occasional transport of White House staff. During a dust storm in Afghanistan in 2010, two CV-22 helicopters rescued 32 soldiers in under four hours from a distance of 800 miles.

Chronic Readiness Problems

Yet these performance advantages have been undercut by persistent readiness shortfalls.

The NAVAIR report noted that mission-capable rates between 2020 and 2024 averaged just 50 percent for the Navy and Air Force, and 60 percent for Marines. The Osprey requires 100 percent more unscheduled maintenance than the Navy averages and 22 maintenance man-hours per flight hour versus 12 for other aircraft.

In addition, Boeing settled a whistleblower lawsuit in 2023 for $8.1 million after employees accused the company of falsifying records for composite part testing. Boeing, in its defense, claimed that the parts were “non-critical” and did not impact flight safety.

Conflicting Views on Safety

“The Osprey does not have a troubled safety record. Per a recent press release, the V-22 mishap rate per 100,000 flight hours is 3.28, which is in line with helicopters with similar missions.” a government source who works closely with the Osprey fleet but is not authorized to speak on the record contended to The Cipher Brief. “Like anything measured statistically, there are periods above and below the mean. Just because humans tend to conclude because of apparent clusters doesn’t necessarily mean there is a pattern or connection – think of how some people say that ‘celebrities die in threes.’”

The source vowed that “the design issues, such as certain electrical wiring rubbing against hydraulic and oil lines, were fixed before fleet introduction.”

“The problems with the test plan were a product of pressure applied to accelerate a delayed and overbudget program and were not repeated when the aircraft was reintroduced,” the insider pointed out. “Those mishaps, combined with the distinctive nature of the V-22, mean that any subsequent incident, major or minor, is always viewed as part of the ‘dangerous V-22’ narrative. A U.S. Army Blackhawk crash in November killed five but barely made the news. A Japanese Blackhawk crash killed ten soldiers in April, but the Japanese didn’t ground their Blackhawks.”

That perception, however, has done little to quiet families who argue that known risks went unaddressed.

Amber Sax’s husband, Marine Corps Capt. John J. Sax died in the 2022 California crash caused by hard clutch engagement, a problem the Marine Corps had known about for over a decade. “Their findings confirm what we already know: More needs to be done, and more needs to be done,” Sax said. “It’s clear in the report that these risks were not properly assessed, and that failure cost my husband his life.”

Sign up for the Cyber Initiatives Group Sunday newsletter, delivering expert-level insights on the cyber and tech stories of the day – directly to your inbox. Sign up for the CIG newsletter today.

An Uncertain Future

As the military confronts those findings, the future of the Osprey fleet is not completely clear. In 2018, the Marine Corps Aviation proposal outlined a sustainability plan for the Osprey to at least 2060.

“The quality of maintenance training curricula, maturation, and standardization has not kept pace with readiness requirements,” the report stated. “Current maintenance manning levels are unable to support demands for labor. The current V-22 sustainment system cannot realize improved and sustained aircraft readiness and availability without significant change. Depot-level maintenance cannot keep up with demand.”

Despite extensive recommendations – NAVAIR underscored 32 actions to improve safety – Vice Adm. John Dougherty reaffirmed commitment to the aircraft. Pike believes it’s a matter of when, not if, the Osprey returns to full operations.

“Once the issues are fixed, everyone will resume their regular programming,” he asserted.

Officials and insiders alike expect that process to translate into tangible fixes.

“I would expect that to lead to some type of corrective action, whether it’s a new procedure or replacing a defective part,” the insider added. “After that, I would expect a long career for the aircraft in the Marine Corps, Navy, and Air Force, as it’s an irreplaceable part of all three services now and gives a unique capability to the American military.”

Whether that optimism proves warranted depends on whether military leadership finally addresses the systemic failures the latest reports have laid bare – failures that cost 20 service members their lives in just the past five years.

The Cipher Brief is committed to publishing a range of perspectives on national security issues submitted by deeply experienced national security professionals. Opinions expressed are those of the author and do not represent the views or opinions of The Cipher Brief.

Have a perspective to share based on your experience in the national security field? Send it to Editor@thecipherbrief.com for publication consideration.

Read more expert-driven national security insights, perspective and analysis in The Cipher Brief because National Security is Everyone’s Business

Signals for 2026

We’re three years into a post-ChatGPT world, and AI remains the focal point of the tech industry. In 2025, several ongoing trends intensified: AI investment accelerated; enterprises integrated agents and workflow automation at a faster pace; and the toolscape for professionals seeking a career edge is now overwhelmingly expansive. But the jury’s still out on the ROI from the vast sums that have saturated the industry. 

We anticipate that 2026 will be a year of increased accountability. Expect enterprises to shift focus from experimentation to measurable business outcomes and sustainable AI costs. There are promising productivity and efficiency gains to be had in software engineering and development, operations, security, and product design, but significant challenges also persist.  

Bigger picture, the industry is still grappling with what AI is and where we’re headed. Is AI a worker that will take all our jobs? Is AGI imminent? Is the bubble about to burst? Economic uncertainty, layoffs, and shifting AI hiring expectations have undeniably created stark career anxiety throughout the industry. But as Tim O’Reilly pointedly argues, “AI is not taking jobs: The decisions of people deploying it are.” No one has quite figured out how to make money yet, but the organizations that succeed will do so by creating solutions that “genuinely improve. . .customers’ lives.” That won’t happen by shoehorning AI into existing workflows but by first determining where AI can actually improve upon them, then taking an “AI first” approach to developing products around these insights.

As Tim O’Reilly and Mike Loukides recently explained, “At O’Reilly, we don’t believe in predicting the future. But we do believe you can see signs of the future in the present.” We’re watching a number of “possible futures taking shape.” AI will undoubtedly be integrated more deeply into industries, products, and the wider workforce in 2026 as use cases continue to be discovered and shared. Topics we’re keeping tabs on include context engineering for building more reliable, performant AI systems; LLM posttraining techniques, in particular fine-tuning as a means to build more specialized, domain-specific models; the growth of agents, as well as the protocols, like MCP, to support them; and computer vision and multimodal AI more generally to enable the development of physical/embodied AI and the creation of world models. 

Here are some of the other trends that are pointing the way forward.

Software Development

In 2025, AI was embedded in software developers’ everyday work, transforming their roles—in some cases dramatically. A multitude of AI tools are now available to create code, and workflows are undergoing a transformation shaped by new concepts including vibe coding, agentic development, context engineering, eval- and spec-driven development, and more.

In 2026, we’ll see an increased focus on agents and the protocols, like MCP, that support them; new coding workflows; and the impact of AI on assisting with legacy code. But even as software development practices evolve, fundamental skills such as code review, design patterns, debugging, testing, and documentation are as vital as ever.

And despite major disruption from GenAI, programming languages aren’t going anywhere. Type-safe languages like TypeScript, Java, and C# provide compile-time validation that catches AI errors before production, helping mitigate the risks of AI-generated code. Memory safety mandates will drive interest in Rust and Zig for systems programming: Major players such as Google, Microsoft, Amazon, and Meta have adopted Rust for critical systems, and Zig is behind Anthropic’s most recent acquisition, Bun. And Python is central to creating powerful AI and machine learning frameworks, driving complex intelligent automation that extends far beyond simple scripting. It’s also ideal for edge computing and robotics, two areas where AI is likely to make inroads in the coming year.

Takeaways

Which AI tools programmers use matter less than how they use them. With a wide choice of tools now available in the IDE and on the command line, and new options being introduced all the time, it’s useful to focus on the skills needed to produce good code rather than focusing on the tool itself. After all, whatever tool they use, developers are ultimately responsible for the code it produces.

Effectively communicating with AI models is the key to doing good work. The more background AI tools are given about a project, the better the code they generate will be. Developers have to understand both how to manage what the AI knows about their project (context engineering) and how to communicate it (prompt engineering) to get useful outputs.

AI isn’t just a pair programmer; it’s an entire team of developers. Software engineers have moved beyond single coding assistants. They’re building and deploying custom agents, often within complex setups involving multi-agent scenarios, teams of coding agents, and agent swarms. But as the engineering workflow shifts from conducting AI to orchestrating AI, the fundamentals of building and maintaining good software—code review, design patterns, debugging, testing, and documentation—stay the same and will be what elevates purposeful AI-assisted code above the crowd.

Software Architecture

AI has progressed from being something architects might have to consider to something that is now essential to their work. They can use LLMs to accelerate or optimize architecture tasks; they can add AI to existing software systems or use it to modernize those systems; and they can design AI-native architectures, an approach that requires new considerations and patterns for system design. And even if they aren’t working with AI (yet), architects still need to understand how AI relates to other parts of their system and be able to communicate their decisions to stakeholders at all levels.

Takeaways

AI-enhanced and AI-native architectures bring new considerations and patterns for system design. Event-driven models can enable AI agents to act on incoming triggers rather than fixed prompts. In 2026, evolving architectures will become more important as architects look for ways to modernize existing systems for AI. And the rise of agentic AI means architects need to stay up-to-date on emerging protocols like MCP.

Many of the concerns from 2025 will carry over into the new year. Considerations such as incorporating LLMs and RAG into existing architectures, emerging architecture patterns and antipatterns specifically for AI systems, and the focus on API and data integrations elevated by MCP are critical.

The fundamentals still matter. Tools and frameworks are making it possible to automate more tasks. However, to successfully leverage these capabilities to design sustainable architecture, enterprise architects must have a full command of the principles behind them: when to add an agent or a microservice, how to consider cost, how to define boundaries, and how to act on the knowledge they already have.

Infrastructure and Operations

The InfraOps space is undergoing its most significant transformation since cloud computing, as AI evolves from a workload to be managed to an active participant in managing infrastructure itself. With infrastructure sprawling across multicloud environments, edge deployments, and specialized AI accelerators, manual management is becoming nearly impossible. In 2026, the industry will keep moving toward self-healing systems and predictive observability—infrastructure that continuously optimizes itself, shifting the human role from manual maintenance to system oversight, architecture, and long-term strategy.

Platform engineering makes this transformation operational, abstracting infrastructure complexity behind self-service interfaces, which lets developers deploy AI workloads, implement observability, and maintain security without deep infrastructure expertise. The best platforms will evolve into orchestration layers for autonomous systems. While fully autonomous systems remain on the horizon, the trajectory is clear.

Takeaways

AI is becoming a primary driver of infrastructure architecture. AI-native workloads demand GPU orchestration at scale, specialized networking protocols optimized for model training and inference, and frameworks like Ray on Kubernetes that can distribute compute intelligently. Organizations are redesigning infrastructure stacks to accommodate these demands and are increasingly considering hybrid environments and alternatives to hyperscalers to power their AI workloads—“neocloud” platforms like CoreWeave, Lambda, and Vultr.

AI is augmenting the work of operations teams with real-time intelligence. Organizations are turning to AIOps platforms to predict failures before they cascade, identify anomalies humans would miss, and surface optimization opportunities in telemetry data. These systems aim to amplify human judgment, giving operators superhuman pattern recognition across complex environments.

AI is evolving into an autonomous operator that makes its own infrastructure decisions. Companies will implement emerging “agentic SRE” practices: systems that reason about infrastructure problems, form hypotheses about root causes, and take independent corrective action, replicating the cognitive workload that SREs perform, not just following predetermined scripts.

Data

The big story of the back half of 2025 was agents. While the groundwork has been laid, in 2026 we expect focus on the development of agentic systems to persist—and this will necessitate new tools and techniques, particularly on the data side. AI and data platforms continue to converge, with vendors like Snowflake, Databricks, and Salesforce releasing products to help customers build and deploy agents. 

Beyond agents, AI is making its influence felt across the entire data stack, as data professionals target their workflows to support enterprise AI. Significant trends include real-time analytics, enhanced data privacy and security, and the increasing use of low-code/no-code tools to democratize data access. Sustainability also remains a concern, and data professionals need to consider ESG compliance, carbon-aware tooling, and resource-optimized architectures when designing for AI workloads.

Takeaways

Data infrastructure continues to consolidate. The consolidation trend has not only affected the modern data stack but also more traditional areas like the database space. In response, organizations are being more intentional about what kind of databases they deploy. At the same time, modern data stacks have fragmented across cloud platforms and open ecosystems, so engineers must increasingly design for interoperability. 

A multiple database approach is more important than ever. Vector databases like Pinecone, Milvus, Qdrant, and Weaviate help power agentic AI—while they’re a new technology, companies are beginning to adopt vector databases more widely. DuckDB’s popularity is growing for running analytical queries. And even though it’s been around for a while, ClickHouse, an open source distributed OLAP database used for real-time analytics, has finally broken through with data professionals.

The infrastructure to support autonomous agents is coming together. GitOps, observability, identity management, and zero-trust orchestration will all play key roles. And we’re following a number of new initiatives that facilitate agentic development, including AgentDB, a database designed specifically to work effectively with AI agents; Databricks’ recently announced Lakebase, a Postgres database/OLTP engine integrated within the data lakehouse; and Tiger Data’s Agentic Postgres, a database “designed from the ground up” to support agents.

Security

AI is a threat multiplier—59% of tech professionals cited AI-driven cyberthreats as their biggest concern in a recent survey. In response, the cybersecurity analyst role is shifting from low-level human-in-the-loop tasks to complex threat hunting, AI governance, advanced data analysis and coding, and human-AI teaming oversight. But addressing AI-generated threats will also require a fundamental transformation in defensive strategy and skill acquisition—and the sooner it happens, the better.

Takeaways

Security professionals now have to defend a broader attack surface. The proliferation of AI agents expands the attack surface. Security tools must evolve to protect it. Implementing zero trust for machine identities is a smart opening move to mitigate sprawl and nonhuman traffic. Security professionals must also harden their AI systems against common threats such as prompt injection and model manipulation.

Organizations are struggling with governance and compliance. Striking a balance between data utility and vulnerability requires adherence to data governance best practices (e.g., least privilege). Government agencies, industry and professional groups, and technology companies are developing a range of AI governance frameworks to help guide organizations, but it’s up to companies to translate these technical governance frameworks into board-level risk decisions and actionable policy controls.

The security operations center (SOC) is evolving. The velocity and scale of AI-driven attacks can overwhelm traditional SIEM/SOAR solutions. Expect increased adoption of agentic SOC—a system of specialized, coordinated AI agents for triage and response. This shifts the focus of the SOC analyst from reactive alert triage to proactive threat hunting, complex analysis, and AI system oversight.

Product Management and Design

Business focus in 2025 shifted from scattered AI experiments to the challenge of building defensible, AI-native businesses. Next year we’re likely to see product teams moving from proof of concept to proof of value

One thing to look for: Design and product responsibilities may consolidate under a “product builder”—a full stack generalist in product, design, and engineering who can rapidly build, validate, and launch new products. Companies are currently hiring for this role, although few people actually possess the full skill set at the moment. But regardless of whether product builders become ascendant, product folks in 2026 and beyond will need the ability to combine product validation, good-enough engineering, and rapid design, all enabled by AI as a core accelerator. We’re already seeing the “product manager” role becoming more technical as AI spreads throughout the product development process. Nearly all PMs use AI, but they’ll increasingly employ purpose-built AI workflows for research, user-testing, data analysis, and prototyping.

Takeaways

Companies need to bridge the AI product strategy gap. Most companies have moved past simple AI experiments but are now facing a strategic crisis. Their existing product playbooks (how to size markets, roadmapping, UX) weren’t designed for AI-native products. Organizations must develop clear frameworks for building a portfolio of differentiated AI products, managing new risks, and creating sustainable value. 

AI product evaluation is now mission-critical. As AI becomes a core product component and strategy matures, rigorous evaluation is the key to turning products that are good on paper into those that are great in production. Teams should start by defining what “good” means for their specific context, then build reliable evals for models, agents, and conversational UIs to ensure they’re hitting that target.

Design’s new frontier is conversations and interactions. Generative AI has pushed user experience beyond static screens into probabilistic new multimodal territory. This means a harder shift toward designing nonlinear, conversational systems, including AI agents. In 2026, we’re likely to see increased demand for AI conversational designers and AI interaction designers to devise conversation flows for chatbots and even design a model’s behavior and personality.

What It All Means

While big questions about AI remain unanswered, the best way to plan for uncertainty is to consider the real value you can create for your users and for your teams themselves right now. The tools will improve, as they always do, and the strategies to use them will grow more complex. Being deeply versed in the core knowledge of your area of expertise gives you the foundation you’ll need to take advantage of these quickly evolving technologies—and ensure that whatever you create will be built on bedrock, not shaky ground.

The Architect’s Dilemma

The agentic AI landscape is exploding. Every new framework, demo, and announcement promises to let your AI assistant book flights, query databases, and manage calendars. This rapid advancement of capabilities is thrilling for users, but for the architects and engineers building these systems, it poses a fundamental question: When should a new capability be a simple, predictable tool (exposed via the Model Context Protocol, MCP) and when should it be a sophisticated, collaborative agent (exposed via the Agent2Agent Protocol, A2A)?

The common advice is often circular and unhelpful: “Use MCP for tools and A2A for agents.” This is like telling a traveler that cars use motorways and trains use tracks, without offering any guidance on which is better for a specific journey. This lack of a clear mental model leads to architectural guesswork. Teams build complex conversational interfaces for tasks that demand rigid predictability, or they expose rigid APIs to users who desperately need guidance. The outcome is often the same: a system that looks great in demos but falls apart in the real world.

In this article, I argue that the answer isn’t found by analyzing your service’s internal logic or technology stack. It’s found by looking outward and asking a single, fundamental question: Who is calling your product/service? By reframing the problem this way—as a user experience challenge first and a technical one second—the architect’s dilemma evaporates.

This essay draws a line where it matters for architects: the line between MCP tools and A2A agents. I will introduce a clear framework, built around the “Vending Machine Versus Concierge” model, to help you choose the right interface based on your consumer’s needs. I will also explore failure modes, testing, and the powerful Gatekeeper Pattern that shows how these two interfaces can work together to create systems that are not just clever but truly reliable.

Two Very Different Interfaces

MCP presents tools—named operations with declared inputs and outputs. The caller (a person, program, or agent) must already know what it wants, and provide a complete payload. The tool validates, executes once, and returns a result. If your mental image is a vending machine—insert a well-formed request, get a deterministic response—you’re close enough.

A2A presents agents—goal-first collaborators that converse, plan, and act across turns. The caller expresses an outcome (“book a refundable flight under $450”), not an argument list. The agent asks clarifying questions, calls tools as needed, and holds onto session state until the job is done. If you picture a concierge—interacting, negotiating trade-offs, and occasionally escalating—you’re in the right neighborhood.

Neither interface is “better.” They are optimized for different situations:

  • MCP is fast to reason about, easy to test, and strong on determinism and auditability.
  • A2A is built for ambiguity, long-running processes, and preference capture.

Bringing the Interfaces to Life: A Booking Example

To see the difference in practice, let’s imagine a simple task: booking a specific meeting room in an office.

The MCP “vending machine” expects a perfectly structured, machine-readable request for its book_room_tool. The caller must provide all necessary information in a single, valid payload:

{
  "jsonrpc": "2.0",
  "id": 42,
  "method": "tools/call",
  "params": {
    "name": "book_room_tool",
    "arguments": {
      "room_id": "CR-104B",
      "start_time": "2025-11-05T14:00:00Z",
      "end_time": "2025-11-05T15:00:00Z",
      "organizer": "user@example.com"
    }
  }
}

Any deviation—a missing field or incorrect data type—results in an immediate error. This is the vending machine: You provide the exact code of the item you want (e.g., “D4”) or you get nothing.

The A2A “concierge, an “office assistant” agent, is approached with a high-level, ambiguous goal. It uses conversation to resolve ambiguity:

User: “Hey, can you book a room for my 1-on-1 with Alex tomorrow afternoon?”
Agent: “Of course. To make sure I get the right one, what time works best, and how long will you need it for?”

The agent’s job is to take the ambiguous goal, gather the necessary details, and then likely call the MCP tool behind the scenes once it has a complete, valid set of arguments.

With this clear dichotomy established—the predictable vending machine (MCP) versus the stateful concierge (A2A)—how do we choose? As I argued in the introduction, the answer isn’t found in your tech stack. It’s found by asking the most important architectural question of all: Who is calling your service?

Step 1: Identify your consumer

  1. The machine consumer: A need for predictability
    Is your service going to be called by another automated system, a script, or another agent acting in a purely deterministic capacity? This consumer requires absolute predictability. It needs a rigid, unambiguous contract that can be scripted and relied upon to behave the same way every single time. It cannot handle a clarifying question or an unexpected update; any deviation from the strict contract is a failure. This consumer doesn’t want a conversation; it needs a vending machine. This nonnegotiable requirement for a predictable, stateless, and transactional interface points directly to designing your service as a tool (MCP).
  2. The human (or agentic) consumer: A need for convenience
    Is your service being built for a human end user or for a sophisticated AI that’s trying to fulfill a complex, high-level goal? This consumer values convenience and the offloading of cognitive load. They don’t want to specify every step of a process; they want to delegate ownership of a goal and trust that it will be handled. They’re comfortable with ambiguity because they expect the service—the agent—to resolve it on their behalf. This consumer doesn’t want to follow a rigid script; they need a concierge. This requirement for a stateful, goal-oriented, and conversational interface points directly to designing your service as an agent (A2A).

By starting with the consumer, the architect’s dilemma often evaporates. Before you ever debate statefulness or determinism, you first define the user experience you are obligated to provide. In most cases, identifying your customer will give you your definitive answer.

Step 2: Validate with the four factors

Once you have identified who calls your service, you have a strong hypothesis for your design. A machine consumer points to a tool; a human or agentic consumer points to an agent. The next step is to validate this hypothesis with a technical litmus test. This framework gives you the vocabulary to justify your choice and ensure the underlying architecture matches the user experience you intend to create.

  1. Determinism versus ambiguity
    Does your service require a precise, unambiguous input, or is it designed to interpret and resolve ambiguous goals? A vending machine is deterministic. Its API is rigid: GET /item/D4. Any other request is an error. This is the world of MCP, where a strict schema ensures predictable interactions. A concierge handles ambiguity. “Find me a nice place for dinner” is a valid request that the agent is expected to clarify and execute. This is the world of A2A, where a conversational flow allows for clarification and negotiation.
  2. Simple execution versus complex process
    Is the interaction a single, one-shot execution, or a long-running, multistep process? A vending machine performs a short-lived execution. The entire operation—from payment to dispensing—is an atomic transaction that is over in seconds. This aligns with the synchronous-style, one-shot model of MCP. A concierge manages a process. Booking a full travel itinerary might take hours or even days, with multiple updates along the way. This requires the asynchronous, stateful nature of A2A, which can handle long-running tasks gracefully.
  3. Stateless versus stateful
    Does each request stand alone or does the service need to remember the context of previous interactions? A vending machine is stateless. It doesn’t remember that you bought a candy bar five minutes ago. Each transaction is a blank slate. MCP is designed for these self-contained, stateless calls. A concierge is stateful. It remembers your preferences, the details of your ongoing request, and the history of your conversation. A2A is built for this, using concepts like a session or thread ID to maintain context.
  4. Direct control versus delegated ownership
    Is the consumer orchestrating every step, or are they delegating the entire goal? When using a vending machine, the consumer is in direct control. You are the orchestrator, deciding which button to press and when. With MCP, the calling application retains full control, making a series of precise function calls to achieve its own goal. With a concierge, you delegate ownership. You hand over the high-level goal and trust the agent to manage the details. This is the core model of A2A, where the consumer offloads the cognitive load and trusts the agent to deliver the outcome.
FactorTool (MCP)Agent (A2A)Key question
DeterminismStrict schema; errors on deviationClarifies ambiguity via dialogueCan inputs be fully specified up front?
ProcessOne-shotMulti-step/long-runningIs this atomic or a workflow?
StateStatelessStateful/sessionfulMust we remember context/preferences?
ControlCaller orchestratesOwnership delegatedWho drives: the caller or callee?

Table 1: Four question framework

These factors are not independent checkboxes; they are four facets of the same core principle. A service that is deterministic, transactional, stateless, and directly controlled is a tool. A service that handles ambiguity, manages a process, maintains state, and takes ownership is an agent. By using this framework, you can confidently validate that the technical architecture of your service aligns perfectly with the needs of your customer.

No framework, no matter how clear…

…can perfectly capture the messiness of the real world. While the “Vending Machine Versus Concierge” model provides a robust guide, architects will eventually encounter services that seem to blur the lines. The key is to remember the core principle we’ve established: The choice is dictated by the consumer’s experience, not the service’s internal complexity.

Let’s explore two common edge cases.

The complex tool: The iceberg
Consider a service that performs a highly complex, multistep internal process, like a video transcoding API. A consumer sends a video file and a desired output format. This is a simple, predictable request. But internally, this one call might kick off a massive, long-running workflow involving multiple machines, quality checks, and encoding steps. It’s a hugely complex process.

However, from the consumer’s perspective, none of that matters. They made a single, stateless, fire-and-forget call. They don’t need to manage the process; they just need a predictable result. This service is like an iceberg: 90% of its complexity is hidden beneath the surface. But because its external contract is that of a vending machine—a simple, deterministic, one-shot transaction—it is, and should be, implemented as a tool (MCP).

The simple agent: The scripted conversation
Now consider the opposite: a service with very simple internal logic that still requires a conversational interface. Imagine a chatbot for booking a dentist appointment. The internal logic might be a simple state machine: ask for a date, then a time, then a patient name. It’s not “intelligent” or particularly flexible.

However, it must remember the user’s previous answers to complete the booking. It’s an inherently stateful, multiturn interaction. The consumer cannot provide all the required information in a single, prevalidated call. They need to be guided through the process. Despite its internal simplicity, the need for a stateful dialogue makes it a concierge. It must be implemented as an agent (A2A) because its consumer-facing experience is that of a conversation, however scripted.

These gray areas reinforce the framework’s central lesson. Don’t get distracted by what your service does internally. Focus on the experience it provides externally. That contract with your customer is the ultimate arbiter in the architect’s dilemma.

Testing What Matters: Different Strategies for Different Interfaces

A service’s interface doesn’t just dictate its design; it dictates how you validate its correctness. Vending machines and concierges have fundamentally different failure modes and require different testing strategies.

Testing MCP tools (vending machines):

  • Contract testing: Validate that inputs and outputs strictly adhere to the defined schema.
  • Idempotency tests: Ensure that calling the tool multiple times with the same inputs produces the same result without side effects.
  • Deterministic logic tests: Use standard unit and integration tests with fixed inputs and expected outputs.
  • Adversarial fuzzing: Test for security vulnerabilities by providing malformed or unexpected arguments.

Testing A2A agents (concierges):

  • Goal completion rate (GCR): Measure the percentage of conversations where the agent successfully achieved the user’s high-level goal.
  • Conversational efficiency: Track the number of turns or clarifications required to complete a task.
  • Tool selection accuracy: For complex agents, verify that the right MCP tool was chosen for a given user request.
  • Conversation replay testing: Use logs of real user interactions as a regression suite to ensure updates don’t break existing conversational flows.

The Gatekeeper Pattern

Our journey so far has focused on a dichotomy: MCP or A2A, vending machine or concierge. But the most sophisticated and robust agentic systems do not force a choice. Instead, they recognize that these two protocols don’t compete with each other; they complement each other. The ultimate power lies in using them together, with each playing to its strengths.

The most effective way to achieve this is through a powerful architectural choice we can call the Gatekeeper Pattern.

In this pattern, a single, stateful A2A agent acts as the primary, user-facing entry point—the concierge. Behind this gatekeeper sits a collection of discrete, stateless MCP tools—the vending machines. The A2A agent takes on the complex, messy work of understanding a high-level goal, managing the conversation, and maintaining state. It then acts as an intelligent orchestrator, making precise, one-shot calls to the appropriate MCP tools to execute specific tasks.

Consider a travel agent. A user interacts with it via A2A, giving it a high-level goal: “Plan a business trip to London for next week.”

  • The travel agent (A2A) accepts this ambiguous request and starts a conversation to gather details (exact dates, budget, etc.).
  • Once it has the necessary information, it calls a flight_search_tool (MCP) with precise arguments like origin, destination, and date.
  • It then calls a hotel_booking_tool (MCP) with the required city, check_in_date, and room_type.
  • Finally, it might call a currency_converter_tool (MCP) to provide expense estimates.

Each tool is a simple, reliable, and stateless vending machine. The A2A agent is the smart concierge that knows which buttons to press and in what order. This pattern provides several significant architectural benefits:

  • Decoupling: It separates the complex, conversational logic (the “how”) from the simple, reusable business logic (the “what”). The tools can be developed, tested, and maintained independently.
  • Centralized governance: The A2A gatekeeper is the perfect place to implement cross-cutting concerns. It can handle authentication, enforce rate limits, manage user quotas, and log all activity before a single tool is ever invoked.
  • Simplified tool design: Because the tools are just simple MCP functions, they don’t need to worry about state or conversational context. Their job is to do one thing and do it well, making them incredibly robust.

Making the Gatekeeper Production-Ready

Beyond its design benefits, the Gatekeeper Pattern is the ideal place to implement the operational guardrails required to run a reliable agentic system in production.

  • Observability: Each A2A conversation generates a unique trace ID. This ID must be propagated to every downstream MCP tool call, allowing you to trace a single user request across the entire system. Structured logs for tool inputs and outputs (with PII redacted) are critical for debugging.
  • Guardrails and security: The A2A Gatekeeper acts as a single point of enforcement for critical policies. It handles authentication and authorization for the user, enforces rate limits and usage quotas, and can maintain a list of which tools a particular user or group is allowed to call.
  • Resilience and fallbacks: The Gatekeeper must gracefully manage failure. When it calls an MCP tool, it should implement patterns like timeouts, retries with exponential backoff, and circuit breakers. Critically, it is responsible for the final failure state—escalating to a human in the loop for review or clearly communicating the issue to the end user.

The Gatekeeper Pattern is the ultimate synthesis of our framework. It uses A2A for what it does best—managing a stateful, goal-oriented process—and MCP for what it was designed for—the reliable, deterministic execution of a task.

Conclusion

We began this journey with a simple but frustrating problem: the architect’s dilemma. Faced with the circular advice that “MCP is for tools and A2A is for agents,” we were left in the same position as a traveler trying to get to Edinburgh—knowing that cars use motorways and trains use tracks but with no intuition on which to choose for our specific journey.

The goal was to build that intuition. We did this not by accepting abstract labels, but by reasoning from first principles. We dissected the protocols themselves, revealing how their core mechanics inevitably lead to two distinct service profiles: the predictable, one-shot “vending machine” and the stateful, conversational “concierge.”

With that foundation, we established a clear, two-step framework for a confident design choice:

  1. Start with your customer. The most critical question is not a technical one but an experiential one. A machine consumer needs the predictability of a vending machine (MCP). A human or agentic consumer needs the convenience of a concierge (A2A).
  2. Validate with the four factors. Use the litmus test of determinism, process, state, and ownership to technically justify and solidify your choice.

Ultimately, the most robust systems will synthesize both, using the Gatekeeper Pattern to combine the strengths of a user-facing A2A agent with a suite of reliable MCP tools.

The choice is no longer a dilemma. By focusing on the consumer’s needs and understanding the fundamental nature of the protocols, architects can move from confusion to confidence, designing agentic ecosystems that are not just functional but also intuitive, scalable, and maintainable.

❌