Normal view

There are new articles available, click to refresh the page.
Before yesterdayMain stream

The Supreme Court’s dangerous double standard on independent agencies

23 January 2026 at 16:35

The Supreme Court appears poised to deliver a contradictory message to the American people: Some independent agencies deserve protection from presidential whim, while others do not. The logic is troubling, the implications profound and the damage to our civil service system could be irreparable.

In December, during oral arguments in Trump v. Slaughter, the court’s conservative majority signaled it would likely overturn or severely weaken Humphrey’s Executor v. United States, the 90-year-old precedent protecting independent agencies like the Federal Trade Commission from at-will presidential removal. Chief Justice John Roberts dismissed Humphrey’s Executor as “just a dried husk,” suggesting the FTC’s powers justify unlimited presidential control. Yet just weeks later, during arguments in Trump v. Cook, those same justices expressed grave concerns about protecting the “independence” of the Federal Reserve, calling it “a uniquely structured, quasi-private entity” deserving special constitutional consideration.

The message is clear: Wall Street’s interests warrant protection, but the rights of federal workers do not.

The MSPB: Guardian of civil service protections

This double standard becomes even more glaring when we consider Harris v. Bessent, where the D.C. Circuit Court of Appeals ruled in December 2025 that President Donald Trump could lawfully remove Merit Systems Protection Board Chairwoman Cathy Harris without cause. The MSPB is not some obscure bureaucratic backwater — it is the cornerstone of our merit-based civil service system, the institution that stands between federal workers and a return to the spoils system that once plagued American government with cronyism, inefficiency and partisan pay-to-play services.

The MSPB hears appeals from federal employees facing adverse actions including terminations, demotions and suspensions. It adjudicates claims of whistleblower retaliation, prohibited personnel practices and discrimination. In my and Harris’ tenure alone, the MSPB resolved thousands of cases protecting federal workers from arbitrary and unlawful treatment. In fact, we eliminated the nearly 4,000 backlogged appeals from the prior Trump administration due to a five-year lack of quorum. These are not abstract policy debates — these are cases about whether career professionals can be fired for refusing to break the law, for reporting waste and fraud or simply for holding the “wrong” political views.

The MSPB’s quasi-judicial function is precisely what Humphrey’s Executor was designed to protect. This is what Congress intended to follow in 1978 when it created the MSPB in order to strengthen the civil service workforce from the government weaponization under the Nixon administration. The 1935 Supreme Court recognized that certain agencies must be insulated from political pressure to function properly — agencies that adjudicate disputes, that apply law to fact, that require expertise and impartiality rather than ideological alignment with whoever currently occupies the White House. Why would today’s Supreme Court throw out that noble and constitutionally oriented mandate?

A specious distinction

The Supreme Court’s apparent willingness to treat the Federal Reserve as “special” while abandoning agencies like the MSPB rests on a distinction without a meaningful constitutional difference. Yes, the Federal Reserve sets monetary policy with profound economic consequences. But the MSPB’s work is no less vital to the functioning of our democracy.

Consider what happens when the MSPB loses its independence. Federal employees adjudicating veterans’ benefits claims, processing Social Security applications, inspecting food safety or enforcing environmental protections suddenly serve at the pleasure of the president. Career experts can be replaced by political loyalists. Decisions that should be based on law and evidence become subject to political calculation. The entire civil service — the apparatus that delivers services to millions of Americans — becomes a partisan weapon to be wielded by whichever party controls the White House.

This is not hypothetical. We have seen this movie before. The spoils system of the 19th century produced rampant corruption, incompetence and the wholesale replacement of experienced government workers after each election. The Pendleton Act of 1883 and subsequent civil service reforms were not partisan projects — they were recognition that effective governance requires a professional, merit-based workforce insulated from political pressure.

The real stakes

The Supreme Court’s willingness to carve out special protection for the Federal Reserve while abandoning the MSPB reveals a troubling hierarchy of values. Financial markets deserve stability and independence, but should the American public tolerate receiving partisan-based government services and protections?

Protecting the civil service is not some narrow special interest. It affects every American who depends on government services. It determines whether the Occupational Safety and Health Administration (OSHA) inspectors can enforce workplace safety rules without fear of being fired for citing politically connected companies. Whether Environmental Protection Agency scientists can publish findings inconvenient to the administration. Whether veterans’ benefits claims are decided on merit rather than political favor. Whether independent and oversight federal organizations can investigate law enforcement shootings in Minnesota without political interference.

Justice Brett Kavanaugh, during the Cook arguments, warned that allowing presidents to easily fire Federal Reserve governors based on “trivial or inconsequential or old allegations difficult to disprove” would “weaken if not shatter” the Fed’s independence. He’s right. But that logic applies with equal force to the MSPB. If presidents can fire MSPB members at will, they can install loyalists who will rubber-stamp politically motivated personnel actions, creating a chilling effect throughout the civil service.

What’s next

The Supreme Court has an opportunity to apply its principles consistently. If the Federal Reserve deserves independence to insulate monetary policy from short-term political pressure, then the MSPB deserves independence to insulate personnel decisions from political retaliation. If “for cause” removal protections serve an important constitutional function for financial regulators, they serve an equally important function for the guardians of civil service protections.

The court should reject the false distinction between agencies that protect Wall Street and agencies that protect workers. Both serve vital public functions. Both require independence to function properly. Both should be subject to the same constitutional analysis.

More fundamentally, the court must recognize that its removal cases are not merely abstract exercises in constitutional theory. They determine whether we will have a professional civil service or return to a patronage system. Whether government will be staffed by experts or political operatives. Whether the rule of law or the whim of the president will govern federal employment decisions.

A strong civil service is just as important to American democracy as an independent Federal Reserve. Both protect against the concentration of power. Both ensure that critical governmental functions are performed with expertise and integrity rather than political calculation. The Supreme Court’s jurisprudence should reflect that basic truth, not create an arbitrary hierarchy that privileges financial interests over the rights of workers and the integrity of government.

The court will issue its decisions over the next several months and when it does, it should remember that protecting democratic institutions is not a selective enterprise. The rule of law requires principles, not preferences. Because in the end, a government run on political loyalty instead of merit is far more dangerous than a fluctuating interest rate.

Raymond Limon retired after more than 30 years of federal service in 2025. He served in leadership roles at the Office of Personnel Management and the State Department and was the vice chairman of the Merit Systems Protections Board. He is now founder of Merit Services Advocates.

The post The Supreme Court’s dangerous double standard on independent agencies first appeared on Federal News Network.

© AP Photo/Julia Demaree Nikhinson

The Supreme Court is seen during oral arguments over state laws barring transgender girls and women from playing on school athletic teams, Tuesday, Jan. 13, 2026, in Washington. (AP Photo/Julia Demaree Nikhinson)

The Human Behind the Door

23 January 2026 at 07:14
The following article originally appeared on Mike Amundsen’s Substack Signals from Our Futures Past and is being republished here with the author’s permission.

There’s an old hotel on a windy corner in Chicago where the front doors shine like brass mirrors. Each morning, before guests even reach the step, a tall man in a gray coat swings one open with quiet precision. He greets them by name, gestures toward the elevator, and somehow makes every traveler feel like a regular. To a cost consultant, he is a line item. To the guests, he is part of the building’s atmosphere.

When management installed automatic doors a few years ago, the entrance became quieter and cheaper, but not better. Guests no longer lingered to chat, taxis stopped less often, and the lobby felt colder. The automation improved the hotel’s bottom line but not its character.

This story captures what British advertising executive Rory Sutherland calls “The Doorman Fallacy,” the habit of mistaking visible tasks for the entirety of a role. In this short video explanation, Sutherland points out that a doorman does more than open doors. He represents safety, care, and ceremony. His presence changes how people feel about a place. Remove him, and you save money but lose meaning.

The Lesson Behind the Metaphor

Sutherland expanded on the idea in his 2019 book Alchemy, arguing that logic alone can lead organizations astray. We typically undervalue the intangible parts of human work because they do not fit neatly into a spreadsheet. For example, the doorman seems redundant only if you assume his job is merely mechanical. In truth, he performs a social and symbolic function. He welcomes guests, conveys prestige, and creates a sense of safety.

Of course, this lesson extends well beyond hotels. In business after business, human behavior is treated as inefficiency. The result is thinner experiences, shallower relationships, and systems that look streamlined on paper but feel hollow in practice.

The Doorman in the Age of AI

In a recent article for The Conversation, Gediminas Lipnickas of the University of South Australia argues that many companies are repeating the same mistake with artificial intelligence. He warns people against the tendency to replace people because technology can imitate their simplest tasks while ignoring the judgment, empathy, and adaptability that define the job.

Lipnickas offers two examples.

The Commonwealth Bank of Australia laid off 45 customer service agents after rolling out a voice bot, then reversed the decision when it realized the employees were not redundant. They were context interpreters, not just phone operators.

Taco Bell introduced AI voice ordering at drive-throughs to speed up service, but customers complained of errors, confusion, and surreal exchanges with synthetic voices. The company paused the rollout and conceded that human improvisation worked better, especially during busy periods.

Both cases reveal the same pattern: Automation succeeds technically but fails experientially. It is the digital version of installing an automatic door and wondering why the lobby feels empty.

Measuring the Wrong Thing

The doorman fallacy persists because organizations keep measuring only what is visible. Performance dashboards reward tidy numbers, calls answered, tickets closed, customer contacts avoided, because they are easy to track. But they miss the essence of the work: problem-solving, reassurance, and quiet support.

When we optimize for visible throughput instead of invisible value, we teach everyone to chase efficiency at the expense of meaning. A skilled agent does not just resolve a complaint; they interpret tone and calm frustration. A nurse does not merely record vitals; they notice hesitation that no sensor can catch. A line cook does not just fill orders; they maintain the rhythm of a kitchen.

The answer is not to stop measuring; it is to do a better job of measuring. Key results should focus on interaction, problem-solving, and support, not just volume and speed. Otherwise, we risk automating away the very parts of work that make it valuable.

Efficiency versus empathy

Sutherland’s insight and Lipnickas’s warning meet at the same point: When efficiency ignores empathy, systems break down. Automation works well for bounded, rule-based tasks such as data entry, image processing, or predictive maintenance. But as soon as creativity, empathy, and creative problem-solving enter the picture, humans remain indispensable.

What looks like inefficiency on paper is often resilience in practice. A doorman who pauses to chat with a regular guest may appear unproductive, yet that moment strengthens loyalty and reputation in ways no metric can show.

Coaching, not replacing

That is why my own work has focused on using AI as a coach or mentor, not as a worker. A well-designed AI coach can prompt reflection, offer structure, and accelerate learning, but it still relies on human curiosity to drive the process. The machine can surface possibilities, but only the person can decide what matters.

When I design an AI coach, I think of it as a partner in thought, closer to Douglas Engelbart’s idea of human-computer partnership than to a substitute employee. The coach asks questions, provides scaffolding, and amplifies creativity. It does not replace the messy, interpretive work that defines human intelligence.

A More Human Kind of Intelligence

The deeper lesson of the doorman fallacy is that intelligence is not a property of isolated systems but of relationships. The doorman’s value emerges in the interplay between person and place, gesture and response. The same is true for AI. Detached from human context, it becomes thin and mechanical. Driven by human purpose, it becomes powerful and humane.

Every generation of innovation faces this tension. The industrial revolution promised to free us from labor but often stripped away craftsmanship. The digital revolution promises connection but frequently delivers distraction. Now the AI revolution promises efficiency, but unless we are careful, it may erode the very qualities that make work worth doing.

As we rush to install the next generation of technological “automatic doors,” let us remember the person who once stood beside them. Not out of nostalgia but because the future belongs to those who still know how to welcome others in.

You can find out just how Mike uses AI as an assistant by joining him on February 11 on the O’Reilly learning platform for his live course AI-Driven API Design. He’ll take you through integrating AI-assisted automation into human-driven API design and leveraging AI tools like ChatGPT to optimize the design, documentation, and testing of web APIs. It’s free for O’Reilly members; register here.

Not a member?
Sign up for a 10-day free trial before the event to attend—and explore all the other resources on O’Reilly.

Navigating insurance, maintaining careers and making smart money moves as a Gen Z military family

22 January 2026 at 15:38

For Gen Z military families, navigating life in their early-to-mid 20s means wading their way through unique challenges that can get overwhelming pretty quickly. Between frequent relocations, long deployments, unpredictable life schedules and limited early-career earnings, financial planning is more than a good idea — it’s essential for long-term stability.

According to the Congressional Research Service, 40% of active-duty military personnel are age 25 or younger, right within the Gen Z age group. Yet these same service members face the brunt of frequent moves, deployments and today’s rising cost of living.

This guide is designed specifically for Gen Z service members and their spouses, helping them understand their financial situations, insurance options, avoid common financial pitfalls and build stable careers, all while dealing with the real-world pressures of military life.

Financial pressures Gen Z military families face

While budgeting, insurance and retirement planning are critical, it’s also important to get a real sense of the actual financial stressors younger military families are grappling with:

  • Living paycheck to paycheck. Even with basic allowance for housing and basic allowance for subsistence, many junior enlisted families still find it hard to keep up with rising living costs. This becomes even more of a precarious situation when you add in dependents.
  • Delayed reimbursements during permanent change of station (PCS) moves, creating short-term cash crunches.
  • Limited emergency savings. The Military Family Advisory Network’s (MFAN) 2023 survey found 22.2% of military families had less than $500 in savings.
  • Predatory lending, with high-interest auto or payday lenders near bases disproportionately targeting young servicemembers.
  • Military spouse underemployment, leaving household income vulnerable when frequent moves disrupt career continuity.

MFAN also found that nearly 80% of respondents spend more on housing than they can comfortably afford, and 57% experienced a financial emergency in the past two years. These aren’t abstract concerns that most young servicemembers and their families can just ignore, hoping that they’ll never be impacted; these are everyday realities for Gen Z military families.

Insurance best practices

Adult life is just getting started in your 20s, and navigating insurance options can feel overwhelming. But taking the time to learn your choices will set your family up for a secure financial future.

  • Life insurance: Most servicemembers are automatically enrolled in Service Members’ Group Life Insurance (SGLI), with Family Servicemembers’ Group Life Insurance (FSGLI) extending coverage to spouses and children. Review coverage annually. Also, compare options across SGLI, FSGLI and trusted military nonprofits to find what fits your family best.
  • Disability insurance: Often overlooked, this protects your family if an injury prevents you from working, even off-duty. Supplemental private coverage can be wise if your lifestyle expenses exceed your military pay.
  • Renters insurance: Essential for families who move often; it protects your belongings through relocations.
  • Healthcare: TRICARE provides strong coverage, but learn the details on copays and referrals, especially when stationed overseas.

Common financial missteps and how to fix them

Mistake #1: Overbudgeting and lack of budgeting

BAH and BAS are designed to offset housing and food costs, not fund lifestyle inflation. Stick to a budget that keeps fixed expenses well below your income. Free tools from Military OneSource can help track spending.

Mistake #2: Not saving for retirement

Retirement may feel far away, but starting early has an outsized impact. Contribute at least 5% to your Thrift Savings Plan (TSP) — a military contribution retirement program similar to that of a 401k — to secure the full Defense Department match. Even small contributions now can grow into hundreds of thousands later.

Mistake #3: Misusing credit or loans

Predatory lenders near bases often target young servicemembers. Try to avoid any predatory or misleading lenders. Instead, consider a secured credit card or an on-base credit union to build credit responsibly. Always be sure to pay your balance in full.

Mistake #4: Skipping an emergency fund

PCS moves, car repairs or medical costs can’t always be predicted. Start small: Even $10 to $20 per week automatically transferred to savings helps to build a safety net. According to MFAN’s 2023 survey, enlisted families with children that have undergone recent PCS moves are most likely to face financial hardship, making an emergency cushion critical.

In addition to avoiding pitfalls, here are realistic strategies to strengthen your finances:

  • Tap military relief organizations like Army Emergency Relief (AER) or Navy-Marine Corps Relief Society (NMCRS) for interest-free loans or grants during emergencies.
  • Plan for post-military life: Keep in mind that SGLI and other benefits change once you leave active duty. Compare nonprofit alternatives early to avoid gaps.
  • Leverage nonprofits you can trust: Some offer competitive life insurance, savings products or financial counseling designed for servicemembers’ long-term interests.
  • Budget with inflation in mind: Rising costs are hitting Gen Z hard. Nearly 48% say they don’t feel financially secure, and over 40% say they’re struggling to make ends meet. Prioritize life’s essentials and be realistic about what you can afford outside of them.

Maintaining a career as a military spouse

Frequent relocations are undoubtedly disruptive, but they don’t have to end career growth. Military spouses may want to focus on careers that can easily move around with them, like healthcare, education, IT or freelancing.

Take advantage of programs like MyCAA, which offers $4,000 in tuition assistance for career training; Military OneSource, which offers resume assistance, free career coaching and financial counseling; and Hiring Our Heroes, which offers networking opportunities and job placement assistance for military spouses. These programs can help reduce underemployment and strengthen household stability, especially during tempestuous times like during and after a PCS move.

Putting it all together

Starting adulthood, a military career and a family all at once is an incredibly challenging undertaking. The financial pressures are real, but with the right knowledge and proactive steps, Gen Z military families can turn instability and uncertainty into long-term security.

By understanding insurance options, making smart money moves, tapping into military-specific resources and planning ahead for life after service, families can not only weather the unpredictability of military life, but also build strong financial foundations for the future.

Alejandra Cortes-Camargo is a brand marketing coordinator at Armed Forces Mutual.

The post Navigating insurance, maintaining careers and making smart money moves as a Gen Z military family first appeared on Federal News Network.

© Getty Images/wichayada suwanachun

A senior couple working together on financial planning, using documents and a calculator to manage family finances.

AI in the Office

22 January 2026 at 07:12

My father spent his career as an accountant for a major public utility. He didn’t talk about work much; when he engaged in shop talk, it was generally with other public utility accountants, and incomprehensible to those who weren’t. But I remember one story from work, and that story is relevant to our current engagement with AI.

He told me one evening about a problem at work. This was the late 1960s or early 1970s, and computers were relatively new. The operations division (the one that sends out trucks to fix things on poles) had acquired a number of “computerized” systems for analyzing engines—no doubt an early version of what your auto repair shop uses all the time. (And no doubt much larger and more expensive.) There was a question of how to account for these machines: Are they computing equipment? Or are they truck maintenance equipment? And it had turned into a kind of turf war between the operations people and the people we’d now call IT. (My father’s job was less about adding up long columns of figures than about making rulings on accounting policy issues like this; I used to call it “philosophy of accounting,” with my tongue not entirely in my cheek.)

My immediate thought was that this was a simple problem. The operations people probably want this to be considered computer equipment to keep it off their budget; nobody wants to overspend their budget. And the computing people probably don’t want all this extra equipment dumped onto their budget. It turned out that was exactly wrong. Politics is all about control, and the computer group wanted control of these strange machines with new capabilities. Did operations know how to maintain them? In the late ’60s, it’s likely that these machines were relatively fragile and contained components like vacuum tubes. Likewise, the operations group really didn’t want the computer group controlling how many of these machines they could buy and where to place them; the computer people would probably find something more fun to do with their money, like leasing a bigger mainframe, and leaving operations without the new technology. In the 1970s, computers were for getting the bills out, not mobilizing trucks to fix downed lines.

I don’t know how my father’s problem was resolved, but I do know how that relates to AI. We’ve all seen that AI is good at a lot of things—writing software, writing poems, doing research—we all know the stories. Human language may yet become a very-high-level, the highest-possible-level, programming language—the abstraction to end all abstractions. It may allow us to reach the holy grail: telling computers what we want them to do, not how (step-by-step) to do it. But there’s another part of enterprise programming, and that’s deciding what we want computers to do. That involves taking into account business practices, which are rarely as uniform as we’d like to think; hundreds of cross-cutting and possibly contradictory regulations; company culture; and even office politics. The best software in the world won’t be used, or will be used badly, if it doesn’t fit into its environment.

Politics? Yes, and that’s where my father’s story is important. The conflict between operations and computing was politics: power and control in the context of the dizzying regulations and standards that govern accounting at a public utility. One group stood to gain control; the other stood to lose it; and the regulators were standing by to make sure everything was done properly. It’s naive of software developers to think that’s somehow changed in the past 50 or 60 years, that somehow there’s a “right” solution that doesn’t take into account politics, cultural factors, regulation, and more.

Let’s look (briefly) at another situation. When I learned about domain-driven design (DDD), I was shocked to hear that a company could easily have a dozen or more different definitions of a “sale.” Sale? That’s simple. But to an accountant, it means entries in a ledger; to the warehouse, it means moving items from stock onto a truck, arranging for delivery, and recording the change in stocking levels; to sales, a “sale” means a certain kind of event that might even be hypothetical: something with a 75% chance of happening. Is it the programmer’s job to rationalize this, to say “let’s be adults, ‘sale’ can only mean one thing”? No, it isn’t. It is a software architect’s job to understand all the facets of a “sale” and find the best way (or, in Neal Ford and Mark Richards’s words, the “least worst way”) to satisfy the customer. Who is using the software, how are they using it, and how are they expecting it to behave?

Powerful as AI is, thought like this is beyond its capabilities. It might be possible with more “embodied” AI: AI that was capable of sensing and tracking its surroundings, AI that was capable of interviewing people, deciding who to interview, parsing the office politics and culture, and managing the conflicts and ambiguities. It’s clear that, at the level of code generation, AI is much more capable of dealing with ambiguity and incomplete instructions than earlier tools. You can tell Claude “Just write me a simple parser for this document type, I don’t care how you do it.” But it’s not yet capable of working with the ambiguity that’s part of any human office. It isn’t capable of making a reasoned decision about whether these new devices are computers or truck maintenance equipment.

How long will it be before AI can make decisions like those? How long before it can reason about fundamentally ambiguous situations and come up with the “least worst” solution? We will see.

Governing the future: A strategic framework for federal HR IT modernization

21 January 2026 at 15:27

The federal government is preparing to undertake one of the most ambitious IT transformations in decades: Modernizing and unifying human resources information technology across agencies. The technology itself is not the greatest challenge. Instead, success will hinge on the government’s ability to establish an effective, authoritative and disciplined governance structure capable of making informed, timely and sometimes difficult decisions.

The central tension is clear: Agencies legitimately need flexibility to execute mission-specific processes, yet the government must reduce fragmentation, redundancy and cost by standardizing and adopting commercial best practices. Historically, each agency has evolved idiosyncratic HR processes — even for identical functions — resulting in one of the most complex HR ecosystems in the world.

We need a governance framework that can break this cycle. It has to be a structured requirements-evaluation process, a systematic approach to modernizing outdated statutory constraints, and a rigorous mechanism to prevent “corner cases” from derailing modernization. The framework is based on a three-tiered governance structure to enable accountability, enforce standards, manage risk and accelerate decision making.

The governance imperative in HR IT modernization

Modernizing HR IT across the federal government requires rethinking more than just systems — it requires rethinking decision making. Technology will only succeed if governance promotes standardization, manages statutory and regulatory constraints intelligently, and prevents scope creep driven by individual agency preferences.

Absent strong governance, modernization will devolve into a high-cost, multi-point, agency-to-vendor negotiation where each agency advocates for its “unique” variations. Commercial vendors, who find arguing with or disappointing their customers to be fruitless and counterproductive, will ultimately optimize toward additional scope, higher complexity and extended timelines — that is, unless the government owns the decision framework.

Why governance is the central challenge

The root causes of this central challenge are structural. Agencies with different missions evolved different HR processes — even for identical tasks such as onboarding, payroll events or personnel actions. Many “requirements” cited today are actually legacy practices, outdated rules or agency preferences. And statutes and regulations are often more flexible than assumed, but in order to avoid any risk of perceived noncompliance or litigation.

Without centralized authority, modernization will replicate fragmentation in a new system rather than reduce it. Governance must therefore act as the strategic filter that determines what is truly required, what can be standardized and what needs legislative or policy reform.

A two-dimensional requirements evaluation framework

Regardless of the rigor associated with the requirements outlined at the outset of the program, implementers will encounter seemingly unique or unaccounted for “requirements” that appear to be critical to agencies as they begin seriously planning for implementation. Any federal HR modernization effort must implement a consistent, transparent and rigorous method for evaluating these new or additional requirements. The framework should classify every proposed “need” across two dimensions:

  • Applicability (breadth): Is this need specific to a single agency, a cluster of agencies, or the whole of government?
  • Codification (rigidity): Is the need explicitly required by law/regulation, or is it merely a policy preference or tradition?

This line of thinking leads to a decision matrix of sorts. For instance, identified needs that are found to be universal and well-codified are likely legitimate requirements and solid candidates for productization on the part of the HR IT vendor. For requirements that apply to a group of agencies or a single agency, or that are really based on practice or tradition, there may be a range of outcomes worth considering.

Prior to an engineering discussion, the applicable governance body must ask of any new requirement: Can this objective be achieved by conforming to a recognized commercial best practice? If the answer is yes, the governance process should strongly favor moving in that direction.

This disciplined approach is crucial to keeping modernization aligned with cost savings, simplification and future scalability.

Breaking the statutory chains: A modern exception and reform model

A common pitfall in federal IT is the tendency to view outdated laws and regulations as immutable engineering constraints. There are in fact many government “requirements” — often at a very granular and prescriptive level — embedded in written laws and regulations, that are either out-of-date or that simply do not make sense when viewed in a larger context of how HR gets done. The tendency is to look at these cases and say, “This is in the rule books, so we must build the software this way.”

But this is the wrong answer, for several reasons. And reform typically lags years behind technology. Changing laws or regulations is an arduous and lengthy process, but the government cannot afford to encode obsolete statutes into modern software. Treating every rule as a software requirement guarantees technical debt before launch.

The proposed mechanism: The business case exception

The Office of Management and Budget and the Office of Personnel Management have demonstrated the ability to manage simple, business-case-driven exception processes. This capability should be operationalized as a core component of HR IT modernization governance:

  • Immediate flexibility: OMB and OPM should grant agencies waivers to bypass outdated procedural requirements if adopting the standard best practice reduces administrative burden and cost.
  • Batch legislative updates: Rather than waiting for laws to change before modernizing, OPM and OMB can “batch up” these approved exceptions. On a periodic basis, these proven efficiencies through standard processes to modify laws and regulations to match the new, modernized reality.

This approach flips the traditional model. Instead of software lagging behind policy, the modernization effort drives policy evolution.

Avoiding the “corner case” trap: ROI-driven decision-making

In large-scale HR modernization, “corner cases” can become the silent destroyer of budgets and timelines. Every agency can cite dozens of rare events — special pay authorities, unusual personnel actions or unique workforce segments — that occur only infrequently.

The risk is that building system logic for rare events is extraordinarily expensive. These edge cases disproportionately consume design and engineering time. And any customization or productization can increase testing complexity and long-term maintenance cost.

Governance should enforce a strict return-on-investment rule: If a unique scenario occurs infrequently and costs more to automate than to handle manually, it should not be engineered into the system.

For instance, if a unique process occurs only 50 times a year across a 2-million-person workforce, it is cheaper to handle it manually outside the system than to spend millions customizing the software. If the government does not manage this evaluation itself, it will devolve into a “ping-pong” negotiation with vendors, leading to scope creep and vulnerability. The government must hold the reins, deciding what gets built based on value, not just request.

Recommended governance structure

To operationalize the ideas above, the government should implement a three-tiered governance structure designed to separate strategy from technical execution.

  1. The executive steering committee (ESC)
  • Composition: Senior leadership from OMB, OPM and select agency chief human capital officers and chief information officers (CHCOs/CIOs).
  • Role: Defines the “North Star.” They hold the authority to approve the “batch exceptions” for policy and regulation. They handle the highest-level escalations where an agency claims a mission-critical need to deviate from the standard.

The ESC establishes the foundation for policy, ensures accountability, and provides air cover for standardization decisions that may challenge entrenched agency preferences.

  1. The functional control board (FCB)
  • Composition: Functional experts (HR practitioners) and business analysts.
  • Role: The “gatekeepers.” They utilize the two-dimensional framework to triage requirements. Their primary mandate is to protect the standard commercial best practice. They determine if a request is a true “need” or just a preference.

The FCB prevents the “paving cow paths” phenomenon by rigorously protecting the standard process baseline.

  1. The architecture review board (ARB)
  • Composition: Technical architects and security experts.
  • Role: Ensures that even approved variations do not break the data model or introduce technical debt. They enforce the return on investment (ROI) rule on corner cases — if the technical cost of a request exceeds its business value, they reject it.

The ARB enforces discipline on engineering choices and protects the system from fragmentation.

Federal HR IT modernization presents a rare opportunity to reshape not just systems, but the business of human capital management across government. The technology exists. The challenge — and the opportunity — lies in governance.

The path to modernization will not be defined by the software implemented, but by the discipline, authority, and insight of the governance structure that guides it.

Steve Krauss is a principal with SLK Executive Advisory. He spent the last decade working for GSA and OPM, including as the Senior Executive Service (SES) director of the HR Quality Service Management Office (QSMO).

The post Governing the future: A strategic framework for federal HR IT modernization first appeared on Federal News Network.

© Getty Images/iStockphoto/metamorworks

People network concept. Group of person. Teamwork. Human resources.

Building AI-Powered SaaS Businesses

21 January 2026 at 07:16

In preparation for our upcoming Building SaaS Businesses with AI Superstream, I sat down with event chair Jason Gilmore to discuss the full lifecycle of an AI-powered SaaS product, from initial ideation all the way to a successful launch.

Jason Gilmore is CTO of Adalo, a popular no-code mobile app builder. A technologist and software product leader with over 25 years of industry experience, Jason’s spent 13 years building SaaS products at companies including Gatherit.co and the highly successful Nomorobo and as the CEO of the coding education platform Treehouse. He’s also a veteran of Xenon Partners, where he leads technical M&A due diligence and advises their portfolio of SaaS companies on AI adoption, and previously served as CTO of DreamFactory.

Here’s our interview, edited for clarity and length.

Ideation

Michelle Smith: As a SaaS developer, what are the first steps you take when beginning the ideation process for a new product?

Jason Gilmore: I always start by finding a name that I love, buying the domain, and then creating a logo. Once I’ve done this, I feel like the idea is becoming real. This used to be a torturous process, but thanks to AI, my process is now quite smooth. I generate product names by asking ChatGPT for 10 candidates, refining them until I have three preferred options, and then checking availability via Lean Domain Search. I usually use ChatGPT to help with logos, but interestingly, while I was using Cursor, the popular AI-powered coding editor, it automatically created a logo for ContributorIQ as it set up the landing page. I hadn’t even asked for one, but it looked great, so I went with it!

Once I nail down a name and logo, I’ll return to ChatGPT yet again and use it like a rubber duck. Of course, I’m not doing any coding or debugging at this point; instead, I’m just using ChatGPT as a sounding board, asking it to expand upon my idea, poke holes in it, and so forth.

Next, I’ll create a GitHub repository and start adding issues (basically feature requests). I’ve used the GitHub kanban board in the past and have also been a heavy Trello user at various times. However, these days I keep it simple and create GitHub issues until I feel I have enough to constitute an MVP. Then I’ll use the GitHub MCP server in conjunction with Claude Code or Cursor to pull and implement these issues.

Before committing resources to development, how do you approach initial validation to ensure the market opportunity exists for a new SaaS product?

The answer to this question is simple. I don’t. If the problem is sufficiently annoying that I eventually can’t resist building something to solve it, then that’s enough for me. That said, once I have an MVP, I’ll start telling everybody I know about it and really try to lower the barrier associated with getting started.

For instance, if someone expresses interest in using SecurityBot, I’ll proactively volunteer to help them validate their site via DNS. If someone wants to give ContributorIQ a try, I’ll ask to meet with the person running due diligence to ensure they can successfully connect to their GitHub organization. It’s in these early stages of customer acquisition that you can determine what users truly want rather than merely trying to replicate what competitors are doing.

Execution, Tools, and Code

When deciding to build a new SaaS product, what’s the most critical strategic question you seek to answer before writing any code?

Personally, the question I ask myself is whether I seriously believe I will use the product every day. If the answer is an adamant yes, then I proceed. If it’s anything but a “heck yes,” then I’ve learned that it’s best to sit on the idea for a few more weeks before investing any additional time.

Which tools do you recommend, and why?

I regularly use a number of different tools for building software, including Cursor and Claude Code for AI-assisted coding and development, Laravel Forge for deployment, Cloudflare and SecurityBot for security, and Google Analytics and Search Console for analytics. Check out my comprehensive list at the end of this article for more details.

How do you accurately measure the success and adoption of your product? What key metrics (KPIs) do you prioritize tracking immediately after launch?

Something I’ve learned the hard way is that being in such a hurry to launch a product means that you neglect to add an appropriate level of monitoring. I’m not necessarily referring to monitoring in the sense of Sentry or Datadog; rather I’m referring to simply knowing when somebody starts a trial.

At a minimum, you should add a restricted admin dashboard to your SaaS which displays various KPIs such as who started a trial and when. You should also be able to quickly determine when trialers reach a key milestone. For instance, at SecurityBot, that key milestone is connecting their Slack, because once that happens, trialers will periodically receive useful notifications right in the very place where they spend a large part of their day.

On build versus buy: What’s your critical decision framework for choosing to use prebuilt frameworks and third-party platforms?

I think it’s a tremendous mistake to try to reinvent the wheel. Frameworks and libraries such as Ruby on Rails, Laravel, Django, and others are what’s known as “batteries included,” meaning they provide everything 99% of what developers require to build a tremendously useful, scalable, and maintainable software product. If your intention is to build a successful SaaS product, then you should focus exclusively on building a quality product and acquiring customers, period. Anything else is just playing with computers. And there’s nothing wrong with playing with computers! It’s my favorite thing to do in the world. But it’s not the same thing as building a software business.

Quality and Security

What unique security and quality assurance (QA) protocols does an intelligent SaaS product require that a standard, non-AI application doesn’t?

The two most important are prompt management and output monitoring. To minimize response drift (the LLM’s tendency for creative, inconsistent interpretation), you should rigorously test and tightly define the LLM prompt. This must be repeatedly tested against diverse datasets to ensure consistent and desired behavior.

Developers should look beyond general OpenAI APIs and consider specialized custom models (like the 2.2 million available on Hugging Face) that are better suited for specific tasks.

To ensure quality and prevent harm, you’ll also need to proactively monitor and review the LLM’s output (particularly when it’s low-confidence or potentially sensitive) and continuously refine and tune the prompt. Keeping a human in the loop (HITL) is essential: At Nomorobo, for instance, we manually reviewed low-confidence robocall categorizations to improve the model. At Adalo, we’ve reviewed thousands of app-building prompt responses to ensure desired outcomes.

Critically, businesses must transparently communicate to users exactly how their data and intellectual property are being used, particularly before passing it to a third-party LLM service.

It’s also important to differentiate when AI is truly necessary. Sometimes, AI can be used most effectively to enhance non-AI tools—for instance, using an LLM to generate complex, difficult-to-write scripts or reviewing schemas for database optimization—rather than trying to solve the core problem with a large, general model.

Marketing, Launch, and Business Success

What are your top two strategies for launching a product?

For early-stage growth, founders should focus intently on two core strategies: prioritizing SEO and proactively promoting the product.

I recommend prioritizing SEO early and aggressively. Currently, the majority of organic traffic still comes from traditional search results, not AI-generated answers (GEO). We are however certainly seeing GEO being attributed to a larger share of visitors. So while you should focus on Google organic traffic, I also suggest spending time tuning your marketing pages for AI crawlers.

Implement a feature-to-landing page workflow: For SecurityBot, nearly all traffic was driven by creating a dedicated SEO-friendly landing page for every new feature. AI tools like Cursor can automate the creation of these pages, including generating necessary assets like screenshots and promotional tweets. Landing pages for features like Broken Link Checker and PageSpeed Insights were 100% created by Cursor and Sonnet 4.5.

Many technical founders hesitate to promote their work, but visibility is crucial. Overcome founder shyness: Be vocal about your product and get it out there. Share your product immediately with friends, colleagues, and former customers to start gaining early traction and feedback.

Mastering these two strategies is more than enough to keep your team busy and effectively drive initial growth.

On scaling: What’s the single biggest operational hurdle when trying to scale your business from a handful of users to a large, paying user base?

I’ve had the opportunity to see business scaling hurdles firsthand, not only at Xenon but also during the M&A process, as well as within my own projects. The biggest operational hurdle, by far, is maintaining focus on customer acquisition. It is so tempting to build “just one more feature” instead of creating another video or writing a blog post.

Conversely, for those companies that do reach a measure of product-market fit, my observation is they tend to focus far too much on customer acquisition at the cost of customer retention. There’s a concept in subscription-based businesses known as “max MRR,” which identifies the point at which your business will simply stop growing once revenue lost due to customer churn reaches an absolute dollar point that erases any revenue gains made through customer acquisition. In short, at a certain point, you need to focus on both, and that’s difficult to do.

We’ll end with monetization. What’s the most successful and reliable monetization strategy you’ve seen for a new AI-powered SaaS feature? Is it usage-based, feature-gated, or a premium tier?

We’re certainly seeing usage-based monetization models take off these days, and I think for certain types of businesses, that makes a lot of sense. However, my advice to those trying to build a new SaaS business is to keep your subscription model as simple and understandable as possible in order to maximize customer acquisition opportunities.

Thanks, Jason.

For more from Jason Gilmore on developing successful SaaS products, join us on February 10 for our AI Superstream: Building SaaS Businesses with AI. Jason and a lineup of AI specialists from Dynatrace, Sendspark, DBGorilla, Changebot, and more will examine every phase of building with AI, from initial ideation and hands-on coding to launch, security, and marketing—and share case studies and hard-won insights from production. Register here; it’s free and open to all.

Appendix: Recommended Tools

CategoryTool/servicePrimary useNotes
AI-assisted codingCursor (with Opus 4.5) and Claude CodeCoding and AI assistanceClaude Opus 4.5 highly valued
Code managementGitHubManaging code repositoriesStandard code management
DeploymentLaravel ForgeDeploying projects to Digital OceanHighly valued for simplifying deployment
API/SaaS interactionMCP serversInteracting with GitHub, Stripe, Chrome devtools, and TrelloCentralized interaction point
ArchitectureMermaidCreating architectural diagramsUsed for visualization
ResearchChatGPTRubber duck debugging and general AI assistanceDedicated tool for problem-solving
SecurityCloudflareSecurity services and blocking bad actorsPrimarily focused on protection
Marketing and SEOGoogle Search ConsoleTracking marketing page performanceFocuses on search visibility
AnalyticsGoogle Analytics 4 (GA4)Site metrics and reportingConsidered a “horrible” but necessary tool due to lack of better alternatives

Securing AI in federal and defense missions: A multi-level approach

20 January 2026 at 17:07

As the federal government accelerates artificial intelligence adoption under the national AI Action Plan, agencies are racing to bring AI into mission systems. The Defense Department, in particular, sees the potential of AI to help analysts manage overwhelming data volumes and maintain an advantage over adversaries.

Yet most AI projects never make it out of the lab — not because models are inadequate, but because the data foundations, traceability and governance around them are too weak. In mission environments, especially on-premises and air-gapped cloud regions, trustworthy AI is impossible without secure, transparent and well-governed data.

To deploy AI that reaches production and operates within classification, compliance and policy constraints, federal leaders must view AI security in layers.

Levels of security and governance

AI covers a wide variety of fields such as machine learning, robotics and computer vision. For this discussion, let’s focus on one of AI’s fastest-growing areas: natural language processing and generative AI used as decision-support tools.

Under the hood, these systems, based on large language models (LLMs), are complex “black boxes” trained on vast amounts of public data. On their own, they have no understanding of a specific mission, agency or theater of operations. To make them useful in government, teams typically combine a base model with proprietary mission data, often using retrieval-augmented generation (RAG), where relevant documents are retrieved and used as context for each answer.

That’s where the security and governance challenges begin.

Layer 1: Infrastructure — a familiar foundation

The good news is that the infrastructure layer for AI looks a lot like any other high-value system. Whether an agency is deploying a database, a web app or an AI service, the ATO processes, network isolation, security controls and continuous monitoring apply.

Layer 2: The challenge of securing AI augmented data

The data layer is where AI security diverges most sharply from commercial use. In RAG systems, mission documents are retrieved as context for model queries. If retrieval doesn’t enforce classification and access controls, the system can generate results that cause security incidents.

Imagine a single AI system indexing multiple levels of classified documents. Deep in the retrieval layer, the system pulls a highly relevant document to augment the query, but it’s beyond the analyst’s classification access levels. The analyst never sees the original document; only a neat, summarized answer that is also a data spill.

The next frontier for federal AI depends on granular, attribute-based access control.

Every document — and every vectorized chunk — must be tagged with classification, caveats, source system, compartments and existing access control lists. This is often addressed by building separate “bins” of classified data, but that approach leads to duplicated data, lost context and operational complexity. A safer and more scalable solution lies within a single semantic index with strong, attribute-based filtering.

Layer 3: Models and the AI supply chain

Agencies may use managed models, fine-tune their own, or import third-party or open-source models into air-gapped environments. In all cases, models should be treated as part of a software supply chain:

  • Keep models inside the enclave so prompts and outputs never cross uncontrolled boundaries.
  • Protect training pipelines from data poisoning, which can skew outputs or introduce hidden security risks.
  • Rigorously scan and test third-party models before use.

Without clear policy around how models are acquired, hosted, updated and retired, it’s easy for “one-off experiments” to become long-term risks.

The challenge at this level lies in the “parity gap” between commercial and government cloud regions. Commercial environments receive the latest AI services and their security enhancements much earlier. Until those capabilities are authorized and available in air-gapped regions, agencies may be forced to rely on older tools or build ad hoc workarounds.

Governance, logging and responsible AI

AI governance has to extend beyond the technical team. Policy, legal, compliance and mission leadership all have a stake in how AI is deployed.

Three themes matter most:

  1. Traceability and transparency. Analysts must be able to see which sources informed a result and verify the underlying documents.
  2. Deep logging and auditing. Each query should record who asked what, which model ran, what data was retrieved, and which filters were applied.
  3. Alignment with emerging frameworks. DoD’s responsible AI principles and the National Institute of Standards and Technology’s AI risk guidance offer structure, but only if policy owners understand AI well enough to apply them — making education as critical as technology.

Why so many pilots stall — and how to break through

Industry estimates suggest that up to 95% of AI projects never make it to full production. In federal environments, the stakes are higher, and the barriers are steeper. Common reasons include vague use cases, poor data curation, lack of evaluation to detect output drift, and assumptions that AI can simply be “dropped in.”

Data quality in air-gapped projects is also a factor. If your query is about “missiles,” but your system is mostly indexed with documents about “tanks”, analysts can expect poor results, also called “AI hallucinations.” They won’t trust the tool, and the project will quietly die. AI cannot invent high-quality mission data where none exists.

There are no “quick wins” for AI in classified missions, but there are smart starting points:

  • Build upon a focused decision-support problem.
  • Inventorying and tagging mission data.
  • Bringing security and policy teams in early.
  • Establishing an evaluation loop to test outputs.
  • Designing for traceability and explainability from day one.

Looking ahead

In the next three to five years, we can expect AI platforms, both commercial and government, to ship with stronger built-in security, richer monitoring, and more robust audit features. Agent-based AI pipelines with autonomous security accesses that can pre-filter queries and post-process answers (for example, to enforce sentiment policies or redact PII) will become more common. Yet even as these security requirements and improvements accelerate, national security environments face a unique challenge: The consequences of failure are too high to rely on blind automation.

Agencies that treat AI as a secure system — grounded in strong data governance, layered protections and educated leadership — will be the ones that move beyond pilots to real mission capability.

Ron Wilcom is the director of innovation for Clarity Business Solutions.

The post Securing AI in federal and defense missions: A multi-level approach first appeared on Federal News Network.

© Getty Images/ThinkNeo

Circuit board in shape electronic brain with gyrus. Artificial intelligence in neon cyberspace with glowing.

The Fork-It-and-Forget Decade

20 January 2026 at 07:20

The following article originally appeared on Medium and is being republished here with the author’s permission.

Open source has been evolving for half a century, but the last two decades have set the stage for what comes next. The 2000s were the “star stage”—when open source became mainstream, commercial, and visible. The 2010s decentralized it, breaking the hierarchy and making forking normal. Now, in the 2020s, it’s transforming again as generative AI enters the scene—as a participant.

This decade isn’t just faster. It’s a different kind of speed. AI is starting to write, refactor, and remix code and open source projects at a scale no human maintainer can match. GitHub isn’t just expanding; it’s mutating, filled with AI-generated derivatives of human work, on track to manage close to 1B repositories by the end of the decade.

If we want to understand what’s happening to open source now, it helps to look back at how it evolved. The story of open source isn’t a straight line—it’s a series of turning points. Each decade changed not just the technology but also the culture around it: from rebellion in the 1990s to recognition in the 2000s to decentralization in the 2010s. Those shifts built the foundation for what’s coming next—an era where code isn’t just written by developers but by the agents they are managing.

1990s: Setting the Stage

The late ’80s and early ’90s were defined by proprietary stacks—Windows, AIX, Solaris. By the mid-’90s, developers began to rebel. Open source wasn’t just an ideal; it was how the web got built. Most sites ran Apache on the frontend but relied on commercial engines such as Dynamo and Oracle on the backend. The first web was open at the edges and closed at the core.

In universities and research labs, the same pattern emerged. GNU tools like Emacs, GCC, and gdb were everywhere, but they ran on proprietary systems—SGI, Solaris, NeXT, AIX. Open source had taken root, even if the platforms weren’t open. Mike Loukides and Andy Oram’s Programming with GNU Software (1996) captured that world perfectly: a maze of UNIX variants where every system broke your scripts in a new way. Anyone who learned command-line syntax on AIX in the early ’90s still trips over it on macOS today.

That shift—Linux and FreeBSD meeting the web—set the foundation for the next decade of open infrastructure. Clearly, Tim Berners-Lee’s work at CERN was the pivotal event that defined the next century, but I think the most tactical win from the 1990s was Linux. Even though Linux didn’t become viable for large-scale use until 2.4 in the 2000s, it set the stage.

2000s: The Open Source Decade

The 2000s were when open source went mainstream. Companies that once sold closed systems started funding the foundations that challenged them—IBM, Sun, HP, Oracle, and even Microsoft. It wasn’t altruism; it was strategy. Open source had become a competitive weapon, and being a committer had become a form of social capital. The communities around Apache, Eclipse, and Mozilla weren’t just writing code; they built a sort of reputation game. “I’m a committer” could fund a startup or land you a job.

Chart of SourceForge-hosted projects 2000–2010 (proxy for OSS). It shows an increase from nearly zero in 2000 to almost 250,000 in 2010. Data sourced from Wikipedia.
Data sourced from SourceForge’s Wikipedia page.

As open source gained momentum, visibility became its own form of power. Being a committer was social capital, and fame within the community created hierarchy. The movement that had started as a rebellion against proprietary control began to build its own “high places.” Foundations became stages; conferences became politics. The centralized nature of CVS and Subversion reinforced this hierarchy—control over a single master repository meant control over the project itself. Forking wasn’t seen as collaboration; it was defiance. And so, even in a movement devoted to openness, authority began to concentrate.

By the end of the decade, open source had recreated the very structures it once tried to dismantle and there were power struggles around forking and control—until Git arrived and quietly made forking not just normal but encouraged.

In 2006, Linus Torvalds quietly dropped something that would reshape it all: Git. It was controversial, messy, and deeply decentralized—the right tool at the right time.

2010s: The Great Decentralization

The 2010s decentralized everything. Git unseated Subversion and CVS, making forking normal. GitHub turned version control into a social network, and suddenly open source wasn’t a handful of central projects—it was thousands of competing experiments. Git made a fork cheap and local: Anyone could branch off instantly, hack in isolation, and later decide whether to merge back. That one idea changed the psychology of collaboration. Experimentation became normal, not subversive.

The effect was explosive. SourceForge, home to the CVS/SVN era, hosted about 240,000 projects by 2010. Ten years later, GitHub counted roughly 190 million repositories. Even if half were toy projects, that’s a two-to-three-order-of-magnitude jump in project creation velocity—roughly one new repository every few seconds by the late 2010s. Git didn’t just speed up commits; it changed how open source worked.

But the same friction that disappeared also removed filters. Because Git made experimentation effortless, “throwaway projects” became viable—half-finished frameworks, prototypes, and personal experiments living side by side with production-grade code. By mid-decade, open source had entered its Cambrian phase: While the 2000s gave us five or six credible frontend frameworks, the 2010s produced 50 or 60. Git didn’t just decentralize code—it decentralized attention.

Chart tracking Git and Subversion usage over time, 2010–2022. Git shows increased usage, while Subversion usage fell. Data sourced from the Eclipse Community Survey (2011, 2013) and the Stack Overflow Dev Survey (2015–2022).
Data sourced from the Eclipse Community Survey (2011, 2013) and the Stack Overflow Dev Survey (2015–2022).

2020s: What Will We Call This Decade?

Now that we’re halfway through the 2020s, something new is happening. Generative AI has slipped quietly into the workflow, reshaping open source once again—not by killing it but by making forking even easier. It also forms one of the main training inputs for the output it generates.

Go back two decades to the 2000s, if a library didn’t do what you needed, you joined the mailing list, earned trust, and maybe became a committer. That was slow, political, and occasionally productive. But for you or the companies you work for to be able to influence a project and commit code, we’re talking months or years of investment.

Today, if a project is 90 percent right, you fork it, describe the fix to an AI, and you move on 5 minutes later. No review queues. No debates about brace styles. The pull-request culture that once defined open source starts to feel optional because you aren’t investing any time in it to begin with.

In fact, you might not even be aware that you forked and patched something. One of the 10 agents you launched in parallel to reimplement an API might have forked a library, patched it for your specific use case, and published it to your private GitHub npm repository while you were at lunch. And you might not even be paying attention to those details.

Trend prediction: We’re going to have a nickname for developers who use GenAI and are unable to read the code it generated very soon because that’s happening.

Is Open Source Done?

No. But it’s already changing. The big projects will continue—React, Next.js, and DuckDB will keep growing because AI models already prefer them. And I do think there are still communities or developers who want to collaborate with other humans.

But there is a surge of AI-generated open source contributions and projects that will start to affect the ecosystem. Smaller, more focused libraries will start to see more forks. That’s my prediction, and it might get to the point where it doesn’t make much sense anymore to track them.

Instead of half a dozen stable frameworks per category, we’ll see hundreds of small, AI-tuned frameworks and forks, each solving one developer’s problem perfectly and then fading away. The social glue that once bound open source—mentorship, debate, shared maintenance—gets thinner. Collaboration gives way to radical personalization. And I don’t know if that’s such a bad thing.

The Fork-It-and-Forget Decade

This is shaping up to be the “fork-it-and-forget” decade. Developers—and the agents they run—are moving at a new kind of velocity: forking, patching, and moving on. GitHub reports more than 420 million repositories as of early 2023, and it’s on pace to hit a billion by 2030.

We tore down the “high places” that defined the 2000s and replaced them with the frictionless innovation of the 2010s. Now the question is whether we’ll even recognize open source by the end of this decade. I still pay attention to the libraries I’m pulling in, but most developers using tools like Cursor to write complex code probably don’t—and maybe don’t need to. The agent already forked it and moved on.

Maybe that’s the new freedom: to fork, to forget, and to let the machines remember for us.

China hacked our mobile carriers. So why is the Pentagon still buying from them?

19 January 2026 at 14:01

A freshly belligerent China is flexing its muscles in ways not seen since the USSR during the Cold War, forging a new illiberal alliance with Russia and North Korea. But the latent battlefield is farther reaching and more dangerous in the information age.

As we now know, over “a years long, coordinated assault,” China has stolen personal data from nearly every single American. This data lets them read our text messages, listen to our phone calls, and track our movements anywhere in the United States and around the world — allowing China to build a nearly perfect intelligence picture of the American population, including our armed forces and elected officials.

This state of affairs leaves corporate leaders, democracy advocates and other private citizens vulnerable to blackmail, cyber attacks and other harassment. Even our national leaders are not immune.

Last year, China targeted the phones of President Donald Trump and Vice President JD Vance in the course of the presidential campaign, reminding us that vulnerabilities in the network can affect even those at the highest levels of government. The dangers were drawn into stark relief earlier this year when Secretary of Defense Pete Hegseth used his personal phone to pass sensitive war plans to his colleagues, along with a high-profile journalist. That incident underscored what we’ve seen in Russia’s invasion of Ukraine, Ukraine’s Operation Spiderweb drone attacks on Russia, and on front lines the world over: Modern wars are run on commercial cellular networks, despite their vulnerabilities.

Many Americans would be surprised to learn that there is no impenetrable, classified military cellular network guiding the top-flight soldiers and weapons we trust to keep us safe. The cellular networks that Lindsay Lohan and Billy Bob Thornton sell us during NFL games are the same networks our troops and national security professionals use to do their jobs. These carriers have a long, shockingly consistent history of losing our personal data via breaches and hacks — as well as selling it outright, including to foreign governments. So it’s no wonder that, when the Pentagon asked carriers to share their security audits, every single one of them refused.

This isn’t a new revelation. Twenty years ago, I served as a Special Forces communications sergeant in Iraq. There, U.S. soldiers regularly used commercial BlackBerries — not because the network was secure, but because they knew their calls would connect. It’s surreal that two decades later, our troops are still relying on commercial phones, even though the security posture has not meaningfully improved.

A big part of the reason why this challenge persists stems from an all-too-familiar issue in our government: a wall of red tape that keeps innovative answers from reaching public-sector problems.

In this case, a solution to the Pentagon’s cell network challenge already exists. The Army requested it, and our soldiers need it. But when they tried to acquire this technology, they were immediately thwarted. Not by China or Russia — but by the United States government’s own bureaucracy.

It turns out that the Defense Department is required to purchase cellular service on a blanket, ten-year contract called Spiral 4. The contract was last renewed in early 2024 to AT&T, Verizon, T-Mobile and a few others, about a year before a solution existed. Yet despite this, rigid procurement rules dictate that the Pentagon will have to wait … presumably another eight years until the contract re-opens for competition.

The FCC recently eliminated regulations calling on telecoms to meet minimum cybersecurity standards, noting that the focus should instead be collaboration with the private sector. I agree. But to harness the full ingenuity of our private sector, our government should not be locking out startups. From Palantir to Starlink to Oura, startups have proven that they can deliver critical national security technologies, out-innovating entrenched incumbents and offering people services they need.

The Pentagon has made real, top-level policy changes to encourage innovation. But it must do more to ensure that our soldiers are equipped with the very best of what they need and deserve, and find and root out these pockets of stalled bureaucratic inertia. Because America’s enemies are real enough – our own red tape should not be one of them.

John Doyle is the founder and CEO of Cape. He previously worked at Palantir and as a Staff Sergeant in the Special Forces.

The post China hacked our mobile carriers. So why is the Pentagon still buying from them? first appeared on Federal News Network.

© (Courtesy of Military OneSource)

Soldier uses the Military OneSource app on his cellphone. (Courtesy of Military OneSource)

Why Uncle Sam favors AI-forward government contractors — and how contractors can use that to their advantage

16 January 2026 at 17:03

Read between the lines of recent federal policies and a clear message to government contractors begins to emerge: The U.S. government isn’t just prioritizing deployment of artificial intelligence in 2026. It wants the contractors to whom it entrusts its project work to do likewise.

That message, gleaned from memoranda issued by the White House Office of Management and Budget, announcements out of the Defense Department’s Chief Digital and Artificial Intelligence Office, statements from the General Services Administration and other recent actions, suggests that when it comes to evaluating government contractors for potential contract awards, the U.S. government in many instances will favor firms that are more mature in their use and governance of AI.

That’s because, in the big picture, firms that are more AI-mature — that employ it with strong governance and oversight — will tend to use and share data to make better decisions and communicate more effectively, so their projects and business run more efficiently and cost-effectively. That in turn translates into lower risk and better value for the procuring agency. Agencies apparently are recognizing the link between AI and contractor value. Based on recent contracting trends along with my own conversations with contracting executives, firms that can demonstrate they use AI-driven tools and processes in key areas like project management, resource utilization, cost modeling and compliance are winning best-value assessments even when they aren’t the cheapest.

To simply dabble in AI is no longer enough. Federal agencies and their contracting officers are putting increased weight on the maturity of a contractor’s AI program, and the added value that contractor can deliver back to the government in specific projects. How, then, can contractors generate extra value using AI in order to be a more attractive partner to federal contracting decision-makers?

Laying the groundwork

Let’s dig deeper into the “why” behind AI. For contractors, it’s not just about winning more government business. Big picture: It’s about running projects and the overall business more efficiently and profitably.

What’s more, being an AI-forward firm isn’t about automating swaths of a workforce out of a job. Rather, AI is an enabler and multiplier of human innovation. It frees people to focus on higher-value work by performing tasks on their behalf. It harnesses the power of data to surface risks, opportunities, trends and potential issues before they escalate into larger problems. Its predictive power promotes anticipatory actions rather than reactive management. The insights it yields, when combined with the collective human experience, institutional knowledge and business acumen inside a firm, leads to better-informed human decision making.

For AI to provide benefits and value both internally and to customers, it requires a solid data foundation underneath it. Clean, connected and governed data is the lifeblood that AI models must have to deliver reliable outputs. If the data used to train those models is incomplete, siloed, flawed or otherwise suspect, the output from AI models will tend to be suspect, too. So in building a solid foundation for AI, a firm would be wise to ensure it has an integrated digital environment in place (with key business systems like enterprise resource planning [ERP], customer relationship management [CRM] and project portfolio management [PPM] connected) to enable data to flow unimpeded. Nowadays, federal contracting officers and primes are evaluating contractors based on the maturity of their AI programs, as well as on the maturity of their data-management programs in terms of hygiene, security and governance.

They’re also looking closely at the guardrails contractors have around their AI program: appropriate oversight, human-in-the-loop practices and governance structures. Transparency, auditability and explainability are paramount, particularly in light of regulations such as the Federal Acquisition Regulations, Defense Federal Acquisition Regulation Supplement, and Cybersecurity Maturity Model Certification. It’s worth considering developing (and keeping up-to-date) an AI capabilities and governance statement that details how and where your firm employs AI, and the structures it uses to oversee its AI capabilities. A firm then can include that statement in the proposals it submits.

AI use cases that create value

Having touched on the why and how behind AI, let’s explore some of the areas where contractors could be employing intelligent automation, predictive engines, autonomous agents, generative AI co-pilots and other capabilities to run their businesses and projects more efficiently. With these approaches in mind, contractors can deliver more value to their federal government customers.

  1. Project and program management: AI has a range of viable use cases that create value inside the project management office. On the process management front, for example, it can automate workflows and processes. Predictive scheduling, cost variance forecasting, automated estimate at completion (EAC) updates, and project triage alerts are also areas where AI is proving its value. For example, AI capabilities within an ERP system can alert decision-makers to cost trends and potential overruns, and offer suggestions for how to address them. They also can provide project managers with actionable, up-to-the-minute information on project status, delays, milestones, cost trends, potential budget variances and resource utilization.

Speaking of resources, predictive tools (skills graphs, staffing models, et cetera) can help contractors forecast talent needs and justify salary structures. They also support resource allocation and surge requirements. Ultimately, these tools help optimize the composition of project teams by analyzing project needs across the firm, changing circumstances and peoples’ skills, certifications and performance. It all adds up to better project outcomes and better value back to the government agency customer.

  1. Finance and accounting: From indirect rate modeling to anomaly detection in timesheets and cost allowability, AI tools can minimize the financial and accounting risk inside a contract. It can alert teams to issues related to missing, inconsistent or inaccurate data, helping firms avoid compliance issues. Using AI, contractors also can expedite invoicing on the accounts receivable side as well as processes on the accounts payable side to provide clarity to both the customer and internal decision-makers.
  2. Compliance: Contractors carry a heavy reporting and compliance burden and live under the constant shadow of an audit. AI is proving valuable as a compliance support tool, with its ability to interpret regulatory language and identify compliance risks like mismatched data or unallowable costs. AI also can create, then confirm compliance with, policies and procedures by analyzing and applying rules, monitoring time and expense entries, gathering and formatting data for specific contractual reporting requirements, and detecting and alerting project managers to data disparities.
  3. Business development and capture: AI can help firms uncover and win new business by identifying relevant and winnable opportunities, and through proposal development, harnessing business data tailored to solicitation requirements. Using AI-driven predictive analytics, companies can develop a scoring system and decision matrix to apply to their go or no-go decisions. Firms can also use AI to handle much of the heavy lifting with proposal creation, significantly reducing time-to-draft and proposal-generation costs, while boosting a firm’s proposal capacity substantially. Intelligent modeling capabilities can recommend optimal pricing and rate strategies for a proposal.

As much as the U.S. government is investing to become an AI-forward operation, logic suggests that it would prefer that its contractors be similarly AI-savvy in their use — and governance — of intelligent tools. In the world of government contracting, we’re approaching a point where winning business from the federal government could depend on how well a firm can leverage the AI tools at hand to demonstrate and deliver value.

 

Steve Karp is chief innovation officer for Unanet.

The post Why Uncle Sam favors AI-forward government contractors — and how contractors can use that to their advantage first appeared on Federal News Network.

© Federal News Network

Steve Karp headshot

21 Lessons from 14 Years at Google

16 January 2026 at 07:23

The following article originally appeared on Addy Osmani’s Substack newsletter, Elevate, and is being republished here with his permission.

When I joined Google ~14 years ago, I thought the job was about writing great code. I was partly right. But the longer I’ve stayed, the more I’ve realized that the engineers who thrive aren’t necessarily the best programmers. They’re the ones who’ve figured out how to navigate everything around the code: the people, the politics, the alignment, the ambiguity.

These lessons are what I wish I’d known earlier. Some would have saved me months of frustration. Others took years to fully understand. None of them are about specific technologies—those change too fast to matter. They’re about the patterns that keep showing up, project after project, team after team.

I’m sharing them because I’ve benefited enormously from engineers who did the same for me. Consider this my attempt to pay it forward.

1. The best engineers are obsessed with solving user problems.

It’s seductive to fall in love with a technology and go looking for places to apply it. I’ve done it. Everyone has. But the engineers who create the most value work backwards: They become obsessed with understanding user problems deeply and let solutions emerge from that understanding.

User obsession means spending time in support tickets, talking to users, watching users struggle, asking “why” until you hit bedrock. The engineer who truly understands the problem often finds that the elegant solution is simpler than anyone expected.

The engineer who starts with a solution tends to build complexity in search of a justification.

2. Being right is cheap. Getting to right together is the real work.

You can win every technical argument and lose the project. I’ve watched brilliant engineers accrue silent resentment by always being the smartest person in the room. The cost shows up later as “mysterious execution issues” and “strange resistance.”

The skill isn’t being right. It’s entering discussions to align on the problem, creating space for others, and remaining skeptical of your own certainty.

Strong opinions, weakly held—not because you lack conviction but because decisions made under uncertainty shouldn’t be welded to identity.

3. Bias towards action. Ship. You can edit a bad page, but you can’t edit a blank one.

The quest for perfection is paralyzing. I’ve watched engineers spend weeks debating the ideal architecture for something they’ve never built. The perfect solution rarely emerges from thought alone. It emerges from contact with reality. AI can in many ways help here.

First do it, then do it right, then do it better. Get the ugly prototype in front of users. Write the messy first draft of the design doc. Ship the MVP that embarrasses you slightly. You’ll learn more from one week of real feedback than a month of theoretical debate.

Momentum creates clarity. Analysis paralysis creates nothing.

4. Clarity is seniority. Cleverness is overhead.

The instinct to write clever code is almost universal among engineers. It feels like proof of competence.

But software engineering is what happens when you add time and other programmers. In that environment, clarity isn’t a style preference. It’s operational risk reduction.

Your code is a strategy memo to strangers who will maintain it at 2am during an outage. Optimize for their comprehension, not your elegance. The senior engineers I respect most have learned to trade cleverness for clarity, every time.

5. Novelty is a loan you repay in outages, hiring, and cognitive overhead.

Treat your technology choices like an organization with a small “innovation token” budget. Spend one each time you adopt something materially nonstandard. You can’t afford many.

The punchline isn’t “never innovate.” It’s “innovate only where you’re uniquely paid to innovate.” Everything else should default to boring, because boring has known failure modes.

The “best tool for the job” is often the “least-worst tool across many jobs”—because operating a zoo becomes the real tax.

6. Your code doesn’t advocate for you. People do.

Early in my career, I believed great work would speak for itself. I was wrong. Code sits silently in a repository. Your manager mentions you in a meeting, or they don’t. A peer recommends you for a project, or someone else.

In large organizations, decisions get made in meetings you’re not invited to, using summaries you didn’t write, by people who have five minutes and 12 priorities. If no one can articulate your impact when you’re not in the room, your impact is effectively optional.

This isn’t strictly about self-promotion. It’s about making the value chain legible to everyone—including yourself.

7. The best code is the code you never had to write.

We celebrate creation in engineering culture. Nobody gets promoted for deleting code, even though deletion often improves a system more than addition. Every line of code you don’t write is a line you never have to debug, maintain, or explain.

Before you build, exhaust the question: “What would happen if we just…didn’t?” Sometimes the answer is “nothing bad,” and that’s your solution.

The problem isn’t that engineers can’t write code or use AI to do so. It’s that we’re so good at writing it that we forget to ask whether we should.

8. At scale, even your bugs have users.

With enough users, every observable behavior becomes a dependency—regardless of what you promised. Someone is scraping your API, automating your quirks, caching your bugs.

This creates a career-level insight: You can’t treat compatibility work as “maintenance” and new features as “real work.” Compatibility is product.

Design your deprecations as migrations with time, tooling, and empathy. Most “API design” is actually “API retirement.”

9. Most “slow” teams are actually misaligned teams.

When a project drags, the instinct is to blame execution: People aren’t working hard enough; the technology is wrong; there aren’t enough engineers. Usually none of that is the real problem.

In large companies, teams are your unit of concurrency, but coordination costs grow geometrically as teams multiply. Most slowness is actually alignment failure—people building the wrong things, or the right things in incompatible ways.

Senior engineers spend more time clarifying direction, interfaces, and priorities than “writing code faster” because that’s where the actual bottleneck lives.

10. Focus on what you can control. Ignore what you can’t.

In a large company, countless variables are outside your control: organizational changes, management decisions, market shifts, product pivots. Dwelling on these creates anxiety without agency.

The engineers who stay sane and effective zero in on their sphere of influence. You can’t control whether a reorg happens. You can control the quality of your work, how you respond, and what you learn. When faced with uncertainty, break problems into pieces and identify the specific actions available to you.

This isn’t passive acceptance, but it is strategic focus. Energy spent on what you can’t change is energy stolen from what you can.

11. Abstractions don’t remove complexity. They move it to the day you’re on call.

Every abstraction is a bet that you won’t need to understand what’s underneath. Sometimes you win that bet. But something always leaks, and when it does, you need to know what you’re standing on.

Senior engineers keep learning “lower level” things even as stacks get higher. Not out of nostalgia but out of respect for the moment when the abstraction fails and you’re alone with the system at 3am. Use your stack.

But keep a working model of its underlying failure modes.

12. Writing forces clarity. The fastest way to learn something better is to try teaching it.

Writing forces clarity. When I explain a concept to others—in a doc, a talk, a code review comment, even just chatting with AI—I discover the gaps in my own understanding. The act of making something legible to someone else makes it more legible to me.

This doesn’t mean that you’re going to learn how to be a surgeon by teaching it, but the premise still holds largely true in the software engineering domain.

This isn’t just about being generous with knowledge. It’s a selfish learning hack. If you think you understand something, try to explain it simply. The places where you stumble are the places where your understanding is shallow.

Teaching is debugging your own mental models.

13. The work that makes other work possible is priceless—and invisible.

Glue work—documentation, onboarding, cross-team coordination, process improvement—is vital. But if you do it unconsciously, it can stall your technical trajectory and burn you out. The trap is doing it as “helpfulness” rather than treating it as deliberate, bounded, visible impact.

Timebox it. Rotate it. Turn it into artifacts: docs, templates, automation. And make it legible as impact, not as personality trait.

Priceless and invisible is a dangerous combination for your career.

14. If you win every debate, you’re probably accumulating silent resistance.

I’ve learned to be suspicious of my own certainty. When I “win” too easily, something is usually wrong. People stop fighting you not because you’ve convinced them but because they’ve given up trying—and they’ll express that disagreement in execution, not meetings.

Real alignment takes longer. You have to actually understand other perspectives, incorporate feedback, and sometimes change your mind publicly.

The short-term feeling of being right is worth much less than the long-term reality of building things with willing collaborators.

15. When a measure becomes a target, it stops measuring.

Every metric you expose to management will eventually be gamed. Not through malice but because humans optimize for what’s measured.

If you track lines of code, you’ll get more lines. If you track velocity, you’ll get inflated estimates.

The senior move: Respond to every metric request with a pair: one for speed; one for quality or risk. Then insist on interpreting trends, not worshiping thresholds. The goal is insight, not surveillance.

16. Admitting what you don’t know creates more safety than pretending you do.

Senior engineers who say “I don’t know” aren’t showing weakness. They’re creating permission. When a leader admits uncertainty, it signals that the room is safe for others to do the same. The alternative is a culture where everyone pretends to understand and problems stay hidden until they explode.

I’ve seen teams where the most senior person never admitted confusion, and I’ve seen the damage. Questions don’t get asked. Assumptions don’t get challenged. Junior engineers stay silent because they assume everyone else gets it.

Model curiosity, and you get a team that actually learns.

17. Your network outlasts every job you’ll ever have.

Early in my career, I focused on the work and neglected networking. In hindsight, this was a mistake. Colleagues who invested in relationships—inside and outside the company—reaped benefits for decades.

They heard about opportunities first, could build bridges faster, got recommended for roles, and cofounded ventures with people they’d built trust with over years.

Your job isn’t forever, but your network is. Approach it with curiosity and generosity, not transactional hustle.

When the time comes to move on, it’s often relationships that open the door.

18. Most performance wins come from removing work, not adding cleverness.

When systems get slow, the instinct is to add: caching layers, parallel processing, smarter algorithms. Sometimes that’s right. But I’ve seen more performance wins from asking, “What are we computing that we don’t need?”

Deleting unnecessary work is almost always more impactful than doing necessary work faster. The fastest code is code that never runs.

Before you optimize, question whether the work should exist at all.

19. Process exists to reduce uncertainty, not to create paper trails.

The best process makes coordination easier and failures cheaper. The worst process is bureaucratic theater. It exists not to help but to assign blame when things go wrong.

If you can’t explain how a process reduces risk or increases clarity, it’s probably just overhead. And if people are spending more time documenting their work than doing it, something has gone deeply wrong.

20. Eventually, time becomes worth more than money. Act accordingly.

Early in your career, you trade time for money—and that’s fine. But at some point, the calculus inverts. You start to realize that time is the nonrenewable resource.

I’ve watched senior engineers burn out chasing the next promo level, optimizing for a few more percentage points of compensation. Some of them got it. Most of them wondered, afterward, if it was worth what they gave up.

The answer isn’t “don’t work hard.” It’s “know what you’re trading, and make the trade deliberately.”

21. There are no shortcuts, but there is compounding.

Expertise comes from deliberate practice—pushing slightly beyond your current skill, reflecting, repeating. For years. There’s no condensed version.

But here’s the hopeful part: Learning compounds when it creates new options, not just new trivia. Write—not for engagement but for clarity. Build reusable primitives. Collect scar tissue into playbooks.

The engineer who treats their career as compound interest, not lottery tickets, tends to end up much further ahead.

A final thought

Twenty-one lessons sounds like a lot, but they really come down to a few core ideas: Stay curious, stay humble, and remember that the work is always about people—the users you’re building for and the teammates you’re building with.

Addy Osmani at Google

A career in engineering is long enough to make plenty of mistakes and still come out ahead. The engineers I admire most aren’t the ones who got everything right. They’re the ones who learned from what went wrong, shared what they discovered, and kept showing up.

If you’re early in your journey, know that it gets richer with time. If you’re deep into it, I hope some of these resonate.

Addy will be joining Tim O’Reilly on February 12 for an hour-long deep dive into the lessons he’s learned over his career. They’ll also chat about the progress being made in agentic coding workflows, in a conversation guided by questions from the audience. Save your seat. It’s free. 

Then on March 26, Addy and Tim will be hosting the next event in our AI Codecon series: Software Craftsmanship in the Age of AI. Over four hours, they and a lineup of expert practitioners will explore what it takes to build excellent software in the age of AI that creates value for all participants. It’s also free and open to all. Register here.

If you have a story to share about how you’re using agents to build innovative and effective AI-powered experiences, we want to hear it—and possibly feature it at AI Codecon. Get the details at our call for proposals and send us your proposal by February 17.

Securing the spotlight: Inside the investigations that protect America’s largest events

15 January 2026 at 15:07

At large-scale events like World Cup matches, a Super Bowl or the LA 2028 Olympics, viewers around the world will turn their attention to the athletes. At these types of large-scale events, often categorized as National Special Security Events (NSSEs), a tightly choreographed collaboration of federal, state and local agencies is required to manage logistics, intelligence and operational response. The spotlight may be on the athletes and fans, but the unsung work happens behind the scenes, where security teams and support personnel operate before, during and after each event to ensure the safety of participants and spectators. While much of that focus is outward, securing perimeters, screening crowds and scanning for external threats, some of the most significant risks can come from inside the event itself.

In an era of multi-dimensional threats, the line between external adversaries and internal vulnerabilities has grown increasingly blurred. Contractors, vendors, temporary staff and employees are vital to the success of major events; however, they also introduce complex risk considerations. Managing those risks requires more than background checks and credentialing; it calls for investigative awareness rooted in federal risk management frameworks and duty-of-care principles. Agencies and partners must align with established standards such as the National Insider Threat Task Force (NITTF) guidelines and the Department of Homeland Security’s National Infrastructure Protection Plan, emphasizing collaboration, transparency and early intervention. By fostering information-sharing and cross-functional coordination, investigative teams can recognize behavioral and contextual warning signs in ways that strengthen both security and trust.

The inside threats that don’t make headlines

When we talk about insider threats in the context of NSSEs, many think of espionage or deliberate sabotage. But the reality is often more subtle, and therefore more dangerous.

Consider this real-world example: A contracted former employee of the San Jose Earthquakes’ home stadium admitted to logging into the concession vendor’s administrative system and deleting menus and payment selections. His unauthorized access, triggered from home after his termination, interrupted operations on opening day and resulted in more than $268,000 in losses.

These kinds of incidents highlight a fundamental truth: Insider risk isn’t just about malicious intent; it’s about exposure. And exposure multiplies with scale. When thousands of people have physical or digital access to a high-profile venue, especially when celebrities, politicians and global audiences are involved, the likelihood of insider-related incidents grows exponentially.

Investigations as the backbone of event security

At their core, investigations are about collecting and connecting the dots between people, data and threats. For NSSEs, this investigative function becomes the connective tissue that binds disparate security disciplines together.

Consider the investigation of a potential insider risk within a major international summit.

  • A social media monitoring team flags an insider — in this case, a contractor — expressing frustration about working conditions.
  • The venue security team reports a missing equipment case from the same contractor’s storage area.
  • A public records check reveals the individual was previously charged with theft.

Individually, none of these signals confirms a threat. But when unified under a connected investigative workflow, the risk becomes clearer and more actionable. This is the type of cross-functional insight that defines modern event protection. It’s not about reacting to threats; it’s about uncovering the threads before they unravel.

Large-scale events generate intelligence at an unprecedented scale: everything from credentialing data to behavioral reports, cybersecurity logs and social media feeds. Yet these systems rarely connect to and communicate with one another. The result is fragmented visibility and slow investigative response.

For example:

  • A fusion center monitoring social media identifies a user threatening to “disrupt the opening ceremony.”
  • A local police investigation logs a similar username associated with a harassment complaint.
  • A corporate security team managing sponsor operations notices suspicious activity during credential pickup with the same surname and the same location.

If these datasets live in silos, that pattern may never be connected. But within a connected framework, analysts can correlate these intelligence signals in seconds, surfacing a person of concern who may have both motive and proximity to the event.

Building a connected investigations framework

Establishing an effective investigations framework for NSSEs and other high-profile events requires three key capabilities:

  1. Pre-event inside vetting and behavioral baselines

Agencies and private partners must move beyond one-time background checks toward continuous, risk-informed vetting that emphasizes awareness and accountability. For example, a Defense Department–affiliated recreation facility on Walt Disney World property uncovered that an accounting technician had exploited her system access over 18 months to issue unauthorized refunds totaling more than $183,000. In a large-scale event environment, similar credential misuse could go unnoticed without behavioral baselines and cross-functional coordination. Establishing clear patterns of access and communication among HR, security and operations helps detect anomalies early and address them before they evolve into costly or reputation-damaging breaches.

  1. Case linkage and pattern recognition

Event-related investigations should never exist in isolation. When analysts apply connected data analysis and link mapping, patterns begin to emerge: recurring individuals, behaviors or affiliations that might otherwise appear unrelated. Each isolated incident may sit on the margins of concern, but when viewed collectively, they can reveal a broader narrative: an insider demonstrating escalating behavior or progressing along the pathway to violence. By aggregating and analyzing these small signals, investigative teams can shift from reacting to incidents to identifying intent, uncovering risks long before they cross into active threats.

  1. Real-time collaboration and feedback loops

Investigative insight loses its power when it’s buried in inboxes or trapped in spreadsheets. The true value of intelligence emerges only when it reaches the right people at the right moment. Breaking down silos between intelligence analysts, investigators and operational teams ensures that findings translate into timely, informed action on the ground. Establishing an event-specific security operations center — one that unites state, local and federal agencies with venue security and event officials under a shared framework — creates a single hub for intelligence sharing and rapid coordination. This collaborative model transforms investigations from static reports into dynamic, real-time decision support, ensuring that every partner has the visibility and context needed to anticipate and neutralize risks before they escalate.

Even as artificial intelligence becomes more integrated into the investigative process, the human element remains indispensable. Technology can accelerate analysis and detection, but it’s human intuition, context and judgment that transform data into decisions — capabilities that AI has yet to replicate and replace.

Securing from the inside out

As the U.S. prepares for a decade of NSSEs, the success of each operation will depend on one foundational principle: Security starts from within. The most sophisticated perimeter protection and threat detection systems cannot compensate for insider risk that goes unexamined.

By operationalizing investigations within a connected framework that unites intelligence data, event security teams and tradecraft, federal agencies can transform insider threats from unknown liabilities into known risks, enabling the implementation of mitigation actions. In doing so, they not only safeguard events, but also set the new standard for how public and private sectors can work together to protect what matters most when the world is watching.

Tim Kirkham is vice president of the investigations practice at Ontic.

The post Securing the spotlight: Inside the investigations that protect America’s largest events first appeared on Federal News Network.

© Federal News Network

Tim Kirkham headshot

8 federal agency data trends for 2026

14 January 2026 at 14:54

If 2025 was the year federal agencies began experimenting with AI at-scale, then 2026 will be the year they rethink their entire data foundations to support it. What’s coming next is not another incremental upgrade. Instead, it’s a shift toward connected intelligence, where data is governed, discoverable and ready for mission-driven AI from the start.

Federal leaders increasingly recognize that data is no longer just an IT asset. It is the operational backbone for everything from citizen services to national security. And the trends emerging now will define how agencies modernize, secure and activate that data through 2026 and beyond.

Trend 1: Governance moves from manual to machine-assisted

Agencies will accelerate the move toward AI-driven governance. Expect automated metadata generation, AI-powered lineage tracking, and policy enforcement that adjusts dynamically as data moves, changes and scales. Governance will finally become continuous, not episodic, allowing agencies to maintain compliance without slowing innovation.

Trend 2: Data collaboration platforms replace tool sprawl

2026 will mark a turning point as agencies consolidate scattered data tools into unified data collaboration platforms. These platforms integrate cataloging, observability and pipeline management into a single environment, reducing friction between data engineers, analysts and emerging AI teams. This consolidation will be essential for agencies implementing enterprise-wide AI strategies.

Trend 3: Federated architectures become the federal standard

Centralized data architectures will continue to give way to federated models that balance autonomy and interoperability across large agencies. A hybrid data fabric — one that links but doesn’t force consolidation — will become the dominant design pattern. Agencies with diverse missions and legacy environments will increasingly rely on this approach to scale AI responsibly.

Trend 4: Integration becomes AI-first

Application programming interfaces (APIs), semantic layers and data products will increasingly be designed for machine consumption, not just human analysis. Integration will be about preparing data for real-time analytics, large language models (LLMs) and mission systems, not just moving it from point A to point B.

Trend 5: Data storage goes AI-native

Traditional data lakes will evolve into AI-native environments that blend object storage with vector databases, enabling embedding search and retrieval-augmented generation. Federal agencies advancing their AI capabilities will turn to these storage architectures to support multimodal data and generative AI securely.

Trend 6: Real-time data quality becomes non-negotiable

Expect a major shift from reactive data cleansing to proactive, automated data quality monitoring. AI-based anomaly detection will become standard in data pipelines, ensuring the accuracy and reliability of data feeding AI systems and mission applications. The new rule: If it’s not high-quality in real time, it won’t support AI at-scale.

Trend 7: Zero trust expands into data access and auditing

As agencies mature their zero trust programs, 2026 will bring deeper automation in data permissions, access patterns and continuous auditing. Policy-as-code approaches will replace static permission models, ensuring data is both secure and available for AI-driven workloads.

Trend 8: Workforce roles evolve toward human-AI collaboration

The rise of generative AI will reshape federal data roles. The most in-demand professionals won’t necessarily be deep coders. They will be connectors who understand prompt engineering, data ethics, semantic modeling and AI-optimized workflows. Agencies will need talent that can design systems where humans and machines jointly manage data assets.

The bottom line: 2026 is the year of AI-ready data

In the year ahead, the agencies that win will build data ecosystems designed for adaptability, interoperability and human–AI collaboration. The outdated mindset of “collect and store” will be replaced by “integrate and activate.”

For federal leaders, the mission imperative is clear: Make data trustworthy by default, usable by design, and ready for AI from the start. Agencies that embrace this shift will move faster, innovate safely, and deliver more resilient mission outcomes in 2026 and beyond.

Seth Eaton is vice president of technology & innovation at Amentum.

The post 8 federal agency data trends for 2026 first appeared on Federal News Network.

© Getty Images/iStockphoto/ipopba

AI, Machine learning, Hands of robot and human touching on big data network connection background, Science and artificial intelligence technology, innovation and futuristic.

The Problem with AI “Artists”

14 January 2026 at 07:07

A performance reel. Instagram, TikTok, and Facebook accounts. A separate contact email for enquiries. All staples of an actor’s website.

Except these all belong to Tilly Norwood, an AI “actor.”

This creation represents one of the newer AI trends, which is AI “artists” that eerily represent real humans (which, according to their creators, is the goal). Eline Van der Velden, the creator of Tilly Norwood, has said that she is focused on making the creation “a big star” in the “AI genre,” a distinction that has been used to justify the existence of AI created artists as not taking away jobs from real actors. Van der Velden has explicitly said that Tilly Norwood was made to be photorealistic to provoke a reaction, and it’s working, as reportedly talent agencies are looking to represent it.

And it’s not just Hollywood. Major producer Timbaland has created his own AI entertainment company and launched his first “artist,” TaTa, with the music created by uploading demos of his own to the platform Suno, reworking it with AI, and adding lyrics afterward.

But while technologically impressive, the emergence of AI “artists” risks devaluing creativity as a fundamentally human act, and in the process, dehumanizing and “slopifying” creative labor.

Heightening Industry at the Expense of Creativity

The generative AI boom is deeply tied to creative industries, with profit-hungry machines monetizing every movie, song, and TV show as much as they possibly can. This, of course, predates AI “artists,” but AI is making the agenda even clearer. One of the motivations behind the Writer’s Guild Strike of 2023 was countering the threat of studios replacing writers with AI.

For industry power players, employing AI “artists” means less reliance on human labor—cutting costs and making it possible to churn out products at a much higher rate. And in an industry already known for poor working conditions, there’s significant appeal in dealing with a creation they do not “need” to treat humanely.

Technological innovation has always posed a risk to eliminating certain jobs, but AI “artists” are a whole new monster in industry. It isn’t just about speeding up processes or certain tasks but about excising human labor from the product. This means in an industry that is already notoriously hard to make money in as a creative, the demand will become even more scarce—and that’s not even looking at the consequences on the art itself.

The AI “Slop” Takeover

The interest of making money over quality has always prevailed in industry; Netflix and Hallmark aren’t making all those Christmas romantic comedies with the same plot because they’re original stories, nor are studios embracing endless amount of reboots and remakes based on successful art because it would be visionary to remake a ’90s movie with a 20-something Hollywood star. But they still have their audiences, and in the end, require creative output and labor to be made.

Now, imagine that instead of these rom-coms cluttering Netflix, we have AI-generated movies and TV shows, starring creations like Tilly Norwood, and the soundtrack comes from a voice, lyrics, and production that was generated by AI.

The whole model of generative AI is dependent on regurgitating and recycling existing data. Admittedly, it’s a technological feat that Suno can generate a song and Sora can convert text to video images; what it is NOT is a creative renaissance. AI-generated writing is already taking over, from essays in the classroom to motivational LinkedIn posts, and in addition to ruining the em dash, it consistently puts out material of low and robotic quality. AI “artists” “singing” and “acting” is the next uncanny destroyer of quality and likely will alienate audiences, who turn to art to feel connection.

Art has a long tradition of being used as resistance and a way of challenging the status quo; protest music has been a staple of culture—look no further than civil rights and antiwar movements in the United States in the 1960s. It is so powerful that there are attempts by political actors to suppress it and punish artists. Iranian filmmaker Jafar Panahi, who won the Palme d’Or at the Cannes Film Festival for It Was Just an Accident, was sentenced to prison in absentia in Iran for making the film, and this is not the first punishment he has received for his films. Will studios like Sony or Warner Bros. release songs or movies like these if they can just order marketing-compliant content from a bot?

A sign during the writer’s strike famously said “ChatGPT doesn’t have childhood trauma.” An AI “artist” may be able to carry out a creator’s agenda to a limited extent, but what value does it have coming from a generated creation that has no lived experiences and emotions—especially when this drives motivation to make art in the first place?

To top it off, generative AI is not a neutral entity by any means; we’re in for a lot of stereotypical and harmful material, especially without the input of real artists. The fact most AI “artists” are portrayed as young women with specific physical features is not a coincidence. It’s an intensification of the longstanding trend of making virtual assistants—from ELIZA to Siri to Alexa to AI “artists” like Tilly Norwood or Timbaland’s TaTa—“female,” which reinforces the trope of relegating women to “helper” roles that are designed to cater to the needs of the user, a clear manifestation of human biases.

Privacy and Plagiarism

Ensuring that “actors” and “singers” look and sound as human as possible in films, commercials, and songs requires that they be trained on real-world data. Tilly Norwood creator Van der Welden has defended herself by claiming that she only used licensed data and went through an extensive research process, looking at thousands of images for her creation. But “licensed data” does not make taking the data automatically ethical; look at Reddit, which signed a multimillion dollar contract to allow Google to train its AI models on Reddit data. The vast data of Reddit users is not protected, just monetized by the organization.

AI expert Ed Newton-Rex has discussed how generative AI is consistently stealing from artists, and has proposed measures in place to make sure data is licensed and trained in the public domain to be used in creating. There are ways for individual artists to protect their online work: including watermarks, opting out of data collection, and taking measures to block AI bots. While these strategies can keep data more secure, considering how vast generative AI is, they’re probably more a safeguard than a solution.

Jennifer King from Stanford’s Human-Centered Artificial Intelligence has provided some ways to protect data and personal information more generally, such as making “opt out” the default option for data sharing, and for legislation that focuses not just on transparency of AI use but on its regulation—likely an uphill battle with the Trump administration trying to take away state AI regulations.

This is the ethical home that AI “artists” are living in. Think of all the faces of real people that went into making Tilly Norwood. A company may have licensed that data for use, but the artists whose “data” is their likeness and creativity likely didn’t (at least directly). In this light, AI “artists” are a form of plagiarism.

Undermining Creativity as Fundamentally Human

Looking at how art has been transformed by technology before generative AI, it could be argued that this is simply the next step in the process of change rather than something to be concerned about. But photography and animation and typewriters and all the other inventions used to justify the onslaught of AI “artists” were not eliminations of human creativity. Photography was not a replacement to painting but a new art form, even if it did concern painters. There’s a difference between having a new, experimental way of doing something and extensively using data (particularly data that is taken without consent) to make creations that blur the lines of what is and isn’t human.  For instance, Rebecca Xu, a professor of computer art and animation at Syracuse who teaches an “AI in Creative Practice” course, argues that artists can incorporate AI into their creative process. But as she warns, “AI offers useful tools, but you still need to produce your own original work instead of using something generated by AI.”

It’s hard to understand exactly how AI “artists” benefit human creativity, which is a fundamental part of our expression and intellectual development. Just look at the cave art from the Paleolithic era. Even humans 30,000 years ago who didn’t have secure food and shelter were making art. Unlike other industries, art did not come into existence purely for profit.

The arts are already undervalued economically, as is evident from the lack of funding in schools. Today, a kid who may want to be a writer will likely be bombarded with marketing from generative AI platforms like ChatGPT to use these tools to “write” a story. The result may resemble a narrative, but there’s not necessarily any creativity or emotional depth that comes from being human, and more importantly, the kid didn’t actually write. Still, the very fact that this AI-generated story is now possible curbs the industrial need for human artists.

How Do We Move Forward?

Though profit-hungry power players may be embracing AI “artists,” the same cannot be said for public opinion. The vast majority of artists and audiences alike are not interested in AI-generated art, much less AI “artists.” The power of public opinion shouldn’t be underestimated; the writer’s strike is probably the best example of that.

Collective mobilization thus will likely be key in the future when it comes to challenging AI “artists” against the interest of studios, record labels, and other members of the creative industry’s ruling class. There have been wins already, such as the Writer’s Guild of America Strike in 2023, which resulted in a contract stipulating that studios can’t use AI as a credited writer. And because music and film and television are full of stars, often with financial and cultural power, the resistance being voiced in the media could benefit from more actionable steps; for example, maybe a prominent production company run by an A-list actor pledges not to have any “artists” generated by AI in their work.

Beyond industry and labor, the devaluing of art as unimportant unless you’re a “star” can also play a significant role in changing conversations around it. This means funding art programs in schools and libraries so that young people know that art is something they can do, something that is fun and that brings joy—not necessarily to make money or a living but to express themselves and engage with the world.

The fundamental risk of AI “artists” is that they will become so commonplace that it will feel pointless to pursue art, and that much of the art we consume will lose its fundamentally human qualities. But human-made art and human artists will never become obsolete—that would require fundamentally eliminating human impulses and the existence of human-made art. The challenge is making sure that artistic creation is not relegated to the margins of life.

A data mesh approach: Helping DoD meet 2027 zero trust needs

13 January 2026 at 16:54

As the Defense Department moves to meet its 2027 deadline for completing a zero trust strategy, it’s critical that the military can ingest data from disparate sources while also being able to observe and secure systems that span all layers of data operations.

Gone are the days of secure moats. Interconnected cloud, edge, hybrid and services-based architectures have created new levels of complexity — and more avenues for bad actors to introduce threats.

The ultimate vision of zero trust can’t be accomplished through one-off integrations between systems or layers. For critical cybersecurity operations to succeed, zero trust must be based on fast, well-informed risk scoring and decision making that consider a myriad of indicators that are continually flowing from all pillars.

Short of rewriting every application, protocol and API schema to support new zero trust communication specifications, agencies must look to the one commonality across the pillars: They all produce data in the form of logs, metrics, traces and alerts. When brought together into an actionable speed layer, the data flowing from and between each pillar can become the basis for making better-informed zero trust decisions.

The data challenge

According to the DoD, achieving its zero trust strategy results in several benefits, including “the ability of a user to access required data from anywhere, from any authorized and authenticated user and device, fully secured.”

Every day, defense agencies are generating enormous quantities of data. Things get even more tricky when the data is spread across cloud platforms, on-prem systems, or specialized environments like satellites and emergency response centers.

It’s hard to find information, let alone use it efficiently. And with different teams working with many different apps and data formats, the interoperability challenge increases. The mountain of data is growing. While it’s impossible to calculate the amount of data the DoD generates per day, a single Air Force unmanned aerial vehicle can generate up to 70 terabytes of data within a span of 14 hours, according to a Deloitte report. That’s about seven times more data output than the Hubble Space Telescope generates over an entire year.

Access to that information is bottlenecking.

Data mesh is the foundation for modern DoD zero trust strategies

Data mesh offers an alternative answer to organizing data effectively. Put simply, a data mesh overcomes silos, providing a unified and distributed layer that simplifies and standardizes data operations. Data collected from across the entire network can be retrieved and analyzed at any or all points of the ecosystem — so long as the user has permission to access it.

Instead of relying on a central IT team to manage all data, data ownership is distributed across government agencies and departments. The Cybersecurity and Infrastructure Security Agency uses a data mesh approach to gain visibility into security data from hundreds of federal agencies, while allowing each agency to retain control of its data.

Data mesh is a natural fit for government and defense sectors, where vast, distributed datasets have to be securely accessed and analyzed in real time.

Utilizing a scalable, flexible data platform for zero trust networking decisions

One of the biggest hurdles with current approaches to zero trust is that most zero trust implementations attempt to glue together existing systems through point-to-point integrations. While it might seem like the most straightforward way to step into the zero trust world, those direct connections can quickly become bottlenecks and even single points of failure.

Each system speaks its own language for querying, security and data format; the systems were also likely not designed to support the additional scale and loads that a zero trust security architecture brings. Collecting all data into a common platform where it can be correlated and analyzed together, using the same operations, is a key solution to this challenge.

When implementing a platform that fits these needs, agencies should look for a few capabilities, including the ability to monitor and analyze all of the infrastructure, applications and networks involved.

In addition, agencies must have the ability to ingest all events, alerts, logs, metrics, traces, hosts, devices and network data into a common search platform that includes built-in solutions for observability and security on the same data without needing to duplicate it to support multiple use cases.

This latter capability allows the monitoring of performance and security not only for the pillar systems and data, but also for the infrastructure and applications performing zero trust operations.

The zero trust security paradigm is necessary; we can no longer rely on simplistic, perimeter-based security. But the requirements demanded by the zero trust principles are too complex to accomplish with point-to-point integrations between systems or layers.

Zero trust requires integration across all pillars at the data level –– in short, the government needs a data mesh platform to orchestrate these implementations. By following the guidance outlined above, organizations will not just meet requirements, but truly get the most out of zero trust.

Chris Townsend is global vice president of public sector at Elastic.

The post A data mesh approach: Helping DoD meet 2027 zero trust needs first appeared on Federal News Network.

© AP Illustration/Peter Hamlin)

(AP Illustration/Peter Hamlin)US--Insider Q&A-Pentagon AI Chief

GPUs: Enterprise AI’s New Architectural Control Point

13 January 2026 at 11:34

Over the past two years, enterprises have moved rapidly to integrate large language models into core products and internal workflows. What began as experimentation has evolved into production systems that support customer interactions, decision-making, and operational automation.

As these systems scale, a structural shift is becoming apparent. The limiting factor is no longer model capability or prompt design but infrastructure. In particular, GPUs have emerged as a defining constraint that shapes how enterprise AI systems must be designed, operated, and governed.

This represents a departure from the assumptions that guided cloud native architectures over the past decade: Compute was treated as elastic, capacity could be provisioned on demand, and architectural complexity was largely decoupled from hardware availability. GPU-bound AI systems don’t behave this way. Scarcity, cost volatility, and scheduling constraints propagate upward, influencing system behavior at every layer.

As a result, architectural decisions that once seemed secondary—how much context to include, how deeply to reason, and how consistently results must be reproduced—are now tightly coupled to physical infrastructure limits. These constraints affect not only performance and cost but also reliability, auditability, and trust.

Understanding GPUs as an architectural control point rather than a background accelerator is becoming essential for building enterprise AI systems that can operate predictably at scale.

The Hidden Constraints of GPU-Bound AI Systems

GPUs break the assumption of elastic compute

Traditional enterprise systems scale by adding CPUs and relying on elastic, on-demand compute capacity. GPUs introduce a fundamentally different set of constraints: limited supply, high acquisition costs, and long provisioning timelines. Even large enterprises increasingly encounter situations where GPU-accelerated capacity must be reserved in advance or planned explicitly rather than assumed to be instantly available under load.

This scarcity places a hard ceiling on how much inference, embedding, and retrieval work an organization can perform—regardless of demand. Unlike CPU-centric workloads, GPU-bound systems cannot rely on elasticity to absorb variability or defer capacity decisions until later. Consequently, GPU-bound inference pipelines impose capacity limits that must be addressed through deliberate architectural and optimization choices. Decisions about how much work is performed per request, how pipelines are structured, and which stages justify GPU execution are no longer implementation details that can be hidden behind autoscaling. They’re first-order concerns.

Why GPU efficiency gains don’t translate into lower production costs

While GPUs continue to improve in raw performance, enterprise AI workloads are growing faster than efficiency gains. Production systems increasingly rely on layered inference pipelines that include preprocessing, representation generation, multistage reasoning, ranking, and postprocessing.

Each additional stage introduces incremental GPU consumption, and these costs compound as systems scale. What appears efficient when measured in isolation often becomes expensive once deployed across thousands or millions of requests.

In practice, teams frequently discover that real-world AI pipelines consume materially more GPU capacity than early estimates anticipated. As workloads stabilize and usage patterns become clearer, the effective cost per request rises—not because individual models become less efficient but because GPU utilization accumulates across pipeline stages. GPU capacity thus becomes a primary architectural constraint rather than an operational tuning problem.

When AI systems become GPU-bound, infrastructure constraints extend beyond performance and cost into reliability and governance. As AI workloads expand, many enterprises encounter growing infrastructure spending pressures and increased difficulty forecasting long-term budgets. These concerns are now surfacing publicly at the executive level: Microsoft AI CEO Mustafa Suleyman has warned that remaining competitive in AI could require investments in the hundreds of billions of dollars over the next decade. The energy demands of AI data centers are also increasing rapidly, with electricity use expected to rise sharply as deployments scale. In regulated environments, these pressures directly impact predictable latency guarantees, service-level enforcement, and deterministic auditability.

In this sense, GPU constraints directly influence governance outcomes.

When GPU Limits Surface in Production

Consider a platform team building an internal AI assistant to support operations and compliance workflows. The initial design was straightforward: retrieve relevant policy documents, run a large language model to reason over them, and produce a traceable explanation for each recommendation. Early prototypes worked well. Latency was acceptable, costs were manageable, and the system handled a modest number of daily requests without issue.

As usage grew, the team incrementally expanded the pipeline. They added reranking to improve retrieval quality, tool calls to fetch live data, and a second reasoning pass to validate answers before returning them to users. Each change improved quality in isolation. But each also added another GPU-backed inference step.

Within a few months, the assistant’s architecture had evolved into a multistage pipeline: embedding generation, retrieval, reranking, first-pass reasoning, tool-augmented enrichment, and final synthesis. Under peak load, latency spiked unpredictably. Requests that once completed in under a second now took several seconds—or timed out entirely. GPU utilization hovered near saturation even though overall request volume was well below initial capacity projections.

The team initially treated this as a scaling problem. They added more GPUs, adjusted batch sizes, and experimented with scheduling. Costs climbed rapidly, but behavior remained erratic. The real issue was not throughput alone—it was amplification. Each user query triggered multiple dependent GPU calls, and small increases in reasoning depth translated into disproportionate increases in GPU consumption.

Eventually, the team was forced to make architectural trade-offs that had not been part of the original design. Certain reasoning paths were capped. Context freshness was selectively reduced for lower-risk workflows. Deterministic checks were routed to smaller, faster models, reserving the larger model only for exceptional cases. What began as an optimization exercise became a redesign driven entirely by GPU constraints.

The system still worked—but its final shape was dictated less by model capability than by the physical and economic limits of inference infrastructure.

This pattern—GPU amplification—is increasingly common in GPU-bound AI systems. As teams incrementally add retrieval stages, tool calls, and validation passes to improve quality, each request triggers a growing number of dependent GPU operations. Small increases in reasoning depth compound across the pipeline, pushing utilization toward saturation long before request volumes reach expected limits. The result is not a simple scaling problem but an architectural amplification effect in which cost and latency grow faster than throughput.

Reliability Failure Modes in Production AI Systems

Many enterprise AI systems are designed with the expectation that access to external knowledge and multistage inference will improve accuracy and robustness. In practice, these designs introduce reliability risks that tend to surface only after systems reach sustained production usage.

Several failure modes appear repeatedly across large-scale deployments.

Temporal drift in knowledge and context

Enterprise knowledge is not static. Policies change, workflows evolve, and documentation ages. Most AI systems refresh external representations on a scheduled basis rather than continuously, creating an inevitable gap between current reality and what the system reasons over.

Because model outputs remain fluent and confident, this drift is difficult to detect. Errors often emerge downstream in decision-making, compliance checks, or customer-facing interactions, long after the original response was generated.

Pipeline amplification under GPU constraints

Production AI queries rarely correspond to a single inference call. They typically pass through layered pipelines involving embedding generation, ranking, multistep reasoning, and postprocessing, each stage consuming additional GPU resources. Systems research on transformer inference highlights how compute and memory trade-offs shape practical deployment decisions for large models. In production systems, these constraints are often compounded by layered inference pipelines—where additional stages amplify cost and latency as systems scale.

Each stage consumes GPU resources. As systems scale, this amplification effect turns pipeline depth into a dominant cost and latency factor. What appears efficient during development can become prohibitively expensive when multiplied across real-world traffic.

Limited observability and auditability

Many AI pipelines provide only coarse visibility into how responses are produced. It’s often difficult to determine which data influenced a result, which version of an external representation was used, or how intermediate decisions shaped the final output.

In regulated environments, this lack of observability undermines trust. Without clear lineage from input to output, reproducibility and auditability become operational challenges rather than design guarantees.

Inconsistent behavior over time

Identical queries issued at different points in time can yield materially different results. Changes in underlying data, representation updates, or model versions introduce variability that’s difficult to reason about or control.

For exploratory use cases, this variability may be acceptable. For decision-support and operational workflows, temporal inconsistency erodes confidence and limits adoption.

Why GPUs Are Becoming the Control Point

Three trends converge to elevate GPUs from infrastructure detail to architectural control point.

GPUs determine context freshness. Storage is inexpensive, but embedding isn’t. Maintaining fresh vector representations of large knowledge bases requires continuous GPU investment. As a result, enterprises are forced to prioritize which knowledge remains current. Context freshness becomes a budgeting decision.

GPUs constrain reasoning depth. Advanced reasoning patterns—multistep analysis, tool-augmented workflows, or agentic systems—multiply inference calls. GPU limits therefore cap not only throughput but also the complexity of reasoning an enterprise can afford.

GPUs influence model strategy. As GPU costs rise, many organizations are reevaluating their reliance on large models. Small language models (SLMs) offer predictable latency, lower operational costs, and greater control, particularly for deterministic workflows.
This has led to hybrid architectures in which SLMs handle structured, governed tasks, with larger models reserved for exceptional or exploratory scenarios.

What Architects Should Do

Recognizing GPUs as an architectural control point requires a shift in how enterprise AI systems are designed and evaluated. The goal isn’t to eliminate GPU constraints; it’s to design systems that make those constraints explicit and manageable.

Several design principles emerge repeatedly in production systems that scale successfully:

Treat context freshness as a budgeted resource. Not all knowledge needs to remain equally fresh. Continuous reembedding of large knowledge bases is expensive and often unnecessary. Architects should explicitly decide which data must be kept current in near real time, which can tolerate staleness, and which should be retrieved or computed on demand. Context freshness becomes a cost and reliability decision, not an implementation detail.

Cap reasoning depth deliberately. Multistep reasoning, tool calls, and agentic workflows quickly multiply GPU consumption. Rather than allowing pipelines to grow organically, architects should impose explicit limits on reasoning depth under production service-level objectives. Complex reasoning paths can be reserved for exceptional or offline workflows, while fast paths handle the majority of requests predictably.

Separate deterministic paths from exploratory ones. Many enterprise workflows require consistency more than creativity. Smaller, task-specific models can handle deterministic checks, classification, and validation with predictable latency and cost. Larger models should be used selectively, where ambiguity or exploration justifies their overhead. Hybrid model strategies are often more governable than uniform reliance on large models.

Measure pipeline amplification, not just token counts. Traditional metrics such as tokens per request obscure the true cost of production AI systems. Architects should track how many GPU-backed operations a single user request triggers end to end. This amplification factor often explains why systems behave well in testing but degrade under sustained load.

Design for observability and reproducibility from the start. As pipelines become GPU-bound, tracing which data, model versions, and intermediate steps contributed to a decision becomes harder—but more critical. Systems intended for regulated or operational use should capture lineage information as a first-class concern, not as a post hoc addition.

These practices don’t eliminate GPU constraints. They acknowledge them—and design around them—so that AI systems remain predictable, auditable, and economically viable as they scale.

Why This Shift Matters

Enterprise AI is entering a phase where infrastructure constraints matter as much as model capability. GPU availability, cost, and scheduling are no longer operational details—they’re shaping what kinds of AI systems can be deployed reliably at scale.

This shift is already influencing architectural decisions across large organizations. Teams are rethinking how much context they can afford to keep fresh, how deep their reasoning pipelines can go, and whether large models are appropriate for every task. In many cases, smaller, task-specific models and more selective use of retrieval are emerging as practical responses to GPU pressure.

The implications extend beyond cost optimization. GPU-bound systems struggle to guarantee consistent latency, reproducible behavior, and auditable decision paths—all of which are critical in regulated environments. In consequence, AI governance is increasingly constrained by infrastructure realities rather than policy intent alone.

Organizations that fail to account for these limits risk building systems that are expensive, inconsistent, and difficult to trust. Those that succeed will be the ones that design explicitly around GPU constraints, treating them as first-class architectural inputs rather than invisible accelerators.

The next phase of enterprise AI won’t be defined solely by larger models or more data. It will be defined by how effectively teams design systems within the physical and economic limits imposed by GPUs—which have become both the engine and the bottleneck of modern AI.

Author’s note: This article is based on the author’s personal views based on independent technical research and does not reflect the architecture of any specific organization.


Join us at the upcoming Infrastructure & Ops Superstream on January 20 for expert insights on how to manage GPU workloads—and tips on how to address other orchestration challenges presented by modern AI and machine learning infrastructure. In this half-day event, you’ll learn how to secure GPU capacity, reduce costs, and eliminate vendor lock-in while maintaining ML engineer productivity. Save your seat now to get actionable strategies for building AI-ready infrastructure that meets unprecedented demands for scale, performance, and resilience at the enterprise level.

O’Reilly members can
register here. Not a member? Sign up for a 10-day free trial before the event to attend—and explore all the other resources on O’Reilly.

The Kennedy Center ‘Kennedy head:’ What it must be thinking!

By: Tom Temin
12 January 2026 at 15:48

Henry Lee Higginson would be aghast at an opera company leaving its opera house. But it’s true. For murky reasons the reporting has not clarified, the Washington National Opera said it would leave the John F. Kennedy Center for the Performing Arts after operating there since 1971.

Higginson was a Civil War brevet colonel who, after investment success, founded the Boston Symphony Orchestra in 1881. You could get into a performance for 25¢. By 1900 the orchestra had its own home, the Boston Symphony Hall. Higginson died in 1919, but the orchestra occupies that historic building to this day. When in the late ’60s and early ’70s, my friends bought Doors albums and experimented with marijuana, I attended the Wednesday night BSO open rehearsals to hear (and watch) Erich Leinsdorf and Seiji Ozawa ply their arts. I was thrilled when the conductor would stop and order a passage replayed, maybe with a little exasperated scold.

In latter years, my wife and I have had season tickets to the Washington opera. Few locals thought the Trump administration’s appetite for change would affect, of all things, the opera. But it has, like lightning zigzagging through a thicket of branches to nail a squirrel.

Unlike the BSO, most opera and orchestral organizations don’t own their facilities, but instead have long-term, sometimes complex, arrangements with the governing bodies of places like Lincoln Center or Dallas’s Morton H. Meyerson Symphony Center. Mostly, they’re public-private partnerships in one form or another.

For artistic organizations operating out of the Kennedy Center, there’s the added twist — not of municipal government, but of federal.

And Washington, D.C.’s federal landscape has been changing fast lately, mainly psychically but also physically. The most visible manifestation of the latter: The White House getting a convention-sized ballroom, and maybe a story added atop the West Wing.

Psychic changes we’re more accustomed to. The metronome of policy swings back and forth on everything from car mileage to vaccinations.

The roiling of the Kennedy Center embodies both. Physically, the building now has the name “Donald J. Trump” added to its external signage. The letters are big; you can see them driving west on Route 66 en route to Virginia. I keep expecting a bust of Trump to pop up next to that busy selfie spot, the “Kennedy head” — a sculpture so big and ugly it’s become sort of lovable over the decades. Psychically, the center has undergone a wrenching change in its governing board members and its apparent approach to programming.

The announced departure of the Washington National Opera has drawn enormous press coverage. The departure is all wrapped up in the ongoing turmoil of Kennedy Center leadership, programming-slash-culture wars, and — frankly — artists and ticket-buyers perhaps cutting their own noses to spite their faces in reaction to what they see as Trump depredations. If you cancel a performance or stop buying tickets, who are you really hurting?

You can’t put on top-tier opera just anywhere. It requires a pit for the orchestra, a large stage with roomy rear and side areas for props and scenery. I’ve seen the behind-the-stage rooms at the Kennedy Center. They’re like caverns.

More than that, opera needs a dignified, uplifting place. The Kennedy Center fits the bill, or it did. Its concert hall interiors and gigantic hallways elevate the experience, just like the ornate Boston Symphony Hall with its statues along the sides and “Beethoven” inscribed over the stage add to the orchestral presentations. Despite its lackluster cafeteria and fluctuating water pressure, the Kennedy Center adds a certain distinction and elegance to a city that, 50 years ago, felt slightly backwater.

Big corporate benefactors have kept the Washington National Opera afloat. I often muse that gifts from Northrop Grumman and American Airlines plus individuals like investor David Rubenstein and candy heiress Jacqueline Mars mean I can buy a seat at the opera for $50 or $75. I often buy a Snickers at intermission.

I plan to keep supporting the opera regardless of where it ends up, and I’ll buy a Snickers bar at intermission. The departure from the marble temple on the Potomac is a loss for the city and an unfortunate reflection on the Kennedy Center’s leadership.

The post The Kennedy Center ‘Kennedy head:’ What it must be thinking! first appeared on Federal News Network.

© AP Photo/Mark Schiefelbein

A worker on a forklift stands near the letters "The Donald" above the signage on the Kennedy Center on Friday, Dec. 19, 2025, in Washington. (AP Photo/Mark Schiefelbein)

MCP Sampling: When Your Tools Need to Think

12 January 2026 at 07:14

The following article originally appeared on Block’s blog and is being republished here with the author’s permission.

If you’ve been following MCP, you’ve probably heard about tools which are functions that let AI assistants do things like read files, query databases, or call APIs. But there’s another MCP feature that’s less talked about and arguably more interesting: sampling.

Sampling flips the script. Instead of the AI calling your tool, your tool calls the AI.

Let’s say you’re building an MCP server that needs to do something intelligent like summarize a document, translate text, or generate creative content. You have three options:

Option 1: Hardcode the logic. Write traditional code to handle it. This works for deterministic tasks, but falls apart when you need flexibility or creativity.

Option 2: Bake in your own LLM. Your MCP server makes its own calls to OpenAI, Anthropic, or whatever. This works, but now you’ve got API keys to manage and costs to track, and you’ve locked users into your model choice.

Option 3: Use sampling. Ask the AI that’s already connected to do the thinking for you. No extra API keys. No model lock-in. The user’s existing AI setup handles it.

How Sampling Works

When an MCP client like goose connects to an MCP server, it establishes a two-way channel. The server can expose tools for the AI to call, but it can also request that the AI generate text on its behalf.

Here’s what that looks like in code (using Python with FastMCP):

Using Python with FastMCP sampling

The ctx.sample() call sends a prompt back to the connected AI and waits for a response. From the user’s perspective, they just called a “summarize” tool. But under the hood, that tool delegated the hard part to the AI itself.

A Real Example: Council of Mine

Council of Mine is an MCP server that takes sampling to an extreme. It simulates a council of nine AI personas who debate topics and vote on each other’s opinions.

But there’s no LLM running inside the server. Every opinion, every vote, every bit of reasoning comes from sampling requests back to the user’s connected LLM.

The council has nine members, each with a distinct personality:

  • 🔧 The Pragmatist – “Will this actually work?”
  • 🌟 The Visionary – “What could this become?”
  • 🔗 The Systems Thinker – “How does this affect the broader system?”
  • 😊 The Optimist – “What’s the upside?”
  • 😈 The Devil’s Advocate – “What if we’re completely wrong?”
  • 🤝 The Mediator – “How can we integrate these perspectives?”
  • 👥 The User Advocate – “How will real people interact with this?”
  • 📜 The Traditionalist – “What has worked historically?”
  • 📊 The Analyst – “What does the data show?”

Each personality is defined as a system prompt that gets prepended to sampling requests.

When you start a debate, the server makes nine sampling calls, one for each council member:

Council of members 1

That temperature=0.8 setting encourages diverse, creative responses. Each council member “thinks” independently because each is a separate LLM call with a different personality prompt.

After opinions are collected, the server runs another round of sampling. Each member reviews everyone else’s opinions and votes for the one that resonates most with their values:

The council has voted

The server parses the structured response to extract votes and reasoning.

One more sampling call generates a balanced summary that incorporates all perspectives and acknowledges the winning viewpoint.

Total LLM calls per debate: 19

  • 9 for opinions
  • 9 for voting
  • 1 for synthesis

All of those calls go through the user’s existing LLM connection. The MCP server itself has zero LLM dependencies.

Benefits of Sampling

Sampling enables a new category of MCP servers that orchestrate intelligent behavior without managing their own LLM infrastructure.

No API key management: The MCP server doesn’t need its own credentials. Users bring their own AI, and sampling uses whatever they’ve already configured.

Model flexibility: If a user switches from GPT to Claude to a local Llama model, the server automatically uses the new model.

Simpler architecture: MCP server developers can focus on building a tool, not an AI application. They can let the AI be the AI, while the server focuses on orchestration, data access, and domain logic.

When to Use Sampling

Sampling makes sense when a tool needs to:

  • Generate creative content (summaries, translations, rewrites)
  • Make judgment calls (sentiment analysis, categorization)
  • Process unstructured data (extract info from messy text)

It’s less useful for:

  • Deterministic operations (math, data transformation, API calls)
  • Latency-critical paths (each sample adds round-trip time)
  • High-volume processing (costs add up quickly)

The Mechanics

If you’re implementing sampling, here are the key parameters:

Sampling parameters

The response object contains the generated text, which you’ll need to parse. Council of Mine includes robust extraction logic because different LLM providers return slightly different response formats:

Council of Mine robust extraction logic

Security Considerations

When you’re passing user input into sampling prompts, you’re creating a potential prompt injection vector. Council of Mine handles this with clear delimiters and explicit instructions:

Council of Mine delimiters and instructions

This isn’t bulletproof, but it raises the bar significantly.

Try It Yourself

If you want to see sampling in action, Council of Mine is a great playground. Ask goose to start a council debate on any topic and watch as nine distinct perspectives emerge, vote on each other, and synthesize into a conclusion all powered by sampling.

Operational Readiness and Resiliency Index: A new model to assess talent, performance

You just left a high-level meeting with agency leadership. You and your colleagues have been informed that Congress passed new legislation, and your agency is expected to implement the new law with your existing budget and staff. The lead program office replied, “We can make this work.” The agency head is pleased to hear this, but has reservations. How?

Another situation: The president just announced a new priority and has assigned it to your agency. Again, there is no new funding for the effort. Your agency head assigns the priority to your program with the expectation for success. How do you proceed?

Today, given the recent reductions in force (RIFs), people voluntarily leaving government, and structural reorganizations that have taken place and will likely continue, answering the question “How to proceed?” is even more difficult. There is a real need to “know” with a level of certainty whether there are sufficient resources to effectively deliver and sustain new programs or in some cases even the larger agency mission.

Members of the Management Advisory Group — a voluntary group of former appointees under Presidents George W. Bush and Donald Trump — and I believe the answer to these and other questions around an organization’s capabilities and capacity to perform can be found by employing the Operational Readiness and Resiliency Index (ORRI). ORRI is a domestic equivalent of the military readiness model. It is structured into four categories:

  • Workforce
  • Performance
  • Culture
  • Health

Composed of approximately 50 data elements and populated by existing systems of record, including payroll, learning management systems, finance, budget and programmatic/functional work systems, ORRI links capabilities/capacity with performance, informed by culture and health to provide agency heads and executives with an objective assessment of their organization’s current and future performance.

In the past, dynamic budgeting and incrementalism meant that risk was low and performance at some levels predictable. We have all managed some increases or cuts to budgets. Those days are gone. Government is changing now at a speed and degree of transformation that has not been witnessed before. Relying on traditional budgeting methods and employee surveys cannot provide insights needed to assess whether current resources provide the capabilities or capacity for future performance of an agency — at any level.

So how does it work?

As is evident with the illustration above, ORRI pulls mainly from existing systems of record. Many of these systems are outside of traditional human resources (HR) departments to include budget, finance and work-systems. Traditionally, HR departments relied on personnel data alone. These systems told you what staff were paid to do, not what they could do. It is focused on classification and pay, not skills, capacity or performance.

Over the years, we have made many efforts to better measure performance. The Government Performance and Results Act (GPRA) as amended, the Performance Assessment Rating Tool (PART), category management and other efforts have attempted to better account for performance. These tools — along with improvements in budgeting to include zero-based budgeting, planning, programming and budgeting systems, and enterprise risk management — have continued to advance our thinking along systems lines. These past efforts, however, failed to produce an integrated model that runs in near real-time or sets objective performance targets using best-in-class benchmarks. Linking capabilities/capacity to performance provides the ability to ask new questions and conduct comparative performance assessments. ORRI can answer such questions as:

  • Are our staffing plans ready for the next mission priority? Can we adapt? Are we resilient?
  • Do we have the right numbers with the right skills assigned to our top priorities?
  • Are we over-staffed in uncritical areas?
  • Given related functions, where are the performance outliers — good and bad?
  • Given our skill shortages, where do I have those skills that are at the right level available now? Should we recruit, train or reassign to make sure we have the right skills? What is in the best interest of the agency/taxpayer?
  • Is our performance comparable — in named activity, to the best — regardless of sector?
  • What does our data/evidence tell us about our culture? Do we represent excellence in whatever we do? Compared to whom?
  • Where are we excelling and why?
  • Where can we invest to demonstrate impact faster?

Focusing on workforce and performance are critical. However, if you believe that culture eats strategy every time, workforce and performance needs to be informed by culture. Hence ORRI includes culture as a category. Culture in this model concentrates on having a team of executives that drive and sustain the culture, evidenced by cycles of learning, change management success and employee engagement. Health is also a key driver for sustained higher performance. In this regard, ORRI tracks both positive and negative indicators of health, as is evident in the illustration. Again, targets are set and measured to drive performance and increase organizational health. Targets are set by industry best in class standards and strategic performance targets necessary for mission achievement.

Governmentwide, ORRI can provide the Office of Management and Budget with real-time comparative performance around key legislative and presidential priorities and cross-agency thematic initiatives. For the Office of Personnel Management, it can provide strategic intelligence on talent, such as enterprise risk management based on an objective assessment: data driven, on critical skills, numbers, competitive environment and performance.

ORRI represents the first phase of an expanded notion of talent assessment. It concentrates on human talent: federal employees.

Phase two of this model will expand the notion of operating capabilities to include AI agents and robotics. As the AI revolution gains speed and acceptance, we can see that agencies will move toward increased use of these tools to increase productivity and reduce transactional cost of government services. Government customer service and adjudication processes will be assigned to AI agents. Like Amazon, more and more warehouse functions will be assigned to physical robots. Talent will need to include machine capabilities, and the total capabilities/capacity reflect the new performance curve — optimizing talent from various sources. This new reality will require a reset in the way government plans, budgets, deploys talent, and assesses overall performance. Phase three will encompass the government’s formalized external supply chains which represent the non-governmental delivery systems — essentially government by other means. For example, the rise of public/private partnerships is fundamentally changing the nature of federated government; think of NASA and its dependence on Space X, Boeing, Lockheed Martin and others. ORRI will need to expand to accurately capture these alternative delivery systems to overall government performance. As the role of the federal government continues to evolve, so too do our models for planning, managing talent and assessing performance. ORRI provides that framework.

John Mullins served on the Trump 45 Transition Team and later as the senior advisor to the director at OPM. Most recently Mullins served as strategy and business development executive for IBM supporting NASA, the General Services Administration and OPM.

Mark Forman was the first administrator for E-Government and Information Technology (Federal CIO). He most recently served as chief strategy officer at Amida Technology Solutions.

The post Operational Readiness and Resiliency Index: A new model to assess talent, performance first appeared on Federal News Network.

© Getty Images/iStockphoto/ipopba

Businessman hold circle of network structure HR - Human resources. Business leadership concept. Management and recruitment. Social network. Different people.

The End of the Sync Script: Infrastructure as Intent

There’s an open secret in the world of DevOps: Nobody trusts the CMDB. The Configuration Management Database (CMDB) is supposed to be the “source of truth”—the central map of every server, service, and application in your enterprise. In theory, it’s the foundation for security audits, cost analysis, and incident response. In practice, it’s a work of fiction. The moment you populate a CMDB, it begins to rot. Engineers deploy a new microservice but forget to register it. An autoscaling group spins up 20 new nodes, but the database only records the original three. . . 

We call this configuration drift, and for decades, our industry’s solution has been to throw more scripts at the problem. We write massive, brittle ETL (Extract-Transform-Load) pipelines that attempt to scrape the world and shove it into a relational database. It never works. The “world”—especially the modern cloud native world—moves too fast.

We realized we couldn’t solve this problem by writing better scripts. We had to change the fundamental architecture of how we sync data. We stopped trying to boil the ocean and fix the entire enterprise at once. Instead, we focused on one notoriously difficult environment: Kubernetes. If we could build an autonomous agent capable of reasoning about the complex, ephemeral state of a Kubernetes cluster, we could prove a pattern that works everywhere else. This article explores how we used the newly open-sourced Codex CLI and theModel Context Protocol (MCP) to build that agent. In the process, we moved from passive code generation to active infrastructure operation, transforming the “stale CMDB” problem from a data entry task into a logic puzzle.

The Shift: From Code Generation to Infrastructure Operation with Codex CLI and MCP

The reason most CMDB initiatives fail is ambition. They try to track every switch port, virtual machine, and SaaS license simultaneously. The result is a data swamp—too much noise, not enough signal. We took a different approach. We drew a small circle around a specific domain: Kubernetes workloads. Kubernetes is the perfect testing ground for AI agents because it’s high-velocity and declarative. Things change constantly. Pods die; deployments roll over; services change selectors. A static script struggles to distinguish between a CrashLoopBackOff (a temporary error state) and a purposeful scale-down. We hypothesized that a large language model (LLM), acting as an operator, could understand this nuance. It wouldn’t just copy data; it would interpret it.

The Codex CLI turned this hypothesis into a tangible architecture by enabling a shift from “code generation” to “infrastructure operation.” Instead of treating the LLM as a junior programmer that writes scripts for humans to review and run, Codex empowers the model to execute code itself. We provide it with tools—executable functions that act as its hands and eyes—via the Model Context Protocol. MCP defines a clear interface between the AI model and the outside world, allowing us to expose high-level capabilities like cmdb_stage_transaction without teaching the model the complex internal API of our CMDB. The model learns to use the tool, not the underlying API.

The architecture of agency

Our system, which we call k8s-agent, consists of three distinct layers. This isn’t a single script running top to bottom; it’s a cognitive architecture.

The cognitive layer (Codex + contextual instructions): This is the Codex CLI running a specific system prompt. We don’t fine-tune the model weights. Infrastructure moves too fast for fine-tuning: A model trained on Kubernetes v1.25 would be hallucinating by v1.30. Instead, we use context engineering—the art of designing the environment in which the AI operates. This involves tool design (creating atomic, deterministic functions), prompt architecture (structuring the system prompt), and information architecture (deciding what information to hide or expose). We feed the model a persistent context file (AGENTS.md) that defines its persona: “You are a meticulous infrastructure auditor. Your goal is to ensure the CMDB accurately reflects the state of the Kubernetes cluster. You must prioritize safety: Do not delete records unless you have positive confirmation; they are orphans.”

The tool layer: Using MCP, we expose deterministic Python functions to the agent.

  • Sensorsk8s_list_workloadscmdb_query_servicek8s_get_deployment_spec
  • Actuatorscmdb_stage_createcmdb_stage_updatecmdb_stage_delete

Note that we track workloads (Deployments, StatefulSets), not Pods. Pods are ephemeral; tracking them in a CMDB is an antipattern that creates noise. The agent understands this distinction—a semantic rule that is hard to enforce in a rigid script.

The state layer (the safety net): LLMs are probabilistic; infrastructure must be deterministic. We bridge this gap with a staging pattern. The agent never writes directly to the production database. It writes to a staged diff. This allows a human (or a policy engine) to review the proposed changes before they are committed.

The OODA Loop in Action

How does this differ from a standard sync script? A script follows a linear path: Connect → Fetch → Write. If any step fails or returns unexpected data, the script crashes or corrupts data. Our agent follows the Observe-Orient-Decide-Act (OODA) loop, popularized by military strategists. Unlike a linear script that executes blindly, the OODA loop forces the agent to pause and synthesize information before taking action. This cycle allows it to handle incomplete data, verify assumptions, and adapt to changing conditions—traits essential for operating in a distributed system.

Let’s walk through a real scenario we encountered during our pilot, the Ghost Deployment, to explore the benefits of using an OODA loop. A developer had deleted a deployment named payment-processor-v1 from the cluster but forgot to remove the record from the CMDB. A standard script might pull the list of deployments, see payment-processor-v1 is missing, and immediately issue a DELETE to the database. The risk is obvious: What if the API server was just timing out? What if the script had a bug in its pagination logic? The script blindly destroys data based on the absence of evidence. 

The agent approach is fundamentally different. First, it observes: Calling k8s_list_workloads and cmdb_query_service, noticing the discrepancy. Second, it orients: Checking its context instructions to “verify orphans before deletion” and deciding to call k8s_get_event_history. Third, it decides: Seeing a “delete” event in the logs, it reasons that the resource is missing and that there’s been a deletion event. Finally, it acts: Calling cmdb_stage_delete with a comment confirming the deletion. The agent didn’t just sync data; it investigated. It handled the ambiguity that usually breaks automation.

Solving the “Semantic Gap”

This specific Kubernetes use case highlights a broader problem in IT operations: the “semantic gap.” The data in our infrastructure (JSON, YAML, logs) is full of implicit meaning. A label “env: production” changes the criticality of a resource. A status CrashLoopBackOff means “broken,” but Completed means “finished successfully.” Traditional scripts require us to hardcode every permutation of this logic, resulting in thousands of lines of unmaintainable if/else statements. With the Codex CLI, we replace those thousands of lines of code with a few sentences of English in the system prompt: “Ignore jobs that have completed successfully. Sync failing Jobs so we can track instability.” The LLM bridges the semantic gap. It understands what “instability” implies in the context of a job status. We’re describing our intent, and the agent is handling the implementation.

Scaling Beyond Kubernetes

We started with Kubernetes because it’s the “hard mode” of configuration management. In a production environment with thousands of workloads, things change constantly. A standard script sees a snapshot and often gets it wrong. An agent, however, can work through the complexity. It might run its OODA loop multiple times to solve a single issue—by checking logs, verifying dependencies, and confirming rules before it ever makes a change. This ability to connect reasoning steps allows it to handle the scale and uncertainty that breaks traditional automation.

But the pattern we established, agentic OODA Loops via MCP, is universal. Once we proved the model worked for Pods and Services, we realized we could extend it. For legacy infrastructure, we can give the agent tools to SSH into Linux VMs. For SaaS management, we can give it access to Salesforce or GitHub APIs. For cloud governance, we can ask it to audit AWS Security Groups. The beauty of this architecture is that the “brain” (the Codex CLI) stays the same. To support a new environment, we don’t need to rewrite the engine; we just hand it a new set of tools. However, shifting to an agentic model forces us to confront new trade-offs. The most immediate is cost versus context. We learned the hard way that you shouldn’t give the AI the raw YAML of a Kubernetes deployment—it consumes too many tokens and distracts the model with irrelevant details. Instead, you create a tool that returns a digest—a simplified JSON object with only the fields that matter. This is context optimization, and it is the key to running agents cost-effectively.

Conclusion: The Human in the Cockpit

There’s a fear that AI will replace the DevOps engineer. Our experience with the Codex CLI suggests the opposite. This technology does not remove the human; it elevates them. It promotes the engineer from a “script writer” to a “mission commander.” The stale CMDB was never really a data problem; it was a labor problem. It was simply too much work for humans to manually track and too complex for simple scripts to automate. By introducing an agent that can reason, we finally have a mechanism capable of keeping up with the cloud. 

We started with a small Kubernetes cluster. But the destination is an infrastructure that is self-documenting, self-healing, and fundamentally intelligible. The era of the brittle sync script is over. The era of infrastructure as intent has begun!

❌
❌