❌

Reading view

There are new articles available, click to refresh the page.

急成長するネオ・クラりド垂堎AI特化型クラりドは新たな遞択肢になりうるか

爆発的成長が予想されるネオ・クラりドずは

ネオ・クラりドずは、GPU䞭心の高性胜むンフラに特化したクラりドプラットフォヌムを指す。䞻芁なサヌビスは、GPUaaSGPU as a Service、GenAI生成AIプラットフォヌムサヌビス、そしお高容量デヌタセンタヌの提䟛だ。

このネオ・クラりド垂堎、実に驚異的な成長を遂げおいる。調査䌚瀟のSynergy Research Groupによるず、2025幎第2四半期4月-6月期の収益は前幎比205%ずいう高い成長率を蚘録し、50億ドルの倧台を突砎した。2025幎通幎の収益は230億ドルに達する芋蟌みずいう。

急成長の背景には、AIむンフラぞの高い需芁がある。

䌁業のAI需芁は急増しおいるが、ハむパヌスケヌラヌなどのクラりドプロバむダヌは「膚倧なAI需芁に䟛絊を合わせるのに苊劎しおいる状況」ずSynergyの創業者でチヌフアナリストを務めるJeremy Duke氏はコメントしおいる。

埓来、䌁業がAIワヌクロヌドを実行する遞択肢は、オンプレミスかパブリッククラりドず倧きく二択だった。しかし、それぞれに倧きな課題がある。オンプレミスでは、GPUは高䟡で電力消費が倧きく、専門人材の確保や物理的な導入が困難ずいう課題がある。䞀方、Amazon Web ServicesAWS、Microsoft Azure、Google Cloudずいったハむパヌスケヌラヌのパブリッククラりドは、幅広いサヌビスを提䟛する反面、コストの予枬が難しいずいうリスクがある。業界、ナヌスケヌス、組織のルヌルなどの瞛りがある堎合は、デヌタ䞻暩の芳点からも懞念が残る。

ネオ・クラりドは、この二぀以倖の遞択肢ずしお登堎した。ハむパヌスケヌラヌが広範なクラりドサヌビスを提䟛するのに察し、ネオ・クラりドはGPUずAIワヌクロヌドに特化するこずで差別化を図る。

IDCのアゞア/倪平掋 ゚ンタヌプラむズサヌバヌおよびデヌタセンタヌリサヌチグルヌプでア゜シ゚むト・リサヌチディレクタヌを務めるCynthia Ho氏は、「ネオ・クラりド事業者はNVIDIAずの契玄により迅速にリ゜ヌスを確保し、高性胜なサヌビスを提䟛しおいる点でハむパヌスケヌラヌよりも優䜍性がある。AIずいう成長垂堎でシェアを獲埗し぀぀ある」ず述べおいる。

ネオ・クラりドのメリットは「安心しお詊せる堎所」

ネオ・クラりド垂堎の䞻芁プレむダヌには、CoreWeave2017幎創業、Crusoe2018幎創業、Lambda2012幎創業、NebiusYandexから2024幎に誕生、そしおOpenAI2015幎など。最埌のOpenAIは「ChatGPT」を提䟛するが、2025幎初めに発衚したAIむンフラの「Stargate」構想により、今埌垂堎の重芁なプレむダヌになるず芋られおいる。これらに加えお、Applied Digital、DataRobot、Together AIなどの新芏参入も続いおいる。興味深いのは、CoreWeaveのように、暗号マむニング䌁業が高性胜コンピュヌティングサヌビスプロバむダヌぞず転換しおいるケヌスが倚い点だ。

ここたで挙げたネオ・クラりド事業者は米囜や欧州䞭心に展開しおいる倧手だが、ロヌカルで提䟛するネオ・クラりドもある。その1瀟がオヌストラリアで展開するSharon AIだ。2024幎に創業、この11月に最倧50MWの容量拡倧契玄を発衚したばかりだ。

Sharon AIでCTOを務めるDan Mons氏が、Ciscoが11月にオヌストラリア・メルボルンで開催したむベントで、Ciscoのオヌストラリアニュヌゞヌランド バむスプレゞデント兌れネラルマネヌゞャヌを務めるStefan Leitl氏ず察談した。そこではネオ・クラりドの優䜍性ずしお、以䞋が挙がった。

たずコスト優䜍性だ。ハむパヌスケヌラヌず比范しお予枬可胜な䟡栌䜓系を提䟛し、予想倖の課金のリスクを回避できる。埓量課金で予想倖の高額請求が発生するリスクを嫌う䌁業にずっお、これは倧きなメリットだろう。

2぀目ずしお、専門性の深さである。ネオ・クラりドプロバむダヌの倚くは、HPCハむ・パフォヌマンス・コンピュヌティングやスヌパヌコンピュヌティングの分野で知芋を持぀。実際、Mons氏は「AIであたり語られおいない秘密」ずしお、「AIむンフラで必芁な知識の倚くは、HPCやスヌパヌコンピュヌティングの䞖界では40幎以䞊前からやっおきたこず」ず明かす。HPCのバックグラりンドを持぀Mons氏らにずっおは「新しいものではない」ず語る。この専門知識があるからこそ、耇雑なAIワヌクロヌドに察応できるずいう。

3぀目が、迅速なリ゜ヌス提䟛だ。ネオ・クラりドプロバむダヌはNvidiaずの契玄を通じお、ハむパヌスケヌラヌよりも迅速にGPUリ゜ヌスを確保できる。GPU䞍足が続く珟状においお、これは決定的な優䜍性ずなる。

これらに加えお、Mons氏が挙げた興味深い優䜍性が、「安党に倱敗できる堎所」だ。生成AIプロゞェクトの95がPOCから本番環境に到達しないMITレポヌトなどず蚀われおいる。「組織が必芁ずするのは安党に倱敗できる堎所であり、早く倱敗するしお孊ぶ必芁がある。新しい技術を詊しおみるこずが重芁で、その堎所を我々は提䟛できる」ずMons氏。

最埌に䞊がったのが、デヌタ䞻暩ぞの察応だ。「デヌタ䞻暩は倚次元的な問題だ」ずMons氏は指摘する。デヌタなしにAIはない。通貚ずも蚀われるデヌタだが、デヌタの皮類によっおは「デヌタをどこに眮くべきか、埓うべきコンプラむアンス芏制はどれか、地域でスキルをどう芋぀けるか、信頌できるベンダヌをどう芋぀けるかなどを考えなければならない」Mons氏。Sharon AIなどロヌカルで展開するネオ・クラりドはこうした地域特有の芁件に粟通しおおり、グロヌバルなハむパヌスケヌラヌずは異なる䟡倀を提䟛しおいる。

䌁業がネオ・クラりドに泚目すべき理由

実際にネオ・クラりドを利甚する䌁業はどのような䌁業なのか。

Mons氏は、初期顧客の1瀟ずしおVictor Chang心臓研究所を玹介した。埓来の研究をGPUベヌスに移行し、AIを掻甚しおいるずいう。Sharon AIを遞択した理由はGPUぞのアクセスだけでない。医療研究機関が扱うデヌタは機密性が高いため、デヌタ䞻暩は極めお重芁な課題だ。Sharon AIはオヌストラリアの2拠点にむンフラを蚭眮しおおり、デヌタが囜倖に出るこずはないずいう点が魅力だったようだ。

Victor Chang心臓研究所のような研究機関や倧孊に加え、AIサヌビスを開発するスタヌトアップなどもネオ・クラりドの顧客のようだ。「珟時点では、技術的な知識が比范的高い䌁業が組織が倚いようだ」ずHo氏。䞀郚地域では、GPUアクセスに制限のある䞭囜䌁業が顧客ずいうネオ・クラりドもあるず蚀われおいる。

成長は今埌も続きそうだ。Synergyは2030幎たでにネオ・クラりド垂堎の芏暡は玄1800億ドルに到達し、幎平均成長率69%で拡倧するず予枬しおいる。GPUaaS/GenAIプラットフォヌムサヌビス垂堎は珟圚、幎間165%ずいう高い成長率を維持しおおり、ここでネオ・クラりドはかなりのシェアを占めるずいう。

だが課題はある。Ho氏は皌働率を挙げる。「巚倧なむンフラ投資が必芁だが、実際のずころ皌働率はどのぐらいか。利甚は远い぀いおいないのではないか」Ho氏。ネオ・クラりドずいう遞択肢が定着するためには、ナヌスケヌスやメリットをより明確にしお蚎求する必芁もありそうだ。

気になる日本ではどうなのか IDCでシニアリサヌチマネヌゞャヌずしお日本囜内の゚ンタヌプラむズむンフラストラクチャ垂堎を担圓する加藀慎也氏は、日本では「ネオ・クラりド」ず倧々的に名乗る事業者は登堎しおいないず認めながら、経枈産業省の「クラりドプログラム」などによりGPUの倧芏暡投資が発生しおおり、今埌GPUクラりドサヌビスが掻況ずなる可胜性を瀺唆した。実際に「GPUをフルに掻甚できる環境ずしお最適化されおおり、パフォヌマンスのメリットは倧きい」ず話す。 珟時点での甚途は研究開発や孊術研究などこれたでHPCを甚いおいたナヌスケヌスがわかりやすいものの、䌁業も動向に泚目しおおく必芁はありそうだ。日本でもネオ・クラりドサヌビスが本栌化するかずいう提䟛者偎の課題はあるものの、「HPCおよびAI領域においお、すでにGPUを利甚しおいたり倧芏暡な需芁があったりする䌁業は、クラりドにある効率的・効果的なむンフラずいう遞択肢になる可胜性はある」ず加藀氏。「ネオ・クラりドの重芁な特城にコストずスピヌドがある。AIを掻甚する分野で、開発力で競争優䜍性を枬るためにも遞択肢の1぀になりうるず泚目しおおいお良いだろう」ず続けた。

IBM to buy Confluent to extend its data and automation portfolio

IBM has agreed to acquire cloud-native enterprise data streaming platform Confluent in a move designed to expand its portfolio of tools for building AI applications

The company said Monday in a release that it sees Confluent as a natural fit for its hybrid cloud and AI strategy, adding that the acquisition is expected to “drive substantial product synergies” across its portfolio.

Confluent connects data sources and cleans up data. It built its service on Apache Kafka, an open-source distributed event streaming platform, sparing its customers the hassle of buying and managing their own server clusters in return for a monthly fee per cluster, plus additional fees for data stored and data moved in or out. 

IBM expects the deal, which it valued at $11 billion, to close by the middle of next year.

Confluent CEO and co-founder Jay Kreps stated in an email sent internally to staff about the acquisition, “IBM sees the same future we do: one in which enterprises run on continuous, event-driven intelligence, with data moving freely and reliably across every part of the business.”

It’s a good move for IBM, noted Scott Bickley, an advisory fellow at Info-Tech Research Group. “[Confluent] fills a critical gap within the watsonx platform, IBM’s next-gen AI platform, by providing the ability to monitor real-time data,” he said, and is based on the industry standard for managing and processing real-time data streams. 

He added, “IBM already has the pieces of the puzzle required to build and train AI models; Confluent provides the connective tissue to saturate those models with continuous live data from across an organization’s entire operation, regardless of the source. This capability should pave the road ahead for more complex AI agents and applications that will be able to react to data in real time.”

He also pointed out that the company is playing the long game with this acquisition, which is its largest in recent history. “IBM effectively positions itself proactively to compete against the AI-native big data companies like Snowflake and Databricks, who are all racing towards the same ‘holy grail’ of realizing AI agents that can consume, process, and react to real-time data within the context of their clients’ trained models and operating parameters,” he said, adding that IBM is betting that a full-stack vertical AI platform, watsonx, will be more appealing to enterprise buyers than a composable solution comprised of various independent components.

The move, he noted, also complements previous acquisitions such as the $34.5 billion acquisition of Red Hat and the more recent $6.4 billion acquisition of Hashicorp, all of which are built upon dominant open source standards including Linux, Terraform/Vault, and Kafka. This allows IBM to offer a stand-alone vertical, hybrid cloud strategy with full-stack AI capabilities apart from the ERP vendor space and the point solutions currently available.

In addition, he said, the timing was right; Confluent has been experiencing a slowing of revenue growth and was reportedly shopping itself already.

“At the end of the day, this deal works for both parties. IBM is now playing a high-stakes game and has placed its bet that having the best AI models is not enough; it is the control of the data flow that will matter,” he said.

Tech marketplaces: Solving the last-mile billing barrier to global growth

According to an IoT Analytics report from early 2024, 1.8% of global enterprise software was sold via marketplaces in 2023 and is forecasted to grow to nearly 10% by 2030. Although this represents a minority share today, it is the segment growing at a much faster pace than any other IT sales channel.

The concept of a technology marketplace as a central hub for software distribution predates the cloud, but I believe its current surge is driven by a fundamentally new dynamic. Cloud giants, or hyperscalers, have reinvented the model by transforming independent software vendors (ISVs) into a motivated army of sales channels. What are the keys to this accelerated growth? And what is the role of the principal actors in this new era of technology commercialization?

The new hyperscaler-ISV economic symbiosis

This new wave of marketplaces is spearheaded by hyperscalers, whose strategy I see as centered on an economic symbiosis with ISVs. The logic is straightforward: an ISV’s software runs on the hyperscaler’s infrastructure. Consequently, every time an ISV sells its solution, it directly drives increased consumption of cloud services, generating a dual revenue stream for the platform.

This pull-through effect, where the ISV’s success translates directly into the platform’s success, is the core incentive that has motivated hyperscalers to invest heavily in developing their marketplaces as a strategic sales channel.

The five players in the marketplace ecosystem

The marketplace ecosystem involves and impacts five key players: the ISV, the hyperscaler, the end customer, the distributor and the reseller or local hyperscaler partner. Let’s examine the role of each.

The ISV as the innovative specialist

In essence, I see the ISV as the entity that transforms the hyperscaler’s infrastructure into a tangible, high-value business solution for the end customer. For ISVs, the marketplace is a strategic channel that dramatically accelerates their time-to-market. It allows them to simplify transactional complexities, leverage the hyperscaler’s global reach and tap into the budgets of customers already under contract with the platform. This can even extend to mobilizing the hyperscaler’s own sales teams as an indirect channel through co-selling programs.

However, in my view, this model presents challenges for the ISV, primarily in managing customer relationships and navigating channel complexity. By operating through one or two intermediaries (the hyperscaler or a local partner), the ISV inevitably cedes some control over and proximity to the end customer.

Furthermore, while partner-involved arrangements simplify the transaction for the customer, they introduce a new layer of complexity for the ISV, who must now manage margin agreements, potential channel conflicts and the tax implications of an indirect sales structure, especially in international transactions.

The hyperscaler as the ecosystem enabler

As the ecosystem enabler, the hyperscaler provides the foundational infrastructure upon which ISVs operate. By leveraging their massive global customer base, I see hyperscalers strategically promote the marketplace with a dual objective: to increase customer loyalty and retention (stickiness) and to drive the cloud consumption generated by these ISVs.

In doing so, the hyperscaler transcends its original role to become the central operator of the ecosystem, assuming what I believe is a new, influential function as a financial and commercial intermediary.

The end customer as the center of gravity

In this ecosystem, the end customer acts as the center of gravity. Their influence stems from their business needs and, most critically, their budget. Both hyperscalers and ISVs align their strategies to meet the customer’s primary demand: transforming a traditionally complex procurement process into a centralized and efficient experience.

However, this appeal can be diminished by operational constraints. A primary limitation arises in territories where the customer cannot pay for purchases in the local currency. This entails managing payments in foreign currencies, reintroducing a level of fiscal and exchange-rate complexity that counteracts the very simplicity that drew them to the marketplace.

The partner as the local reseller

The partner acts as a local reseller in the customer’s procurement process, particularly in countries where the hyperscaler does not have a direct billing entity. In this model, the reseller manages the contractual relationship and invoices the end customer in the local currency, simplifying the transaction for the customer.

This arrangement, however, challenges the marketplace model, which was designed for direct transactions between the hyperscaler and the customer. When a local reseller becomes the billing intermediary, the standard model becomes complicated as it does not natively account for the elements the partner introduces:

  • Partner margin: The payment flow must accommodate the reseller’s commission.
  • Credit risk: The partner, not the hyperscaler, assumes the risk if the end customer defaults on payment.
  • Tax implications: The partner must manage the complexities of international invoicing and related withholding taxes (WHT).

This disconnect has been, in my analysis, a significant barrier to the global expansion of ISV sales through marketplaces in regions where the hyperscaler lacks a legal entity.

The distributor as an aggregator being replaced

Historically, distributors have been the major aggregators in the technology ecosystem, managing relationships and contracts with thousands of ISVs and leading the initial wave of software commercialization. In the new era of digital distribution, however, hyperscaler marketplaces have emerged as a formidable competitor.

In my opinion, the marketplace model strikes at the core of the software distribution business by offering a more efficient platform for transacting digital assets. This leaves distributors to compete primarily on their advantage in handling tangible technology assets.

Key trends: Two noteworthy cases in marketplaces

The strategic use of cloud consumption commitments: A key driver accelerating marketplace adoption is its integration with annual and multiyear cloud consumption contracts. These agreements, in which a customer commits to a minimum expenditure, can often be used to purchase ISV solutions from the marketplace. This creates what I see as a threefold benefit:

  1. The customer can leverage a pre-approved budget to acquire new technology, expediting procurement.
  2. The ISV can close sales faster by overcoming budget hurdles.
  3. The hyperscaler ensures the customer fulfills their consumption commitment, thereby increasing retention.

The integration of professional services is the missing piece: A traditional limitation of marketplaces was their focus solely on software transactions, excluding the professional services (e.g., consulting, migration, implementation) required to deploy them. This created a process gap, forcing customers to manage a separate services contract.

While I have seen the inclusion of some professional services packages directly in marketplaces, this is not universally available for all ISVs. As a result, professional services remain the key missing link needed to complete the sale and offer the customer a comprehensive solution (software + services) in a single transaction.

Key actions for the ecosystem

This new wave of marketplaces is expected to continue its accelerated growth and capture a significant share of the technology distribution market. Assuming this transition is inevitable, I offer the following strategic recommendations for the ecosystem’s key players.

ISVs: Adapt the commercial model to the channel

I believe ISVS must incorporate the costs associated with the partner channel into their marketplace pricing strategy. When a sale requires a local reseller, the ISV’s commercial model must account for a clear partner margin and the impact of withholding taxes.

I’ve seen that failure to do so will disincentivize the partner from promoting the solution, potentially blocking the sale or, more likely, leading them to offer a competing solution that protects their profitability.

Hyperscalers: Resolve global billing friction

To realize the full global growth potential of the marketplace, hyperscalers must overcome the obstacle of international billing. The solution lies in one of two paths:

  1. Direct investment: Establish local subsidiaries in strategic countries to enable local currency invoicing and ensure compliance with regional tax regulations.
  2. Channel enablement: Design a financially viable model that empowers and compensates local partners to manage billing, assume credit risk and handle administrative complexity in exchange for a clear margin.

Customers: Establish governance and clarity in the billing model

The very simplicity that makes the marketplace attractive is also its greatest risk. The ease of procurement can lead to uncontrolled spending or the acquisition of redundant solutions if clear governance policies are not implemented.

It is essential to establish centralized controls to manage who can purchase and what can be purchased, thereby preventing agility from turning into a budgetary liability.

Customers must also verify whether a transaction will be billed directly by the hyperscaler (potentially involving an international payment in a foreign currency) or through a local partner. This distinction is critical as it determines the vendor of record and has direct implications for managing local taxes and withholding.

Partners: Proactively protect your profitability

From my analysis, the primary risk for a partner is financial; specifically, a loss of profitability when a managed client purchases directly from the marketplace, as this eliminates the partner’s margin and creates tax uncertainty. Attempting to resolve this retroactively with a penalty clause is often contentious and difficult to enforce.

The solution must be preventative and contractual. A partner of record agreement should be established with the client at the outset of the relationship. This agreement must clearly stipulate that, in exchange for the value the partner provides (e.g., consulting, support, local management), they will be the designated channel for all marketplace transactions.

This protects the partner’s profitability, prevents losses from unmanaged transactions and aligns the interests of the client and the partner, ensuring the partner’s value is recognized and compensated with every purchase.

Distributors: Differentiate your value

Faced with diminishing relevance due to hyperscaler marketplaces, distributors must redefine their value proposition. Their strategy should focus on developing an ecosystem of value-added services on their own platform to encourage direct customer purchases and compete more effectively.

The final frontier of frictionless growth

The shift to marketplace distribution is an undeniable force that will reshape how enterprise technology is bought and sold globally. However, the true promise of this model (frictionless, one-stop procurement for the end customer) remains constrained by the very complexities it seeks to eliminate: international billing, channel compensation and tax adherence.

The transition from a domestic (US-centric), direct-sale mindset to a truly global, indirect channel model is the final frontier. Those who solve the “last mile” of global channel and billing complexity will be the ones to truly own the future of enterprise software distribution.

This article is published as part of the Foundry Expert Contributor Network.
Want to join?

Meet the MAESTRO: AI agents are ending multi-cloud vendor lock-in

For today’s CIO, the multi-cloud landscape, extending across hyperscalers, enterprise platforms, and AI-native cloud providers, is a non-negotiable strategy for business resilience and innovation velocity. Yet, this very flexibility can become a liability, often leading to fragmented automation, vendor sprawl, and costly data silos. The next frontier in cloud optimization isn’t better scripting—it’s Agentic AI systems.

These autonomous, goal-driven systems, deployed as coordinated multi-agent ecosystems, act as an enterprise’s “MAESTRO.” They don’t just follow instructions; they observe, plan, and execute tasks across cloud boundaries in real-time, effectively transforming vendor sprawl from a complexity tax into a strategic asset.

The architecture of cross-cloud agent interoperability

The core challenge in a multi-cloud environment is not the platforms themselves, but the lack of seamless interoperability between the automation layers running on them. The MAESTRO architecture (referencing the Cloud Security Alliance’s MAESTRO agentic AI threat modeling framework; MAESTRO stands for multi-agent environment, security, threat, risk and outcome) solves this by standardizing the language and deployment of these autonomous agents:

1. The open standards bridge: A2A protocol

For agents to coordinate effectively—to enable a FinOps agent on one cloud to negotiate compute resources with an AIOps agent on another cloud—they must speak a common, vendor-agnostic language. This is where the emerging Agent2Agent (A2A) protocol becomes crucial.

The A2A protocol is an open, universal standard that enables intelligent agents, regardless of vendor or underlying model, to discover, communicate, and collaborate. It provides the technical foundation for:

  • Dynamic capability discovery: Agents can publish their identity and skills, allowing others to discover and connect without hard-coded integrations.
  • Context sharing: Secure exchange of context, intent, and status, enabling long-running, multi-step workflows like cross-cloud workload migration or coordinated threat response.

To fully appreciate the power of the Maestro architecture, consider a critical cross-cloud workflow: strategic capacity arbitrage and failover. A FinOps agent on a general-purpose cloud is continuously monitoring an AI inference workload’s service level objectives(SLOs) and cost-per-inference. When a sudden regional outage is detected by an AIOps agent on the same cloud, the AIOps agent broadcasts a high-priority “capacity sourcing” intent using the A2A protocol. The Maestro orchestrates an immediate response, allowing the FinOps agent to automatically negotiate and provision the required GPU capacity with a specialized neocloud agent. Simultaneously, a security agent ensures the new data pipeline adheres to the required data sovereignty rules before the workload migration agent seamlessly shifts the portable Kubernetes container to the new, available capacity, all in under a minute to maintain continuous model performance. This complex, real-time coordination is impossible without the standardized language and interoperability provided by the A2A protocol and the Kubernetes-native deployment foundation.

2. The deployment foundation: Kubernetes-native frameworks

To ensure agents can be deployed, scaled, and managed consistently across clouds, we must leverage a Kubernetes-native approach. Kubernetes is already the de facto orchestration layer for enterprise cloud-native applications. New Kubernetes-native agent frameworks, like kagent, are emerging to extend this capability directly to multi-agent systems.

This approach allows the Maestro to:

  • Zero-downtime agent portability: Package agents as standard containers, making it trivial to move a high-value security agent from one cloud to another for resilience or cost arbitrage.
  • Observability and auditability: Leverage Kubernetes’ built-in tools for monitoring, logging, and security to gain visibility into the agent’s actions and decision-making process, a non-negotiable requirement for autonomous systems.

Strategic value: Resilience and zero lock-in

The Maestro architecture fundamentally shifts the economics and risk profile of a multi-cloud strategy.

  • Reduces vendor lock-in: By enforcing open standards like A2A, the enterprise retains control over its core AI logic and data models. The Maestro’s FinOps agents are now capable of dynamic cost and performance arbitrage across a more diverse compute landscape that includes specialized providers. Neoclouds are purpose-built for AI, offering GPU-as-a-Service (GPUaaS) and unique performance advantages for training and inference. By packaging AI workloads as portable Kubernetes containers, the Maestro can seamlessly shift them to the most performant or cost-effective platform—whether it’s an enterprise cloud for regulated workloads, or a specialized AI-native cloud for massive, high-throughput training. As BCG emphasizes, managing the evolving dynamics of digital platform lock-in requires disciplined sourcing and modular, loosely coupled architectures. The agent architecture makes it dramatically easier to port or coordinate high-value AI services, providing true strategic flexibility.
  • Enhances business resilience (AIOps): AIOps agents, orchestrated by the Maestro, can perform dynamic failover, automatically redirecting traffic or data pipelines between regions or providers during an outage. Furthermore, the Maestro can orchestrate strategic capacity sourcing, instantly rerouting critical AI inference workloads to available, high-performance GPU capacity offered by specialized neoclouds to ensure continuous model performance during a regional outage on a general-purpose cloud. They can also ensure compliance by dynamically placing data or compute in the “greenest” (most energy-efficient) cloud or the required sovereign region to meet data sovereignty rules.

The future trajectory

The shift to the Maestro architecture represents more than just a technological upgrade; it signals the true democratization of the multi-cloud ecosystem. By leveraging open standards like A2A, the enterprise is moving away from monolithic vendor platforms and toward a vibrant, decentralized marketplace of agentic services. In this future state, enterprises will gain access to specialized, hyper-optimized capabilities from a wide array of providers, treating every compute, data, or AI service as a modular, plug-and-play component. This level of strategic flexibility fundamentally alters the competitive landscape, transforming the IT organization from a consumer of platform-centric services to a strategic orchestrator of autonomous, best-of-breed intelligence. This approach delivers the “strategic freedom from vendor lock-in” necessary to continuously adapt to market changes and accelerate innovation velocity, effectively turning multi-cloud complexity into a decisive competitive advantage.

Governance: Managing the autonomous agent sprawl

The power of autonomous agents comes with the risk of “misaligned autonomy”—agents doing what they were optimized to do, but without the constraints and guardrails the enterprise forgot to encode. Success requires a robust governance framework to manage the burgeoning population of agents.

  • Human-in-the-loop (HITL) for critical decisions: While agents execute most tasks autonomously, the architecture must enforce clear human intervention points for high-risk decisions, such as a major cost optimization that impacts a business-critical service or an automated incident response that involves deleting a core data store. Gartner emphasizes the importance of transparency, clear audit trails, and the ability for humans to intervene or override agent behavior. In fact, Gartner predicts that by 2028, loss of control—where AI agents pursue misaligned goals—will be the top concern for 40% of Fortune 1000 companies.
  • The 4 pillars of agent governance: A strong framework must cover the full agent lifecycle:
    1. Lifecycle management: Enforcing separation of duties for development, staging, and production.
    2. Risk management: Implementing behavioral guardrails and compliance checks.
    3. Security: Applying least privilege access to tools and APIs.
    4. Observability: Auditing every action to maintain a complete chain of reasoning for compliance and debugging.

By embracing this Maestro architecture, CIOs can transform their multi-cloud complexity into a competitive advantage, achieving unprecedented levels of resilience, cost optimization, and, most importantly, strategic freedom from vendor lock-in.

This article is published as part of the Foundry Expert Contributor Network.
Want to join?

Why cyber resilience must be strategic, not a side project

As one of the world’s foremost voices on cybersecurity and crisis leadership, Sarah Armstrong-Smith has spent her career at the intersection of technology, resilience and human decision-making. Formerly chief security advisor at Microsoft Europe, and now a member of the UK Government Cyber Advisory Board, she is widely recognized for her ability to translate complex technical challenges into actionable business strategy.

In this exclusive interview with The Cyber Security Speakers Agency, Sarah explores how today’s CIOs must evolve from technology enablers to resilience architects — embedding cyber preparedness into the core of business strategy. Drawing on decades of experience leading crisis management and resilience functions at global organizations, she offers a masterclass in how technology leaders can balance innovation with security, manage disruption with clarity and build cultures of trust in an era defined by volatility and digital interdependence.

For business and technology leaders navigating the next wave of transformation, Sarah’s insights offer a rare blend of strategic depth and practical foresight — a roadmap for leadership in the age of perpetual disruption.

1. As digital transformation accelerates, how can CIOs embed cyber resilience into the very fabric of business strategy rather than treating it as a separate function?

Cyber resilience should be recognised as a strategic enabler, not merely a technical safeguard. CIOs must champion a holistic approach where resilience is woven into every stage of digital transformation — from initial design through to deployment and ongoing operations.

This requires close collaboration with business leaders to ensure risk management and security controls are embedded from the outset, rather than being an afterthought. By aligning cyber resilience objectives with business outcomes, CIOs can work alongside CISOs to help their organizations anticipate threats, adapt rapidly to disruptions and maintain stakeholder trust.

Embedding resilience also demands a shift in organizational mindset. CIOs should help to foster a culture where every employee understands their role in protecting digital assets and maintaining operational service.

This involves education and cross-functional exercises that simulate real-world incidents, aligned to current threats. By making resilience a shared responsibility and a key performance metric, CIOs can ensure their organizations are not only prepared to withstand a range of threats but are also positioned to recover quickly and thrive in the face of adversity.

2. CIOs and CISOs often face tension between innovation and security. What’s your advice for maintaining that balance while still driving progress?

Balancing innovation and security are constant challenges that require CIOs to act as both risk managers and business catalysts. The key is to embed security and resilience considerations early into the innovation lifecycle, ensuring new technologies and processes are assessed for risk early and often.

CIOs should promote agile governance frameworks that allow for rapid experimentation while maintaining clear guardrails around information protection, compliance and operational integrity. By involving security teams from the outset, organizations can identify potential vulnerabilities before they become systemic issues.

At the same time, CISOs must avoid creating a culture of fear that stifles creativity. Instead, they should encourage responsible risk-taking by providing teams with the tools, guidance and autonomy to innovate securely.

This includes leveraging automation, zero-trust architectures and continuous monitoring to reduce vulnerabilities and enable faster, safer deployment of solutions. Ultimately, the goal is to create an environment where innovation and security are mutually reinforcing, driving competitive advantage and organizational resilience.

3. You’ve led crisis management and resilience teams across major organizations. What leadership lessons can CIOs take from managing incidents under pressure?

Effective crisis leadership is built on preparation, decisiveness and transparent communication. CIOs must ensure their teams are well-versed in incident response and empowered to act swiftly when an incident occurs.

This means investing in due diligence, having clear escalation paths and robust playbooks that outline the critical path, and designated roles and responsibilities. During a crisis, leaders must remain calm, protect critical assets and make informed decisions based on real-time intelligence.

Equally important is the ability to communicate clearly with both internal and external stakeholders. CIOs and CISOs should work in unison to provide timely updates to the board, regulators and customers, balancing transparency with the need to protect vulnerable people and sensitive data.

Demonstrating accountability and empathy during a crisis can help preserve trust and minimise reputational damage. After the incident, leaders should be thoroughly committed to post-mortems to identify ‘no blame’ lessons learned and drive continuous improvement, ensuring the organization emerges stronger and more resilient.

4. With AI transforming both security threats and defences, what role should CIOs play in governing ethical and responsible AI adoption?

CIOs are uniquely positioned to guide the ethical deployment of AI and emerging tech, balancing innovation with risk management and societal responsibility. They should contribute to governance frameworks that address data privacy, algorithmic bias and transparency, ensuring AI systems are designed and operated in accordance with core organizational policies and regulatory requirements. This involves collaborating with legal, compliance and HR teams to develop policies that safeguard against unintended consequences and consequential impact.

Additionally, CIOs should champion ongoing education and awareness around AI ethics, both within IT and across the wider organization. By fostering a culture of accountability and continuous learning, CIOs can help teams identify and mitigate risks associated with AI through the implementation of rigorous engineering principles.

Regular technical and security assessments and stakeholder engagement is essential to maintaining trust and ensuring AI adoption delivers positive outcomes for those most impacted by it.

5. In your experience, what distinguishes organizations that recover stronger from a cyber incident from those that struggle to regain trust?

Organizations that recover stronger from cyber incidents typically demonstrate resilience through proactive planning, transparent communication and a commitment to continuous improvement. They invest in proactive and reactive capabilities and a positive culture driven by empathetic leadership, empowerment and accountability.

When an incident occurs, these organizations respond swiftly, contain the threat and communicate transparently with stakeholders about the actions being taken to remediate and reduce future occurrences.

Conversely, organizations that struggle often lack preparedness and fail to engage stakeholders effectively. Delayed or inconsistent communication can erode trust and amplify reputational damage.

The most resilient organizations treat incidents and near-misses as learning opportunities, conducting thorough post-incident reviews and implementing changes to strengthen their defences. By prioritising transparency, accountability and a culture of resilience, CIOs can help their organizations not only recover but also enhance their reputation and stakeholder confidence.

6. How can CIOs cultivate a security-first culture across non-technical teams — especially in remote or hybrid work environments?

Cultivating a security-first culture requires CIOs and CISOs to make cybersecurity relevant and accessible to all employees, regardless of technical expertise. This starts with tailored training programmes that address the specific risks faced by different stakeholders, rather than a one-size-fits-all approach.

This should leverage engaging formats – like interactive workshops, gamified learning and real-world simulations to reinforce positive behaviors and outcomes

Beyond training, CIOs and CISOs must embed security into everyday workflows by providing user-friendly tools and clear guidance. Regular communication, visible leadership and recognition of positive security behaviors can help sustain momentum.

In hybrid environments, CIOs should ensure policies are dynamic and adaptive to evolving threats, enabling employees to work securely without sacrificing productivity. By fostering a sense of shared responsibility and empowering non-technical teams, CIOs can build a resilient culture that extends beyond the IT department.

7. Boards are increasingly holding CIOs accountable for resilience and risk. How can technology leaders communicate complex security risks in business language?

To effectively engage boards, CIOs must translate technical issues into enterprise risks, framing cybersecurity and resilience as a strategic imperative rather than a technical challenge. This involves articulating how exposure to specific threats could affect safety, revenue, reputation, regulatory compliance and operational services. CIOs and CISOs should use clear, non-technical language, supported by real-world scenarios, to illustrate the potential consequences of ineffective controls and the value of resilience investments.

Regular, structured and diligent reporting — such as dashboards, heat maps and risk registers — can help boards visualise enterprise risk exposure and track progress over time. CIOs should foster open dialogue, encouraging board members to ask questions and participate in scenario planning.

By aligning security discussions with business objectives and demonstrating the ROI of resilience initiatives, technology and security leaders can build trust and secure the support needed to drive meaningful change.

8. What emerging risks or trends should CIOs be preparing for in 2025 and beyond?

CIOs must stay ahead of a rapidly evolving threat landscape, characterised by the proliferation of AI-enabled attacks, supply chain vulnerabilities and targeted campaigns. The rise of quantum computing poses long-term risks to traditional encryption methods, necessitating understanding and early exploration of quantum-safe solutions.

Additionally, regulatory scrutiny around data sovereignty and ethical AI is intensifying, requiring codes of conduct and governance strategies.

Beyond technology, CIOs should anticipate continuous shifts in workforce dynamics, such as the increase in human-related threats. Societal risks, geopolitical instability and the convergence of physical and cyber threats are also shaping the resilience agenda. By maintaining a forward-looking perspective and investing in adaptive capabilities, leaders can position their organizations to navigate uncertainty and capitalize on emerging opportunities.

9. How important is collaboration between CIOs and other business leaders, such as CFOs and CHROs, in building organizational resilience?

Collaboration across the entire C-suite is essential for building holistic resilience that encompasses people, technology, finance and processes. CIOs must work closely with CFOs to align resilience investments with business priorities and CROs to ensure risk management strategies are financially sustainable. Engaging CHROs is equally important, as workforce readiness and culture play a critical role in responding to and recovering from disruptions.

Joint initiatives such as cross-functional crisis simulations, integrated risk assessments and shared accountability frameworks can help break down silos and foster a unified approach to resilience.

By leveraging diverse perspectives and expertise, CIOs can drive more effective decision-making and ensure resilience is embedded throughout the organization. Ultimately, strong collaboration enables organizations to reduce assumptions, anticipate challenges, respond cohesively and emerge stronger in times of adversity.

10. Finally, what personal qualities do you believe future-ready CIOs must develop to lead effectively through constant disruption?

Future-ready CIOs must embody adaptability, strategic vision and emotional intelligence. The pace of technological change and the frequency of disruptive events demand leaders who can pivot quickly, embrace uncertainty and inspire confidence in their teams. CIOs should cultivate an inquisitive mindset, continuously seeking new knowledge and challenging conventional wisdom to stay ahead of emerging trends.

Equally important are communication and collaboration skills. CIOs must be able to articulate complex ideas clearly, build consensus across diverse stakeholders and foster a culture of trust and accountability.

Resilience, empathy and a commitment to ethical leadership will enable CIOs to navigate challenges with integrity and guide their organizations through periods of uncertainty and transformation. By developing these qualities, CIOs can lead with purpose and drive sustainable success in an ever-changing landscape.

This article is published as part of the Foundry Expert Contributor Network.
Want to join?

゚ンゞニア芖点から芋たLLM゚ヌゞェント実装入門──フレヌムワヌク遞定からプロトタむプ構築たで

アヌキテクチャの党䜓像を抌さえる

最初の䞀歩ずしお重芁なのは、LLM゚ヌゞェントシステムの基本的なアヌキテクチャを頭の䞭で描けるようにするこずです。倚くの堎合、䞭栞にはLLM掚論APIがあり、その呚囲にプロンプトテンプレヌト、ツヌル矀、メモリストア、RAG甚のベクトルデヌタベヌス、ログやモニタリングの仕組みが配眮されたす。゚ヌゞェント自䜓は、これらを組み合わせた「オヌケストレヌション局」ずしお実装され、芳察・思考・行動のルヌプを管理したす。

クラむアントからのリク゚ストは、たずアプリケヌションサヌバヌを通じお゚ヌゞェントに枡されたす。゚ヌゞェントは、珟圚のコンテキストずメモリをもずにプロンプトを構築し、LLM APIを呌び出したす。LLMから返っおきた出力のうち、ツヌル呌び出しが含たれおいる郚分はパヌスされ、察応するツヌル関数や倖郚APIが実行されたす。その結果が再び゚ヌゞェントに戻り、次のステップのプロンプトに組み蟌たれ、ルヌプが続きたす。

RAGを組み蟌む堎合は、゚ヌゞェントが必芁に応じお怜玢ツヌルを呌び出し、ナヌザヌの質問やタスクに関連するドキュメントをベクトルデヌタベヌスから取埗したす。取埗したテキストは、LLMのコンテキストに組み蟌たれ、事実ベヌスの回答や刀断を支えたす。メモリストアは、ナヌザヌごずの長期的な情報やタスクの䞭間状態を保持し、次回以降のむンタラクションでも掻甚されたす。

このような構造を意識するこずで、「どこを先に䜜り、どこを埌から差し替え可胜に保぀か」ずいう蚭蚈刀断がしやすくなりたす。たずえば、最初は単玔なRDBMSをメモリストアずしお䜿い、埌から専甚のベクトルデヌタベヌスやキャッシュ局を远加するずいった段階的なアプロヌチが可胜になりたす。

フレヌムワヌク遞定ず小さなプロトタむプ

実装手段ずしおは、各瀟やコミュニティが提䟛する゚ヌゞェントフレヌムワヌクやワヌクフロヌ゚ンゞンを利甚する方法ず、自前で薄いオヌケストレヌションレむダヌを曞く方法がありたす。どちらを遞ぶにせよ、「最初から完璧な基盀を䜜ろうずしない」こずが成功の鍵です。

フレヌムワヌクを遞ぶ際には、察応しおいるLLMプロバむダ、ツヌル連携のしやすさ、ステヌト管理の仕組み、ログやモニタリングの機胜などを確認したす。たた、コヌドの読みやすさや拡匵のしやすさも重芁です。゚ヌゞェントの振る舞いを现かく制埡したくなる堎面は必ず蚪れるため、ブラックボックスに芋えるフレヌムワヌクよりも、䞭身を理解しやすいものを遞ぶ方が長期的には安党です。

最初のプロトタむプずしおは、䞀぀の明確なナヌスケヌスに特化した゚ヌゞェントを䜜るのがよいでしょう。たずえば、りェブ怜玢ず瀟内RAGを組み合わせおレポヌト草案を䜜るリサヌチ゚ヌゞェントや、瀟内のFAQを参照しながら埓業員の問い合わせに答えるヘルプデスク゚ヌゞェントなどです。この段階では、認蚌や耇雑な暩限管理、スケヌリング戊略などは最䜎限にずどめ、ずにかく゚ヌゞェントの「手觊り」をチヌムで共有するこずが目的になりたす。

プロトタむプの䞭では、ツヌルを二、䞉個に絞り、メモリもセッション内の簡易なものに留めるず実装が楜になりたす。その代わり、ログを䞁寧に残し、どのようなプロンプトがどのような出力を生んだのか、ツヌルの呌び出しが成功したのか倱敗したのかを可芖化する仕組みを敎えおおくず、埌の改善に圹立ちたす。

開発プロセスずテスト・評䟡の工倫

LLM゚ヌゞェント開発で゚ンゞニアが戞惑いやすいのが、テストの難しさです。同じ入力に察しお同じ応答が返らないこずも倚く、埓来の単䜓テストやスナップショットテストの手法をそのたた適甚するこずは困難です。そこで重芁になるのが、シナリオベヌスの評䟡ず、自動評䟡ず人手評䟡の組み合わせです。

具䜓的には、兞型的なタスクシナリオを耇数甚意し、それぞれに぀いお期埅される振る舞いの条件を定矩したす。たずえば「この問い合わせに察しおは、瀟内芏皋の該圓箇所を匕甚し぀぀、䞉぀の遞択肢を提瀺する」ずいったレベルです。゚ヌゞェントを定期的にこれらのシナリオに察しお実行し、LLMを甚いた自動評䟡やルヌルベヌスのチェッカヌで合吊を刀定したす。これに加えお、重芁なシナリオに぀いおは人手によるレビュヌを行い、䞻芳的な品質も確認したす。

開発プロセスずしおは、プロンプトやツヌル構成を頻繁に倉曎できるようにし぀぀、倉曎の圱響範囲を把握するための評䟡ゞョブをCIに組み蟌むずよいでしょう。゚ヌゞェントの蚭定を倉曎するたびに、シナリオ評䟡を走らせ、重芁指暙の倉化を可芖化したす。これにより、「䞀぀のナヌスケヌスを改善した぀もりが、別のナヌスケヌスを劣化させおしたった」ずいった事態を早期に怜知できたす。

最埌に、運甚フェヌズでは、ナヌザヌのフィヌドバックずログ分析が重芁な情報源になりたす。ナヌザヌに簡単に「この回答は圹に立ったか」「どこが問題だったか」を送信しおもらえるむンタヌフェヌスを甚意し、その情報をログず玐づけお分析するこずで、改善の優先順䜍を決めるこずができたす。゚ンゞニアは、モデルやプロンプトの調敎だけでなく、ツヌルの远加・削陀、メモリ戊略の芋盎し、゚ラヌ凊理の匷化など、システム党䜓を察象ずした改善を継続的に行うこずになりたす。

LLM゚ヌゞェント実装は、単なるAPI呌び出しのラッパヌ䜜りではなく、掚論システム、ワヌクフロヌ、デヌタ基盀、UXが亀差する総合栌闘技のような領域です。しかし、小さなプロトタむプから始め、アヌキテクチャの骚栌を意識しながら埐々に拡匵しおいけば、珟実的なコストで本番運甚に耐えうる゚ヌゞェントを育おおいくこずができたす。

安党なLLM゚ヌゞェントを䜜るためのリスクずガバナンス──幻芚・セキュリティ・法的責任

LLM゚ヌゞェント特有のリスクの党䜓像

たず抌さえおおきたいのは、LLM゚ヌゞェントのリスクは、単䞀の技術的問題ではなく、耇数のレむダヌにたたがっおいるずいう点です。ひず぀は、LLMそのものが持぀幻芚の問題です。もっずもらしいが誀った情報を自信満々に語っおしたう振る舞いはよく知られおいたすが、゚ヌゞェントずしお倖郚ツヌルにアクセスする堎合、この誀りが具䜓的なアクションに぀ながっおしたう可胜性がありたす。存圚しないAPI゚ンドポむントを呌び出そうずしたり、誀った条件でデヌタを抜出したりするこずは、業務プロセスに盎接的な圱響を䞎えたす。

次に、セキュリティずプラむバシヌのリスクがありたす。゚ヌゞェントは、ナヌザヌの入力内容だけでなく、瀟内の各皮システムやドキュメントにアクセスするこずが倚く、その過皋で機密情報を扱いたす。これらの情報がモデル提䟛者やログシステムを通じお倖郚に送信される堎合、情報管理䞊のリスクが生じたす。たた、゚ヌゞェントが攻撃者に悪甚される可胜性も無芖できたせん。たずえば、プロンプトむンゞェクション攻撃によっお゚ヌゞェントの行動方針が曞き換えられ、意図しない情報送信や操䜜が行われるずいったシナリオです。

さらに、法的責任の問題もありたす。゚ヌゞェントが生成した内容や実行したアクションが法什違反や契玄違反に぀ながった堎合、誰が責任を負うのか。モデル提䟛者か、゚ヌゞェントを組み蟌んだサヌビス提䟛者か、それずも最終的に利甚したナヌザヌか。この問いに明確な答えが出おいない領域も倚く、ガバナンス蚭蚈の難しさを増しおいたす。

ガヌドレヌル蚭蚈ず暩限管理の考え方

こうしたリスクに察凊するためには、技術的・運甚的なガヌドレヌルを倚局的に蚭蚈する必芁がありたす。その䞭心にあるのが暩限管理です。゚ヌゞェントに䞎える暩限は、原則ずしお必芁最小限にずどめ、「たずは読み取り専甚から始める」こずが安党なアプロヌチです。たずえば、CRMシステムずの連携では、最初は顧客情報の参照のみに絞り、䞀定期間問題がないこずを確認したうえで、レコヌド曎新の暩限を限定的に解攟しおいくずいった段階的な蚭蚈が考えられたす。

たた、危険床の高いアクションに぀いおは、必ず人間の承認を挟むワヌクフロヌにするこずが重芁です。高額な支払い指瀺、契玄条件の倉曎、察倖的な重芁文曞の送付などは、゚ヌゞェントがドラフトや提案を行うこずはあっおも、最終実行は人間が行う圢にすべきです。この「人間の承認ステップ」を゚ヌゞェントのフロヌの䞭に明瀺的に組み蟌むこずで、誀動䜜の圱響を限定できたす。

プロンプトむンゞェクションやデヌタ挏えいぞの察策ずしおは、入力ず出力のフィルタリングも欠かせたせん。ナヌザヌ入力や倖郚サむトから取埗したテキストをそのたたシステムプロンプトに取り蟌たない、倖郚に送信しおはならない情報が出力に含たれおいないかをチェックする、特定のキヌワヌドやパタヌンが怜出された堎合には凊理を停止しおアラヌトを䞊げるずいった仕組みが有効です。これらは、モデルの倖偎のアプリケヌションレむダヌで実装できるこずが倚く、ガヌドレヌルの重芁な䞀郚になりたす。

モニタリングず責任の明確化によるガバナンス

ガヌドレヌルを蚭蚈したずしおも、䞀床導入した゚ヌゞェントをそのたた攟眮しおよいわけではありたせん。゚ヌゞェントは孊習枈みモデルの䞊に成り立っおいるずはいえ、その挙動はコンテキストや環境によっお倉化したす。したがっお、運甚開始埌も継続的なモニタリングず改善が必芁です。

モニタリングの察象には、成功したタスクず倱敗したタスクの比率、ナヌザヌによる修正頻床、゚ラヌや䟋倖の発生パタヌン、セキュリティ䞊の疑矩のある挙動などが含たれたす。特に重芁なのは、「重倧事故に぀ながる手前の未遂事䟋」を早期に怜知するこずです。たずえば、゚ヌゞェントが犁止されおいる倖郚ドメむンぞのアクセスを詊みたが、ガヌドレヌルによりブロックされたずいうログは、蚭蚈の改善䜙地を瀺す貎重なシグナルです。

たた、責任の明確化もガバナンスの䞀郚です。組織内郚においおは、゚ヌゞェントの蚭蚈ず運甚に぀いお最終責任を負うオヌナヌを明瀺し、倉曎管理やむンシデント察応のプロセスを定矩しおおく必芁がありたす。倖郚向けには、利甚芏玄やプラむバシヌポリシヌにおいお、゚ヌゞェントの機胜ず限界、ナヌザヌ偎に求められる確認矩務などを分かりやすく説明するこずが求められたす。

安党なLLM゚ヌゞェントずは、リスクがれロの゚ヌゞェントではなく、リスクが可芖化され、コントロヌル可胜な圢で運甚されおいる゚ヌゞェントです。幻芚や誀刀断を完党に排陀するこずはできない以䞊、それらを前提ずしお、どこで止め、どこで人間に぀なぐのか、問題が発生したずきにどう怜知し、どう孊びに倉えるのかずいうガバナンスの枠組みこそが、蚭蚈ず同じくらい重芁になっおいきたす。

CIOs shift from ‘cloud-first’ to ‘cloud-smart’

Common wisdom has long held that a cloud-first approach will gain CIOs benefits such as agility, scalability, and cost-efficiency for their applications and workloads. While cloud remains most IT leaders’ preferred infrastructure platform, many are rethinking their cloud strategies, pivoting from cloud-first to “cloud-smart” by choosing the best approach for specific workloads rather than just moving everything off-premises and prioritizing cloud over other considerations for new initiatives.

Cloud cost optimization is one factor motivating this rethink, with organizations struggling to control escalating cloud expenses amid rapid growth. An estimated 21% of enterprise cloud infrastructure spend, equivalent to $44.5 billion in 2025, is wasted on underutilized resources — with 31% of CIOs wasting half of their cloud spend, according to a recent survey from VMware.

The full rush to the cloud is over, says Ryan McElroy, vice president of technology at tech consultancy Hylaine. Cloud-smart organizations have a well-defined and proven process for determining which workloads are best suited for the cloud.

For example, “something that must be delivered very quickly and support massive scale in the future should be built in the cloud,” McElroy says. “Solutions with legacy technology that must be hosted on virtual machines or have very predictable workloads that will last for years should be deployed to well-managed data centers.”

The cloud-smart trend is being influenced by better on-prem technology, longer hardware cycles, ultra-high margins with hyperscale cloud providers, and the typical hype cycles of the industry, according to McElroy. All favor hybrid infrastructure approaches.

However, “AI has added another major wrinkle with siloed data and compute,” he adds. “Many organizations aren’t interested in or able to build high-performance GPU datacenters, and need to use the cloud. But if they’ve been conservative or cost-averse, their data may be in the on-prem component of their hybrid infrastructure.”

These variables have led to complexity or unanticipated costs, either through migration or data egress charges, McElroy says.

He estimates that “only 10% of the industry has openly admitted they’re moving” toward being cloud-smart. While that number may seem low, McElroy says it is significant.

“There are a lot of prerequisites to moderate on your cloud stance,” he explains. “First, you generally have to be a new CIO or CTO. Anyone who moved to the cloud is going to have a lot of trouble backtracking.”

Further, organizations need to have retained and upskilled the talent who manage the datacenter they own or at the co-location facility. They must also have infrastructure needs that outweigh the benefits the cloud provides in terms of raw agility and fractional compute, McElroy says.

Selecting and reassessing the right hyper-scaler

Procter & Gamble embraced a cloud-first strategy when it began migrating workloads about eight years ago, says Paola Lucetti, CTO and senior vice president. At that time, the mandate was that all new applications would be deployed in the public cloud, and existing workloads would migrate from traditional hosting environments to hyperscalers, Lucetti says.

“This approach allowed us to modernize quickly, reduce dependency on legacy infrastructure, and tap into the scalability and resilience that cloud platforms offer,” she says.

Today, nearly all P&G’s workloads run on cloud. “We choose to keep selected workloads outside of the public cloud because of latency or performance needs that we regularly reassess,” Lucetti says. “This foundation gave us speed and flexibility during a critical phase of digital transformation.”

As the company’s cloud ecosystem has matured, so have its business priorities. “Cost optimization, sustainability, and agility became front and center,” she says. “Cloud-smart for P&G means selecting and regularly reassessing the right hyperscaler for the right workload, embedding FinOps practices for transparency and governance, and leveraging hybrid architectures to support specific use cases.”

This approach empowers developers through automation, AI, and agentic to drive value faster, Lucetti says. “This approach isn’t just technical — it’s cultural. It reflects a mindset of strategic flexibility, where technology decisions align with business outcomes.”

AI is reshaping cloud decisions

AI represents a huge potential spend requirement and raises the stakes for infrastructure strategy, says McElroy.

“Renting servers packed with expensive Nvidia GPUs all day every day for three years will be financially ruinous compared to buying them outright,” he says, “but the flexibility to use next year’s models seamlessly may represent a strategic advantage.”

Cisco, for one, has become far more deliberate about what truly belongs in the public cloud, says Nik Kale, principal engineer and product architect. Cost is one factor, but the main driver is AI data governance.

“Being cloud-smart isn’t about repatriation — it’s about aligning AI’s data gravity with the right control plane,” he says.

IT has parsed out what should be in a private cloud and what goes into a public cloud. “Training and fine-tuning large models requires strong control over customer and telemetry data,” Kale explains. “So we increasingly favor hybrid architectures where inference and data processing happen within secure, private environments, while orchestration and non-sensitive services stay in the public cloud.”

Cisco’s cloud-smart strategy starts with data classification and workload profiling. Anything with customer-identifiable information, diagnostic traces, and model feedback loops are processed within regionally compliant private clouds, he says.

Then there are “stateless services, content delivery, and telemetry aggregation that benefit from public-cloud elasticity for scale and efficiency,” Kale says.

Cisco’s approach also involves “packaging previously cloud-resident capabilities for secure deployment within customer environments — offering the same AI-driven insights and automation locally, without exposing data to shared infrastructure,” he says. “This gives customers the flexibility to adopt AI capabilities without compromising on data residency, privacy, or cost.”

These practices have improved Cisco’s compliance posture, reduced inference latency, and yielded measurable double-digit reductions in cloud spend, Kale says.

One area where AI has fundamentally changed their approach to cloud is in large-scale threat detection. “Early versions of our models ran entirely in the public cloud, but once we began fine-tuning on customer-specific telemetry, the sensitivity and volume of that data made cloud egress both costly and difficult to govern,” he says. “Moving the training and feedback loops into regional private clouds gave us full auditability and significantly reduced transfer costs, while keeping inference hybrid so customers in regulated regions received sub-second response times.”

IT saw a similar issue with its generative AI support assistant. “Initially, case transcripts and diagnostic logs were processed in public cloud LLMs,” Kale says. “As customers in finance and healthcare raised legitimate concerns about data leaving their environments, we re-architected the capability to run directly within their [virtual private clouds] or on-prem clusters.”

The orchestration layer remains in the public cloud, but the sensitive data never leaves their control plane, Kale adds.

AI has also reshaped how telemetry analytics is handled across Cisco’s CX portfolio. IT collects petabyte-scale operational data from more than 140,000 customer environments.

“When we transitioned to real-time predictive AI, the cost and latency of shipping raw time-series data to the cloud became a bottleneck,” Kale says. “By shifting feature extraction and anomaly detection to the customer’s local collector and sending only high-level risk signals to the cloud, we reduced egress dramatically while improving model fidelity.”

In all instances, “AI made the architectural trade-offs clear: Specific workloads benefit from public-cloud elasticity, but the most sensitive, data-intensive, and latency-critical AI functions need to run closer to the data,” Kale says. “For us, cloud-smart has become less about repatriation and more about aligning data gravity, privacy boundaries, and inference economics with the right control plane.”

A less expensive execution path

Like P&G, World Insurance Associates believes cloud-smart translates to implementing a FinOps framework. CIO Michael Corrigan says that means having an optimized, consistent build for virtual machines based on the business use case, and understanding how much storage and compute is required.

Those are the main drivers to determine costs, “so we have a consistent set of standards of what will size our different environments based off of the use case,” Corrigan says. This gives World Insurance what Corrigan says is an automated architecture.

“Then we optimize the build to make sure we have things turned on like elasticity. So when services aren’t used typically overnight, they shut down and they reduce the amount of storage to turn off the amount of compute” so the company isn’t paying for it, he says. “It starts with the foundation of optimization or standards.”

World Insurance works with its cloud providers on different levels of commitment. With Microsoft, for example, the insurance company has the option to use virtual machines, or what Corrigan says is a “reserved instance.” By telling the provider how many machines they plan to consume or how much they intend to spend, he can try to negotiate discounts.

“That’s where the FinOps framework has to really be in place 
 because obviously, you don’t want to commit to a level of spend that you wouldn’t consume otherwise,” Corrigan says. “It’s a good way for the consumer or us as the organization utilizing those cloud services, to get really significant discounts upfront.”

World Insurance is using AI for automation and alerts. AI tools are typically charged on a compute processing model, “and what you can do is design your query so that if it is something that’s less complicated, it’s going to hit a less expensive execution path” and go to a small language model (SLM), which doesn’t use as much processing power, Corrigan says.

The user gets a satisfactory result, and “there is less of a cost because you’re not consuming as much,” he says.

That’s the tactic the company is taking — routing AI queries to the less expensive model. If there is a more complicated workflow or process, it will be routed to the SLM first “and see if it checks the box,” Corrigan says. If its needs are more complex, it is moved to the next stage, which is more expensive, and generally involves an LLM that requires going through more data to give the end user what they’re looking for.

“So we try to manage the costs that way as well so we’re only consuming what’s really needed to be consumed based on the complexity of the process,” he says.

Cloud is ‘a living framework’

Hylaine’s McElroy says CIOs and CTOs need to be more open to discussing the benefits of hybrid infrastructure setups, and how the state of the art has changed in the past few years.

“Many organizations are wrestling with cloud costs they know instinctively are too high, but there are few incentives to take on the risky work of repatriation when a CFO doesn’t know what savings they’re missing out on,” he says.

Lucetti characterizes P&G’s cloud strategy as “a living framework,” and says that over the next few years, the company will continue to leverage the right cloud capabilities to enable AI and agentic for business value.

“The goal is simple: Keep technology aligned with business growth, while staying agile in a rapidly changing digital landscape,” she says. “Cloud transformation isn’t a destination — it’s a journey. At P&G, we know that success comes from aligning technology decisions with business outcomes and by embracing flexibility.”

Get data, and the data culture, ready for AI

When it comes to AI adoption, the gap between ambition and execution can be impossible to bridge. Companies are trying to weave the tech into products, workflows, and strategies, but good intentions often collapse under the weight of the day-to-day realities from messy data and lack of a clear plan.

“That’s the challenge we see most often across the global manufacturers we work with,” says Rob McAveney, CTO at software developer Aras. “Many organizations assume they needAI, when the real starting point should be defining the decision you want AI to support, and making sure you have the right data behind it.”

Nearly two-thirds of leaders say their organizations have struggled to scale AI across the business, according to a recent McKinsey global survey. Often, they can’t move beyond tests of pilot programs, a challenge that’s even more pronounced among smaller organizations. Often, pilots fail to mature, and investment decisions become harder to justify.

A typical issue is the data simply isn’t ready for AI. Teams try to build sophisticated models on top of fragmented sources or messy data, hoping the technology will smooth over the cracks.

“From our perspective, the biggest barriers to meaningful AI outcomes are data quality, data consistency, and data context,” McAveney says. “When data lives in silos or isn’t governed with shared standards, AI will simply reflect those inconsistencies, leading to unreliable or misleading outcomes.”

It’s an issue that impacts almost every sector. Before organizations double down on new AI tools, they must first build stronger data governance, enforce quality standards, and clarify who actually owns the data meant to fuel these systems.

Making sure AI doesn’t take the wheel

In the rush to adopt AI, many organizations forget to ask the fundamental questionofwhat problem actually needs to be solved. Without that clarity, it’s difficult to achieve meaningful results.

Anurag Sharma, CTO of VyStar Credit Union believes AI is just another tool that’s available to help solve a given business problem, and says every initiative should begin with a clear, simple statement of the business outcome it’s meant to deliver. He encourages his team to isolate issues AI could fix, and urges executives to understand what will change and who will be affected before anything moves forward.

“CIOs and CTOs can keep initiatives grounded by insisting on this discipline, and by slowing down the conversation just long enough to separate the shiny from the strategic,” Sharma says.

This distinction becomes much easier when an organization has an AI COE or a dedicated working group focused on identifying real opportunities. These teams help sift through ideas, set priorities, and ensure initiatives are grounded in business needs rather than buzz.

The group should also include the people whose work will be affected by AI, along with business leaders, legal and compliance specialists, and security teams. Together, they can define baseline requirements that AI initiatives must meet.

“When those requirements are clear up front, teams can avoid pursuing AI projects that look exciting but lack a real business anchor,” says Kayla Underkoffler, director of AI security and policy advocacy at security and governance platform Zenity.

She adds that someone in the COE should have a solid grasp of the current AI risk landscape. That person should be ready to answer critical questions, knowing what concerns need to be addressed before every initiative goes live.

“A plan could have gaping cracks the team isn’t even aware of,” Underkoffler says. “It’s critical that security be included from the beginning to ensure the guardrails and risk assessment can be added from the beginning and not bolted on after the initiative is up and running.”

In addition, there should be clear, measurable business outcomes to make sure the effort is worthwhile. “Every proposal must define success metrics upfront,” says Akash Agrawal, VP of DevOps and DevSecOps at cloud-based quality engineering platform LambdaTest, Inc. “AI is never explored, it’s applied.”

He recommends companies build in regular 30- or 45-day checkpoints to ensure the work continues to align with business objectives. And if the results don’t meet expectations, organizations shouldn’t hesitate to reassess and make honest decisions, he says. Even if that means walking away from the initiative altogether.

Yet even when the technology looks promising, humans still need to remain in the loop. “In an early pilot of our AI-based lead qualification, removing human review led to ineffective lead categorization,” says Shridhar Karale, CIO at sustainable waste solutions company, Reworld. “We quickly retuned the model to include human feedback, so it continually refines and becomes more accurate over time.”

When decisions are made without human validation, organizations risk acting on faulty assumptions or misinterpreted patterns. The aim isn’t to replace people, but to build a partnership in which humans and machines strengthen one other.

Data, a strategic asset

Ensuring data is managed effectively is an often overlooked prerequisite for making AI work as intended. Creating the right conditions means treating data as a strategic asset: organizing it, cleaning it, and having the right policies in place so it stays reliable over time.

“CIOs should focus on data quality, integrity, and relevance,” says Paul Smith, CIO at Amnesty International. His organization works with unstructured data every day, often coming from external sources. Given the nature of the work, the quality of that data can be variable. Analysts sift through documents, videos, images, and reports, each produced in different formats and conditions. Managing such a high volume of messy, inconsistent, and often incomplete information has taught them the importance of rigor.

“There’s no such thing as unstructured data, only data that hasn’t yet had structure applied to it,” Smith says. He also urges organizations to start with the basics of strong, everyday data-governance habits. That means checking whether the data is relevant, and ensuring it’s complete, accurate, and consistent, and outdated information can skew results.

Smith also emphasizes the importance of verifying data lineage. That includes establishing provenance — knowing where the data came from and whether its use meets legal and ethical standards — and reviewing any available documentation that details how it was collected or transformed.

In many organizations, messy data comes from legacy systems or manual entry workflows. “We strengthen reliability by standardizing schemas, enforcing data contracts, automating quality checks at ingestion, and consolidating observability across engineering,” says Agrawal.

When teams trust the data, their AI outcomes improve. “If you can’t clearly answer where the data came from and how trustworthy is it, then you aren’t ready,” Sharma adds. “It’s better to slow down upfront than chase insights that are directionally wrong or operationally harmful, especially in the financial industry where trust is our currency.”

Karale says that at Reworld, they’ve created a single source of truth data fabric, and assigned data stewards to each domain. They also maintain a living data dictionary that makes definitions and access policies easy to find with a simple search. “Each entry includes lineage and ownership details so every team knows who’s responsible, and they can trust the data they use,” Karale adds.

A hard look in the organizational mirror

AI has a way of amplifying whatever patterns it finds in the data — the helpful ones, but also the old biases organizations would rather leave behind. Avoiding that trap starts with recognizing that bias is often a structural issue.

CIOs can do a couple of things to prevent problems from taking root. “Vet all data used for training or pilot runs and confirm foundational controls are in place before AI enters the workflow,” says Underkoffler.

Also, try to understand in detail how agentic AI changes the risk model. “These systems introduce new forms of autonomy, dependency, and interaction,” she says. “Controls must evolve accordingly.”

Underkoffler also adds that strong governance frameworks can guide organizations on monitoring, managing risks, and setting guardrails. These frameworks outline who’s responsible for overseeing AI systems, how decisions are documented, and when human judgment must step in, providing structure in an environment where the technology is evolving faster than most policies can keep up.

And Karale says that fairness metrics, such as disparate impact, play an important role in that oversight. These measures help teams understand whether an AI system is treating different groups equitably or unintentionally favoring one over another. These metrics could be incorporated into the model validation pipeline.

Domain experts can also play a key role in spotting and retraining models that produce biased or off-target outputs. They understand the context behind the data, so they’re often the first to notice when something doesn’t look right. “Continuous learning is just as important for machines as it is for people,” says Karale.

Amnesty International’s Smith agrees, saying organizations need to train their people continuously to help them pick out potential biases. “Raise awareness of risks and harms,” he says. “The first line of defense or risk mitigation is human.”

❌