Large enterprises will significantly miscalculate their AI infrastructure costs over the next couple of years, prompting more CIOs to expand the scope of their FinOps teams, IDC predicts.
Enterprise AI users are headed for an โAI infrastructure reckoning,โ as CIOs and finance leaders realize that standard budget forecasting doesnโt work for compute-heavy AI projects, says Jevin Jensen, IDCโs vice president of infrastructure and operations research. Global 1,000 companies will underestimate their AI infrastructure costs by 30% through 2027, IDC predicts.
The cost of ramping up AI projects is fundamentally different than launching a new ERP solution and other IT systems that enterprises have been deploying for decades, Jensen says. Calculating the cost of GPUs, inference, networking, and tokens can be much more complicated than planning a budget for a more traditional IT system, and CIOs also need to consider security, governance, and employee training costs.
โAI is expensive, unpredictable, dramatically different than traditional IT projects, and growing faster than most budgets can track,โ he writes in a blog post. โAI-enabled applications are often resource-intensive, coupled with opaque consumption models, and have outpaced the traditional IT budgeting playbook.โ
IT leaders often underestimate the pricing complexity associated with scaling AI, Jensen writes.
From the blog post: โModels that double in size can consume 10 times the compute. Inference workloads run continuously, consuming GPU cycles long after training ends. What once looked like a contained line item now behaves like a living organism โ growing, adapting, and draining resources unpredictably.โ
Point of no return
As CIOs struggle to estimate AI costs, a spending frenzy by large AI vendors such as OpenAI and Anthropic adds pressure to recover the investments, some critics say. In a recent appearance on the Decoder podcast, IBM CEO Arvind Krishna warned about the cost of building 100 gigawatts of data center capacity, at a price of about $8 trillion, as the projected fuel needed to power large vendorsโ AI ambitions.
โThereโs no way youโre going to get a return on that, in my view, because $8 trillion of capex means you need roughly $800 billion of profit just to pay for the interest,โ Krishna says.ย
The math doesnโt add up, adds Barry Baker, COO of IBM Infrastructure and general manager of IBM Systems. In the short term, a single gigawatt data center will likely cost more than $75 billion, he says, echoing the concerns of his boss.
โMuch of this investment is occurring in parallel resulting in the demand outstripping the supply and dramatically raising prices for every element of the cost equation โ from people, to concrete, to the silicon,โ Baker says.
At the same time, the self-life of the hardware at AI data centers is limited, he adds. โAdding to these staggering figures is the reality that the actual compute will need to be replaced every few years, creating an ongoing reinvestment cycle that many organizations have failed to fully account for in their long-term planning,โ Baker says.
IDCโs Jensen agrees that massive AI spending by vendors and hyperscalers, such as AWS, Microsoft Azure, and Google Cloud, may keep prices high in the near term. โTheyโre trying to recoup their hundreds of billions of costs by trying to sell it to you for $150 billion,โ he says.
Past 2027, however, AI infrastructure prices should fall, he predicts. GPU prices from manufacturers such as Nvidia are likely to come down, and the hyperscalers and AI vendors could eventually cut their prices to up demand in an effort to recover their costs.
Struggling to estimate costs
Beyond the discussion about massive spending on data centers and GPUs, many IT leaders at enterprises consuming AI infrastructure services find it difficult to estimate costs, some experts say.
The IDC prediction about underestimated costs is plausible, if not conservative, says Nik Kale, principal engineer for CX engineering, cloud security, and AI platforms at Cisco. Many organizations project AI infrastructure costs as if they were predictable cloud workloads, he adds.
โUsage expands quickly once models are introduced into the business,โ he says. โA workflow designed for a single team often becomes a shared service across the company, which leads to a significant increase in demand that was not captured in the original cost model.โ
Systems required to reduce the risks of running AI, including monitoring, drift detection, logging, and validation checks, can consume more computing power than expected, Kale adds.
โIn several enterprise environments, these supporting systems have grown to cost as much as, or even more than, the model inference itself,โ he says.
The case for FinOps
CIOs need to take precautions when attempting to determine their AI infrastructure costs, experts say, and IDCโs Jensen sees a growing reliance on FinOps solutions, with adoption no longer optional. CIOs will be responsible, with the most common reporting structure of FinOps teams residing in their offices, he notes.
FinOps practices are essential to understanding the best fits for AI projects at specific enterprises, he says. Good FinOps practices will force IT leaders to focus on AI projects with the best ROI probabilities, to understand infrastructure costs, and to adjust as conditions change, he adds.
โAI has moved technology spending from predictable consumption to probabilistic behavior,โ he says. โThat means financial visibility must become continuous, not periodic.โ
IT leaders should focus first on the AI projects that are easy wins, but those are different at every organization, Jensen says; a relatively simple AI project at one enterprise may be impossible at another.
โIf you have an idea for a project, but your competitor is losing money on it, let them continue to lose money,โ he says. โIf it doesnโt work, you have to change things.โ
Adopting FinOps practices is a good start, but IT leaders will need to go deeper, says Ciscoโs Kale. FinOps traditionally provides a mechanism to track spending and allocate costs, based resources used, but with AI, cost-control teams will need to understand how models perform and identify where their organizations are consuming unnecessarily computing resources, he says.
FinOps teams should use operational analytics that allow the organization to view how money is being spent but also show how workloads operate, he says.
โA viable strategy to limit unnecessary resource usage is to guide teams to utilize the minimum sized models available for each specific task,โ he adds. โFrequently, requests can be rerouted to smaller or distilled models without impacting user experiences.โ
FinOps teams should also evaluate the design of their AI retrieval systems, validation pipelines, and policy checks to ensure they are operating independently and not more frequently than required, Kale recommends.
CIOs should also pay attention to GPU use, he adds. โFrequently, GPU nodes are operating at a fraction of their total capacity due to poor scheduling and lack of consolidated workload management,โ he says. โImproved orchestration and workload placement can result in substantial cost savings.โ
Avoid vendor lock-in
IBMโs Baker recommends that organizations adopt hybrid architectures to avoid overcommitting to a single AI infrastructure provider. In addition, CIOs should always pay attention to the computing resources needed to operate their AI workloads, he says.
โRight-sizing AI technology investment offers significant savings opportunities,โ he adds. โNot every problem requires the largest model or the fastest response time.โ
Organizations should consider quantization and compression techniques and deploy smaller models tuned for specific tasks, rather than general-purpose large language models, Baker says. โUse appropriate compute resources rather than defaulting to the most powerful option available.โ
Many organizations can also benefit from strategic patience, he adds. โAvoiding investments in capabilities not yet needed allows organizations to learn from early adopters who absorb the penalties of being too early,โ he says.