AI ROI: How to measure the true value of AI
For all the buzz about AI’s potential to transform business, many organizations struggle to ascertain the extent to which their AI implementations are actually working.
Part of this is because AI doesn’t just replace a task or automate a process — rather, it changes how work itself happens, often in ways that are hard to quantify. Measuring that impact means deciding what return really means, and how to connect new forms of digital labor to traditional business outcomes.
“Like everyone else in the world right now, we’re figuring it out as we go,” says Agustina Branz, senior marketing manager at Source86.
That trial-and-error approach is what defines the current conversation about AI ROI.
To help shed light on measuring the value of AI, we spoke to several tech leaders about how their organizations are learning to gauge performance in this area — from simple benchmarks against human work to complex frameworks that track cultural change, cost models, and the hard math of value realization.
The simplest benchmark: Can AI do better than you?
There’s a fundamental question all organizations are starting to ask, one that underlies nearly every AI metric in use today: How well does AI perform a task relative to a human? For Source86’s Branz, that means applying the same yardstick to AI that she uses for human output.
“AI can definitely make work faster, but faster doesn’t mean ROI,” she says. “We try to measure it the same way we do with human output: by whether it drives real results like traffic, qualified leads, and conversions. One KPI that has been useful for us has been cost per qualified outcome, which basically means how much less it costs to get a real result like the ones we were getting before.”
The key is to compare against what humans delivered in the same context. “We try to isolate the impact of AI by running A/B tests between content that uses AI and those that don’t,” she says.
“For instance, when testing AI-generated copy or keyword clusters, we track the same KPIs — traffic, engagement, and conversions — and compare the outcome to human-only outputs,” Branz explains. “Also, we treat AI performance as a directional metric rather than an absolute one. It is super useful for optimization, but definitely not the final judgment.”
Marc‑Aurele Legoux, founder of an organic digital marketing agency, is even more blunt. “Can AI do this better than a human can? If yes, then good. If not, there’s no point to waste money and effort on it,” he says. “As an example, we implemented an AI agent chatbot for one of my luxury travel clients, and it brought in an extra €70,000 [$81,252] in revenue through a single booking.”
The KPIs, he said, were simply these: “Did the lead come from the chatbot? Yes. Did this lead convert? Yes. Thank you, AI chatbot. We would compare AI-generated outcomes — leads, conversions, booked calls —against human-handled equivalents over a fixed period. If the AI matches or outperforms human benchmarks, then it’s a success.”
But this sort of benchmark, while straightforward in theory, becomes much harder in practice. Setting up valid comparisons, controlling for external factors, and attributing results solely to AI is easier said than done.
Hard money: Time, accuracy, and value
The most tangible form of AI ROI involves time and productivity. John Atalla, managing director at Transformativ, calls this “productivity uplift”: “time saved and capacity released,” measured by how long it takes to complete a process or task.
But even clear metrics can miss the full picture. “In early projects, we found our initial KPIs were quite narrow,” he says. “As delivery progressed, we saw improvements in decision quality, customer experience, and even staff engagement that had measurable financial impact.”
That realization led Atalla’s team to create a framework with three lenses: productivity, accuracy, and what he calls “value-realization speed”— “how quickly benefits show up in the business,” whether measured by payback period or by the share of benefits captured in the first 90 days.
The same logic applies at Wolters Kluwer, where Aoife May, product management association director, says her teams help customers compare manual and AI-assisted work for concrete time and cost differences.
“We attribute estimated times to doing tasks such as legal research manually and include an average attorney cost per hour to identify the costs of manual effort. We then estimate the same, but with the assistance of AI.” Customers, she says, “reduce the time they spend on obligation research by up to 60%.”
But time isn’t everything. Atalla’s second lens — decision accuracy — captures gains from fewer errors, rework, and exceptions, which translate directly into lower costs and better customer experiences.
Adrian Dunkley, CEO of StarApple AI, takes the financial view higher up the value chain. “There are three categories of metrics that always matter: efficiency gains, customer spend, and overall ROI,” he says, adding that he tracks “how much money you were able to save using AI, and how much more you were able to get out of your business without spending more.”
Dunkley’s research lab, Section 9, also tackles a subtler question: how to trace AI’s specific contribution when multiple systems interact. He relies on a process known as “impact chaining,” which he “borrowed from my climate research days.” Impact chaining maps each process to its downstream business value to create a “pre-AI expectation of ROI.”
Tom Poutasse, content management director at Wolters Kluwer, also uses impact chaining, and describes it as “tracing how one change or output can influence a series of downstream effects.” In practice, that means showing where automation accelerates value and where human judgment still adds essential accuracy.
Still, even the best metrics matter only if they’re measured correctly. Establishing baselines, attributing results, and accounting for real costs are what turn numbers into ROI — which is where the math starts to get tricky.
Getting the math right: Baselines, attribution, and cost
The math behind the metrics starts with setting clean baselines and ends with understanding how AI reshapes the cost of doing business.
Salome Mikadze, co-founder of Movadex, advises rethinking what you’re measuring: “I tell executives to stop asking ‘what is the model’s accuracy’ and start with ‘what changed in the business once this shipped.’”
Mitadze’s team builds those comparisons into every rollout. “We baseline the pre-AI process, then run controlled rollouts so every metric has a clean counterfactual,” she says. Depending on the organization, that might mean tracking first-response and resolution times in customer support, lead time for code changes in engineering, or win rates and content cycle times in sales. But she says all these metrics include “time-to-value, adoption by active users, and task completion without human rescue, because an unused model has zero ROI.”
But baselines can blur when people and AI share the same workflow, something that spurred Poutasse’s team at Wolters Kluwer to rethink attribution entirely. “We knew from the start that the AI and the human SMEs were both adding value, but in different ways — so just saying ‘the AI did this’ or ‘the humans did that’ wasn’t accurate.”
Their solution was a tagging framework that marks each stage as machine-generated, human-verified, or human-enhanced. That makes it easier to show where automation adds efficiency and where human judgment adds context, creating a truer picture of blended performance.
At a broader level, measuring ROI also means grappling with what AI actually costs. Michael Mansard, principal director at Zuora’s Subscribed Institute, notes that AI upends the economic model that IT has taken for granted since the dawn of the SaaS era.
“Traditional SaaS is expensive to build but has near-zero marginal costs,” Mansard says, “while AI is inexpensive to develop but incurs high, variable operational costs. These shifts challenge seat-based or feature-based models, since they fail when value is tied to what an AI agent accomplishes, not how many people log in.”
Mansard sees some companies experimenting with outcome-based pricing — paying for a percentage of savings or gains, or for specific deliverables such as Zendesk’s $1.50-per-case-resolution model. It’s a moving target: “There isn’t and won’t be one ‘right’ pricing model,” he says. “Many are shifting toward usage-based or outcome-based pricing, where value is tied directly to impact.”
As companies mature in their use of AI, they’re facing a challenge that goes beyond defining ROI once: They’ve got to keep those returns consistent as systems evolve and scale.
Scaling and sustaining ROI
For Movadex’s Mikadze, measurement doesn’t end when an AI system launches. Her framework treats ROI as an ongoing calculation rather than a one-time success metric. “On the cost side we model total cost of ownership, not just inference,” she says. That includes “integration work, evaluation harnesses, data labeling, prompt and retrieval spend, infra and vendor fees, monitoring, and the people running change management.”
Mikadze folds all that into a clear formula: “We report risk-adjusted ROI: gross benefit minus TCO, discounted by safety and reliability signals like hallucination rate, guardrail intervention rate, override rate in human-in-the-loop reviews, data-leak incidents, and model drift that forces retraining.”
Most companies, Mikadze adds, accept a simple benchmark: ROI = (Δ revenue + Δ gross margin + avoided cost) − TCO, with a payback target of less than two quarters for operations use cases and under a year for developer-productivity platforms.
But even a perfect formula can fail in practice if the model isn’t built to scale. “A local, motivated pilot team can generate impressive early wins, but scaling often breaks things,” Mikadze says. Data quality, workflow design, and team incentives rarely grow in sync, and “AI ROI almost never scales cleanly.”
She says she sees the same mistake repeatedly: A tool built for one team gets rebranded as a company-wide initiative without revisiting its assumptions. “If sales expects efficiency gains, product wants insights, and ops hopes for automation, but the model was only ever tuned for one of those, friction is inevitable.”
Her advice is to treat AI as a living product, not a one-off rollout. “Successful teams set very tight success criteria at the experiment stage, then revalidate those goals before scaling,” she says, defining ownership, retraining cadence, and evaluation loops early on to keep the system relevant as it expands.
That kind of long-term discipline depends on infrastructure for measurement itself. StarApple AI’s Dunkley warns that “most companies aren’t even thinking about the cost of doing the actual measuring.” Sustaining ROI, he says, “requires people and systems to track outputs and how those outputs affect business performance. Without that layer, businesses are managing impressions, not measurable impact.”
The soft side of ROI: Culture, adoption, and belief
Even the best metrics fall apart without buy-in. Once you’ve built the spreadsheets and have the dashboards up and running, the long-term success of AI depends on the extent to which people adopt it, trust it, and see its value.
Michael Domanic, head of AI at UserTesting, draws a distinction between “hard” and “squishy” ROI.
“Hard ROI is what most executives are familiar with,” he says. “It refers to measurable business outcomes that can be directly traced back to specific AI deployments.” Those might be improvements in conversion rates, revenue growth, customer retention, or faster feature delivery. “These are tangible business results that can and should be measured with rigor.”
But squishy ROI, Domanic says, is about the human side — the cultural and behavioral shifts that make lasting impact possible. “It reflects the cultural and behavioral shift that happens when employees begin experimenting, discovering new efficiencies, and developing an intuition for how AI can transform their work.” Those outcomes are harder to quantify but, he adds, “they are essential for companies to maintain a competitive edge.” As AI becomes foundational infrastructure, “the boundary between the two will blur. The squishy becomes measurable and the measurable becomes transformative.”
Promevo’s Pettit argues that self-reported KPIs that could be seen as falling into the “squishy” category — things like employee sentiment and usage rates — can be powerful leading indicators. “In the initial stages of an AI rollout, self-reported data is one of the most important leading indicators of success,” he says.
When 73% of employees say a new tool improves their productivity, as they did at one client company he worked with, that perception helps drive adoption, even if that productivity boost hasn’t been objectively measured. “Word of mouth based on perception creates a virtuous cycle of adoption,” he says. “Effectiveness of any tool grows over time, mainly by people sharing their successes and others following suit.”
Still, belief doesn’t come automatically. StarApple AI and Section 9’s Dunkley warn that employees often fear AI will erase their credit for success. At one of the companies where Section 9 has been conducting a long-term study, “staff were hesitant to have their work partially attributed to AI; they felt they were being undermined.”
Overcoming that resistance, he says, requires champions who “put in the work to get them comfortable and excited for the AI benefits.” Measuring ROI, in other words, isn’t just about proving that AI works — it’s about proving that people and AI can win together.
