8 tips for rebuilding an AI-ready data strategy
Any organization that wants to have a leading AI strategy must first have a winning data strategy.
That’s the message from Ed Lovely, vice president and chief data officer for IBM.
“When you think about scaling AI, data is the foundation,” he says.
However, few organizations have a data architecture aligned to their AI ambitions, he says. Instead, they have siloed data that’s not governed by consistent data standards — the result of longstanding enterprise data strategies that created IT environments application by application to deliver point-in-time decisions rather than to support enterprise-wide artificial intelligence deployments.
The 2025 IBM study AI Ambitions Are Surging, But Is Enterprise Data Ready? shows just how many are struggling with their data. It found that only 26% of 1,700 CDOs worldwide feel confident their data can support new AI-enabled revenue streams.
What’s needed, Lovely says, is an integrated enterprise data architecture, where the same standards, governance, and metadata are applied “regardless of where data is born.”
Lovely is not alone in seeing a need for organizations to update their data strategies.
“Most organizations need to modernize their data strategies because AI changes not just how data is used, but why it’s used and where value is created,” says Adam Wright, research manager for IDC’s Global DataSphere and Global StorageSphere research programs and co-author of the 2025 report Content Creation in the Age of Generative AI.
“Traditional data strategies were built for reporting, BI, and automation, but AI requires far more dynamic, granular, and real-time data pipelines that can fuel iterative, model-driven workflows. This means shifting from static data governance to continuous data quality monitoring, stronger metadata and lineage tracking, and retention policies that reflect AI’s blend of ephemeral, cached, and saved data,” he says. “The AI era demands that organizations evolve from a collect/store everything mentality toward intentional, value-driven data strategies that balance cost, risk, and the specific AI outcomes they want to achieve.”
High-maturity data foundations
Most organizations are far from that objective.
“Many organizations continue to struggle with having the ‘right’ data, whether that means sufficient volume, appropriate quality, or the necessary contextual metadata to support AI use cases,” Wright says. “In IDC research and industry conversations, data readiness consistently emerges as one of the top barriers to realizing AI value, often outranking compute cost or model selection. Most enterprises are still dealing with fragmented systems, inconsistent governance, and limited visibility into what data they actually have and how trustworthy it is.”
Lovely says IBM had faced many such challenges but spent the past three years tackling them to make its data AI ready.
IBM’s data strategy for the AI era included multiple changes to longstanding approaches, enabling it to build what Lovely calls an integrated enterprise data architecture. For example, the company retained the concept of data owners but “helped them understand that the data is an IBM asset, and if we’re able to democratize it in a controlled, secure way, we can run the business in a better, more productive way,” Lovely says.
As a result, IBM moved from multiple teams managing siloed data to a common team using common standards and common architectures. Enterprise leaders also consolidated 300 terabytes of data, selecting needed data based on the outcomes the company seeks and the workflows that drive those outcomes.
“We were deliberate,” Lovely says, adding that its data platform now covers about 80% of IBM workflows. “One of the greatest productivity unlocks for an enterprise today is to create an integrated enterprise data architecture. We’re rapidly deploying AI at our company because of our investment in data.”
8 tips for building a better data strategy
To build high maturity in data foundations and data consumption capabilities, organizations need a data strategy for the AI era — one that enforces data quality, breaks down data siloes, and aligns data capabilities with the AI use cases prioritized by the business.
Experts offer steps to take:
1. Rethink data ownership
“Traditional models that treat data ownership as a purely IT issue no longer work when business units, product teams, and AI platforms are all generating and transforming data continuously,” Wright explains. “Ideally, clear accountability should sit with a senior data leader such as a CDO, but organizations without a CDO must ensure that data governance responsibilities are explicitly distributed across IT, security, and the business.”
It’s critical to have “a single point of authority for defining policies and a federated model for execution, so that business units remain empowered but not unchecked,” he adds.
Manjeet Rege, professor and chair of the Department of Software Engineering and Data Science and director of the Center for Applied Artificial Intelligence at the University of St. Thomas, advises organizations to reframe data owners as data stewards, who don’t own the data but rather own the meaning and quality of the data based on standards, governance, security, and interoperability set by a central data function.
2. Break down siloes
To do this, “CIOs need to align business units around shared AI and data outcomes, because gen AI only delivers value when workflows, processes, and data sources are connected across the enterprise,” Wright says.
“This means establishing cross-functional governance, standardizing taxonomies and policies, and creating incentives for teams to share data rather than protect it,” he adds. “Technology helps through unified platforms, metadata layers, and common security frameworks, but the real unlock comes from coordinated leadership across the C-suite and business stakeholders.”
3. Invest in data technologies for the AI era
These technologies include modern data lakes and data lakehouses, vector databases, and scalable object storage, all of which “can handle high-volume, multimodal data with strong governance,” Wright says.
Organizations also need orchestration and pipeline tools that automate ingestion, cleansing, transformation, and movement so that AI workflows can run reliably end-to-end. Metadata engines and governance layers are essential to enable models to understand context, track lineage, and safely and reliably use both structured and unstructured data.
Build a data platform layer that is “modular, governed, and able to evolve,” Rege advises. “You need architecture that can treat data as a reusable product, and not just for a single pipeline, and can be used for both batch and real-time needs.”
Rege also endorses data lakes and data lakehouses, saying they’re “becoming the backbones of AI because they can handle structured and unstructured data.”
Additionally, Shayan Mohanty, chief AI and data officer at Thoughtworks, advises CIOs to build a composable enterprise, with modular technologies and flexible structures that enable humans and AI to access data and work across the multiple layers.
Experts also advise CIOs to invest in technologies that address emerging data lifecycle needs.
“Generative AI is fundamentally reshaping the data lifecycle, creating a far more dynamic mix of ephemeral, cached, and persistently stored content. Most gen AI outputs are short-lived and used only for seconds, minutes, or hours, which increases the need for high-performance infrastructure like DRAM and SSDs to handle rapid iteration, caching, and volatile workflows,” Wright says.
“But at the same time, a meaningful subset of gen AI outputs does persist, such as finalized documents, approved media assets, synthetic training datasets, and compliance-relevant content, and these still rely heavily on cost-efficient, high-capacity HDDs for long-term storage,” he adds. “As gen AI adoption grows, organizations will need data strategies that accommodate this full lifecycle from ultra-fast memory for transient content to robust HDD-based systems for durable archives, because the storage burden/dynamics is shifting.”
4. Automate and add intelligence to the data architecture
Mohanty blames the poor state of enterprise data on “a rift between data producers and data consumers,” with the data being produced going into a “giant pile somewhere, in what’s called data warehouses” with analytics layers then created to make use of it. This approach, he notes, requires a lot of human knowledge and manual effort to make work.
He advises organizations to adopt a data product mindset “to bring data producers and data consumers closer together” and to add automation and intelligence to their enterprise architecture so that AI can identify and access the right data when needed.
CIOs can use Model Context Protocol (MCP) to wrap data and provide that protocol-level access, Mohanty says, noting that access requires organizations to encode information in its catalog and tools to ensure data discoverability.
5. Ensure structured and unstructured data is AI-ready
“Structured data is AI-ready when it is consistently formatted, well-governed, and enriched with accurate metadata, making it easy for models to understand and use,” Wright says. “Organizations should prioritize strong data quality controls, master data management, and clear ownership so structured datasets remain reliable, interoperable, and aligned to specific AI use cases.”
Experts stress the need to bring that same discipline to unstructured data, ensuring that unstructured data is also properly tagged, classified, and enriched with metadata so AI systems can understand and retrieve it effectively.
“You need to treat unstructured data as a first-class data asset,” Rege says. “Most of the most interesting AI use cases live in unstructured data like customer service audio calls, messages, and documents, but for many organization organizations unstructured data remains a blind spot.”
Rege advises storing it in vector databases where information is searchable.
6. Consider external data sources and synthetic data
“Organizations should absolutely evaluate whether external or synthetic data is needed when their existing data is incomplete, biased, too small, or poorly aligned with the AI use case they’re trying to pursue,” Wright says, noting that “synthetic data becomes especially useful when real data is sensitive, costly to collect, or limited by privacy, regulatory, or operational constraints.”
7. Implement a high-maturity data foundation incrementally
Don’t wait until data is in a perfect place to start, says Shibani Ahuja, senior vice president of enterprise IT strategy at Salesforce.
“There are organizations that feel they have to get all their data right before they can pull the trigger, but they’re also getting pressure to start on the journey,” she says.
As is the case when maturing most enterprise programs, CIOs and their executive colleagues can — and should — take an incremental approach to building a data program for the AI era.
Ahuja recommends maturing a data program by working outcome to outcome, creating a data strategy and architecture to support one AI-driven outcome and then moving onto subsequent ones.
“It’s a way of thinking: reverse engineering from what you need,” Ahuja says. “Put something in production, make sure you have the right guardrails, observe it, and tweak it so it scales, then put in the next one.”
8. Take a cross-functional approach to data team building
“Data should be supported by a cross-functional ecosystem that includes IT, data governance, security, and the business units that actually use the data to drive decisions,” Wright says. “AI-era data strategy works best when these teams share ownership, where IT teams enable the infrastructure, governance teams ensure trust and quality, and business teams define the context and value.”







