When it comes to AI, not all data is created equal
Gen AI is becoming a disruptive influence on nearly every industry, but using the best AI models and tools isnโt enough. Everybodyโs using the same ones but what really creates competitive advantage is being able to train and fine-tune your own models, or provide unique context to them, and that requires data.
Your companyโs extensive code base, documentation, and change logs? Thatโs data for your coding agents. Your library of past proposals and contracts? Data for your writing assistants. Your customer databases and support tickets? Data for your customer service chatbot.
But just because all this data exists, doesnโt mean itโs good.
โItโs so easy to point your models to any data thatโs available,โ says Manju Naglapur, SVP and GM of cloud, applications, and infrastructure solutions at Unisys. โFor the past three years, weโve seen this mistake made over and over again. The old adage garbage in, garbage out still holds true.โ
According to a Boston Consulting Group survey released in September, 68% of 1,250 senior AI decision makers said the lack of access to high-quality data was a key challenge when it came to adopting AI. Other recent research confirms this. In an October Cisco survey of over 8,000 AI leaders, only 35% of companies have clean, centralized data with real-time integration for AI agents. And by 2027, according to IDC, companies that donโt prioritize high-quality, AI-ready data will struggle scaling gen AI and agentic solutions, resulting in a 15% productivity loss.
Losing track of the semantics
Another problem using data thatโs all lumped together is that the semantic layer gets confused. When data comes from multiple sources, the same type of information can be defined and structured in many ways. And as the number of data sources proliferates due to new projects or new acquisitions, the challenge increases. Even just keeping track of customers โ the most critical data type โ and basic data issues are difficult for many companies.
Dun & Bradstreet reported last year that more than half of organizations surveyed have concerns about the trustworthiness and quality of the data theyโre leveraging for AI. For example, in the financial services sector, 52% of companies say AI projects have failed because of poor data. And for 44%, data quality is their biggest concern for 2026, second only to cybersecurity, based on a survey of over 2,000 industry professionals released in December.
Having multiple conflicting data standards is a challenge for everybody, says Eamonn OโNeill, CTO at Lemongrass, a cloud consultancy.
โEvery mismatch is a risk,โ he says. โBut humans figure out ways around it.โ
AI can also be configured to do something similar, he adds, if you understand what the challenge is, and dedicate time and effort to address it. Even if the data is clean, a company should still go through a semantic mapping exercise. And if the data isnโt perfect, itโll take time to tidy it up.
โTake a use case with a small amount of data and get it right,โ he says. โThatโs feasible. And then you expand. Thatโs what successful adoption looks like.โ
Unmanaged and unstructured
Another mistake companies make when connecting AI to company information is to point AI at unstructured data sources, says OโNeill. And, yes, LLMs are very good at reading unstructured data and making sense of text and images. The problem is not all documents are worthy of the AIโs attention.
Documents could be out of date, for example. Or they could be early versions of documents that havenโt been edited yet, or that have mistakes in them.
โPeople see this all the time,โ he says. โWe connect your OneDrive or your file storage to a chatbot, and suddenly it canโt tell the difference between โversion 2โ and โversion 2 final.โโ
Itโs very difficult for human users to maintain proper version control, he adds. โMicrosoft can handle the different versions for you, but people still do โsave asโ and you end up with a plethora of unstructured data,โ OโNeill says.
Losing track of security
When CIOs typically think of security as it relates to AI systems, they might consider guardrails on the models, or protections around the training data and the data used for RAG embeddings. But as chatbot-based AI evolves into agentic AI, the security problems get more complex.
Say for example thereโs a database of employee salaries. If an employee has a question about their salary and asks an AI chatbot embedded into their AI portal, the RAG embedding approach would be to collect only the relevant data from the database using traditional code, embed it into the prompt, then send the query off to the AI. The AI only sees the information itโs allowed to see and the traditional, deterministic software stack handles the problem of keeping the rest of the employee data secure.
But when the system evolves into an agentic one, the AI agents can query the databases autonomously via MCP servers, and since they need to be able to answer questions from any employee, they require access to all employee data, and keeping it from getting into the wrong hands becomes a big task.
According to the Cisco survey, only 27% of companies have dynamic and detailed access controls for AI systems, and fewer than half feel confident in safeguarding sensitive data or preventing unauthorized access.
And the situation gets even more complicated if all the data is collected into a data lake, says OโNeill.
โIf youโve put in data from lots of different sources, each of those individual sources might have its own security model,โ he says. โWhen you pile it all into block storage, you lose that granularity of control.โ
Trying to add the security layer in after the fact can be difficult. The solution, he says, is to go directly to the original data sources and skip the data lake entirely.
โIt was about keeping history forever because storage was so cheap, and machine learning could see patterns over time and trends,โ he says. โPlus, cross-disciplinary patterns could be spotted if you mix data from different sources.โ
In general, data access changes dramatically when instead of humans, AI agents are involved, says Doug Gilbert, CIO and CDO at Sutherland Global, a digital transformation consultancy.
โWith humans, thereโs a tremendous amount of security that lives around the human,โ he says. โFor example, most user interfaces have been written so if itโs a number-only field, you canโt put a letter in there. But once you put in an AI, all thatโs gone. Itโs a raw back door into your systems.โ
The speed trap
But the number-one mistake Gilbert sees CIOs making is they simply move too fast. โThis is why most projects fail,โ he says. โThereโs such a race for speed.โ
Too often, CIOs look at data issues as slowdowns, but all those things are massive risks, he adds. โA lot of people doing AI projects are going to get audited and theyโll have to stop and re-do everything,โ he says.
So getting the data right isnโt a slowdown. โWhen you put the proper infrastructure in place, then you speed through your innovation, you pass audits, and you have compliance,โ he says.
Another area that might feel like an unnecessary waste of time is testing. Itโs not always a good strategy to move fast, break things, and then fix them later on after deployment.
โWhatโs the cost of a mistake that moves at the speed of light?โ he asks. โI would always go to testing first. Itโs amazing how many products we see that are pushed to market without any testing.โ
Putting AI to work to fix the data
The lack of quality data might feel like a hopeless problem thatโs only going to get worse as AI use cases expand.
In an October AvePoint report based on a survey of 775 global business leaders, 81% of organizations have already delayed deployment of AI assistants due to data management or data security issues, with an average delay of six months.
Meanwhile, not only the number of AI projects continues to grow but also the amount of data. Nearly 52% of respondents also said their companies were managing more than 500 petabytes of data, up from just 41% a year ago.
But Unisysโ Naglapur says itโs going to become easier to get a 360-degree view of a customer, and to clean up and reconcile other data sources, because of AI.
โThis is the paradox,โ he says. โAI will help with everything. If you think about a digital transformation that would take three years, you can do it now in 12 to 18 months with AI.โ The tools are getting closer to reality, and theyโll accelerate the pace of change, he says.
