Managing world-class data for world-class AI

Presented at Gartner Data & Analytics Summit 2025 by Nima Vadiee, MVP, Customer Engineering & Success, Capital One Software.

As more companies race to adopt AI throughout their organizations, the quality of their data will be a critical factor in success. AI systems rely on massive amounts of data to learn and make accurate predictions. Instant decision making in areas like financial fraud detection and healthcare monitoring require real-time access to data. Organizations with the right data management strategies in place are primed to pull ahead in the AI revolution.

Despite the importance of high-quality data to AI, organizations are experiencing challenges in managing their data ecosystems. According to a Capital One survey, 87% of business leaders see their data ecosystem as ready to build and deploy AI at scale, yet 70% of technical practitioners spend hours daily fixing data issues.

The following tips can help organizations effectively manage high-quality data that is well governed and accessible so they can gain an AI advantage.

Relationship between high-quality data and AI

AI strategies build on a strong data foundation. At Capital One, AI and ML use cases are in production across the organization. Our ability to leverage AI is a result of our modern tech stack, our data and our talent. 

Data and AI are closely intertwined, working together in a cycle that leads to iterative improvements over time. For example, AI helps build customer experiences that are more personalized and lead to better engagement. Stronger engagement fuels better data insights. These insights then feed better AI that makes more accurate predictions for improved customer experiences. This cyclical relationship means well-managed and high quality data is essential to the effective use of AI. 

Role of AI graphic

Data challenges hindering great AI

Today’s data ecosystems are big, diverse and fast, adding to the complexity of managing data for AI. 

Data volume and complexity

Organizations must navigate the complexities of managing data along three dimensions: volume, variety and velocity. From exponentially growing amounts of data to the variety of formats and the incredible speed at which data is generated, enterprises must overcome these obstacles to meet the demands of AI.   

  • Volume: The amount of data created every day is growing at an unprecedented pace with an expectation for the volume to double in five years, according to Statista. This rapid generation of data, driven by factors like the increased use of connected devices, adoption of cloud computing and now training of AI models, requires enterprises to prepare the data infrastructure and compute resources necessary to store and manage massive amounts of data. 

  • Variety: Working only with structured data that is well defined is a thing of the past. Data today arrives from many different sources in various formats including PDFs, emails, videos, social media, streaming services and Internet of Things devices such as smartwatches and connected cars. Most of this data is unstructured data, which accounts for 80 to 90% of data, according to MIT. Failing to tap into that data for use in AI leaves companies at a significant disadvantage.

  • Velocity: Data today is generated and moves through systems at an incredible pace, making it imperative for enterprises to capture and process data efficiently in real time. At the same time, customers and employees expect experiences that rely on instant delivery of data and quick decision making. The need for speed makes applying the right governance to data challenging.

The AI revolution is raising the stakes on how well companies handle these challenges in their data ecosystems. Poorly managed data ecosystems lead to bad data, which in turn becomes an incredible obstacle to trustworthy, efficient AI. 

Data quality

When data professionals are unable to rely on data for its consistency and accuracy, there is an erosion in trust in using the data for important use cases like AI and analytics. Professionals as a result spend more time checking, validating and fixing data rather than focusing on innovations. Data quality is the top data integrity challenge according to 64% of data and analytics professionals surveyed. Additionally, data organization is a key struggle in implementing AI with about 1 in 2 professionals citing challenges in organizing structured data for machine learning and for unstructured data in retrieval augmented generation, a technique to improve the accuracy of large language models. A strong data governance program is necessary to ensure data, both structured and unstructured, is well-organized and adhering to established standards.

64% of data and analytics professionals identified data quality as their top data integrity challenge.-Precisely, 2025 Outlook: Data Integrity Trends and Insights

Data access

The ability to access and process data in real time is increasingly a requirement for AI success. To be effective, AI systems need the most up-to-date data for instant decision making and adjusting predictions with changing circumstances, particularly in situations like autonomous driving and anomalies detection in manufacturing. Yet 62% of IT professionals cite real-time data access as requiring the most attention for AI success. Also undermining efforts is the lost opportunity in untapped data, with 68% of data available to enterprises remaining unused according to recent research.

Data issues can stand in the way of unlocking AI’s full potential for enterprises. But with the right data fundamentals in place, organizations can establish a data foundation for AI acceleration.

How to establish a strong data foundation for AI

How can enterprises build a strong data foundation? The fundamentals come down to establishing data standards, automating governance and committing to a culture that treats data with reverence. 

  • Data standards drive consistency, interoperability and ease of use.

  • Automated governance enforces standards universally.

  • Reverence for data leads an organization to make foundational investments in a great data ecosystem that attracts and trains talent and instills a data-first culture.

Let’s explore the principles we at Capital One have learned on our own data journey that enables us to take advantage of AI.

Data standards that achieve consistency

The data lifecycle can be complex and unforgiving, involving many steps and tools. Uncertainty exists around data lineage including where data came from, when it will be available and whether it is trustworthy. The latest technologies and tools bring new capabilities and benefits, but can introduce data sprawl, silos and inconsistencies. A commitment to data standards is necessary to address these complexities.

Definitions and formats, such as naming conventions and field data definitions, allow for a common understanding and interpretation of an organization’s data. Up-front standardizations such as an established DateTime format and adding a set number of decimals to currency values save data professionals from the drain of time-consuming data cleansing tasks. 

Additionally, organizations need to take the time to define and codify the requirements of their standardizations. Basic questions need answering such as when metadata is complete, whether all data needs registering, when data retention requirements change, who can access data and what approvals are necessary. Applying these requirements consistently is necessary for democratizing data and building trust and usability across an organization.

Lastly, controls provide a standard, testable framework for a data ecosystem. Controls ensure that data arrived at the right place and on time, the data was accurate and complete, and data quality checks were performed.

Automated governance

While data standards are a good starting point, automation is where enterprises can reach economies of scale. Through automation of standards, organizations take the guesswork out of achieving consistency across the enterprise and its various teams. 

Scaling a data ecosystem requires equipping teams to work together. Key roles or data personas within an enterprise include those in publishing, consumption, governance and infrastructure management. In publishing, application developers may stream events from their application for data engineers developing a data pipeline to create new data from a combination of other data. Consumption could include business analysts looking to make data-driven decisions or an ML engineer training a model. A risk manager is responsible for governance, helping to define and enforce the policies. And in infrastructure management, a DevOps or platform engineer oversees platform management.

Asking each of these stakeholders to drive standardization on their own can lead to inconsistencies across hundreds or thousands of individuals and many point solutions. Organizations need a means to scale their data standards to apply AI effectively. This is where automated governance comes in. An organization can form the strongest best practices and standards, but none of the standards will be adopted unless they can be baked into the data experience for all users. 

Scaling governance through unified platforms

A key piece of the strategy in driving adoption of standards across an organization is meeting users where they are already at work. This type of scale can be achieved through automation. Data should run through unified platforms with a common UI. A data producer should be given a common set of APIs, libraries, SDKs and platforms that provide a way to publish data to the right place, time and format. For data consumers, data should be trustworthy and easy to discover, access and explore. At the same time, standardized governance requirements should be embedded in each step of the process and built into each tool or platform in use. Lastly, the business data platform owner needs common tools, frameworks, languages and monitoring mechanisms for infrastructure management. A unified platform for automated governance will allow each stakeholder to operate in their roles quickly with the assurance they are adhering to common standards and policies.

Self service portal graphic

Self-service with automation

An important benefit of automation are the self-service capabilities made available to data users. Self service can help democratize data access across an enterprise, as shown in the below example for a day-to-day data consumer.  

Data consumer workflow
  • Discover: A self-service portal, for example, provides a consistent way to search for and find relevant data that users can trust. This is where strong metadata requirements can be helpful with the platform providing recommendations based on user needs and roles. 

  • Understand: Data consumers also gain important information about a dataset such as the lineage, ownership, usage and quality of the data.

  • Request and approve: Through self service, a data consumer can also access data easily within an automated request and approval process that makes certain data available to the right user for the right purposes. 

Reverence for data

A business may standardize and automate their data management, but that can only go so far without a data-driven culture that empowers talent to make use of that data. In 2024, a Capital One survey found that business leaders consider data culture a top indicator of AI success, but only 35% of respondents said they have a strong data culture. Gartner predicts 30% of generative AI projects will be abandoned this year due to issues like poor data quality.

Leaders within an organization must reinforce the value of data, hire the talent and educate all users. Reinforcing the value of data means cultivating an environment that prioritizes data literacy and data-driven decisions. Data should be embedded at every level of decision making while producing valuable data with the expectation that it could be used for other parts of the business.

A strong culture that values data also means hiring the right talent. With the diversity in data roles, we recommend businesses start with a strategy around roles and talent profiles that are right for your organization. Even in non-data roles, having a foundation in data is important with a curiosity around data to move forward on innovations and new ideas.

A commitment to educating all users in an organization is another important way to invest in a vibrant data culture. Enterprises can build training and tools to support development and growth in data. At Capital One, we help our associates level up their data skills by offering an in-house tech college and data-focused learning events. 

Establishing a data advantage

Following these data principles at Capital One has enabled us to build a strong, scalable data ecosystem that produces high-quality and well-managed data. The stakes are high for companies looking to take advantage of AI, and starting with and maintaining a solid data foundation is instrumental to success. Investing in a scalable data ecosystem and strong data culture through data standards, automated governance and a data-driven culture will elevate your ability to build and deploy AI.


Nima Vadiee, Managing Vice President and Head of Customer Engineering & Success

Nima Vadiee is the Managing Vice President and Head of Customer Engineering & Success at Capital One Software, an enterprise B2B software business of Capital One. Nima and his team are responsible for ensuring that Capital One Software customers are maximizing the value of their products and services. Nima also leads partnership strategy and relationship management for Capital One Software. With 20 years of experience in engineering, product management and go-to-market, Nima is an expert in developing strategies for companies that are operating in emerging tech markets.

Related Content