Tips for managing your Snowflake Data Cloud at scale

Learn how Capital One adopted Snowflake Data Cloud for real-time data analysis at scale with three best practices.

Since its founding, data has been at the heart of Capital One. We believe in the power of data to drive insights and empower people to deliver real-time solutions to our millions of customers. Of course, the amount of data we analyze has skyrocketed over the last thirty years, making it more difficult to share data across the company and derive insights in real time. That’s where Snowflake comes in with its cloud data platform.

Snowflake separated data storage from compute for relational data warehouses—and for customers like Capital One, that means our hardware no longer limits us. Instead of racking up technical debt, we can focus on our data and what we do best: build personalized customer experiences that transform people’s relationship with their money.

Meet Capital One Slingshot

A solution to help businesses scale their Snowflake Data Cloud

Our unique journey with Snowflake

Capital One is the first U.S. bank to exit our on-premise data centers and go all in on the cloud, and we’ve written a great deal about our cloud journey and our learnings. We exited our data centers because we worked hard not to be burdened by legacy technologies, technical debt and silos.

As we worked to modernize our data operations in the cloud, we adopted Snowflake to enable our more than 6,000 analysts to run millions of queries with no degradation in performance. We needed performance that could scale infinitely and instantly for any workload, and would allow multiple lines of business to seamlessly share data with proper fine grained access control.

What is Snowflake Data Cloud?

Snowflake Data Cloud is a cloud-based data platform that provides a wide range of data management and analytics services. It's designed to solve common data management challenges in the cloud, such as accessibility, availability and performance.

What sets Snowflake apart is its data sharing capabilities. With Snowflake, multiple analysts can access the same data without affecting each other’s performance. In concrete terms, Snowflake allows our credit card team to make intensive queries without affecting the performance of other teams who are making queries on that same data. At the same time, we can have ETL jobs running different compute tasks on the same data without impacting anyone else.

Implementing strategic controls and creative integration with Snowflake for optimal data utilization

Snowflake is so flexible and efficient that you can quickly go from “data starved”, to “data drunk.” To avoid that data avalanche and associated costs, we worked to put some controls in place before our users migrated to Snowflake. For example, users cannot select a larger cluster than their workload requires or run workloads in a manner that never allows Snowflake compute/warehouse to suspend.

Also, as a technology company in financial services, we operate in a regulated environment. Our model is unique, but our journey with Snowflake applies to any company that operates within a regulated industry. In many ways, our journey with Snowflake applies to any company that must get value from its data.

To generate the most value, organizations need to integrate tools like Snowflake thoughtfully and, at times, creatively. We figured out how to take advantage of Snowflake’s speed and flexibility—while providing the kind of traceability a heavily regulated company like ours requires. Also, being a bank, we understand a thing or two about budgets. So we devised a way to ensure that usage levels were reasonable and on budget.

3 tips for harnessing Snowflake Data Cloud

1. Create ways to streamline onboarding and develop processes and solutions

To provision and manage compute or storage resources, Capital One created an online self-service portal that equips teams with the self-service data they need. But our tools also fit into existing processes and organizational structures to control costs and assure best practices are followed.

2. Ensure you track and optimize resources to control cost

With Snowflake, your company unlocks access to data—the data flow is the difference between a garden hose and a fire hose. It’s important to manage and track usage, as costs can rise due to faulty configurations or inefficient queries. While it’s possible to centralize Snowflake access and provisioning through a department head, that method can reintroduce the bottlenecks you were trying to get rid of when you opted for Snowflake in the first place.

Capital One developed a dashboard interface that puts performance and cost management into the hands of key decision-makers—without slowing down the overall process. It generates alerts when there is a sudden increase in cost. It also automatically recommends a way to remediate. In short, you find out right away if something should go wrong.

3. Govern securely and transparently

As data becomes pervasive, ensuring it’s being managed responsibly grows increasingly critical. As a heavily regulated company, Capital One has built a traceability solution into their Snowflake system that enables approval workflows and data logging to support data remediation and retention use cases.

Using Snowflake for data-driven decision making

At Capital One, we’re believers in Snowflake because it enables us to harness data and put it to work. But as with any technology, corporations must take a 360-degree look at what’s required when you integrate any solution. Technology on its own is a resource, but as our use case demonstrates: We must also think creatively.

At Snowflake Summit 2023, Capital One Software was named the Powered by Snowflake Innovation Partner of the Year.


Salim Syed, Vice President and Head of Engineering, Capital One Software

Salim Syed is Vice President and Head of Engineering for Capital One Software. He led Capital One’s data warehouse migration to AWS and is a specialist in deploying Snowflake to a large enterprise. Salim’s expertise lies in developing Big Data (Lake) and Data Warehouse strategy on the public cloud. Salim has more than 25 years of experience in the data ecosystem. His career started in data engineering where he built data pipelines and then moved into maintenance and administration of large database servers using multi-tier replication architecture in various remote locations. He then worked at CodeRye as a database architect and at 3M Health Information Systems as an enterprise data architect. He has a bachelor’s degree in math and computer science from Lewis & Clark College and a master’s degree from George Washington University.

Related Content

navy dot illusion graphic
Article | April 11, 2023
Article | April 10, 2024