Data warehouse optimization

Learn data warehouse optimization techniques to help reduce cloud spend and improve performance at scale.

What is data warehouse optimization?

As businesses migrate to the cloud and establish data warehouses to manage their data at scale, optimizing these warehouses becomes a highly effective way to reduce cloud spend and improve performance.

Data warehouse optimization is the continuous process of improving the performance, efficiency and effectiveness of a business’s data warehouse. This includes adjusting warehouse sizing, optimizing query performance and managing resource utilization to balance cost and performance. An optimized warehouse can lead to efficient provisioning of computing capacity, helping businesses minimize waste while improving the overall performance of their systems.

Techniques for optimizing data warehouses

One of the major challenges Capital One faced when moving to the cloud was proactively right-sizing data warehouses without pulling our data teams away from driving business value to manage warehouses. We came away with the following best practices to best manage our warehouses in the cloud.  

Meet Capital One Slingshot

A solution to help businesses scale their Snowflake Data Cloud

1. One warehouse size does not fit all

By scheduling your warehouse to change sizes dynamically depending on the workload size, time of day and day of the week, you pay based on your company’s usage patterns.

2. Set the right auto-suspend time

Reducing the auto-suspend time for a warehouse from 15 minutes to 2 minutes, for example, can greatly decrease idle times and save on costs. Paying attention to warehouse trends will allow you to set an auto-suspend time that helps improve usage and cost efficiency.

3. Configure multi-cluster correctly

At Capital One, we adopted Snowflake as our data cloud because of the near-infinite scalability. When configuring a multi-cluster warehouse, you need to consider how fast a response time is necessary. Significant cost savings are possible if you can afford to wait a few minutes of high load before spinning up a new cluster.

4. Keep different sizes and types of workloads separate

Each team or group has its own requirements and the data warehouse should be rightsized for that team’s needs. Keeping workloads that are different sizes and types separated leads to greater efficiency and cost savings.

How to right-size your warehouses

We covered how the warehouse size, scaling policy and auto-suspend time can affect the cost of running workloads. But how do we determine the rightsize for a warehouse? There are four key metrics to consider:

1. Query size

Understanding the distribution of typical query sizes that run in a warehouse will help determine the best size. You may find almost all queries are small and you do not actually need that XL warehouse you created by default.

2. Data spillage

When a data warehouse runs out of memory while executing a query, the data spillage that occurs significantly slows down query execution and degrades performance. Viewing the spillage patterns can help you identify queries that tend to lead to spillage and how much disk storage is necessary to complete the queries.

3. Query load

This metric helps organizations understand the time required for a given query to run in a data warehouse. If you decide to scale up the size to accommodate, there will be a tradeoff with query times decreasing while costs go up.

4. Queued queries

Reviewing the total number of queries that entered a queued state, usually due to the load on the warehouse, will help with understanding the minimum and maximum cluster settings that are ideal to scale.

Balancing warehouse cost and performance with Slingshot

There are many factors to consider when right-sizing your warehouses and managing these decisions effectively takes time. We built Capital One Slingshot to help businesses optimize their Snowflake investments by finding the right balance between warehouse cost and performance. 

Warehouse provisioning

Slingshot’s warehouse scheduling tool allows users to set dynamic schedules using templates to automatically provision warehouses. The ability to tune warehouse sizes and scaling policies, apply schedules and manage inactivity controls to rightsize warehouses based on need means you only pay for the compute you need, when you need it.

Warehouse recommendations

Slingshot’s warehouse recommendations identify when a warehouse is under or overutilized and provide optimized size and cluster settings on a schedule. The recommendation also includes an estimated cost savings where applicable to apply tangible business value to the change. For example, Slingshot may recommend a schedule for a warehouse that estimates a 20.5% per month savings with just a 15-second increase in query execution time.

Screenshot of Capital One Slingshot’s UI showing the warehouse scheduling tool

Slingshot’s warehouse scheduling tool

Optimizing your data warehouse for cost efficiency and scalability

The near-infinite scalability of data in the cloud brings incredible opportunities for new data insights and use cases, but managing all that data can present challenges without proper oversight. Following best practices and leveraging tools to help proactively manage and rightsize warehouses will help you optimize costs, improve efficiency and scale faster.


John Conrad, Solutions Architect, Capital One Software

John Conrad is a Solutions Architect at Capital One. He has worked with containers and Kubernetes since 2016 where he helped deliver a K8s workshop for 300 partners. Prior to joining Capital One, John was a WorldWide Technical Sales Leader for IBM Collaboration Solutions where he had the privilege to travel to 20+ countries. John has 25 years experience in software and technical sales and holds a Computer Science degree from the University of Kentucky. You can connect with him on LinkedIn (https://www.linkedin.com/in/johnmartinconrad/).

Related Content

Article | April 10, 2024
Understanding & Confidently Applying Slingshot’s Warehouse Recommendations
Article | May 28, 2024
govern responsibly with capital one slingshot
Article | September 25, 2023