Applying Slingshot’s warehouse recommendations

Understanding and applying Slingshot’s warehouse recommendations to optimize Snowflake spend and performance.

One of the common problems we hear from our customers is that there is a lot of "guesswork" involved to optimize their warehouses and they do not have the time to dive deep into every job and warehouse. 

In this blog post, we’ll explore how Capital One Slingshot removes the guesswork and enables our customers to optimize their Snowflake spend and performance by identifying, alerting and remediating warehouse issues via warehouse recommendations.

Why we built recommendations as a core component of Slingshot

Slingshot's recommendations are at the core of how Slingshot can optimize your spending inside of your Snowflake data cloud. When we developed Slingshot internally, our teams needed to pinpoint how to dynamically schedule warehouse resizing and other enhancements to save on Snowflake credits. Since implementing Slingshot functionality, we have estimated savings of over 50,000 hours of manual work, successfully onboarded over 450 new use cases, and increased our Snowflake footprint to 60PB of data.

Over time, we've expanded these capabilities to cover more parameters within Snowflake to provide even more significant savings. In 2024, we've made some additional enhancements to how our recommendations appear in Slingshot.

How Slingshot recommendations help you save time and Snowflake credits

Identifying and applying optimal warehouse settings can be time-consuming, particularly if your workloads are increasing or changing over time. I hear all too often from Database Administrators that they make static changes to optimize for cost or performance but need more insights or time to monitor the impact continually. 

At Capital One, we lacked the resources to manually optimize our Snowflake investment across all our warehouses. Rather than making static changes to warehouses, we dynamically resize them based on historical data and a recommended schedule. 

For example, Slingshot recommends resizing this warehouse at the very beginning of the day and the very end of the day.

Schedule Block Details view showing low query load and short query as the issues indicating for a decrease in warehouse size.

By right-sizing your warehouses through Slingshot, you can eliminate the guesswork involved in balancing cost and performance and allow your team to focus on other priorities.

Warehouse schedule view in the recommended schedule view tab showing the new schedule with adjustments between medium and large.

Factors behind warehouse recommendations

Slingshot provides a recommendation engine that simplifies Snowflake administration by optimizing cost via warehouse resizing and performance improvement recommendations. 

Behind the scenes, Slingshot examines several factors when calculating recommendations and explains why a recommendation was created.

Query Size classification graph showing the distribution of small queries, medium queries, and large queries.

1. Query Size Classifications - We classify the total number of queries by run time based on the warehouse size. This will allow you to see what percentage of queries take significant time to execute due to complexity versus queries that perform quickly, leaving your warehouse underutilized.

Query Size classification graph showing the distribution of small queries, medium queries, and large queries.

2. Data Spillage  - The total number of queries with spillage data is based on the ratio of bytes spilled to remote storage. Since performance degrades when a warehouse runs out of local memory, this is a key factor when determining when to increase size.

Queued queries graph showing the total number of queries that entered into a queued state over time.

3. Queued Queries - The total number of queries that entered a queued state, usually due to the warehouse's load. If a larger number of queries are waiting in line to process, this will impact overall warehouse performance.

Query load graph showing the total query execution time over the total amount of time in the interval.

4. Query Load - Is the total query execution time of your queries during different intervals of time. The Query Load will help determine when compute resources are constrained and impacting performance or underutilized.

Idleness graph showing the total number of suspension events and amount of idleness on the warehouse.

5. Idleness - Total number of idleness and suspension events across your warehouses over time. An idle warehouse is currently running but is not actively running workloads. Ideally, you want to shut down additional charges unless you expect a steady workload of queries in the same warehouse, where you can take advantage of reusing cache.

Slingshot detects issues and provides recommendations

Slingshot recommendations are tailored to your Snowflake usage patterns. The recommendation engine analyzes your historical data and recommends warehouse size, cluster, and auto-suspend changes that save your organization money or lower query execution time.

Common scenarios that are addressed with manual optimization:

  • Are auto-suspend times adjusted based on the idleness of the warehouses?

  • Are queries relying on remote storage because the warehouse is undersized?

  • Did workloads change at different times of the day or week, and how did that impact our performance?

  • Are any of my warehouses overutilized?

  • What is the impact of query performance for thousands of queries if warehouse sizes are increased?  

  • How can I reduce costs for warehouses that are underutilized?

Since workloads and query patterns change frequently, Slingshot will continually monitor each warehouse and alert your teams when your recommendations can be applied each month.

Every warehouse recommendation includes a rationale

Slingshot provides context into the rationale behind a particular recommendation and the issues that Slingshot has identified so that users can make informed decisions before applying any changes. Recommendations are precisely that: recommendations. Your teams will still have final oversight over any changes to your Snowflake resources. 

Top view of recommendations detail screen that includes the issues found section and the current and projected results of the change.

Slingshot allows warehouse owners to quickly revert to a previously applied warehouse schedule and build custom schedules that can be applied ahead of new use cases.

Understand the effects of warehouse recommendations on query performance

Right-sizing recommendations aren't just about cost saving. It's also about maximizing performance. Slingshot uses query execution times to gauge performance and provide visibility into how a recommendation will affect the run time of queries that execute in each warehouse.

Queries by size before vs. after a given recommendation is implemented by the user.

Need to know which queries will exceed an SLA or what impact a warehouse size change will have on the longest-running queries? Before applying a recommendation, you can check which queries may exceed your statement timeout or SLA.

Impacted queries widget - showing the impact to the top 1000 queries on this warehouse.

Identifying and explaining key issues

Slingshot analyzes warehouse trends to find areas of optimization for your warehouses based on idleness, query load, queueing, spillage, and utilization. Slingshot identifies the following key issues when justifying recommendations:

 

Potential Warehouse Issue

Why is this Important ?

High auto-suspend

When warehouses run for long periods of inactivity before suspending activity

High idleness

When a warehouse consumes credits without executing any queries for long periods of time

High query load

Several queries run at once, reduces performance and strains compute resources

High average (avg.) queue

A large number of queries waiting to be processed impacts overall warehouse performance

High spill

Queries take a substantial amount of time to complete due to complexity, size, or a variety of factors

Long query

Queries take a substantial amount of time to complete due to complexity, size, or a variety of factors

Low query load

An underutilized warehouse that runs minimal queries

Low average (avg.) queue

An underutilized warehouse with only a few tasks waiting in the queue

Low spill

A warehouse with the capacity to run more queries, it's executing without spilling to remote storage

No auto-suspend

During periods of inactivity, the warehouse continues to run and consume credits

No warehouse load

An inactive warehouse, underutilized

Short query

Queries execute quickly, underutilized

 

Hourly analysis of each warehouse looks for a single issue or a combination of potential problems to initiate a recommendation. There are 17 possible issue combinations. Combinations consist of a single issue or two or more issues together.

Top view of recommendations detail screen that includes the issues found section and the current and projected results of the change.

Optimize your warehouses with Capital One Slingshot

As we demonstrated above, Capital One Slingshot is a powerful tool that will empower you to manage and optimize your Snowflake. If you want to limit warehouse idleness, optimize performance, and right-size your warehouses with automated recommendations, request a demo today.


John Conrad, Solutions Architect, Capital One Software

John Conrad is a Solutions Architect at Capital One. He has worked with containers and Kubernetes since 2016 where he helped deliver a K8s workshop for 300 partners. Prior to joining Capital One, John was a WorldWide Technical Sales Leader for IBM Collaboration Solutions where he had the privilege to travel to 20+ countries. John has 25 years experience in software and technical sales and holds a Computer Science degree from the University of Kentucky. You can connect with him on LinkedIn (https://www.linkedin.com/in/johnmartinconrad/).

Meet Capital One Slingshot

A solution to help businesses scale their Snowflake Data Cloud

Related Content

3 pixelated clouds on navy background
Article | December 18, 2024
Article | April 10, 2024
optimizing data warehouses
Article | June 25, 2024