Applying Slingshot’s warehouse recommendations
Understanding and applying Slingshot’s warehouse recommendations to optimize Snowflake spend and performance.
One of the common problems we hear from our customers is that there is a lot of "guesswork" involved to optimize their warehouses and they do not have the time to dive deep into every job and warehouse.
In this blog post, we’ll explore how Capital One Slingshot removes the guesswork and enables our customers to optimize their Snowflake spend and performance by identifying, alerting and remediating warehouse issues via warehouse recommendations.
Why we built recommendations as a core component of Slingshot
Slingshot's recommendations are at the core of how Slingshot can optimize your spending inside of your Snowflake data cloud. When we developed Slingshot internally, our teams needed to pinpoint how to dynamically schedule warehouse resizing and other enhancements to save on Snowflake credits. Since implementing Slingshot functionality, we have estimated savings of over 50,000 hours of manual work, successfully onboarded over 450 new use cases, and increased our Snowflake footprint to 60PB of data.
Over time, we've expanded these capabilities to cover more parameters within Snowflake to provide even more significant savings. In 2024, we've made some additional enhancements to how our recommendations appear in Slingshot.
How Slingshot recommendations help you save time and Snowflake credits
Identifying and applying optimal warehouse settings can be time-consuming, particularly if your workloads are increasing or changing over time. I hear all too often from Database Administrators that they make static changes to optimize for cost or performance but need more insights or time to monitor the impact continually.
At Capital One, we lacked the resources to manually optimize our Snowflake investment across all our warehouses. Rather than making static changes to warehouses, we dynamically resize them based on historical data and a recommended schedule.
For example, Slingshot recommends resizing this warehouse at the very beginning of the day and the very end of the day.
By right-sizing your warehouses through Slingshot, you can eliminate the guesswork involved in balancing cost and performance and allow your team to focus on other priorities.
Factors behind warehouse recommendations
Slingshot provides a recommendation engine that simplifies Snowflake administration by optimizing cost via warehouse resizing and performance improvement recommendations.
Behind the scenes, Slingshot examines several factors when calculating recommendations and explains why a recommendation was created.
1. Query Size Classifications - We classify the total number of queries by run time based on the warehouse size. This will allow you to see what percentage of queries take significant time to execute due to complexity versus queries that perform quickly, leaving your warehouse underutilized.
2. Data Spillage - The total number of queries with spillage data is based on the ratio of bytes spilled to remote storage. Since performance degrades when a warehouse runs out of local memory, this is a key factor when determining when to increase size.
3. Queued Queries - The total number of queries that entered a queued state, usually due to the warehouse's load. If a larger number of queries are waiting in line to process, this will impact overall warehouse performance.
4. Query Load - Is the total query execution time of your queries during different intervals of time. The Query Load will help determine when compute resources are constrained and impacting performance or underutilized.
5. Idleness - Total number of idleness and suspension events across your warehouses over time. An idle warehouse is currently running but is not actively running workloads. Ideally, you want to shut down additional charges unless you expect a steady workload of queries in the same warehouse, where you can take advantage of reusing cache.
Slingshot detects issues and provides recommendations
Slingshot recommendations are tailored to your Snowflake usage patterns. The recommendation engine analyzes your historical data and recommends warehouse size, cluster, and auto-suspend changes that save your organization money or lower query execution time.
Common scenarios that are addressed with manual optimization:
-
Are auto-suspend times adjusted based on the idleness of the warehouses?
-
Are queries relying on remote storage because the warehouse is undersized?
-
Did workloads change at different times of the day or week, and how did that impact our performance?
-
Are any of my warehouses overutilized?
-
What is the impact of query performance for thousands of queries if warehouse sizes are increased?
-
How can I reduce costs for warehouses that are underutilized?
Since workloads and query patterns change frequently, Slingshot will continually monitor each warehouse and alert your teams when your recommendations can be applied each month.
Every warehouse recommendation includes a rationale
Slingshot provides context into the rationale behind a particular recommendation and the issues that Slingshot has identified so that users can make informed decisions before applying any changes. Recommendations are precisely that: recommendations. Your teams will still have final oversight over any changes to your Snowflake resources.
Understand the effects of warehouse recommendations on query performance
Right-sizing recommendations aren't just about cost saving. It's also about maximizing performance. Slingshot uses query execution times to gauge performance and provide visibility into how a recommendation will affect the run time of queries that execute in each warehouse.
Identifying and explaining key issues
Slingshot analyzes warehouse trends to find areas of optimization for your warehouses based on idleness, query load, queueing, spillage, and utilization. Slingshot identifies the following key issues when justifying recommendations:
Potential Warehouse Issue |
Why is this Important ? |
High auto-suspend |
When warehouses run for long periods of inactivity before suspending activity |
High idleness |
When a warehouse consumes credits without executing any queries for long periods of time |
High query load |
Several queries run at once, reduces performance and strains compute resources |
High average (avg.) queue |
A large number of queries waiting to be processed impacts overall warehouse performance |
High spill |
Queries take a substantial amount of time to complete due to complexity, size, or a variety of factors |
Long query |
Queries take a substantial amount of time to complete due to complexity, size, or a variety of factors |
Low query load |
An underutilized warehouse that runs minimal queries |
Low average (avg.) queue |
An underutilized warehouse with only a few tasks waiting in the queue |
Low spill |
A warehouse with the capacity to run more queries, it's executing without spilling to remote storage |
No auto-suspend |
During periods of inactivity, the warehouse continues to run and consume credits |
No warehouse load |
An inactive warehouse, underutilized |
Short query |
Queries execute quickly, underutilized |
Hourly analysis of each warehouse looks for a single issue or a combination of potential problems to initiate a recommendation. There are 17 possible issue combinations. Combinations consist of a single issue or two or more issues together.
Optimize your warehouses with Capital One Slingshot
As we demonstrated above, Capital One Slingshot is a powerful tool that will empower you to manage and optimize your Snowflake. If you want to limit warehouse idleness, optimize performance, and right-size your warehouses with automated recommendations, request a demo today.