Choosing the right Snowflake warehouse size
How an administrator determines their Snowflake warehouse size is key. It can feel like a guessing game, but fortunately, there are some guidelines for setting up a Snowflake warehouse that fits an organization's requirements.
With the right warehouse size, there's no need to worry about committing time and resources to resize space down the line. Proper sizing also allows admins to optimize their warehouses' performance.
What is a Snowflake warehouse?
A Snowflake warehouse refers to a cluster of compute resources that allow users to manipulate and process data. Two types of warehouses for Snowflake are available: standard and Snowpark-optimized. The former is ideal for those who require fast query performance, while the latter is intended for workloads with large memory requirements.
Within Snowflake, there's also a virtual warehouse that provides users with access to CPU, memory and other compute resources. It enables users to perform data manipulation language and SQL execution tasks.
In Snowflake warehouses, compute and storage are separate. A virtual warehouse consists of several Snowflake-powered virtual machines that don't share resources elsewhere. Meanwhile, storage resources are scalable, and admins have the flexibility to increase or decrease them as needed without having to make storage changes.
Overview of Snowflake warehouse sizes
In terms of Snowflake cost optimization strategies, it helps to consider warehouse size. Here are common warehouse sizes and the credits per hour associated with them:
-
X-Small: 1 credit
-
Small: 2 credits
-
Medium: 4 credits
-
Large: 8 credits
-
X-Large: 16 credits
-
2X-Large: 32 credits
-
3X-Large: 64 credits
-
4X-Large: 128 credits
-
5X-Large: 256 credits
-
6X-Large: 512 credits
The size and number of clusters of a Snowflake warehouse affects the compute resources and the amount of nodes (each with eight threads). The number of nodes increases exponentially with warehouse size—for example, an X-small warehouse has one node, while a large warehouse has eight.
In the same way, the number of servers a warehouse has corresponds to its size. As such, an X-small warehouse has one server, and a small warehouse has two.
Factors to consider when choosing the right warehouse size
Setting up a Snowflake warehouse is a bit like Goldilocks; a too-small warehouse won't be able to handle the data, while a too-large one will be an inefficient use of resources. You need one that's just right. Avoid over-provisioning and under-provisioning issues by considering these factors when identifying the right warehouse size.
Query complexity and workload type
Typically, query performance scales with size. A large warehouse has more compute resources to process queries faster and more efficiently than a small one. Of course, the complexity of a query itself can dictate processing speed and efficiency.
For example, it may take less time to process a SELECT statement than it would for complex joins and aggregations. In a large warehouse with plenty of compute resources available, admins can process simple and complex queries without delay.
Snowflake warehouses can also support a variety of workloads, including:
-
Applications
-
Ad-hoc analytics
-
Batches
-
Loaders
-
Machine learning (ML) training
ETL workloads and resource-intensive analytics generally require extensive compute resources, making it beneficial or even necessary to choose a large Snowflake warehouse.
Concurrency requirements
Concurrency is the relationship between how well a warehouse performs and how multiple users access and analyze data. In Snowflake, users run queries without impacting storage resources. Each query is run independently, and if there aren't enough resources for a query, it's put into a queue until more become available.
With Snowflake, admins can take advantage of automatic concurrency scaling. This helps admins manage query concurrency based on the number of users and query volume and minimizes the risk of running out of compute resources.
Snowflake's multi-cluster architecture offers concurrency flexibility. It gives admins the ability to add clusters rather than scaling up in accordance with their concurrency requirements.
Performance vs. cost trade-off
For many admins, performance and cost are primary Snowflake warehouse considerations. A large warehouse provides ample compute resources, but the costs to manage it can be substantial.
To find the right balance between performance and cost, start small. Follow Snowflake optimization best practices, as these can help admins build warehouses that meet their initial expectations. Track warehouse performance and costs and make sure compute resources are being provisioned properly. Make changes as warranted and continue to monitor KPIs to get the most value out of these resources.
Leveraging Snowflake's scaling features
Snowflake's scaling features help administrators maximize their warehouse's performance and keep costs to a minimum. Here are Snowflake scaling features to consider.
Auto-suspend and auto-resume
With auto-suspend, admins can temporarily shut down a warehouse if it is inactive for a set period of time. When a minimum number of clusters is available and there's no activity for a designated amount of time, auto-suspend will be activated.
Comparatively, with auto-resume, a warehouse becomes active after a statement that requires one is submitted, and it's the current warehouse for the session.
For auto-suspend and auto-resume, it helps to match the period of time between queries. As an example, if queries are run every 30 minutes, set auto-suspend or auto-resume accordingly. This helps admins avoid wasting credits.
Capital One Slingshot can help with auto-suspension settings. This solution can evaluate warehouse usage and provide personalized tips and recommendations to reduce idle time.
Auto-scaling with multi-cluster warehouses
As a Snowflake workload increases, auto-scaling lets admins automatically add or resume clusters based on the maximum number defined by admins. In a situation where a workload decreases, auto-scaling can force a shutdown or pause additional clusters.
Because auto-scaling works across multi-cluster warehouses, it can eliminate the need for manual resizing. If there is a sudden workload increase, queries can still be processed, helping ensure a warehouse can handle a temporary rise in demand without having to permanently increase its size.
Auto-scaling can come in handy with end-of-month reporting or other high concurrency periods. During these times, many queries can be processed, albeit for just a short amount of time. The ability to automatically scale to match demands at these points helps admins meet deadlines and keep their warehouse operations on track.
Monitoring and adjusting warehouse size
Warehouse size is an ongoing consideration for administrators. Here are factors to consider to help admins assess warehouse size and determine if a change in size is necessary.
Monitoring performance and usage
Snowflake offers tools admins can use to monitor warehouse performance and resource utilization, including:
-
Resource monitors: Tracks the use of virtual resources and cloud services. Snowflake’s Resource Monitor can alert and stop warehouses from overspending but does not provide a solution for reducing waste.
-
Warehouse_cluster_utilization: Provides warehouse and cluster utilization metrics.
-
Account usage: Shows querying object metadata and historical account usage data.
-
Query history: Details past queries, including execution time, user information, SQL text and errors. By default, Snowflake retains query history for 14 days, though enterprise accounts can access longer retention with additional options.
When it comes to monitoring warehouse load and other factors, establish benchmarks. Reviewing warehouse size monthly or every other month can be time consuming, but is greatly beneficial. This can give admins a glimpse into how well their warehouse is performing and the costs associated with it over an extended period. The data they obtain can help these admins decide if now is the ideal time to resize.
Slingshot’s warehouse management tools empower admins to make informed decisions about warehouse sizing with more control and flexibility. By creating multiple dynamic schedules for your warehouses, you’ll be able to automate settings and inactivity controls, reducing the need for frequent manual review of warehouses. With Slingshot’s warehouse recommendations, users reduce some of the guesswork involved with warehouse sizing, allowing them to confidently optimize warehouse configurations for cost-efficiency and performance.
Adjusting warehouse size over time
Changes in workload, concurrency and query performance are three vital factors to evaluate. Without oversight, warehouse performance issues can result in inefficiencies and drive up costs. Proactively monitoring and optimizing allows admins to ensure warehouse sizing aligns with their organization’s evolving needs and that resources are being efficiently utilized.
It may make sense to scale up if warehouse performance is slow. If the slow response is temporary and short-term, it may be better to explore an auto-scaling solution. However, if admins notice constant performance delays over months, it may be time to expand their warehouse.
On the other hand, admins may notice many credits are not being used, indicating underutilization. In this case, it may be time to scale down their warehouse. Keep in mind, though—credits being unused during one period doesn't mean they will be next period. Track trends and patterns with unused credits to determine if it's time to scale down.
Best practices for warehouse size optimization
Snowflake warehouse optimization can be difficult. These best practices can help administrators determine the right size for their warehouse from the get-go.
Start small and scale based on demand
Begin with a small warehouse and scale gradually. Track warehouse performance and identify warehouse expansion or reduction opportunities. This helps admins avoid overspending on resources they won't necessarily need. It also allows admins to save money that they can invest in other areas of their operations.
Use separate warehouses for different workloads
Set up warehouses based on workload. Remember, workloads differ, and some require more resources than others. For instance, using a warehouse for BI reporting and another for data loading can simplify Snowflake warehouse management. On top of that, separate warehouses can be scaled up and down when necessary for optimal performance.
Test warehouse sizes for performance tuning
Conduct performance tests to see how well a warehouse manages a particular workload. If a warehouse falls short, it may be time to expand. Conversely, there may be times when a warehouse performance levels off. If a test highlights this, it may be possible to decrease the warehouse's size and obtain the same results while using fewer resources.
Leverage tools for automated warehouse size optimization
Utilize Slingshot to automatically scale warehouse size based on performance. Slingshot takes the guesswork out of warehouse data analysis. It provides admins with insights that they can use to determine how well their warehouse is performing. Using Slingshot regularly can help admins save time and money as they manage their warehouse operations.
Simplifying Snowflake warehouse sizing
Determining Snowflake warehouse size can be challenging, but is an important part of managing Snowflake and cloud operations overall. By taking the time to account for warehouse performance, cost and other factors, admins can set up warehouses that are appropriate for an organization's needs.
With Slingshot, admins can streamline this process using dynamic scheduling, setting inactivity controls and leveraging data-driven recommendations to proactively optimize warehouse sizes. Detailed performance metrics and cost insights provide the tools needed to monitor and fine-tune operations on an ongoing basis, improving efficiency and reducing costs.
Explore how Slingshot simplifies Snowflake warehouse optimization and empowers teams to make informed decisions that balance Data Cloud cost and performance.