Advantages to PostgreSQL table partitioning: How-to guide
In this tutorial we explain how to use PostgreSQL to split large tables of data into smaller pieces for table partitioning.
PostgreSQL table partitioning: How to partition tables
PostgreSQL is a relational database management system (RDBMS) using structured query language (SQL). It offers features for storing and scaling complex data workloads. This data is stored in "tables" consisting of columns and rows.
As SQL databases grow, the sheer number of tables and the amount of data stored in each table can make it difficult to manage information efficiently. Table partitioning in PostgreSQL refers to splitting a large table into smaller pieces, which can provide benefits such as improved query performance and easier data management.
What is table partitioning in PostgreSQL?
Table partitioning in PostgreSQL is the process of organizing a large data cache into smaller subsets based on certain criteria.
Here’s a real-world example: Image you’re building a large Lego set, and you decide to organize them before building. You might want to sort the bricks into piles by color so that you can find the correct brick more quickly. Each pile represents a partition, and all the partitions together make up the entire Lego.
In PostgreSQL, partitioning a table means taking all the data stored in that table and dividing it into smaller pieces of related information, like in the toy example above. Each partition stores a subset of the data based on specific criteria, such as a range of values or a certain attribute.
This process can be useful when your PostgreSQL tables become large and unwieldy, allowing for better performance and manageability.
For instance, if you have a table with millions of records, partitioning can help speed up queries by only searching through the relevant partitions instead of the entire table.
When you’re deciding whether or not to use employ this technique, a good rule of thumb is to use partitioning when the size of the table is greater than the memory installed in the database server. You might also consider partitioning when you’re working with tables containing historical data. As new data comes in, you could use partitioning to separate new data from historical data while keeping all the data contained in the same table.
Benefits of partitioning in PostgreSQL
Partitioning tables in PostgreSQL offers a number of significant advantages. These include the following:
Improved query performance
By dividing a large PostgreSQL table into smaller partitions, table partitioning reduces the number of bytes read during a single query. This improves efficiency by reducing the time each query takes, often by a substantial margin. Partitioning also allows for index reorganization or rebuilding on only a single partition, so queries are applied to only the most relevant partition of table data.
Faster bulk loads and deletes
Table partitioning can lead to faster bulk loads and deletes because it allows for the addition or removal of individual partitions, which can be accomplished more quickly than loading or deleting data from a single large table.
Optimized storage of seldom-used data
With table partitioning, seldom-used data can easily be partitioned off and migrated to a cheaper storage medium.
Types of partitioning
PostgreSQL offers three types of table partitioning: Range, list and hash.
Range partitioning
Range partitioning involves dividing the data from a specific table column into segments based on a specified range of values. Each partition represents a distinct range of values, and there is no overlap between the ranges assigned to different partitions.
Range partitioning is often used to sort information by date. The bounds of each range are inclusive at the lower end and exclusive at the upper end.
For instance, if column data is being partitioned by purchase data, one partition might include purchases made between January 1st and April 1st (meaning the last inclusive data would be for March 31st). The next partition might include purchases from April 1st through July 31st.
List partitioning
List partitioning in PostgreSQL refers to the process of dividing a table into smaller subsets based on discrete categories that have been specified.
For instance, a table holding insurance data might be partitioned by region or state, or by the specific type of claim (e.g., home, auto, life).
Hash partitioning
Hash partitioning is a technique used to partition a table based on a hash function. Each partition is defined by a function computed from the given column, and the data is distributed evenly across the partitions.
This method is often used when there is no natural way to partition the data or when the goal is to achieve even distribution.
How to create a partition table in PostgreSQL
By following a few steps, even a beginner can start partitioning tables in PostgreSQL. Here's what you need:
-
PostgreSQL installed: First, make sure PostgreSQL is installed on your system. You can download and install it for free from the official website.
-
Access rights: To partition tables, you need sufficient permissions enabled to make database changes, or you need access to a user who has these permissions.
-
Basic SQL knowledge: You don't need to be a SQL wizard to learn to partition tables, but it helps to have a solid knowledge base when it comes to the basic commands.
Now, here's a step-by-step guide to creating a partitioned table in PostgreSQL:
-
Create the parent table. Start by creating the parent table, which will serve as the template for defining partitions. You can use the CREATE TABLE statement to create the parent table.
-
Define your partitioning strategy. Decide what criteria you want to use to partition your data. From there, you can determine if range, list or hash partitioning makes the most sense.
-
Create your child tables. These are your partitions, and you can create them using the CREATE TABLE statement along with a constraint that specifies the partitioning rule you decided on in the previous step.
-
Add data to your child tables. As you use SQL commands to insert data into your child tables, PostgreSQL will use your partitioning key to route each data entry to the appropriate partition.
-
Create indexes and constraints. Define indexes and constraints on the parent and child tables to ensure data integrity and improve query performance.
-
Test and optimize. Once your partitioned table is set up, perform tests to ensure that data is correctly routed to partitions. Monitor and optimize query performance as needed.
Range partitioning example
Let's say you want to create a table holding home, auto and life insurance data for the years 2018 through 2020, and you want to partition it by date.
The first step is to create your partitioned table:
CREATE TABLE insurance_policies_p ( agency text , policy_date date not null , policy_type text , description text , location text )
PARTITION BY RANGE (policy_date);
Now you have your partitioned table, but since you haven't defined the partitions themselves, you won't be able to insert data.
To fix this, you now need to define the partitions:
CREATE TABLE insurance_policies_p_2018 PARTITION OF insurance_policies_p FOR VALUES FROM ('2018-01-01') TO ('2018-12-31');
CREATE TABLE insurance_policies_p_2019 PARTITION OF insurance_policies_p FOR VALUES FROM ('2019-01-01') TO ('2019-12-31');
CREATE TABLE insurance_policies_p_2020 PARTITION OF insurance_policies_p FOR
VALUES FROM ('2020-01-01') TO ('2020-12-31');
Now you can insert data into your table insurance_policies_p, and PostgreSQL will sort it into the correct partitioned table based on the policy date.
If you attempt to insert data as a new row and it doesn’t match the predicate for any partition, PostgreSQL will raise an error. To avoid this, it’s important to correctly define your data.
List partitioning example
The process for creating a PostgreSQL table partitioned by list is similar to creating one partitioned by range.
It starts the same way: By defining your partitioned table. Let's use the same one as in the range example.
CREATE TABLE insurance_policies_p ( agency text , policy_date date not null , policy_type text , description text , location text )
PARTITION BY LIST (policy_type);
As you'll notice, there are only two differences: One, we specify directly that we're partitioning by list rather than by range. Two, we specify policy_type as the criterion for partitioning.
From there, the process to build the partitioned tables is also similar:
CREATE TABLE insurance_policies_p_auto PARTITION OF insurance_policies_p FOR VALUES IN ('auto');
CREATE TABLE insurance_policies_p_home PARTITION OF insurance_policies_p FOR VALUES IN ('home');
CREATE TABLE insurance_policies_p_life PARTITION OF insurance_policies_p FOR VALUES IN ('life');
Hash partitioning example
We’ll build the initial table the same way as before:
CREATE TABLE insurance_policies_p_hash ( agency text , policy_date date not null , policy_type text , description text , location text )
PARTITION BY HASH (location);
Then you use your modular and remainder values to partition the table into equal subsets:
CREATE TABLE insurance_policies_p_hash_p1 PARTITION OF insurance_policies_p_hash FOR VALUES WITH (MODULUS 3, REMAINDER 0);
CREATE TABLE insurance_policies_p_hash_p2 PARTITION OF insurance_policies_p_hash FOR VALUES WITH (MODULUS 3, REMAINDER 1);
CREATE TABLE insurance_policies_p_hash_p3 PARTITION OF insurance_policies_p_hash FOR VALUES WITH (MODULUS 3, REMAINDER 2);
Partition maintenance
Table partitioning in PostgreSQL provides significant value for data maintenance. It efficiently manages large datasets by dividing them into smaller, more manageable partitions. This partitioning allows for faster and more efficient operations such as data loading, backup and index maintenance.
Additionally, partitioning enables easier data archiving and purging, simplifying data lifecycle management. It also improves query performance, making it an essential technique for handling large and frequently accessed databases.
How to drop partitions
Use the DROP TABLE command to drop a partitioned table. Here's an example using one of the partitioned tables we built in the range partitioning example above:
DROP TABLE insurance_policies_p_2018;
How to detach partitions
To detach a partition from the original table, use the ALTER TABLE and DETACH PARTITION commands.
For example:
ALTER TABLE insurance_policies_p;
DETACH PARTITION insurance_policies_p_2018;
Limitations of partition maintenance
Table partitioning is a powerful tool in PostgreSQL that can help you manage your databases more efficiently. That said, it does have a few limitations.
-
Overcomplication: Table partitioning can simplify queries and make them run faster. But you can sometimes reach a point where your partitioning gets so complex that it makes troubleshooting more difficult.
-
Data imbalances: Table partitioning doesn't automatically create equal subsets with similar data loads. If you aren't careful, you could end up with some partitions that are significantly larger and more complex.
-
Query constraints: Once you partition a table, your queries to that table must refer to the partitioning column, which can make certain queries more difficult and limit your flexibility.
-
Lack of compatibility: Some partitioning features might not fully work in older PostgreSQL versions, so be mindful if you're not using the latest.
Resources, examples and more PostgreSQL info
For more detailed information and examples on creating partitioned tables in PostgreSQL, you can refer to the official PostgreSQL documentation:
The official PostgreSQL documentation provides comprehensive and up-to-date information about partitioning and various other features of PostgreSQL.
Make large tables more manageable with partitioning in PostgreSQL
Table partitioning in PostgreSQL has many benefits. These include:
-
It can help you optimize your relational database performance.
-
It can significantly improve the efficiency of your queries, particularly when heavily accessed rows are concentrated in a single partition or a small number of partitions.
-
It also helps organize data by splitting a large table into smaller subsets based on a common attribute.
-
Finally, it can help you optimize your storage and even save you money.
For more information about table partitioning in PostgreSQL, check out these resources from Capital One: