Understanding the basics of using DynamoDB
Part 1 of 3: An introduction to using and cost optimizing DynamoDB
Amazon DynamoDB is a fully managed NoSQL database service that lets you offload the administrative burdens of operating and scaling a distributed database. In this three-part series, I am going to walk you through the basics of DynamoDB and show you some best practices that could save you some operational expense while using this Amazon service.
This article–the first of a three part series–is an introductory refresher focusing on the basic components of DynamoDB. In this blog post I will cover:
- The basic components of DynamoDB
- The basic structure of DynamoDB
- The importance of keys
- Indexes
- Instance types
If you want to jump deeper into the series, use the links below:
- Part 2: 10 DynamoDB choices that will impact your costs
- Part 3: 9 recommendations to minimize DynamoDB operational costs
Core components of DynamoDB tables, items and attributes
In DynamoDB, tables, items and attributes are the core components that you work with. Simply put, a table is a collection of items and each item is a collection of attributes. DynamoDB uses primary keys to uniquely identify each item in a table and secondary indexes to provide more querying flexibility.
Tables
Similar to other database systems, DynamoDB stores data in tables. A table is a collection of data.
Items
Each table contains zero or more items. An item is a group of attributes that is uniquely identifiable among all of the other items. In DynamoDB, there is no limit to the number of items you can store in a table. Items are like rows in a relational database.
Attributes
Each item is composed of one or more attributes. An attribute is a fundamental data element, something that does not need to be broken down any further. Attributes in DynamoDB are similar in many ways to fields or columns in other database systems.
Basic structure of DynamoDB
Here are some things to understand about the basic structure of DynamoDB.
- Each item in the table has a unique identifier, or primary key, that distinguishes the item from all of the others in the table.
- Other than the primary key, a table is schemaless, which means that neither the attributes nor their data types need to be defined beforehand. Each item can have its own distinct attributes.
- Most of the attributes are scalar, which means that they can have only one value. Strings and numbers are common examples of scalars.
- Some of the items have a nested attribute (Address). DynamoDB supports nested attributes up to 32 levels deep.
Significance of Keys in DynamoDB
When you create a table, in addition to the table name, you must specify the primary key of the table. The primary key uniquely identifies each item in the table, so that no two items can have the same key.
DynamoDB supports two different kinds of primary keys–partition keys and composite primary keys.
Partition key
A simple primary key is composed of one attribute. DynamoDB uses the partition key’s value as input to an internal hash function. The output from the hash function determines the partition (physical storage internal to DynamoDB) in which the item will be stored. In a table that has only a partition key, no two items can have the same partition key value.
Composite primary key
A composite key contains a partition key and sort key, this type of key is composed of two attributes. The first attribute is the partition key, and the second attribute is the sort key. Here DynamoDB uses the partition key value as input to an internal hash function. The output from the hash function determines the partition (physical storage internal to DynamoDB) in which the item will be stored. All items with the same partition key value are stored together, in sorted order by sort key value.In a table that has a partition key and a sort key, it’s possible for multiple items to have the same partition key value. However, those items must have different sort key values.
Indexes in DynamoDB
DynamoDB supports two kinds of indexes–global secondary index (GSI) and local secondary index (LSI):
Global secondary index (GSI)
A GSI is an index with a partition key and sort key that can be different from those on the table.
Local secondary index (LSI)
A LSI is an index that has the same partition key as the table, but a different sort key.
Each table in DynamoDB has a quota of 20 global secondary indexes (default quota) and 5 local secondary indexes.
Instance types–on-demand and provisioned
On-demand instances
When to choose an on-demand instance? On-demand mode is a good option if any of the following are true:
- You are creating new tables with unknown workloads.
- You have unpredictable application traffic.
- You prefer the ease of paying for only what you use.
For on-demand mode tables, you don’t need to specify how much read and write throughput you expect your application to perform. DynamoDB charges you for the reads and writes that your application performs on your tables in terms of read request units and write request units.
Key information about on-demand
Read request unit
- One read request unit represents one strongly consistent read request, or two eventually consistent read requests, for an item up to 4 KB in size.
- Two read request units represent one transactional read for items up to 4 KB.
- If you need to read an item that is larger than 4 KB, DynamoDB needs additional read request units. The total number of read request units required depends on the item size, and whether you want an eventually consistent or strongly consistent read.
- For example, if your item size is 8 KB, you require two read request units to sustain one strongly consistent read, one read request unit if you choose eventually consistent reads, or four read request units for a transactional read request.
Write request unit
- One write request unit represents one write for an item up to 1 KB in size.
- If you need to write an item that is larger than 1 KB, DynamoDB needs to consume additional write request units.
- Transactional write requests require two write request units to perform one write for items up to 1 KB.
- The total number of write request units required depends on the item size.
- For example, if your item size is 2 KB, you require two write request units to sustain one write request or four write request units for a transactional write request.
Initial throughput for on-demand capacity mode
- Newly created table with on-demand capacity mode:
- Assume: previous peak was 2,000 write request units or 6,000 read request units.
- You can drive up to double the previous peak immediately, which enables newly created on-demand tables to serve up to 4,000 write request units or 12,000 read request units, or any linear combination of the two.
- Existing table switched to on-demand capacity mode:
- The previous peak is half the maximum write capacity units and read capacity units provisioned since the table was created,
- or the settings for a newly created table with on-demand capacity mode, whichever is higher.
- In other words, your table will deliver at least as much throughput as it did prior to switching to on-demand capacity mode.
Capacity switching impacts
- When you switch a table from provisioned capacity mode to on-demand capacity mode, DynamoDB makes several changes to the structure of your table and partitions. This process can take several minutes.
- During the switching period, your table delivers throughput that is consistent with the previously provisioned write capacity unit and read capacity unit amounts.
- When switching from on-demand capacity mode back to provisioned capacity mode, your table delivers throughput consistent with the previous peak reached when the table was set to on-demand capacity mode.
Provisioned instances
If you choose provisioned mode, you must specify the number of reads and writes per second that you require for your application. You can use auto scaling to adjust your table’s provisioned capacity automatically in response to traffic changes. This helps you govern your DynamoDB use to stay at or below a defined request rate in order to obtain cost predictability. You can use provisioned capacity when:
- You have predictable application traffic.
- You run applications whose traffic is consistent or ramps gradually.
- You can forecast capacity requirements to control costs.
Request throttling
Provisioned throughput is the maximum amount of capacity that an application can consume from a table or index. If your application exceeds your provisioned throughput capacity on a table or index, it is subject to request throttling.
Throttling prevents your application from consuming too many capacity units. When a request is throttled, it fails with an HTTP 400 code (Bad Request) and a ProvisionedThroughputExceededException.
DynamoDB auto-scaling
This is enabled by default. Make sure you configure the function to your needs.
***
Continue to Part 2: 10 DynamoDB choices that will impact your costs