DynamoDB is a no-SQL database that provides a flexible pricing model along with the stateless connection model which helps to perform persistent CRUD operations with a serverless mechanism irrespective of the size of data. The difference between a relational database and DynamoDB is varied in many aspects. Still, the biggest quirk that DynamoDB possesses is the AWS recommendation to use a single table for all your records. To understand the concept of single table design along with its advantages and disadvantages, we need to understand relational database.
Challenge
Traditionally applications have been designed to make use of relational databases like SQL which has been performing the data management services for a long time. But the advancement in technology and the introduction of IOT have provided the need for real time data processing along with the need to process large volumes of data. Relational databases provide scalability but the processing of data in some scenarios is not as optimized and efficient as required with Big Data applications. Hence, the origin of no-SQL databases like DynamoDB.
With no-SQL database, it helped with the storing and processing of large volumes of data, but still real-time processing of data was an area of concern. Therefore, Single Table Design was introduced as a part of DynamoDB. This design pattern makes use of no-SQL database features and helps store and process large chunks of data. In addition to the existing no-SQL features, Single Table Design helped reduce the processing time of requests to milli-seconds.
Another challenge for Single Table Design is the fact that most of the traditional developers are used to the relational database and multiple tables approach. In a relational database, data is normalized and stored in multiple tables for different entities. When requiring information from more than one table, the user can make use of joins which help to retrieve data from multiple tables with the help of foreign keys and other parameters. Joins prove to be very useful as they help to merge data from multiple tables during read time, however, these joins are expensive and performance deteriorates as the size of data increases, which makes scalability an issue. Developers can still make multiple tables and then make subsequent queries to fetch the related data from multiple tables, but this approach would get slower and slower as the application size increases.
Solution
The core concept for a single table design is to structure the data in a manner so that most of the data can be fetched with a minimum number of queries, hence reducing the number of calls to the server and at the same time because all the data is stored in a single table, it helps to get rid of JOINS which are expensive and time consuming.
The reduction in the number of JOINS and subsequent calls to the database helps to lower the operational costs, DynamoDB offers an on demand costing structure, under this structure you set a default minimum load capacity (number of requests per second) that would always be available for your application. However, during peak hours when the demand and the number of requests can spike, AWS will automatically scale the capacity of the server to adhere to the needs of the peak hours and the consumer would be billed accordingly based on the number of additional requests processed on top of the original minimum threshold value. With the use of a single table design, we minimize the number of requests we would be making to the database, hence, reducing the associated cost as well.
Two Approaches of Table Design
Let’s compare the two approaches of multiple tables in a relational database and item collection in DynamoDB to better understand the difference and how the performance in the two approaches compare to the computation of data. For example, you have to store information for a video streaming application where you have users who can create multiple profiles.
Relational Database Approach
In this scenario a relational database would be structured something like this:
UserId(PK) | FirstName | LastName |
johndoe | John | Doe |
janedoe | Jane | Doe |
ProfileId(PK) | UserId(FK) | Active | Name |
1 | johndoe | 0 | John |
2 | janedoe | 1 | Jane |
3 | janedoe | 1 | Jenna |
With a relational database, we can retrieve the information for a user, associated account, and the profiles associated with the user and account in one single query.
SELECT * from User u
INNER JOIN Account a ON u.UserId = a.UserId
INNER JOIN Profile p ON u.UserId = p.UserId
This query would scan the data across 2 tables and return the result of the query with data fetched for both tables. As the size of the database increases this query would get slower and slower.
DynamoDB Single Table Approach
Same data can be stored as a collection of items and would look something like this:
User(partition) | ProfileName (sort) | Active |
John Doe | john | 0 |
Jane Doe | jane | 1 |
jenna | 1 |
With this approach, we can fetch all the required data based on partition keys without the need for a join operation. This is also the crux of single-table design. We try to tune our table and data in a structure so that most if not all the data can be fetched in as few requests as possible.
Pros of Single Table Design
- Configurable overhead watch for tables
- Having a single table as compared to multiple tables, can potentially lower cost as it helps to get rid of expensive JOINS.
- Partition key can be any string value, based on the need of the application we can select different partition keys for different sets of data that we want to store and fetch in our operations.
Cons of Single Table Design
- Steep learning curve, designing a single table database has its own challenges and the combination of partition key and sort keys can be a challenge and requires a good understanding of the data and operations that would be performed on that data.
- Inflexibility for adding new access patterns, since that data is stored in a single table and we rely on partition keys to segregate the data into different partitions, adding new patterns to the data can be tricky as the number of sort keys that are available to help query the data are very limited.
When to Use a Multi-table Approach
When designing the architecture for a new application it is important to consider the type of database that would best comply with all the requirements of the application in an efficient manner. Let’s explore when not to use single-table design. The answer to this question at a basic level is whenever the benefits don’t outweigh the costs. However, a more complete answer would be, we can use a multi-table approach when we want the following:
- More flexibility with our queries
- Easier analytics of the tables
- Retrieval speed is not the most important aspect
- New applications where developer agility is more important
- Applications that use GraphQL
Conclusion
In this post, we have reviewed the concept of single table design in DynamoDB. We compared the difference in the structure of data between a multi-table approach with a relational database compared to the structure of data in DynamoDB with a single-table design. Finally, we looked at the positives and downsides of using a single table design and highlighted the use cases and scenarios for when to use or not use a single table design. To conclude, I would like to say that learning is a big part of getting acquainted with single table design and a good understanding is important to implement an effective single table design.