DynamoDB For System Design Interviews

By the Co-founder of www.hellointerview.com

6 min readNov 19, 2024

Intro

DynamoDB is a fully-managed, highly scalable, key-value service provided by AWS. Cool, buzz-words. But what the hell does that mean and why does it matter?

Fully-Managed — This means that AWS takes care of all the operational aspects of the database. The fully-managed nature allows AWS to handle all operational tasks — hardware provisioning, configuration, patching, and scaling — freeing developers to concentrate on application development.
Highly Scalable — DynamoDB can handle massive amounts of data and traffic. It automatically scales up or down to adjust to your application’s needs, without any downtime or performance degradation.
Key-value — DynamoDB is a NoSQL database, which means it doesn’t use the traditional relational database model. Instead, it uses a key-value model that allows for flexible data storage and retrieval.

The moral of the story is that DynamoDB is a super easy to use and can scale to support a wide variety of applications. For system design interviews in particular, it has just about everything you’d ever need from a database. It even supports transactions now! Which neutralizes one of the biggest criticisms of DynamoDB in the past.

Importantly, DynamoDB is not open-source, so we can’t as easily describe its internals like we did with breakdowns of open source technologies like Kafka and Redis. Instead, we’ll focus more on you interact with it. In order to look under the hood, we’ll rely on the limited information AWS provides via documentation and the DynamoDB Whitepaper.

In this deep dive, we’ll break down exactly what you need to know about DynamoDB in order to field any question about it in a system design interview. Along the way, you’ll also acquire practical learning that you can later apply in your own projects. Let’s break it down!

Candidates often ask me, “am I even allowed to use DynamoDB in an interview?”
The answer is simple, ask your interviewer! Many will say yes, and just expect that you know how to use it. Others may say no, expecting open-source alternatives that avoid any vendor lock-in. As is always the case, just ask 😊

The Data Model

In DynamoDB, data is organized into tables, where each table has multiple items that represent individual records. This is just like a relational database, but with some distinct differences tailored for scalability and flexibility.

Tables — Serve as the top-level data structure in DynamoDB, each defined by a mandatory primary key that uniquely identifies its items. Tables support secondary indexes, enabling queries on non-primary key attributes for more versatile data retrieval.

Items — Correspond to rows in a relational database and contain a collection of attributes. Each item must have a primary key and can contain up to 400KB of data, including all its attributes.

Attributes — Key-value pairs that constitute the data within an item. They can vary in type, including scalar types (strings, numbers, booleans) and set types (string sets, number sets). Attributes can also be nested, allowing for complex data structures within a single item.

Setting up DynamoDB is straightforward: you can create tables directly in the AWS console, and start inserting data immediately. Unlike traditional RDBMS, DynamoDB is schema-less, meaning you don’t need to define a schema before inserting data. This means items in the same table can have different sets of attributes, and new attributes can be added to items at any point without affecting existing items. This schema-less design provides high flexibility but requires careful data validation at the application level, as DynamoDB does not enforce attribute uniformity across items.

Consider a users table in DynamoDB, structured as follows:

{
  "PersonID": 101,
  "LastName": "Smith",
  "FirstName": "Fred",
  "Phone": "555-4321"
},
{
  "PersonID": 102,
  "LastName": "Jones",
  "FirstName": "Mary",
  "Address": {
    "Street": "123 Main",
    "City": "Anytown",
    "State": "OH",
    "ZIPCode": 12345
  }
},
{
  "PersonID": 103,
  "LastName": "Stephens",
  "FirstName": "Howard",
  "Address": {
    "Street": "123 Main",
    "City": "London",
    "PostalCode": "ER3 5K8"
  },
  "FavoriteColor": "Blue"
}

Each item represents a user with various attributes. Notice how some users have attributes not shared by others, like FavoriteColor, showing DynamoDB’s flexibility in attribute management.

Although DynamoDB uses JSON for data transmission, it’s merely a transport format. The actual storage format of DynamoDB is proprietary, allowing users to focus on data modeling without delving into the complexities of physical data storage.

Partition Key and Sort Key

DynamoDB tables are defined by a primary key, which can consist of one or two attributes:

Partition Key — A single attribute that uniquely identifies each item in the table. DynamoDB uses the partition key’s value to determine the physical location of the item within the database. This value is hashed to determine the partition where the item is stored.
Sort Key (Optional) — An additional attribute that, when combined with the partition key, forms a composite primary key. The sort key is used to order items with the same partition key value, enabling efficient range queries and sorting within a partition.

In an interview, you’ll want to be sure to specify the partition key and, optionally, a sort key when introducing DynamoDB. This choice is important for optimizing query performance and data retrieval efficiency. Just like with any other database, you’ll choose a partition key that optimizes for the most common query patterns in your application and keeping data evenly distributed across partitions. In the case you need to perform range queries or sorting, you’ll want to also specify the Sort Key.

For example, if you’re building a simple group chat application, it would make sense to use the chat_id as the partition key and timestamp as the sort key. This way, you can efficiently query all messages for a specific chat group and sort them by timestamp before displaying them to users.

But what is actually happening under the hood?

DynamoDB uses a combination of consistent hashing and B-trees to efficiently manage data distribution and retrieval:

Consistent Hashing for Partition Keys:

The partition key is hashed using a consistent hashing function.
This hash value determines which physical node (or partition) in the DynamoDB cluster will store the item.
Consistent hashing ensures even distribution of data across nodes and facilitates easy scaling.

2. B-trees for Sort Keys:

Within each partition, items are organized using a B-tree data structure, which is a self-balancing tree stored on disk.
The B-tree is indexed on the sort key, allowing for efficient range queries and sorted retrieval.
This structure enables quick lookups, insertions, and deletions within a partition.

3. Composite Key Operations:

When querying with both partition and sort keys, DynamoDB first uses the partition key’s hash to locate the correct node.
It then traverses the B-tree on that node using the sort key to find the specific item or range of items.

This two-tier approach allows DynamoDB to achieve both horizontal scalability (through partitioning) and efficient querying within partitions (through B-tree indexing). It’s this combination that enables DynamoDB to handle massive amounts of data while still providing fast, predictable performance for queries using both partition and sort keys.

Head over to Hello Interview to continue reading!

(don’t worry, it’s totally free)

DynamoDB For System Design Interviews

By the Co-founder of www.hellointerview.com

Intro

The Data Model

Partition Key and Sort Key

Written by Evan King

No responses yet