Key Concepts

In this section, we'll cover the key concepts you need to know about DynamoDB. At the end of this section, you will understand:

tables, items, and attributes;
primary keys;
secondary indexes;
read and write capacity.

Tables, Items, and Attributes

Tables, items, and attributes are the core building blocks of DynamoDB.

A table is a grouping of data records. For example, you might have a Users table to store data about your users, and an Orders table to store data about your users' orders. This concept is similar to a table in a relational database or a collection in MongoDB.

An item is a single data record in a table. Each item in a table is uniquely identified by the stated primary key of the table. In your Users table, an item would be a particular User. An item is similar to a row in a relational database or a document in MongoDB.

Attributes are pieces of data attached to a single item. This could be a simple Age attribute that stores the age of a user. An attribute is comparable to a column in a relational database or a field in MongoDB. DynamoDB does not require attributes on items except for attributes that make up your primary key.

Primary Key

Each item in a table is uniquely identified by a primary key. The primary key definition must be defined at the creation of the table, and the primary key must be provided when inserting a new item.

There are two types of primary key: a simple primary key made up of just a partition key, and a composite primary key made up of a partition key and a sort key.

Using a simple primary key is similar to standard key-value stores like Memcached or accessing rows in a SQL table by a primary key. One example would be a Users table with a Username primary key.

The composite primary key is more complex. With a composite primary key, you specify both a partition key and a sort key. The sort key is used to (wait for it) sort items with the same partition. One example could be an Orders tables for recording customer orders on an e-commerce site. The partition key would be the CustomerId, and the sort key would be the OrderId.

Remember: each item in a table is uniquely identified by a primary key, even with the composite key. When using a table with a composite primary key, you may have multiple items with the same partition key but different sort keys. You can only have one item with a particular combination of partition key and sort key.

The composite primary key enables sophisticated query patterns, including grabbing all items with the given partition key or using the sort key to narrow the relevant items for a particular query.

For more on interacting with items, start with the lesson on the anatomy of an item

Secondary Indexes

The primary key uniquely identifies an item in a table, and you may make queries against the table using the primary key. However, sometimes you have additional access patterns that would be inefficient with your primary key. DynamoDB has the notion of secondary indexes to enable these additional access patterns.

The first kind of secondary index is a local secondary index. A local secondary index uses the same partition key as the underlying table but a different sort key. To take our Order table example from the previous section, imagine you wanted to quickly access a customer's orders in descending order of the amount they spent on the order. You could add a local secondary index with a partition key of CustomerId and a sort key of Amount, allowing for efficient queries on a customer's orders by amount.

The second kind of secondary index is a global secondary index. A global secondary index can define an entirely different primary key for a table. This could mean setting an index with just a partition key for a table with a composite primary key. It could also mean using completely different attributes to populate a partition key and sort key. With the Order example above, we could have a global secondary index with a partition key of OrderId so we could retrieve a particular order without knowing the CustomerId that placed the order. secondary indexes are a complex topic but are extremely useful in getting the most out of DynamoDB. Check out the section on secondary indexes for a deeper dive.

Read and Write Capacity

When you use a database like MySQL, Postgres, or MongoDB, you provision a particular server to run your database. You'll need to choose your instance size -- how many CPUs do you need, how much RAM, how many GBs of storage, etc.

Not so with DynamoDB. Instead, you provision read and write capacity units. These units allow a given number of operations per second. This is a fundamentally different pricing paradigm than the instance-based world -- pricing can more closely reflect actual usage.

DynamoDB also has autoscaling of your read and write capacity units. This makes it much easier to scale your application up during peak times while saving money by scaling down when your users are asleep

Anatomy of an Item

An item is the core unit of data in DynamoDB. It is comparable to a row in a relational database, a document in MongoDB, or a simple object in a programming language. Each item is uniquely identifiableby a primary key that is set on a table level.

An item is composed of attributes, which are bits of data on the item. This could be the "Name" for a User, or the "Year" for a Car. Attributes have types -- e.g., strings, numbers, lists, sets, etc -- which must be provided when writing and querying Items.

In this section, we'll cover the core aspects of Items, including:

Primary Keys

Every item in a table is uniquely identified by its primary key.

When creating a new table, you will need to specify the primary key of that table. Every item in a table is uniquely identified by its primary key. Accordingly, the primary key must be included with every item that is written to a DynamoDB table.

There are two types of primary keys. A simple primary key uses a single attribute to identify an item, such as a Username or an OrderId. Using a DynamoDB table with a simple primary key is similar to using most simple key-value stores, such as Memcached.

A composite primary key uses a combination of two attributes to identify a particular item. The first attribute is a partition key (also known as a "hash key") which is used to segment and distribute items across shards. The second attribute is a sort key (also known as a "range key") which is used to order items with the same partition key. A DynamoDB table with a composite primary key can use interesting query patterns beyond simple get / set operations.

Understanding the primary key is a crucial part of planning your data model for a DynamoDB table. The primary key will be your main method of inserting and updating items in your table.

Attributes

An item is made up of attributes, which are different elements of data on a particular item. For example, an item in the User table might have a Name attribute, an Age attribute, an Address attribute, and more. They are comparable to columns in a relational database.

Most attributes in a DynamoDB table are not required for every item. DynamoDB is a NoSQL database which allows for a more flexible data model than the standard relational databases. You could store entirely different kinds of objects in a single DynamoDB table, such as a Car object with Make, Model, and Year attributes, and a Pet object with Type, Breed, Age, and Color attributes. This isn't usually a best practice, but the flexible model of DynamoDB allows it if desired.

There is one exception to the flexible model of DynamoDB items -- each item must have the attribute(s) for the defined primary key of the table.

Attribute types

When setting an attribute for a DynamoDB item, you must specify the type of the attribute. Available types include simple types like strings and numbers as well as composite types like lists, maps, or sets.

When manipulating an item, you'll provide each attribute you're trying to set with the type of the attribute. This will be provided in a map where the keys are the names of the attributes to set. The values for each attribute is a map with a single key indicating the type of value for the attribute and the value being the actual value of the attribute.

For example, if you want to store a User object with three attributes of Name, Age, and Roles, your API call to store the User would look like:

{
    "Name": { "S": "Alex DeBrie" },
    "Age": { "N": "29" },
    "Roles": { "L": ["Admin", "User"] }
}

In this example, we've stored a string attribute of "Name" with the value "Alex DeBrie" using the string indicator of "S". There's also a number attribute of "Age" with the value "29" with the number indicator of "N". Finally, there's a list attribute of "Roles" with the value containing two items, "Admin" and "User" using the list indicator of "L".

Similarly, when you retrieve an item from DynamoDB, it will return the attributes in a map with the attribute names as the keys of the map. The values of the map will be a map containing a single key indicating the type of the attribute and the value containing the actual value of the attribute.

For example, if you're using the GetItem API call to retrieve the User from above, your response would look like:

{
    "Item": {
        "Name": {
            "S": "Alex DeBrie"
        },
        "Age": {
            "N": "29"
        },
        "Roles": {
            "L": ["Admin", "User"]
        }
    }
}

Note that the value for the Age attribute is a string -- "29" -- rather than the actual number 29. In your application, you'll need to do conversions from a string type to a number type if needed.

With the basics of attribute types in mind, let's look at the different types of attributes. Each type starts with the identifier used (e.g. S for strings, N for numbers, etc) as well as an example usage.

String type

Identifier: "S"

Example Usage:

"Name": { "S": "Alex DeBrie" }

The string type is the most basic data type, representing a Unicode string with UTF-8 encoding.

DynamoDB allows sorting and comparisons of string types according to the UTF-8 encoding. This can be helpful when sorting last names ("Give me all last names, sorted alphabetically") or when filtering ISO timestamps ("Give me all orders between "2017-07-01" and "2018-01-01").

Number type

Identifier: "N"

Example Usage:

"Age": { "N": "29" }

The number type represents positive and negative numbers, or zero. It can be used for precision up to 38 digits.

Note that you will send your number up as a string value. However, you may do numerical operations on your number attributes when working with condition expressions.

Binary type

Identifier: "B"

Example Usage:

"SecretMessage": { "B": "bXkgc3VwZXIgc2VjcmV0IHRleHQh" }

You can use DynamoDB to store Binary data directly, such as an image or compressed data. Generally, larger binary blobs should be stored in something like Amazon S3 rather than DynamoDB to enable greater throughput, but you may use DynamoDB if you like.

When using Binary data types, you must base64 encode your data before sending to DynamoDB.

Boolean type

Identifier: "BOOL"

Example Usage:

"IsActive": { "BOOL": "false" }

The Boolean type stores either "true" or "false".

Null type

Identifier: "NULL"

Example Usage:

"OrderId": { "NULL": "true" }

The Null type stores a boolean value of either "true" or "false". I would generally recommend against using it.

List type

Identifier: "L"

Example Usage:

"Roles": { "L": [ "Admin", "User" ] }

The List type allows you to store a collection of values in a single attribute. The values are ordered and do not have to be of the same type (e.g. string or number).

You can operate directly on list elements using expressions.

Map type

Identifier: "M"

Example Usage:

"FamilyMembers": 
    "M": {
        "Bill Murray": {
            "Relationship": "Spouse",
            "Age": 65
        },
        "Tina Turner": {
            "Relationship": "Daughter",
            "Age": 78,
            "Occupation": "Singer"
        }
    }
}

Like the List type, the Map type allows you to store a collection of values in a single attribute. For a Map attribute, these values are stored in key-value pairs, similar to the map or dictionary objects in most programming languages.

Also like the List type, you can operate directly on map elements using expressions.

String Set type

Identifier: "SS"

Example Usage:

"Roles": { "SS": [ "Admin", "User" ] }

DynamoDB includes three different Set types which allow you to maintain a collection of unique items of the same type. The String Set is used to hold a set of strings.

Sets can be particularly useful with expressions. You can run update commands to add & remove elements to a set without fetching & inserting the whole object. You may also check for the existence of an element within a set when updating or retrieving items.

Number Set type

Identifier: "NS"

Example Usage:

"RelatedUsers": { "NS": [ "123", "456", "789" ] }

DynamoDB includes three different Set types which allow you to maintain a collection of unique items of the same type. The Number Set is used to hold a set of numbers.

Binary Set type

Identifier: "BS"

Example Usage:

"SecretCodes": { "BS": [ 
 "c2VjcmV0IG1lc3NhZ2UgMQ==", 
 "YW5vdGhlciBzZWNyZXQ=", 
 "dGhpcmQgc2VjcmV0" 
] }

DynamoDB includes three different Set types which allow you to maintain a collection of unique items of the same type. The Binary Set is used to hold a set of binary values.

With the basics of Items in mind, let's insert and retrieve our first items.

Expression Basics

In this lesson, we will cover using expressions with DynamoDB. Expressions are an integral part of using DynamoDB, and they are used in a few different areas:

Condition expressions are used when manipulating individual items to only change an item when certain conditions are true.
Projection expressions are used to specify a subset of attributes you want to receive when reading Items. We used these in our GetItem calls in the previous lesson.
Update expressions are used to update a particular attribute in an existing Item.
Key condition expressions are used when querying a table with a composite primary key to limit the items selected.
Filter expressions allow you to filter the results of queries and scans to allow for more efficient responses.

Understanding these expressions is key to getting the full value from DynamoDB. In this section, we'll look at the basics of expressions, including the use of expression attributes names and values. Then, we'll see how to use condition expressions in the context of our PutItem calls from the previous lesson.

Basics of Expressions

Expressions are strings that use DynamoDB's domain-specific expression logic to check for the validity of a described statement. With expressions, you can use comparator symbols, such as "=" (equals), ">" (greater than), or ">=" (greater than or equal to).

For example, a comparator symbol could be used as follows:

"Age >= 21"

to ensure that the Item being manipulated has an Age greater than or equal to 21.

Note: this example wouldn't work as it wouldn't know the type of the value "21". You would need to use the expression attribute values discussed below.

In addition to comparators, you can also use certain functions in your expressions. This includes checking whether a particular attribute exists (attribute_exists() function) or does not exist (attribute_not_exists() function), or that an attribute begins with a particular substring (begins_with() function).

For example, you could use the attribute_not_exists() function as follows to ensure you're not manipulating an Order that already has a DateShipped attribute:

"attribute_not_exists(DateShipped)

The full list of available functions is:

attribute_exists(): Check for existence of an attribute;
attribute_not_exists(): Check for non-existence of an attribute;
attribute_type(): Check if an attribute is of a certain type;
begins_with(): Check if an attribute begins with a particular substring;
contains(): Check if a String attribute contains a particular substring or a Set attribute contains a particular element; and
size(): Returns a number indicating the size of an attribute.

Expression Placeholders

From the previous section, we learned that expressions are strings that check for the validity of a particular statement. However, there are times when you cannot accurately represent your desired statement due to DynamoDB syntax limitations or when it's easier to use variable substitution to create your statement rather than building a string.

DynamoDB's expression language lets you use expression attribute names and expression attribute values to get around these limitations. These allow you to define expression variables outside of the expression string itself, then use replacement syntax to use these variables in your expression.

Expression Attribute Names

Let's start with understanding expression attribute names. There are times when you want to write an expression for a particular attribute, but you can't properly represent that attribute name due to DynamoDB limitations. This could be because:

Your attribute is a reserved word. DynamoDB has a huge list of reserved words, including words like "Date", "Year", and "Name". If you want to use those as attribute names, you'll need to use expression attribute name placeholders.
Your attribute name contains a dot. DynamoDB uses dot syntax to access nested items in a document. If you used a dot in your top-level attribute name, you'll need to use a placeholder.
Your attribute name begins with a number. DynamoDB won't let you use attribute names that begin with a number in your expression syntax.

To use expression attribute names, you pass in a map where the key is the placeholder to use in your expression and the value is the attribute name you want to expand to. For example, if I wanted to use a placeholder of "#a" for my attribute name of "Age", my expression attribute names map would look like:

  --expression-attribute-names '{
    "#a": "Age"
  }'

Then, I could use "#a" in my expression when I wanted to refer to the Age attribute.

When using expression attribute names, the placeholder must begin with a pound sign ("#").

In the "GetItem" example from the previous lesson, we used the --projection-expression flag to return a subset of the item attributes. To alter it to use expression attribute names, the API call would look like:

$ aws dynamodb get-item \
    --table-name UsersTable \
    --projection-expression "#a, #u" \
    --expression-attribute-names '{
      "#a": "Age",
      "#u": "Username"
    }' \
    --key '{
      "Username": {"S": "daffyduck"}
    }' \
    $LOCAL

{
    "Item": {
        "Username": {
            "S": "daffyduck"
        },
        "Age": {
            "N": "81"
        }
    }
}

Notice that we've replaced the "Age" and "Username" attributes with the expression attribute names of "#a" and "#u", respectively.

One final note: if you're modifying a nested attribute in a map, you'll need to use expression attribute names for each element in the attribute name.

For example, imagine you had an "Address" map attribute with keys of "Street", "City", and "State". You have a use case where you want to check if the "State" is equal to a particular value. To get the nested "Address.State" attribute, you would need to write it as:

    --condition-expression "#a.#st = 'Nebraska' " \
    --expression-attribute-names '{
      "#a": "Address",
      "#st": "State"
    }'

Notice that both Address and State have been replaced with expression attribute names.

Expression Attribute Values

Expression attribute values are similar to expression attribute names except that they are used for the values used to compare to values of attributes on an Item, rather than the name of the attribute.

The syntax for expression attribute values is the same as expression attribute names with two changes:

expression attribute values must start with a colon (":") rather than a pound sign; and
expression attribute values must specify the type for the value they are referencing, e.g.: {":agelimit": {"N": 21} }

For examples of using expression attribute values, look at the next lesson where we use Update Expressions.

Condition Expressions

Let's close out this lesson by using an expression on one of our previous examples.

Recall in the last lesson that we used the PutItem operation to insert Items into our table. The PutItem call overwrites any existing Item with the provided primary key.

This can be a destructive operation. With our Users example, imagine a new user tries to use a Username which has already been claimed by another user. If we did a simple PutItem operation, it would overwrite the existing User with that Username -- not a great experience!

We can use a condition expression to ensure there isn't a User with the requested Username before creating the new Item. We would adjust our call to look as follows:

$ aws dynamodb put-item \
    --table-name UsersTable \
    --item '{
      "Username": {"S": "yosemitesam"},
      "Name": {"S": "Yosemite Sam"},
      "Age": {"N": "73"}
    }' \
    --condition-expression "attribute_not_exists(#u)" \
    --expression-attribute-names '{
      "#u": "Username"
    }' \
    $LOCAL

Note that we've added a condition expression using the attribute_not_exists() function on the primary key of the table.

On first run, this Item is inserted successfully. If you try inserting the same Item again, you'll get an error:

An error occurred (ConditionalCheckFailedException) when calling the PutItem operation: The conditional request failed

The operation failed because the Username was already taken. We can return an error to the user and ask them to choose a different Username.

In the next lesson, we'll cover updating and deleting items, which will include a look at some additional expression examples.

Updating & Deleting Items

In this lesson, we'll learn about updating and deleting Items. This is the final lesson on Single-Item Actions. The next chapter is on Multi-Item Actions where we'll use Queries & Scans to operate on multiple Items at a time.

Updating Items

Previously, we used the PutItem operation to insert Items into our DynamoDB table. We saw that this operation completely overwrites any existing Item in the table. To counteract this, we used a condition expression to only insert the Item if an Item with that primary key did not exist previously.

At other times, it is useful to update an existing Item by modifying one or two attributes but leaving the other attributes unchanged. DynamoDB has an UpdateItem operation which allows you to update an Item directly without first retrieving the Item, manipulating it as desired, then saving it back with a PutItem operation.

When using the UpdateItem action, you need to specify an update expression. This describes the update actions you want to take and uses the expression syntax.

When using the update expression, you must include one of four update clauses. These clauses are:

SET: Used to add an attribute to an Item or modify an existing attribute;
REMOVE: Used to delete attributes from an Item.
ADD: Used to increment/decrement a Number or insert elements into a Set.
DELETE: Used to remove items from a Set.

Let's check a few of these by example.

Using the SET update clause

The most common update clause is "SET", which is used to add an attribute to an Item if the attribute does not exist or to overwrite the existing attribute value if it does exist.

Returning to our initial PutItem examples, perhaps we want to have a DateOfBirth attribute for our Users. Without the UpdateItem action, we would need to retrieve the Item with a GetItem call and then reinsert the Item with a DateOfBirth attribute via a PutItem call. With the UpdateItem call, we can just insert the DateOfBirth directly:

$ aws dynamodb update-item \
    --table-name UsersTable \
    --key '{
      "Username": {"S": "daffyduck"}
    }' \
    --update-expression 'SET #dob = :dob' \
    --expression-attribute-names '{
      "#dob": "DateOfBirth"
    }' \
    --expression-attribute-values '{
      ":dob": {"S": "1937-04-17"}
    }' \
    $LOCAL

Note that we used the expression attribute names and expression attribute values from the previous lesson.

If we then retrieve our Item, we can see that the DateOfBirth attribute has been added but our previous attributes are still there:

$ aws dynamodb get-item \
    --table-name UsersTable \
    --key '{
      "Username": {"S": "daffyduck"}
    }' \
    $LOCAL

{
    "Item": {
        "Username": {
            "S": "daffyduck"
        },
        "DateOfBirth": {
            "S": "1937-04-17"
        },
        "Age": {
            "N": "81"
        },
        "Name": {
            "S": "Daffy Duck"
        }
    }
}

Using the REMOVE update clause

The REMOVE clause is the opposite of the SET clause -- it deletes an attribute from an Item if it exists.

Let's use it here to remove the "DateOfBirth" attribute we just added. We're also going to add a --return-values option to have DynamoDB return the Item to us after the update so we don't have to make a separate GetItem call. The --return-values option has different options, including to return the old Item before the changes or to return only the updated attributes before they were changed. Here, we'll just use the "ALL_NEW" option to show the Item as it exists after the operation:

$ aws dynamodb update-item \
    --table-name UsersTable \
    --key '{
      "Username": {"S": "daffyduck"}
    }' \
    --update-expression 'REMOVE #dob' \
    --expression-attribute-names '{
      "#dob": "DateOfBirth"
    }' \
    --return-values 'ALL_NEW' \
    $LOCAL

{
    "Attributes": {
        "Username": {
            "S": "daffyduck"
        },
        "Age": {
            "N": "81"
        },
        "Name": {
            "S": "Daffy Duck"
        }
    }
}

We can see that our Item no longer contains a DateOfBirth attribute.

Deleting Items

The final single-item action to cover is DeleteItem. There will be times when you want to delete Items from your tables, and this is the action you'll use.

The DeleteItem action is pretty simple -- just provide the key of the Item you'd like to delete:

$ aws dynamodb delete-item \
    --table-name UsersTable \
    --key '{
      "Username": {"S": "daffyduck"}
    }' \
    $LOCAL

Your Item is deleted! If you try to retrieve your Item with a GetItem, you'll get an empty response.

Similar to the PutItem call, you can add a --condition-expression to only delete your Item under certain conditions. Let's say we want to delete Yosemite Sam, but only if he's younger than 21 years old:

$ aws dynamodb delete-item \
    --table-name UsersTable \
    --key '{
      "Username": {"S": "yosemitesam"}
    }' \
    --condition-expression "Age < :a" \
    --expression-attribute-values '{
      ":a": {"N": "21"}
    }' \
    $LOCAL

An error occurred (ConditionalCheckFailedException) when calling the DeleteItem operation: The conditional request failed

Because Yosemite Sam is 73 years old, the conditional check failed and the delete did not go through.

Conclusion

This concludes the single-item actions chapter of the DynamoDB guide. We learned about the basics of Items, including primary keys, attributes, and attribute types. We then covered inserting and retrieving Items. Then we looked at expression syntax, including expression attribute names and values. We did some conditional expressions, and wrapped up with update and delete actions.

The next chapter covers actions that operate on multiple items. This includes queries and scans, as well as how to use filter expressions. These actions take advantage of tables with composite primary keys and increase the utility of DynamoDB.

UCONN Stamford Foundations of Engineering

dynamo

Key Concepts

Tables, Items, and Attributes

Primary Key

Secondary Indexes

Read and Write Capacity

Anatomy of an Item

Primary Keys

Attributes

Attribute types

String type

Number type

Binary type

Boolean type

Null type

List type

Map type

String Set type

Number Set type

Binary Set type

Expression Basics

Basics of Expressions

Expression Placeholders

Expression Attribute Names

Expression Attribute Values

Condition Expressions

Updating & Deleting Items

Updating Items

Using the SET update clause

Using the REMOVE update clause

Deleting Items

Conclusion

No comments:

Post a Comment

Report Abuse