DynamoDB provides a simple but effective API to work with data in tables. In this blog, we will learn how to query data from the DynamoDB table using Python and boto3.
Table of contents
Open Table of contents
Prerequisites
We will use the same DynamoDB table created in our last blog about how to get a single item from the DynamoDB table. If you do not have a DynamoDB table ready, you can follow the steps mentioned in that blog to create a DynamoDB table with the required access.
Query Data From DynamoDB
We can use the query
The function provided by boto3 is to get data from the DynamoDB table. The query function accepts the item key as an input parameter and returns data for that item, as shown in the code below.
import boto3
from pprint import pprint
from boto3.dynamodb.conditions import Key
boto3.setup_default_session(profile_name="ah")
table_name = "orders"
dynamodb = boto3.resource("dynamodb")
orders_table = dynamodb.Table(table_name)
def query_data():
response = orders_table.query(
KeyConditionExpression="order_id = :id and order_date = :dt",
ExpressionAttributeValues={
":dt": "2022-06-28",
":id": "ord100",
},
)
pprint(response["Items"])
print(f"ScannedCount: {response['ScannedCount']}")
# output
"""
[{'item_count': Decimal('50'),
'order_date': '2022-06-28',
'order_id': 'ord100'
}]
ScannedCount: 1
"""
In the code, we are creating a DynamoDB table resource for the orders
table. Using this table resource, we are querying for data. Please note that we are using the KeyConditionExpression
parameter to pass item key information.
You must be wondering if we could get data for a single item using the get_item
function; what is the use of the query
function? Unlike the get_item
function, the query function for DynamoDB can accept only the primary key of an item and can return multiple items matching that primary key. Let us look at the example below.
def query_data_with_primary():
response = orders_table.query(
KeyConditionExpression="order_id = :id",
ExpressionAttributeValues={
":id": "ord100",
},
)
print(f"total items: {len(response['Items'])}")
print(f"ScannedCount: {response['ScannedCount']}")
# output
"""
total items: 45
ScannedCount: 45
"""
In the above code, we are trying to get all items with the primary key of ord100
. The output shows we have 45 such items in the table.
Query DynamoDB data using between clause
So, we can fetch multiple items from the DynamoDB table using the query
function. What else can it do? The query function is much more powerful, and we can use it to filter out more items based on the sort key of the item. Let us look at the example code below.
def query_data_with_between():
response = orders_table.query(
KeyConditionExpression="order_id = :id and order_date between :dt1 and :dt2",
ExpressionAttributeValues={
":id": "ord100",
":dt1": "2022-07-01",
":dt2": "2022-07-31",
},
)
print(f"total items: {len(response['Items'])}")
print(f"ScannedCount: {response['ScannedCount']}")
# output
"""
total items: 31
ScannedCount: 31
"""
In the above code, we use the between
operation on the item’s sort key. The above query will return us items that have order_id (which is the primary key) as ord100
and an order date (which is the sort key of the item) between 2022-07-01 and 2022-07-31.
Apart from between
We can use multiple comparison operators on the item’s sort key in the query operation.
- Equal (EQ) and Not Equal (NE)
- Less Than(LT), Greater Than(GT), Less Than or Equal (LE), Greater Than or Equal (GE)
- NOT_NULL
- NULL
- CONTAINS
- NOT_CONTAINS
- BEGINS_WITH
- IN
The above operators are self-explanatory and can be used to write flexible queries for the DynamoDB table. You can read more about them in boto3 docs.
Sorting with DynamoDB Query
With the query function, we get multiple items from the DynamoDB table. Often you will need to sort those items as we do in SQL queries with the ORDER BY
clause. We can also sort our result set by passing the ScanIndexForward
to DynamoDB query function.
def query_data_with_sort():
# sorts items in descending order based on sort key when ScanIndexForward=False
# sorts items in ascending order based on sort key when ScanIndexForward=True which is default
response = orders_table.query(
KeyConditionExpression="order_id = :id and order_date between :dt1 and :dt2",
ExpressionAttributeValues={
":id": "ord100",
":dt1": "2022-06-28",
":dt2": "2022-06-30",
},
ScanIndexForward=False,
)
pprint(response["Items"])
# output
"""
[
{'item_count': Decimal('48'),
'order_date': '2022-06-30',
'order_id': 'ord100'},
{'item_count': Decimal('49'),
'order_date': '2022-06-29',
'order_id': 'ord100'},
{'item_count': Decimal('50'),
'order_date': '2022-06-28',
'order_id': 'ord100'}
]
"""
In the above code, we have sorted items in descending order based on order_date by passing the ScanIndexForward
parameter with the value False
.
In SQL, we can order our data based on any table column. In DynamoDB, the query function can only sort data based on the sort key of an item.
Limit the number of items returned with the DynamoDB Query
Another popular use case is limiting the number of items the Query function returns. We can do that as shown below.
def query_data_with_limit():
response = orders_table.query(
KeyConditionExpression="order_id = :id and order_date between :dt1 and :dt2",
ExpressionAttributeValues={
":id": "ord100",
":dt1": "2022-06-28",
":dt2": "2022-06-30",
},
Limit=1,
)
pprint(response["Items"])
# output
"""
[{'item_count': Decimal('50'),
'order_date': '2022-06-28',
'order_id': 'ord100'}]
"""
Pagination with DynamoDB query
We can also use pagination with our DynamoDB queries. Let us look at the example below.
def query_data_with_pagination():
response = orders_table.query(
KeyConditionExpression="order_id = :id and order_date between :dt1 and :dt2",
ExpressionAttributeValues={
":id": "ord100",
":dt1": "2022-06-28",
":dt2": "2022-06-30",
},
Limit=1,
)
pprint(response["Items"])
while response.get("LastEvaluatedKey"):
response = orders_table.query(
ExclusiveStartKey=response.get("LastEvaluatedKey"),
KeyConditionExpression="order_id = :id and order_date between :dt1 and :dt2",
ExpressionAttributeValues={
":id": "ord100",
":dt1": "2022-06-28",
":dt2": "2022-06-30",
},
Limit=1,
)
print("### New Call ###")
pprint(response["Items"])
# output
"""
[{'item_count': Decimal('50'),
'order_date': '2022-06-28',
'order_id': 'ord100'}]
### New Call ###
[{'item_count': Decimal('49'),
'order_date': '2022-06-29',
'order_id': 'ord100'}]
### New Call ###
[{'item_count': Decimal('48'),
'order_date': '2022-06-30',
'order_id': 'ord100'}]
"""
In the above code,
- First, we query and get 1 item back ( as we have set a limit of 1)
- If there are more items we can fetch using the same query, a response will have the
LastEvaluatedKey
parameter. - We can use a value from
LastEvaluatedKey
, to start our query from an item we have already received. - We will keep querying for data until there is no
LastEvaluatedKey
in response, signaling that all items matching the current query have been fetched.
Conclusion
In this blog, we have learned how to query data from the DynamoDB table. We have also learned how to filter data based on sort keys and how to sort and use limit and pagination with DynamoDB query. In the next blog, we will learn how to update items in DynamoDB. See you there.