DynamoDB provides us simple but effective API to work with data in tables. In this blog, we will learn how to query data from DynamoDB table using Python and boto3.
Prerequisites
We are going to use the same DynamoDB table created in our last blog about how to get a single item from the DynamoDB table. If you do not have a DynamoDB table ready, you can follow the steps mentioned in that blog to create a DynamoDB table with the required access.
Query Data From DynamoDB
We can use the `query` function provided by boto3 to get data from the DynamoDB table. The query function accepts the item key as an input parameter and returns data for that item, as shown below the code.
import boto3 from pprint import pprint from boto3.dynamodb.conditions import Key boto3.setup_default_session(profile_name="ah") table_name = "orders" dynamodb = boto3.resource("dynamodb") orders_table = dynamodb.Table(table_name) def query_data(): response = orders_table.query( KeyConditionExpression="order_id = :id and order_date = :dt", ExpressionAttributeValues={ ":dt": "2022-06-28", ":id": "ord100", }, ) pprint(response["Items"]) print(f"ScannedCount: {response['ScannedCount']}") # output """ [{'item_count': Decimal('50'), 'order_date': '2022-06-28', 'order_id': 'ord100' }] ScannedCount: 1 """
In the code, we are creating a DynamoDB table resource for the `orders` table. Using this table resource, we are querying for data. Please note that we are using the `KeyConditionExpression` parameter to pass item key information.
You must be wondering if we could get data for a single item using the `get_item` function what is the use of the `query` function? Unlike the `get_item` function, the query function for DynamoDB can accept only the primary key of an item and can return multiple items matching that primary key. Let us look at the example below.
def query_data_with_primary(): response = orders_table.query( KeyConditionExpression="order_id = :id", ExpressionAttributeValues={ ":id": "ord100", }, ) print(f"total items: {len(response['Items'])}") print(f"ScannedCount: {response['ScannedCount']}") # output """ total items: 45 ScannedCount: 45 """
In the above code, we are trying to get all items with the primary key of `ord100`. The output shows we have 45 such items in the table.
Query DynamoDB data using between clause
So we can fetch multiple items from the DynamoDB table using the `query` function. What else it can do? The query function is much more powerful, and we can use it to filter out more items based on the sort key of the item. Let us look at the example code below.
def query_data_with_between(): response = orders_table.query( KeyConditionExpression="order_id = :id and order_date between :dt1 and :dt2", ExpressionAttributeValues={ ":id": "ord100", ":dt1": "2022-07-01", ":dt2": "2022-07-31", }, ) print(f"total items: {len(response['Items'])}") print(f"ScannedCount: {response['ScannedCount']}") # output """ total items: 31 ScannedCount: 31 """
In the above code, we are using the `between` operation on the item’s sort key. The above query will return us items that have order_id (which is the primary key) as `ord100` and an order date (which is the sort key of the item) between 2022-07-01 and 2022-07-31.
Apart from `between`, there are multiple comparison operators that we can use on the item’s sort key in the query operation.
- Equal (EQ) and Not Equal (NE)
- Less Than(LT), Greater Than(GT), Less Than or Equal (LE), Greater Than or Equal (GE)
- NOT_NULL
- NULL
- CONTAINS
- NOT_CONTAINS
- BEGINS_WITH
- IN
The above operators are self-explanatory and can be used to write flexible queries for the DynamoDB table. You can read more about them in boto3 docs.
Sorting with DynamoDB Query
With the query function, we get multiple items from the DynamoDB table. Often you will need to sort those items as we do in SQL queries with the `ORDER BY`clause. We can also sort our result set by passing the `ScanIndexForward`to DynamoDB query function.
def query_data_with_sort(): # sorts items in descending order based on sort key when ScanIndexForward=False # sorts items in ascending order based on sort key when ScanIndexForward=True which is default response = orders_table.query( KeyConditionExpression="order_id = :id and order_date between :dt1 and :dt2", ExpressionAttributeValues={ ":id": "ord100", ":dt1": "2022-06-28", ":dt2": "2022-06-30", }, ScanIndexForward=False, ) pprint(response["Items"]) # output """ [ {'item_count': Decimal('48'), 'order_date': '2022-06-30', 'order_id': 'ord100'}, {'item_count': Decimal('49'), 'order_date': '2022-06-29', 'order_id': 'ord100'}, {'item_count': Decimal('50'), 'order_date': '2022-06-28', 'order_id': 'ord100'} ] """
In the above code, we have sorted items in descending order based on order_date by passing the `ScanIndexForward` parameter with the value `False`.
In SQL, we can order our data based on any table column. In DynamoDB, the query function can only sort data based on the sort key of an item.
Limit the number of items returned with the DynamoDB Query
Another popular use case is to limit the number of items returned by the Query function. We can do that as shown below.
def query_data_with_limit(): response = orders_table.query( KeyConditionExpression="order_id = :id and order_date between :dt1 and :dt2", ExpressionAttributeValues={ ":id": "ord100", ":dt1": "2022-06-28", ":dt2": "2022-06-30", }, Limit=1, ) pprint(response["Items"]) # output """ [{'item_count': Decimal('50'), 'order_date': '2022-06-28', 'order_id': 'ord100'}] """
Pagination with DynamoDB query
We can also use pagination with our DynamoDB queries. Let us look at the example below.
def query_data_with_pagination(): response = orders_table.query( KeyConditionExpression="order_id = :id and order_date between :dt1 and :dt2", ExpressionAttributeValues={ ":id": "ord100", ":dt1": "2022-06-28", ":dt2": "2022-06-30", }, Limit=1, ) pprint(response["Items"]) while response.get("LastEvaluatedKey"): response = orders_table.query( ExclusiveStartKey=response.get("LastEvaluatedKey"), KeyConditionExpression="order_id = :id and order_date between :dt1 and :dt2", ExpressionAttributeValues={ ":id": "ord100", ":dt1": "2022-06-28", ":dt2": "2022-06-30", }, Limit=1, ) print("### New Call ###") pprint(response["Items"]) # output """ [{'item_count': Decimal('50'), 'order_date': '2022-06-28', 'order_id': 'ord100'}] ### New Call ### [{'item_count': Decimal('49'), 'order_date': '2022-06-29', 'order_id': 'ord100'}] ### New Call ### [{'item_count': Decimal('48'), 'order_date': '2022-06-30', 'order_id': 'ord100'}] """
In the above code,
- First, we query and get 1 item back ( as we have set a limit of 1)
- If there are more items we can fetch using the same query, a response will have the `LastEvaluatedKey` parameter.
- We can use a value from `LastEvaluatedKey`, to start our query from an item we have already received.
- We will keep querying for data until there is no `LastEvaluatedKey` in response, signaling that all items matching the current query have been fetched.
Conclusion
In this blog, we have learned how to query data from the DynamoDB table. We have also learned how to filter data based on sort keys, and how to sort and use limit and pagination with DynamoDB query. In the next blog, we will learn how to update items in DynamoDB. See you there.