Skip to content

Query Data From DynamoDB Table With Python

Posted on:August 5, 2023 at 05:17 AM

DynamoDB provides a simple but effective API to work with data in tables. In this blog, we will learn how to query data from the DynamoDB table using Python and boto3.

Table of contents

Open Table of contents

Prerequisites

We will use the same DynamoDB table created in our last blog about how to get a single item from the DynamoDB table. If you do not have a DynamoDB table ready, you can follow the steps mentioned in that blog to create a DynamoDB table with the required access.

Query Data From DynamoDB

We can use the query The function provided by boto3 is to get data from the DynamoDB table. The query function accepts the item key as an input parameter and returns data for that item, as shown in the code below.

import boto3
from pprint import pprint
from boto3.dynamodb.conditions import Key

boto3.setup_default_session(profile_name="ah")

table_name = "orders"
dynamodb = boto3.resource("dynamodb")
orders_table = dynamodb.Table(table_name)


def query_data():
    response = orders_table.query(
        KeyConditionExpression="order_id = :id and order_date = :dt",
        ExpressionAttributeValues={
            ":dt": "2022-06-28",
            ":id": "ord100",
        },
    )

    pprint(response["Items"])
    print(f"ScannedCount: {response['ScannedCount']}")

    # output
    """
    [{'item_count': Decimal('50'),
      'order_date': '2022-06-28',
      'order_id': 'ord100'
    }]
    ScannedCount: 1
    """

In the code, we are creating a DynamoDB table resource for the orders table. Using this table resource, we are querying for data. Please note that we are using the KeyConditionExpression parameter to pass item key information.

You must be wondering if we could get data for a single item using the get_item function; what is the use of the query function? Unlike the get_item function, the query function for DynamoDB can accept only the primary key of an item and can return multiple items matching that primary key. Let us look at the example below.

def query_data_with_primary():
    response = orders_table.query(
        KeyConditionExpression="order_id = :id",
        ExpressionAttributeValues={
            ":id": "ord100",
        },
    )

    print(f"total items: {len(response['Items'])}")
    print(f"ScannedCount: {response['ScannedCount']}")

    # output
    """
    total items: 45
    ScannedCount: 45
    """

In the above code, we are trying to get all items with the primary key of ord100. The output shows we have 45 such items in the table.

Query DynamoDB data using between clause

So, we can fetch multiple items from the DynamoDB table using the query function. What else can it do? The query function is much more powerful, and we can use it to filter out more items based on the sort key of the item. Let us look at the example code below.

def query_data_with_between():
    response = orders_table.query(
        KeyConditionExpression="order_id = :id and order_date between :dt1 and :dt2",
        ExpressionAttributeValues={
            ":id": "ord100",
            ":dt1": "2022-07-01",
            ":dt2": "2022-07-31",
        },
    )

    print(f"total items: {len(response['Items'])}")
    print(f"ScannedCount: {response['ScannedCount']}")

    # output
    """
    total items: 31
    ScannedCount: 31
    """

In the above code, we use the between operation on the item’s sort key. The above query will return us items that have order_id (which is the primary key) as ord100 and an order date (which is the sort key of the item) between 2022-07-01 and 2022-07-31.

Apart from betweenWe can use multiple comparison operators on the item’s sort key in the query operation.

The above operators are self-explanatory and can be used to write flexible queries for the DynamoDB table. You can read more about them in boto3 docs.

Sorting with DynamoDB Query

With the query function, we get multiple items from the DynamoDB table. Often you will need to sort those items as we do in SQL queries with the ORDER BYclause. We can also sort our result set by passing the ScanIndexForwardto DynamoDB query function.

def query_data_with_sort():
    # sorts items in descending order based on sort key when ScanIndexForward=False
    # sorts items in ascending order based on sort key when ScanIndexForward=True which is default
    response = orders_table.query(
        KeyConditionExpression="order_id = :id and order_date between :dt1 and :dt2",
        ExpressionAttributeValues={
            ":id": "ord100",
            ":dt1": "2022-06-28",
            ":dt2": "2022-06-30",
        },
        ScanIndexForward=False,
    )

    pprint(response["Items"])

    # output
    """
    [
     {'item_count': Decimal('48'),
      'order_date': '2022-06-30',
      'order_id': 'ord100'},
     {'item_count': Decimal('49'),
      'order_date': '2022-06-29',
      'order_id': 'ord100'},
     {'item_count': Decimal('50'),
      'order_date': '2022-06-28',
      'order_id': 'ord100'}
    ]
    """

In the above code, we have sorted items in descending order based on order_date by passing the ScanIndexForward parameter with the value False.

In SQL, we can order our data based on any table column. In DynamoDB, the query function can only sort data based on the sort key of an item.

Limit the number of items returned with the DynamoDB Query

Another popular use case is limiting the number of items the Query function returns. We can do that as shown below.

def query_data_with_limit():
    response = orders_table.query(
        KeyConditionExpression="order_id = :id and order_date between :dt1 and :dt2",
        ExpressionAttributeValues={
            ":id": "ord100",
            ":dt1": "2022-06-28",
            ":dt2": "2022-06-30",
        },
        Limit=1,
    )

    pprint(response["Items"])

    # output
    """
    [{'item_count': Decimal('50'),
      'order_date': '2022-06-28',
      'order_id': 'ord100'}]
    """

Pagination with DynamoDB query

We can also use pagination with our DynamoDB queries. Let us look at the example below.

def query_data_with_pagination():
    response = orders_table.query(
        KeyConditionExpression="order_id = :id and order_date between :dt1 and :dt2",
        ExpressionAttributeValues={
            ":id": "ord100",
            ":dt1": "2022-06-28",
            ":dt2": "2022-06-30",
        },
        Limit=1,
    )
    pprint(response["Items"])
    while response.get("LastEvaluatedKey"):
        response = orders_table.query(
            ExclusiveStartKey=response.get("LastEvaluatedKey"),
            KeyConditionExpression="order_id = :id and order_date between :dt1 and :dt2",
            ExpressionAttributeValues={
                ":id": "ord100",
                ":dt1": "2022-06-28",
                ":dt2": "2022-06-30",
            },
            Limit=1,
        )
        print("### New Call ###")
        pprint(response["Items"])

    # output
    """
    [{'item_count': Decimal('50'),
      'order_date': '2022-06-28',
      'order_id': 'ord100'}]
    ### New Call ###
    [{'item_count': Decimal('49'),
      'order_date': '2022-06-29',
      'order_id': 'ord100'}]
    ### New Call ###
    [{'item_count': Decimal('48'),
      'order_date': '2022-06-30',
      'order_id': 'ord100'}]
    """

In the above code,

  1. First, we query and get 1 item back ( as we have set a limit of 1)
  2. If there are more items we can fetch using the same query, a response will have the LastEvaluatedKey parameter.
  3. We can use a value from LastEvaluatedKey, to start our query from an item we have already received.
  4. We will keep querying for data until there is no LastEvaluatedKey in response, signaling that all items matching the current query have been fetched.

Conclusion

In this blog, we have learned how to query data from the DynamoDB table. We have also learned how to filter data based on sort keys and how to sort and use limit and pagination with DynamoDB query. In the next blog, we will learn how to update items in DynamoDB. See you there.