Skip to content

Quickest Ways to List Files in S3 Bucket

Posted on:October 1, 2023 at 05:17 AM

This tutorial will teach us how to list files in an S3 bucket. In this blog series, we use Python to work with AWS S3.

Python with boto3 offers the list_objects_v2 function and its paginator to efficiently list files in the S3 bucket. Let’s learn how to use this function and write our code.

Setting up permissions for S3

For this tutorial to work, we will need an IAM user who has access to upload a file to S3. We can configure this user on our local machine using AWS CLI or use its credentials directly in Python script. We have already covered how to create an IAM user with S3 access. If you do not have this user setup, please follow that blog first and then continue with this blog.

List files in S3 using client

First, we will list files in S3 using the s3 client provided by boto3. In S3, files are also called objects. Hence, the function that lists files is named as list_objects_v2.

There is also the function list_objects, but AWS recommends using its list_objects_v2, and the old function is there only for backward compatibility.

Before we list down our files from the S3 bucket using Python, let us check what we have in our S3 bucket. In my case, bucket “testbucket-frompython-2” contains a couple of folders and a few files in the root path. Folders also have a few files in them.

List Files in S3 bucket from Console

Now, let us write code that will list all files in an S3 bucket using Python.

def list_s3_files_using_client():
    """
    This functions list all files in s3 bucket.
    :return: None
    """

    s3_client = boto3.client("s3")
    bucket_name = "testbucket-frompython-2"
    response = s3_client.list_objects_v2(Bucket=bucket_name)
    files = response.get("Contents")
    for file in files:
        print(f"file_name: {file['Key']}, size: {file['Size']}")

When we run this code, we will see the output below. We can see that this function has listed all files from our S3 bucket.

List S3 files output

In the above code, we have not specified any user credentials. Boto3 uses your local machine’s default AWS CLI profile in such cases. You can also specify which profile should be used by boto3 if you have multiple profiles on your machine. All you need to do is add the below line to your code.

# setting up default profile for session
boto3.setup_default_session(profile_name='PROFILE_NAME_FROM_YOUR_MACHINE')

Another option is to specify the code’s access key ID and secret access key. This is not a recommended approach, and I believe using IAM credentials directly in code should be avoided in most cases. If you must do this, you can use the access key ID and secret access key in the code.

s3 = boto3.client("s3",
                aws_access_key_id=ACCESS_KEY,
                aws_secret_access_key=SECRET_KEY)

Listing files from some folder from the S3 bucket

Often, we will not have to list all files from the S3 bucket but will list files from one folder. In that case, we can use list_objects_v2 and pass which prefix as the folder name. Let us list all files from the images folder and see how it works.

def list_s3_files_in_folder_using_client():
    """
    This function will list down all files in a folder from S3 bucket
    :return: None
    """
    s3_client = boto3.client("s3")
    bucket_name = "testbucket-frompython-2"
    response = s3_client.list_objects_v2(Bucket=bucket_name, Prefix="images")
    files = response.get("Contents")
    for file in files:
        print(f"file_name: {file['Key']}, size: {file['Size']}")

As you can see, it is easy to list files from one folder using the “Prefix” parameter.

List files in S3 using paginator

S3 buckets can have thousands of files/objects. If your bucket has too many objects, using list_objects_v2 will not help. By default, this function lists only 1000 objects at a time. So, how do we list all files in the S3 bucket if we have more than 1000 objects?

In such cases, we can use the paginator with the list_objects_v2 function. This way, it fetches n number of objects in each run and then fetches the following n objects until it lists all the objects from the S3 bucket. Let us see how we can use the paginator.

def list_s3_files_using_paginator():
    """
    This functions list all files in s3 using paginator.
    Paginator is useful when you have 1000s of files in S3.
    S3 list_objects_v2 can list at max 1000 files in one go.
    :return: None
    """
    s3_client = boto3.client("s3")
    bucket_name = "testbucket-frompython-2"
    paginator = s3_client.get_paginator("list_objects_v2")
    response = paginator.paginate(Bucket=bucket_name, PaginationConfig={"PageSize": 2})
    for page in response:
        print("getting 2 files from S3")
        files = page.get("Contents")
        for file in files:
            print(f"file_name: {file['Key']}, size: {file['Size']}")
        print("#" * 10)

/When you run the above function, the paginator will fetch 2 (as our PageSize is 2) files in each run until all files are listed from the bucket. You can set PageSize from 1 to 1000.

List files from the S3 bucket using Resource

Apart from the S3 client, we can also use the S3 resource object from boto3 to list files. S3 resource first creates a bucket object and then uses that to list files from that bucket.

def list_s3_files_using_resource():
    """
    This functions list files from s3 bucket using s3 resource object.
    :return: None
    """
    s3_resource = boto3.resource("s3")
    s3_bucket = s3_resource.Bucket("testbucket-frompython-2")
    files = s3_bucket.objects.all()
    for file in files:
        print(file)

You can also use Prefix to list files from a single folder and Paginator to list 1000s of S3 objects with resource class.

Conclusion

In this blog, we have written code to list files/objects from the S3 bucket using Python and boto3. I hope you have found this helpful. You can find code from this blog in the GitHub repo. In the next blog, we will learn about the object access control lists (ACLs) in AWS S3. See you there 🙂