In this tutorial, we are going to learn few ways to list files in S3 bucket. In this series of blogs, we are using python to work with AWS S3.

Python with boto3 offers the list_objects_v2 function along with its paginator to list files in the S3 bucket efficiently. Let us learn how we can use this function and write our code.

Setting up permissions for S3

For this tutorial to work, we will need an IAM user who has access to upload a file to S3. We can configure this user on our local machine using AWS CLI or we can use its credentials directly in python script. We have already covered this topic on how to create an IAM user with S3 access. If you do not have this user setup please follow that blog first and then continue with this blog.

List files in S3 using client

First, we will list files in S3 using the s3 client provided by boto3. In S3 files are also called objects. Hence function that lists files is named as list_objects_v2.

There is also function list_objects but AWS recommends using its list_objects_v2 and the old function is there only for backward compatibility.

Before we list down our files from the S3 bucket using python, let us check what we have in our S3 bucket. In my case, bucket “testbucket-frompython-2” contains a couple of folders and few files in the root path. Folders also have few files in them.

List Files in S3 bucket from Console
List files in S3 bucket from a console

Now, let us write code that will list all files in an S3 bucket using python.

def list_s3_files_using_client():
    """
    This functions list all files in s3 bucket.
    :return: None
    """

    s3_client = boto3.client("s3")
    bucket_name = "testbucket-frompython-2"
    response = s3_client.list_objects_v2(Bucket=bucket_name)
    files = response.get("Contents")
    for file in files:
        print(f"file_name: {file['Key']}, size: {file['Size']}")

When we run this code we will see the below output. We can see that this function has listed all files from our S3 bucket.

List S3 files output
Output when we run s3 list files function

In the above code, we have not specified any user credentials. In such cases, boto3 uses the default AWS CLI profile set up on your local machine. You can also specify which profile should be used by boto3 if you have multiple profiles on your machine. All you need to do is add the below line to your code.

# setting up default profile for session
boto3.setup_default_session(profile_name='PROFILE_NAME_FROM_YOUR_MACHINE')

Another option is you can specify the access key id and secret access key in the code itself. This is not recommended approach and I strongly believe using IAM credentials directly in code should be avoided in most cases. You can use access key id and secret access key in code as shown below, in case you have to do this.


s3 = boto3.client("s3", 
                aws_access_key_id=ACCESS_KEY,
                aws_secret_access_key=SECRET_KEY)

Listing files from some folder from the S3 bucket

Often we will not have to list all files from the S3 bucket but just list files from one folder. In that case, we can use list_objects_v2 and pass which prefix as the folder name. Let us list all files from the images folder and see how it works.

def list_s3_files_in_folder_using_client():
    """
    This function will list down all files in a folder from S3 bucket
    :return: None
    """
    s3_client = boto3.client("s3")
    bucket_name = "testbucket-frompython-2"
    response = s3_client.list_objects_v2(Bucket=bucket_name, Prefix="images")
    files = response.get("Contents")
    for file in files:
        print(f"file_name: {file['Key']}, size: {file['Size']}")

As you can see it is easy to list files from one folder by using the “Prefix” parameter.

List files in S3 using paginator

S3 buckets can have thousands of files/objects. If your bucket has too many objects using simple list_objects_v2 will not help you. By default, this function only lists 1000 objects at a time. So how do we list all files in the S3 bucket if we have more than 1000 objects?

In such cases, we can use the paginator with the list_objects_v2 function. This way, it fetches n number of objects in each run and then goes and fetches next n objects until it lists all the objects from the S3 bucket. Let us see how we can use paginator.

def list_s3_files_using_paginator():
    """
    This functions list all files in s3 using paginator.
    Paginator is useful when you have 1000s of files in S3.
    S3 list_objects_v2 can list at max 1000 files in one go.
    :return: None
    """
    s3_client = boto3.client("s3")
    bucket_name = "testbucket-frompython-2"
    paginator = s3_client.get_paginator("list_objects_v2")
    response = paginator.paginate(Bucket=bucket_name, PaginationConfig={"PageSize": 2})
    for page in response:
        print("getting 2 files from S3")
        files = page.get("Contents")
        for file in files:
            print(f"file_name: {file['Key']}, size: {file['Size']}")
        print("#" * 10)

When you run the above function, the paginator will fetch 2 (as our PageSize is 2) files in each run until all files are listed from the bucket. You can set PageSize from 1 to 1000.

List files from S3 bucket using resource

Apart from the S3 client, we can also use the S3 resource object from boto3 to list files. S3 resource first creates bucket object and then uses that to list files from that bucket.

def list_s3_files_using_resource():
    """
    This functions list files from s3 bucket using s3 resource object.
    :return: None
    """
    s3_resource = boto3.resource("s3")
    s3_bucket = s3_resource.Bucket("testbucket-frompython-2")
    files = s3_bucket.objects.all()
    for file in files:
        print(file)

You can also use Prefix to list files from a single folder and Paginator to list 1000s of S3 objects with resource class.

Conclusion

In this blog, we have written code to list files/objects from the S3 bucket using python and boto3. I hope you have found this useful. You can find code from this blog in the GitHub repo. In the next blog, we will learn about the object access control lists (ACLs) in AWS S3. See you there 🙂

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *

3 Comments