In Amazon Web Services it’s possible to enable logging for the whole VPC (CloudTrail) or for various services provided by Amazon, like S3. These logs get stored in S3 buckets. They are generated every few minutes in the case of CloudTrail, or every few seconds in the case of S3. CloudTrail logs are somewhat easier to manage because they are grouped by day, but S3 logs are all stored in one “folder”, so after some time there can be tens of thousands of files making it hard to find something unless you know exactly what you’re looking for.

Mostly a learning project, this script provides functions to “collapse” many files into one. Written in Python, tested on FreeBSD and Linux, it uses Amazon’s Python SDK, boto. Obviously, boto must be installed and configured with the proper credentials. It downloads the files for a certain period of time, concatenates them into one file, uploads the new file to S3, then deletes the concatenated files from S3 and local. The basic idea would be to get all logs for a certain day/hour into one file, making it easier to find something in those logs later. Might need some code adapting.

The code is on Github, this page might not have the latest version.

The main function is collapse(), it does the whole download -> concatenate -> upload process, the rest are helpers. Hopefully the comments explain what’s going on. These functions need the connection to AWS and to the bucket to already be established and receive it as parameter.

Here’s one example of how to use it:

Opens a connection to S3 bucket named “s3logs_bucket” using the specified profile, previously configured for boto, then calls collapse_s3_backlog(), which will group by day logs between 31st of May 2014 and 31 October 2014, inclusive.

