Jun 032016

Upgraded Ansible to version 2.1 on OS X El Capitan. First run, I get this error:
AttributeError: 'EntryPoint' object has no attribute 'resolve'

Googling for it, it seems like the cause is setuptools (!?) version that is too old. I did install Ansible by running pip2 install --upgrade --user ansible, which installed it in my home directory and also upgraded the setuptools package in my home directory, but that’s not the version that Python is picking up.

I’m using system’s Python 2.7, I didn’t install another one. And the system Python is looking for modules using a path that starts with /System/Library/Frameworks/Python.framework/Versions/2.7, so it’s going to pick up the system setuptools instead of the one in my home directory. This package can’t be upgraded because it’s protected by SIP and disabling that and updating the package might cause the system to misbehave.

One solution is to export PYTHONPATH in my environment. Problem is, I’m using both Python2 and Python3 and that variable applies to both. Setting PYTHONPATH to point to Python2’s modules would likely cause a bigger mess.

Better solution:
Create a file under the site-packages directory in my home folder ($HOME/Library/Python/2.7/lib/python/site-packages) that has a .pth extension. It’s going to be picked up before anything else runs and lines starting with import are going to be executed. This file will contain code that will insert our directory before everything else in sys.path, so it will be searched first:
import sys; sys.path = ["/Users/USERNAME/Library/Python/2.7/lib/python/site-packages"] + sys.path

Here’s a one-liner that will create the file:

Apr 282016

Emrer is a Python script that reads a YAML file and starts an EMR cluster as specified in that file.

The main advantage over other EMR automation solutions is that it will take care of uploading the bootstrap/step scripts to S3, so everything can be stored locally, both the cluster config and the scripts it’s going to execute. Which basically means that a cluster created with this script can be stored in a versioning system like Git and basically treated as code all the way.

The configuration file is YAML, easier to read and understand than JSON. The example configuration is commented.

It’s not using CloudFormation at all, when this script was initially written CloudFormation didn’t yet know how to create EMR clusters. At the time I didn’t find anything else that could do it out there either.

It could be enhanced with a kind of “plugin” system where custom procedures are executed when certain things are set up. For example, a procedure that would add required security groups to the list if they are missing from the configuration file, making sure that the cluster is compliant with company regulations.

Jan 032015

In Amazon Web Services it’s possible to enable logging for the whole VPC (CloudTrail) or for various services provided by Amazon, like S3. These logs get stored in S3 buckets. They are generated every few minutes in the case of CloudTrail, or every few seconds in the case of S3. CloudTrail logs are somewhat easier to manage because they are grouped by day, but S3 logs are all stored in one “folder”, so after some time there can be tens of thousands of files making it hard to find something unless you know exactly what you’re looking for.

Mostly a learning project, this script provides functions to “collapse” many files into one. Written in Python, tested on FreeBSD and Linux, it uses Amazon’s Python SDK, boto. Obviously, boto must be installed and configured with the proper credentials. It downloads the files for a certain period of time, concatenates them into one file, uploads the new file to S3, then deletes the concatenated files from S3 and local. The basic idea would be to get all logs for a certain day/hour into one file, making it easier to find something in those logs later. Might need some code adapting.

The code is on Github, this page might not have the latest version.

Continue reading »

Aug 262014

Short demo script in Python that monitors the VPN tunnels in Amazon Web Services. It queries the current state every 1.5 seconds in a loop and if the state changes it writes the new state to a log file. Needs the boto library.