Apr 282016
 

Emrer is a Python script that reads a YAML file and starts an EMR cluster as specified in that file.

The main advantage over other EMR automation solutions is that it will take care of uploading the bootstrap/step scripts to S3, so everything can be stored locally, both the cluster config and the scripts it’s going to execute. Which basically means that a cluster created with this script can be stored in a versioning system like Git and basically treated as code all the way.

The configuration file is YAML, easier to read and understand than JSON. The example configuration is commented.

It’s not using CloudFormation at all, when this script was initially written CloudFormation didn’t yet know how to create EMR clusters. At the time I didn’t find anything else that could do it out there either.

It could be enhanced with a kind of “plugin” system where custom procedures are executed when certain things are set up. For example, a procedure that would add required security groups to the list if they are missing from the configuration file, making sure that the cluster is compliant with company regulations.