Jun 032016

Upgraded Ansible to version 2.1 on OS X El Capitan. First run, I get this error:
AttributeError: 'EntryPoint' object has no attribute 'resolve'

Googling for it, it seems like the cause is setuptools (!?) version that is too old. I did install Ansible by running pip2 install --upgrade --user ansible, which installed it in my home directory and also upgraded the setuptools package in my home directory, but that’s not the version that Python is picking up.

I’m using system’s Python 2.7, I didn’t install another one. And the system Python is looking for modules using a path that starts with /System/Library/Frameworks/Python.framework/Versions/2.7, so it’s going to pick up the system setuptools instead of the one in my home directory. This package can’t be upgraded because it’s protected by SIP and disabling that and updating the package might cause the system to misbehave.

One solution is to export PYTHONPATH in my environment. Problem is, I’m using both Python2 and Python3 and that variable applies to both. Setting PYTHONPATH to point to Python2’s modules would likely cause a bigger mess.

Better solution:
Create a file under the site-packages directory in my home folder ($HOME/Library/Python/2.7/lib/python/site-packages) that has a .pth extension. It’s going to be picked up before anything else runs and lines starting with import are going to be executed. This file will contain code that will insert our directory before everything else in sys.path, so it will be searched first:
import sys; sys.path = ["/Users/USERNAME/Library/Python/2.7/lib/python/site-packages"] + sys.path

Here’s a one-liner that will create the file:

Apr 282016

Emrer is a Python script that reads a YAML file and starts an EMR cluster as specified in that file.

The main advantage over other EMR automation solutions is that it will take care of uploading the bootstrap/step scripts to S3, so everything can be stored locally, both the cluster config and the scripts it’s going to execute. Which basically means that a cluster created with this script can be stored in a versioning system like Git and basically treated as code all the way.

The configuration file is YAML, easier to read and understand than JSON. The example configuration is commented.

It’s not using CloudFormation at all, when this script was initially written CloudFormation didn’t yet know how to create EMR clusters. At the time I didn’t find anything else that could do it out there either.

It could be enhanced with a kind of “plugin” system where custom procedures are executed when certain things are set up. For example, a procedure that would add required security groups to the list if they are missing from the configuration file, making sure that the cluster is compliant with company regulations.

Sep 182015

The following will create a dump of raw network packets to a file, while continuously reading that file and displaying the packets on screen in human-readable format:


  • /bin/sh -c "tcpdump -i any -w /tmp/dumpfile.cap host &" : run tcpdump in the background, dumping raw packets to /tmp/dumpfile.cap
  • sleep 1 : wait a second for the file to be created and the header to be written to it. without waiting, you’ll probably get “bad dump file format”
  • tail -n 1000 -f /tmp/dumpfile.cap : tail the dump file. The point of -n is to get the whole file, from the start, including the header. Avoids “bad dump file format” error
  • tcpdump -r - : reads from stdin, which is actually the contents of /tmp/dumpfile.cap and displays to stdout in human-readable format.

IMPORTANT: Interrupting with CTRL+C will NOT kill the backgrounded tcpdump. Don’t forget to do that too if it’s not limited somehow, otherwise it will fill up the disk.

Feb 062015

A bucket policy that will deny access to anyone not coming from the specified IP addresses. Used in combination with IAM groups that allow access to S3, the net result will be that users will be allowed the access given to the group they belong to, but only if they are coming from one of the IP address specified in this policy, which is going to be attached to the bucket. Using an “Allow” policy, like in Amazon’s example, would allow anyone coming from those IPs full access, effectively defeating the purpose of group-based policies.
Continue reading »

Feb 062015

AWS permissions intended for a group containing users that will monitor the environment, but should not have access to data and are not allowed to make any changes. Should allow members to check the health of services or run periodic reviews. Basically a modified version of Amazon’s Read-Only policy template. In order to cut access to potentially dangerous information, some access was removed:

    • DynamoDB and Kinesis:Get* because those would reveal data
      ElasticBeanstalk and Opsworks because the information there is potentially dangerous
      S3 objects, but it does give permissions to access S3 bucket policy
  • Continue reading »

    Jan 032015

    In Amazon Web Services it’s possible to enable logging for the whole VPC (CloudTrail) or for various services provided by Amazon, like S3. These logs get stored in S3 buckets. They are generated every few minutes in the case of CloudTrail, or every few seconds in the case of S3. CloudTrail logs are somewhat easier to manage because they are grouped by day, but S3 logs are all stored in one “folder”, so after some time there can be tens of thousands of files making it hard to find something unless you know exactly what you’re looking for.

    Mostly a learning project, this script provides functions to “collapse” many files into one. Written in Python, tested on FreeBSD and Linux, it uses Amazon’s Python SDK, boto. Obviously, boto must be installed and configured with the proper credentials. It downloads the files for a certain period of time, concatenates them into one file, uploads the new file to S3, then deletes the concatenated files from S3 and local. The basic idea would be to get all logs for a certain day/hour into one file, making it easier to find something in those logs later. Might need some code adapting.

    The code is on Github, this page might not have the latest version.

    Continue reading »

    Sep 192014

    Had a situation at $WORK where network connections were just hanging there, open, with no activity. So I needed to send something, whatever, on the open connection, just to see how it behaves.

    TL;DR: attach to the process using the debugger, gdb(1), then call send(2) on the network socket
    Continue reading »

    Sep 152014

    In order to take advantage of shared clipboard, seamless integration and drag&drop features of VirtualBox two things are needed. First, the guest additions need to be installed, pretty straight-forward in most distributions. Second, VBoxClient needs to be started. This depends on desktop environment and distribution, some start it from the get-go, others don’t. If it’s started, it should show in the list of processes. If it isn’t, there’s a script that starts and enables all features, it’s called VBoxClient-all. Just add that one to the startup list of the DE.