Mar 122017


There’s also information on generating self-signed multi domain or subject alternative name (SAN) certificates below.
Continue reading »

Feb 152017

This post is mainly about SSE – Server Side Encryption. It was hard for me to understand, got quite confused along the way. The fact that the documentation on it is somewhat spread over several services didn’t help either. Trying to put it in simpler terms here.

In order to understand how Server Side Encryption works in AWS, first we should understand the envelope encryption process.

I. Envelope encryption.
Assume we want to encrypt several files, but “files” can mean any kind of data. The process goes through a few stages.

First, an encryption key must be created, to be used as master_key. Amazon also refer to this as a “customer master key”, or CMK. Once the master key is in place, the encryption/decryption follows the following steps:


  1. Create another encryption key, called a data_key. This is the key used to encrypt the file.
  2. Encrypt file with the data_key, resulting in encrypted_file.
  3. Encrypt the data_key with the master_key, resulting in an encrypted_data_key
  4. Discard the data_key. This is important. After this step we’re left with the master_key, the encrypted_file and the encrypted_data_key. The only unencrypted bit is the master key, so in order to decrypt, access to the master key is required.
  5. Store encrypted_file together with encrypted_data_key

Repeat steps 1 to 5 for each file.


  1. Retrieve encrypted_file and encrypted_data_key.
  2. Decrypt encrypted_data_key with master_key, resulting in data_key.
  3. Decrypt encrypted_file with data_key, resulting in file.
  4. Discard data_key.

Amazon offer three types of Server Side Encryption (SSE), SSE-C, SSE-KMS and SSE-S3. The encryption of data is done by Amazon in all cases, the main difference is in where the encryption key comes from. All SSE encryption will send unencrypted data to Amazon for encryption and assumes Amazon is trusted in this regard.

SSE-C, stands for Server Side Encryption with Client-Provided Keys:

  • The client provides the key in the encryption request. It doesn’t involve envelope encryption, the provided encryption key is used directly. Similarily, the key will have to be provided in order to decrypt.
  • AWS don’t store the key, only randomly salted HMAC value of the key in order to validate future requests.
  • Provides the biggest amount of control, but it requires highest effort.
  • Cloud HSM can be used to manage encryption keys.

SSE-KMS, stands for Server-Side Encryption with AWS KMS-Managed Keys:

  • Goes through the envelope encryption process.
  • Encryption key is managed by KMS.
  • Multiple master keys can be used and stored in KMS.
  • Access to a KMS key is subject to IAM policies and can be used as an additional layer of control. For example, in order to read the data contained in an object stored in an S3 bucket using KMS encryption, the user would have to (1) have access to the object and (2) have access to the KMS key in order to decrypt the object contents.
  • Master keys never leave KMS, they are never seen. So, for example, there is no need to rotate keys when an employee leaves the business, since there is no way for the employee to get the key.
  • All access to keys is logged in CloudTrail, so an audit log can be provided.
  • It’s a compromise between level of control and ease of use.

SSE-S3, stands for Server-Side Encryption with Amazon S3-Managed Keys:

  • Goes through the envelope encryption process.
  • Master key is managed by Amazon’s S3 service, with no involvement from the client.
  • Data is encrypted and decrypted transparently for users that are procvided access to that data via IAM policies.
  • Easiest to use, but the least amount of control.

III. Client side encryption
Detailed in the AWS docs, there’s basically two cases.

  1. The encryption key is stored in KMS. Easier to manage, but it also means that Amazon have access to the key in theory, hence they have access to the data, even though it was encrypted on the client side. Arguably, the only advantage over SSE-KMS is that data is sent encrypted over the wire.
  2. Customer provided key. The key (client-side key) is managed by the customer and never sent to Amazon. Data is encrypted on the client side, so unencrypted data is never sent to Amazon. Maximum protection, but it is managed exclusively by the customer.

The AWS SDKs, or at least the Java one, provide functions for easy access to envelope encryption when using the client-side options.

Jun 032016

Upgraded Ansible to version 2.1 on OS X El Capitan. First run, I get this error:
AttributeError: 'EntryPoint' object has no attribute 'resolve'

Googling for it, it seems like the cause is setuptools (!?) version that is too old. I did install Ansible by running pip2 install --upgrade --user ansible, which installed it in my home directory and also upgraded the setuptools package in my home directory, but that’s not the version that Python is picking up.

I’m using system’s Python 2.7, I didn’t install another one. And the system Python is looking for modules using a path that starts with /System/Library/Frameworks/Python.framework/Versions/2.7, so it’s going to pick up the system setuptools instead of the one in my home directory. This package can’t be upgraded because it’s protected by SIP and disabling that and updating the package might cause the system to misbehave.

One solution is to export PYTHONPATH in my environment. Problem is, I’m using both Python2 and Python3 and that variable applies to both. Setting PYTHONPATH to point to Python2’s modules would likely cause a bigger mess.

Better solution:
Create a file under the site-packages directory in my home folder ($HOME/Library/Python/2.7/lib/python/site-packages) that has a .pth extension. It’s going to be picked up before anything else runs and lines starting with import are going to be executed. This file will contain code that will insert our directory before everything else in sys.path, so it will be searched first:
import sys; sys.path = ["/Users/USERNAME/Library/Python/2.7/lib/python/site-packages"] + sys.path

Here’s a one-liner that will create the file:

Apr 282016

Emrer is a Python script that reads a YAML file and starts an EMR cluster as specified in that file.

The main advantage over other EMR automation solutions is that it will take care of uploading the bootstrap/step scripts to S3, so everything can be stored locally, both the cluster config and the scripts it’s going to execute. Which basically means that a cluster created with this script can be stored in a versioning system like Git and basically treated as code all the way.

The configuration file is YAML, easier to read and understand than JSON. The example configuration is commented.

It’s not using CloudFormation at all, when this script was initially written CloudFormation didn’t yet know how to create EMR clusters. At the time I didn’t find anything else that could do it out there either.

It could be enhanced with a kind of “plugin” system where custom procedures are executed when certain things are set up. For example, a procedure that would add required security groups to the list if they are missing from the configuration file, making sure that the cluster is compliant with company regulations.

Sep 182015

The following will create a dump of raw network packets to a file, while continuously reading that file and displaying the packets on screen in human-readable format:


  • /bin/sh -c "tcpdump -i any -w /tmp/dumpfile.cap host &" : run tcpdump in the background, dumping raw packets to /tmp/dumpfile.cap
  • sleep 1 : wait a second for the file to be created and the header to be written to it. without waiting, you’ll probably get “bad dump file format”
  • tail -n 1000 -f /tmp/dumpfile.cap : tail the dump file. The point of -n is to get the whole file, from the start, including the header. Avoids “bad dump file format” error
  • tcpdump -r - : reads from stdin, which is actually the contents of /tmp/dumpfile.cap and displays to stdout in human-readable format.

IMPORTANT: Interrupting with CTRL+C will NOT kill the backgrounded tcpdump. Don’t forget to do that too if it’s not limited somehow, otherwise it will fill up the disk.

Feb 062015

A bucket policy that will deny access to anyone not coming from the specified IP addresses. Used in combination with IAM groups that allow access to S3, the net result will be that users will be allowed the access given to the group they belong to, but only if they are coming from one of the IP address specified in this policy, which is going to be attached to the bucket. Using an “Allow” policy, like in Amazon’s example, would allow anyone coming from those IPs full access, effectively defeating the purpose of group-based policies.
Continue reading »

Feb 062015

AWS permissions intended for a group containing users that will monitor the environment, but should not have access to data and are not allowed to make any changes. Should allow members to check the health of services or run periodic reviews. Basically a modified version of Amazon’s Read-Only policy template. In order to cut access to potentially dangerous information, some access was removed:

    • DynamoDB and Kinesis:Get* because those would reveal data
      ElasticBeanstalk and Opsworks because the information there is potentially dangerous
      S3 objects, but it does give permissions to access S3 bucket policy
  • Continue reading »

    Jan 032015

    In Amazon Web Services it’s possible to enable logging for the whole VPC (CloudTrail) or for various services provided by Amazon, like S3. These logs get stored in S3 buckets. They are generated every few minutes in the case of CloudTrail, or every few seconds in the case of S3. CloudTrail logs are somewhat easier to manage because they are grouped by day, but S3 logs are all stored in one “folder”, so after some time there can be tens of thousands of files making it hard to find something unless you know exactly what you’re looking for.

    Mostly a learning project, this script provides functions to “collapse” many files into one. Written in Python, tested on FreeBSD and Linux, it uses Amazon’s Python SDK, boto. Obviously, boto must be installed and configured with the proper credentials. It downloads the files for a certain period of time, concatenates them into one file, uploads the new file to S3, then deletes the concatenated files from S3 and local. The basic idea would be to get all logs for a certain day/hour into one file, making it easier to find something in those logs later. Might need some code adapting.

    The code is on Github, this page might not have the latest version.

    Continue reading »