Feb 152017
 

This post is mainly about SSE – Server Side Encryption. It was hard for me to understand, got quite confused along the way. The fact that the documentation on it is somewhat spread over several services didn’t help either. Trying to put it in simpler terms here.

In order to understand how Server Side Encryption works in AWS, first we should understand the envelope encryption process.

I. Envelope encryption.
Assume we want to encrypt several files, but “files” can mean any kind of data. The process goes through a few stages.

First, an encryption key must be created, to be used as master_key. Amazon also refer to this as a “customer master key”, or CMK. Once the master key is in place, the encryption/decryption follows the following steps:

Encryption:

  1. Create another encryption key, called a data_key. This is the key used to encrypt the file.
  2. Encrypt file with the data_key, resulting in encrypted_file.
  3. Encrypt the data_key with the master_key, resulting in an encrypted_data_key
  4. Discard the data_key. This is important. After this step we’re left with the master_key, the encrypted_file and the encrypted_data_key. The only unencrypted bit is the master key, so in order to decrypt, access to the master key is required.
  5. Store encrypted_file together with encrypted_data_key

Repeat steps 1 to 5 for each file.

Decryption:

  1. Retrieve encrypted_file and encrypted_data_key.
  2. Decrypt encrypted_data_key with master_key, resulting in data_key.
  3. Decrypt encrypted_file with data_key, resulting in file.
  4. Discard data_key.

II. SSE-C vs SSE-KMS vs SSE-S3
Amazon offer three types of Server Side Encryption (SSE), SSE-C, SSE-KMS and SSE-S3. The encryption of data is done by Amazon in all cases, the main difference is in where the encryption key comes from. All SSE encryption will send unencrypted data to Amazon for encryption and assumes Amazon is trusted in this regard.

SSE-C, stands for Server Side Encryption with Client-Provided Keys:

  • The client provides the key in the encryption request. It doesn’t involve envelope encryption, the provided encryption key is used directly. Similarily, the key will have to be provided in order to decrypt.
  • AWS don’t store the key, only randomly salted HMAC value of the key in order to validate future requests.
  • Provides the biggest amount of control, but it requires highest effort.
  • Cloud HSM can be used to manage encryption keys.

SSE-KMS, stands for Server-Side Encryption with AWS KMS-Managed Keys:

  • Goes through the envelope encryption process.
  • Encryption key is managed by KMS.
  • Multiple master keys can be used and stored in KMS.
  • Access to a KMS key is subject to IAM policies and can be used as an additional layer of control. For example, in order to read the data contained in an object stored in an S3 bucket using KMS encryption, the user would have to (1) have access to the object and (2) have access to the KMS key in order to decrypt the object contents.
  • Master keys never leave KMS, they are never seen. So, for example, there is no need to rotate keys when an employee leaves the business, since there is no way for the employee to get the key.
  • All access to keys is logged in CloudTrail, so an audit log can be provided.
  • It’s a compromise between level of control and ease of use.

SSE-S3, stands for Server-Side Encryption with Amazon S3-Managed Keys:

  • Goes through the envelope encryption process.
  • Master key is managed by Amazon’s S3 service, with no involvement from the client.
  • Data is encrypted and decrypted transparently for users that are procvided access to that data via IAM policies.
  • Easiest to use, but the least amount of control.

III. Client side encryption
Detailed in the AWS docs, there’s basically two cases.

  1. The encryption key is stored in KMS. Easier to manage, but it also means that Amazon have access to the key in theory, hence they have access to the data, even though it was encrypted on the client side. Arguably, the only advantage over SSE-KMS is that data is sent encrypted over the wire.
  2. Customer provided key. The key (client-side key) is managed by the customer and never sent to Amazon. Data is encrypted on the client side, so unencrypted data is never sent to Amazon. Maximum protection, but it is managed exclusively by the customer.

The AWS SDKs, or at least the Java one, provide functions for easy access to envelope encryption when using the client-side options.

Apr 282016
 

Emrer is a Python script that reads a YAML file and starts an EMR cluster as specified in that file.

The main advantage over other EMR automation solutions is that it will take care of uploading the bootstrap/step scripts to S3, so everything can be stored locally, both the cluster config and the scripts it’s going to execute. Which basically means that a cluster created with this script can be stored in a versioning system like Git and basically treated as code all the way.

The configuration file is YAML, easier to read and understand than JSON. The example configuration is commented.

It’s not using CloudFormation at all, when this script was initially written CloudFormation didn’t yet know how to create EMR clusters. At the time I didn’t find anything else that could do it out there either.

It could be enhanced with a kind of “plugin” system where custom procedures are executed when certain things are set up. For example, a procedure that would add required security groups to the list if they are missing from the configuration file, making sure that the cluster is compliant with company regulations.

Feb 062015
 

A bucket policy that will deny access to anyone not coming from the specified IP addresses. Used in combination with IAM groups that allow access to S3, the net result will be that users will be allowed the access given to the group they belong to, but only if they are coming from one of the IP address specified in this policy, which is going to be attached to the bucket. Using an “Allow” policy, like in Amazon’s example, would allow anyone coming from those IPs full access, effectively defeating the purpose of group-based policies.
Continue reading »

Jan 032015
 

In Amazon Web Services it’s possible to enable logging for the whole VPC (CloudTrail) or for various services provided by Amazon, like S3. These logs get stored in S3 buckets. They are generated every few minutes in the case of CloudTrail, or every few seconds in the case of S3. CloudTrail logs are somewhat easier to manage because they are grouped by day, but S3 logs are all stored in one “folder”, so after some time there can be tens of thousands of files making it hard to find something unless you know exactly what you’re looking for.

Mostly a learning project, this script provides functions to “collapse” many files into one. Written in Python, tested on FreeBSD and Linux, it uses Amazon’s Python SDK, boto. Obviously, boto must be installed and configured with the proper credentials. It downloads the files for a certain period of time, concatenates them into one file, uploads the new file to S3, then deletes the concatenated files from S3 and local. The basic idea would be to get all logs for a certain day/hour into one file, making it easier to find something in those logs later. Might need some code adapting.

The code is on Github, this page might not have the latest version.

Continue reading »

Aug 262014
 

Short demo script in Python that monitors the VPN tunnels in Amazon Web Services. It queries the current state every 1.5 seconds in a loop and if the state changes it writes the new state to a log file. Needs the boto library.