A practical look at basic AWS Networking with Terraform

Hey,

I noticed that recently a bunch of people has been getting to the website through queries about HAProxy in the context of AWS - mainly how to receive traffic through a set of instances and forward them to another set of machines.

As I aim at making this blog as much practical as I can, I feel that if I taught the basics around AWS networking first, then I could have the ground set for further discussions about either HAProxy, NGINX or other options out there.

The client point of view
The components
Creating an AWS VPC using Terraform
Creating an Internet Gateway using Terraform
Creating AWS VPC Subnets
Refactoring - making networking a Terraform module
Accessing generated subnet IDs via maps using Terraform
Creating EC2 Instances in AWS Subnets
Accessing an AWS EC2 Instance - Setting the security groups
Closing thoughts

The client point of view

In the point of view of the client, be it AWS or anywhere else, the lifecycle is the same:

it makes requests to an endpoint (under a given domain that resolves to an assigned IP);
somehow someone processes its request;
it receives a response back.

Client and server communication over the network with hidden application logic

That endpoint that takes the request from the client and then calls the application magic behind the scenes could either be a Content Delivery Network (CDN) point of presence (PoP) that would forward the request to another region or be a machine that we own.

For simplicity, let’s assume it’s one of our machines.

The components

A very common scenario where many AWS networking concepts can be exercised is the following:

Very simplified AWS network infrastructure

three machines are meant not to be touched directly by other applications on the internet;
each of those machines (srv1, srv2 and srv3) have application services (maybe just the same application but spread in three machines) which are meant to be accessed publicly via a well-defined URL;
lb (our load-balancer machine) has a public interface that is meant to receive traffic from the internet and another interface that can reach the machines in the internal network - this is the machine that should be touched when traffic comes to api.cirocosta.com.

To have these machines communicating with each other, they’re put into a common virtual network that we can define - in AWS terms, this big network is the AWS VPC (Virtual Private Cloud).

These virtual networks (you can have more than your if you wish) are totally separate from each other (so you can have many networks using the same IP address range, for instance) and can be created in a minute.

Another property of them is that they naturally span across multiple availability zones in a given region.

AWS VPC spanning across multiple availability zones

When it comes to provisioning EC2 instances, only having a VPC is not enough. From that big range of IPs that we allocated for our network, it’s needed that the user configures subnets too.

By separating the VPC in subnets we’re able to partition that network across the availability zones and configure further properties, including whether it should:

be totally private (i.e, have no default internet gateway assigned to it);
assign public IPs (even though ephemeral) to the instances when they get into the subnet;
assign an IPv6 address to the instance.

I usually think of subnets as a further specification of details about the network that I want to put instances to.

In the end, we usually have something like this:

AWS architecture composed of a load-balancer and a set of servers in a private network

With the ground set, let’s terraform it!

Creating an AWS VPC using Terraform

The first resource I start creating is aws_vpc:

# The VPC that spans across multiple availability zones.
#
# Given the CIDR 10.0.0.0/16, we can have IPs from 10.0.0.1
# up to 10.0.255.254. Essentially we can host 65k IPs in
# that range. 
resource "aws_vpc" "main" {
  cidr_block = "10.0.0.0/16"
}

The VPC by itself gives us nothing more than that cidr block that we specified. Creating this resource (terraform apply) we can see it in describe-vpcs:

aws ec2 describe-vpcs \
        --profile=$PROFILE

{
    "Vpcs": [
        {
            "CidrBlock": "10.0.0.0/16",
            "CidrBlockAssociationSet": [
                {
                    "AssociationId": "vpc-cidr-assoc-896491e1",
                    "CidrBlock": "10.0.0.0/16",
                    "CidrBlockState": {
                        "State": "associated"
                    }
                }
            ],
            "DhcpOptionsId": "dopt-a20007c7",
            "InstanceTenancy": "default",
            "IsDefault": false,
            "State": "available",
            "VpcId": "vpc-76d00d11"
        }
    ]
}

Something I didn’t mention before is that there’s an extra resource that gets attached to the VPC when you create it: a routing table.

aws ec2 describe-route-tables \
        --profile=$PROFILE

{
    "RouteTables": [
        {
            "Associations": [
                {
                    "Main": true,
                    "RouteTableAssociationId": "rtbassoc-0f885c69",
                    "RouteTableId": "rtb-852df2e2"
                }
            ],
            "PropagatingVgws": [],
            "RouteTableId": "rtb-852df2e2",
            "Routes": [
                {
                    "DestinationCidrBlock": "10.0.0.0/16",
                    "GatewayId": "local",
                    "Origin": "CreateRouteTable",
                    "State": "active"
                }
            ],
            "Tags": [],
            "VpcId": "vpc-76d00d11"
        }
    ]
}

Each VPC created carries with it an implicit router, coming with a main route table that we can modify. By default, it only comes with a route that determines where traffic should go when packets are sent to internal instances.

This might look like not much, but it’s enough to allow us to create subnets, add instances to them and have these instances communicating with each other. Nevertheless, if we want to download software from the internet to update our instances, we must be able to get traffic sent out to the internet as well.

To do so, an internet gateway is required.

The internet gateway is a piece of infrastructure that can be seen as a router that will take packets from instances inside the network and forward them to the outside (as well as receive back packets from the established connection).

Request going from an EC2 instance inside a VPC to the internet passing by the internet gateway

The instances that are created within subnets within this VPC use such component as a target in its internal route table and then when it establishes a connection to an outside resource, the gateway takes care of setting up that connection.

Fortunately, we don’t need to take care of this component - you only attach it to your VPC and then AWS takes care of scaling, making it redundant and highly available to all the instances within the VPC.

Let’s now create this internet gateway and add the route to our main route using Terraform then.

Creating an Internet Gateway using Terraform

To grant internet access to the VPC we make use of the aws_internet_gateway resource and aws_route:

# Internet gateway to give our VPC access to the outside world
resource "aws_internet_gateway" "default" {
  vpc_id = "${aws_vpc.default.id}"
}

# Grant the VPC internet access by creating a very generic
# destination CIDR ("catch all" - the least specific possible) 
# such that we route traffic to outside as a last resource for 
# any route that the table doesn't know about.
resource "aws_route" "internet_access" {
  route_table_id         = "${aws_vpc.main.main_route_table_id}"
  destination_cidr_block = "0.0.0.0/0"
  gateway_id             = "${aws_internet_gateway.main.id}"
}

Once the creation finishes, check out the route table:

aws ec2 describe-route-tables \
        --profile=$PROFILE

 {
     "RouteTables": [
         {
             "Associations": [
                 {
                     "Main": true,
                     "RouteTableAssociationId": "rtbassoc-0f885c69",
                     "RouteTableId": "rtb-852df2e2"
                 }
             ],
             "PropagatingVgws": [],
             "RouteTableId": "rtb-852df2e2",
             "Routes": [
                 {
                     "DestinationCidrBlock": "10.0.0.0/16",
                     "GatewayId": "local",
                     "Origin": "CreateRouteTable",
                     "State": "active"
-                }
+                },
+                {
+                    "DestinationCidrBlock": "0.0.0.0/0",
+                    "GatewayId": "igw-caa366ae",
+                    "Origin": "CreateRoute",
+                    "State": "active"
+                }
             ],
             "Tags": [],
             "VpcId": "vpc-76d00d11"
         }
     ]
 }

It’s there!

Creating AWS VPC Subnets

Differently from a VPC, each subnet is assigned to a specific availability zone, but each availability zone can have more than one subnet (belonging to the same VPC) though.

This means that if we want to cover two availability zones with instances, we must create two subnets at least.

With Terraform, I see that the most elegant way of doing so is by using the aws_subnet resource paired with a list of maps:

# Creates N subnets according to the subnet mapping described in
# the `az-subnet-mapping` variable.
#
# The variable is a list of maps in the following form:
#
#   [  { name: "crazydog", az: "name-of-the-az", cidr: "cidr-range" } , ... ]
#
# For instance:
#
#   [ { name =  "sub1", az = "us-east-1a", cidr = "192.168.0.0/24"  } ]
#
resource "aws_subnet" "main" {
  count = "${length(var.az-subnet-mapping)}"

  cidr_block              = "${lookup(var.az-subnet-mapping[count.index], "cidr")}"
  vpc_id                  = "${aws_vpc.main.id}"
  map_public_ip_on_launch = true
  availability_zone       = "${lookup(var.az-subnet-mapping[count.index], "az")}"

  tags = {
    Name = "${lookup(var.az-subnet-mapping[count.index], "name")}"
  }
}

This way we’re able to be very explicit about which subnets we want in each region and what CIDR they take (we can even vary the size of these subnets without relying on some weird CIDR math).

To make use of it, just declare the variable:


variable "az-subnet-mapping" {
  type        = "list"
  description = "Lists the subnets to be created in their respective AZ."

  default = [
    {
      name = "subnet1"
      az   = "sa-east-1a"
      cidr = "10.0.0.0/24"
    },
    {
      name = "subnet1"
      az   = "sa-east-1c"
      cidr = "10.0.1.0/24"
    },
  ]
}

and then, after applying it, see what subnets to we have around:

aws ec2 describe-subnets \
        --profile=beld

{
    "Subnets": [
        {
            "AssignIpv6AddressOnCreation": false,
            "AvailabilityZone": "sa-east-1a",
            "AvailableIpAddressCount": 251,
            "CidrBlock": "10.0.0.0/24",
            "DefaultForAz": false,
            "Ipv6CidrBlockAssociationSet": [],
            "MapPublicIpOnLaunch": true,
            "State": "available",
            "SubnetId": "subnet-936734f4",
            "VpcId": "vpc-76d00d11"
        },
        {
            "AssignIpv6AddressOnCreation": false,
            "AvailabilityZone": "sa-east-1c",
            "AvailableIpAddressCount": 251,
            "CidrBlock": "10.0.1.0/24",
            "DefaultForAz": false,
            "Ipv6CidrBlockAssociationSet": [],
            "MapPublicIpOnLaunch": true,
            "State": "available",
            "SubnetId": "subnet-b20122ea",
            "VpcId": "vpc-76d00d11"
        }
    ]
}

There we go! With the subnets set, we can start creating our instances. Nevertheless, we can take some time to refactor our code a bit.

Refactoring - making networking a Terraform module

So far all the code has been thrown out to a main.tf.

That’s perfect if you’re dealing with just a pair or two of resources. However, when the infrastructure starts growing, it becomes harder and harder to maintain.

Here we face one of those situations. We’re passing our subnet definitions via a list of maps, but after the subnets get created, it becomes hard for us to retrieve the subnet IDs without passing this complexity to the resource that will need to pick that ID.

One way of making that whole process easier is to abstract the whole networking set up into a Terraform module.

To do so, I started by picking all those resources related to networking and put them in a file in a different directory called networking. The file structure looks like this:

.
├── keys                # contains pub and private keys (for this example)
│   ├── key.rsa 
│   └── key.rsa.pub
├── keys.tf             # manages `key` resource
├── main.tf             # top-level tf file - configures providers and vars
├── networking          # networking moduel - sets the whole networking
│   │ 
│   ├── inputs.tf       # declares the variables taken as input
│   ├── main.tf         # main resource creation (vpc, igw, routes, subnets ...)
│   └── outputs.tf      # results from the execution of the module
│     
└── terraform.tfstate

The rationale behind setting this kind of structure is that we can think of the Terraform modules as functions that take some inputs, creates some resources and then produces some outputs.

Accessing generated subnet IDs via maps using Terraform

Once the subnets have been generated, we’ll have their IDs which we can then use to attach instances to (or set as the based subnet to have an autoscaling group placing instances to).

The problem is that without doing some interpolation trick, we can’t access those IDs by name.

Once aws_subnet is executed with the count set, we end up with a plain list of aws_subnet resources.

Terraform networking resource transformation using a custom module

Given that at this point we already separated our networking configuration from the rest, at output.tf we can make use of zipmap to have access to those subnet IDs using plain key-value lookup:

# Creates a mapping between subnet name and generated subnet ID.
#
# Given an `aws_subnet` resource created with `count`, we can access
# properties from the list of resources created as a list by using
# the wildcard syntax:
# 
#     <resource>.*.<property>
#
# By making use of `zipmap` we can take two lists and create a map
# that uses the values from the first list as keys and the values
# from the seconds list as values.
#
# Example output:
#
#     az-subnet-id-mapping = {
#       subnet1 = subnet-3b92c15c
#       subnet2 = subnet-a90023f1
#     }
#
output "az-subnet-id-mapping" {
  value = "${zipmap(aws_subnet.main.*.tags.Name, aws_subnet.main.*.id)}"
}

Having easy access to the subnet IDs, it’s time to create the instances.

Creating EC2 Instances in AWS Subnets

As we need an AMI to create the instances, we start by picking the latest one released by Canonical.

# Pick the latest ubuntu artful (17.10) ami released by the
# Canonical team.
data "aws_ami" "ubuntu" {
  most_recent = true

  filter {
    name   = "name"
    values = ["ubuntu/images/hvm-ssd/ubuntu-artful-17.10-amd64-server-*"]
  }

  filter {
    name   = "virtualization-type"
    values = ["hvm"]
  }

  # 099720109477 is the id of the Canonical account
  # that releases the official AMIs.
  owners = ["099720109477"]
}

Usually, you’d pick an AMI that you prepared yourself already containing the software needed to run your application (maybe it’s just docker and some agent to send metrics / whatever), but for testing, picking a simple base Ubuntu image suffices.

With the AMI in hands, let’s wrap the whole thing with all that’s left: making use of the networking module that we specified, set the desired subnets and specify some instances:

variable "profile" {
  description = "Profile with permissions to provision the AWS resources."
}

variable "region" {
  description = "Region to provision the resources into."
  default     = "sa-east-1"
}

provider "aws" {
  region  = "${var.region}"
  profile = "${var.profile}"
}

# By specifying the submodule here we'll have
# as an end result 2 subnets created in those
# two availability zones.
#
# Given that we defined the module's output, 
# wherever we interpolate with that output,
# terraform will establish the dependency and
# make sure that the subnets have been properly
# created.
module "networking" {
  source = "./networking"
  cidr   = "10.0.0.0/16"

  "az-subnet-mapping" = [
    {
      name = "subnet1"
      az   = "sa-east-1a"
      cidr = "10.0.0.0/24"
    },
    {
      name = "subnet2"
      az   = "sa-east-1c"
      cidr = "10.0.1.0/24"
    },
  ]
}

# Create the first instance in `subnet1`.
#
# Here we get the subnet ID to put this instance
# by performing a simple key-value lookup over
# the map that we created in the output of the
# networking module.
resource "aws_instance" "inst1" {
  instance_type = "t2.micro"
  ami           = "${data.aws_ami.ubuntu.id}"
  key_name      = "${aws_key_pair.main.id}"
  subnet_id     = "${module.networking.az-subnet-id-mapping["subnet1"]}"
}

resource "aws_instance" "inst2" {
  instance_type = "t2.micro"
  ami           = "${data.aws_ami.ubuntu.id}"
  key_name      = "${aws_key_pair.main.id}"
  subnet_id     = "${module.networking.az-subnet-id-mapping["subnet2"]}"
}

Shoot terraform apply, and you’re good to go!

aws ec2 describe-instances \
        --profile=$PROFILE

{
    "Reservations": [
        {
            "Instances": [
                {
                    "InstanceId": "i-0394b9700680faf8b",
                    "InstanceType": "t2.micro",
                    "Placement": {
                        "AvailabilityZone": "sa-east-1a"
                    },
                    "PublicIpAddress": "18.231.174.76",
                    "PrivateIpAddress": "10.0.0.191",
                    "VpcId": "vpc-4ff52828"
                }
            ]
        },
        {
            "Instances": [
                {
                    "InstanceId": "i-04714fbbd5df26497",
                    "InstanceType": "t2.micro",
                    "Placement": {
                        "AvailabilityZone": "sa-east-1c"
                    },
                    "PrivateIpAddress": "10.0.1.140",
                    "PublicIpAddress": "18.231.163.99",
                    "VpcId": "vpc-4ff52828"
                }
            ]
        }
    ]
}

Naturally, we could now try to access the instances via SSH.

Unfortunatelly, that would not work.

Accessing an AWS EC2 Instance - Setting the security groups

Whenever traffic comes to our instances, those packets pass by EC2’s virtual firewall that filters traffic before it gets to our instances.

AWS Security groups filtering traffic from and to the internet

While that’s a good analogy that makes sense in the most traditional way of thinking of a centralized firewall, security groups are a bit different.

They’re tied directly to the instances and can also be used to filter internal traffic - you can use it even to decide if two instances within the same network can communicate or not. Even more specifically, these security groups get tied to the virtual network interfaces of the instances.

A more accurate representation would be one that puts that “AWS Firewall” more tied to the instance:

Egress and Ingress traffic from EC2 instance being filtered by the AWS security group attached o the instance

The way AWS EC2 security group works is:

a security group is created (at this moment, everything is denied by default);
rules are specified and attached to a security group (these rules specify which kind of traffic is allowed - either inbound or outbound);
an instance is raised with a given security group specified.

When we created the instances in the Terraform file above, we didn’t specify a security group. In that case, AWS takes the lead and assigns a default security group to our instances.

Let’s check:

aws ec2 describe-instances \
        --profile=$PROFILE | \
        jq '.[] | .[] | .Instances | .[] | .SecurityGroups | .[]' | \
        jq -s '.'

[
  {
    "GroupName": "default",
    "GroupId": "sg-948f3ef2"
  },
  {
    "GroupName": "default",
    "GroupId": "sg-948f3ef2"
  }
]

So, as we see, both EC2 instances are making use of the default security group.

We can also check what’s defined for this group:

aws ec2 describe-security-groups \
        --group-ids sg-948f3ef2 \
        --profile=$PROFILE

{
    "SecurityGroups": [
        {
            "Description": "default VPC security group",
            "GroupId": "sg-948f3ef2",
            "GroupName": "default",
            "IpPermissions": [
                {
                    "IpProtocol": "-1",
                    "IpRanges": [],
                    "Ipv6Ranges": [],
                    "PrefixListIds": [],
                    "UserIdGroupPairs": [
                        {
                            "GroupId": "sg-948f3ef2",
                            "UserId": "461528445026"
                        }
                    ]
                }
            ],
            "IpPermissionsEgress": [
                {
                    "IpProtocol": "-1",
                    "IpRanges": [
                        {
                            "CidrIp": "0.0.0.0/0"
                        }
                    ],
                    "Ipv6Ranges": [],
                    "PrefixListIds": [],
                    "UserIdGroupPairs": []
                }
            ],
            "OwnerId": "461528445026",
            "VpcId": "vpc-bd4e93da"
        }
    ]
}

This way, we can adapt our Terraform file to fit our needs - have egress all available and the SSH port open.

# Create a security group that will allow us to both
# SSH into the instance as well as access prometheus
# publicly (note.: you'd not do this in prod - otherwise
# you'd have prometheus publicly exposed).
resource "aws_security_group" "allow-ssh-and-egress" {
  name = "main"

  description = "Allows SSH traffic into instances as well as all eggress."
  vpc_id      = "${module.networking.vpc-id}"

  ingress {
    from_port   = 22
    to_port     = 22
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags {
    Name = "allow_ssh-all"
  }
}

and then, with that, reference this security group:

resource "aws_instance" "inst1" {
  instance_type = "t2.micro"
  ami           = "${data.aws_ami.ubuntu.id}"
  key_name      = "${aws_key_pair.main.id}"
  subnet_id     = "${module.networking.az-subnet-id-mapping["subnet1"]}"

  vpc_security_group_ids = [
    "${aws_security_group.allow-ssh-and-egress.id}",
  ]
}

Closing thoughts

With our EC2 instances running, being able to communicate with each other and granting us access to do whatever we want, we’re able to move forward with more interesting things.

To get traffic from the Internet and provide a real service we could either assign an Elastic IP to an instance (or one for each) and then configure our DNS settings or put a network load-balancer in front of them.

I wanted to share this configuration as this is one of the pieces that when I got started with AWS felt quite hard to understand - as there’s a bunch of terminology around it and few gotchas.

All the code cited here can be found in the cirocosta/sample-aws-networking repository.

If you have any questions, make sure to ask me at @cirowrc on Twitter. Maybe I wrote something completely wrong here too! I’d really appreciate if you let me know.

Have a good one!

finis