Multi-Region deployments with Terraform
In my previous post, Deploying Multiple Environments with Terraform, I described how I used Terraform to deploy to multiple environments within a single project. Since then, new requirements were assigned to my project and my team needed to implement multi-region deployments. Being multi-region is the same concept as having data centers in multiple parts of the country or world; you’re safeguarding yourself against disaster.
This didn’t come as a big surprise to the team since anyone who wants to build a resilient application and implement disaster recovery should always build in more than one region. Additionally, being multi-region had the added benefit of being able to do global load balancing, meaning if you are running in us-east-1 and us-west-2 then users will hit the instance of the application that is geographically closest to them.
With some design work we realized multi-region deployment could easily be implemented since we’d made the code extensible and kept the configuration separate. It turned out that we could even implement multi-region without introducing a breaking change!
Extending the design
The first thing added was a new exposed input called region. Previously this input was not exposed to the end user and was hardcoded in the configuration. This input was going to specify which AWS region the Kubernetes cluster would be deployed to. Instead of abstracting away the actual region names and just accepting simple values such as east or west, we opted to have the literal region names be the values; us-east-1 or us-west-2. This was a design decision that took a fair amount of debate, but ultimately it came down to the users knowing exactly where the cluster was being built.
The larger change was to the variable configuration. Originally all of the variables were maps where each key was the environment name; dev, qa, etc. This worked well for single region deployment, but for multi-region many of the values were different from region to region. The easy fix was to extend the list of known workspaces, but that was not a scalable option. Instead, the decision was made to convert all of the maps that had different values based on region to nested maps. This meant that the region name was the top level and under that were the different environments with their respective values.
With the change to the variable structure, changes also had to be made to the inputs to the variables module, as well as any of the variable lookups in the code that referenced a nested map. The former was simple, add a region input to the module and pass it the region variable. The latter was more in depth though. Not only did it take some time to find all of the lookups in the code that needed to be modified, but figuring out how to do the lookup properly took some trial and error.
Keeping it backwards compatible
Previous versions of the code did not support multi-region deployments, they only knew about the default east region, and had lookups configured for a now outdated variables design. Everything here points to this being a breaking change in the code base. Luckily, since the configuration is completely separate form the code, and we always had the region variable configured, just not exposed, we were able to keep this change backwards compatible.
The implementation here set the region variable by default to us-east-1, keeping all previous versions of the code working. When we exposed the region variable to the user, that value could now be overwritten allowing for the newly available region-specific parameters to be found and used in the provisioning.
Changes in the code
Even though this wasn’t so much a code change as a configuration change, I think it is still valuable to show some of the before and after examples. Below are some code snippets showing what some of the refactors looked like:
Pre multi-region support variable:
variable “worker_elb_sg_map” {
description = “A map from environment to a comma-delimited list of build worker ELB security groups”
type = “map”
default = {
dev = “sg-9f59278yreuhifbf,sg-be2t43erfce,sg-434fedf2b”
qa = “sg-e945ygrthdrg,sg-e55tgr54hd,sg-7d34trfwe7”
staging = “sg-255yg45hedr5,sg-6234tth6,sg-9834tfery4e5t”
training = “sg-255yerd6h,sg-625t5rqrgy5,sg-98gr54w5g”
prod = “sg-4c5y65re5,sg-3b35tg4wg,sg-3e3tgrtw4y6”
}
}
output “worker_elb_security_groups” {
value = [“${split(“,”, var.worker_elb_sg_map[var.environment])}”]
}
Post multi-region support variable:
variable “worker_elb_sg_map” {
description = “A map from environment to a comma-delimited list of build worker ELB security groups”
type = “map”
default = {
us-east-1 = {
dev = “sg-9fhjsdf76ef,sg-bksdajfhiece,sg-4487heff0b”
qa = “sg-e29834hfisb99,sg-e398hfu95,sg-7d398hdsaf7”
staging = “sg-239uhibwf942,sg-983huh939,sg-99834hh94f”
training = “sg-250ba552,sg-62b39719,sg-9983h9384hf”
prod = “sg-4c98hf93uc,sg-3938fh3hb,sg-3e083eiuf”
}
us-west-2 = {
qa = “sg-6f390f15,sg-1b645761,sg-93484j80e9”
prod = “sg-0c9384hf973hf,sg-ad2983hudh37,sg-4283h93498j”
}
}
}
output “worker_elb_security_groups” {
value = [“${split(“,”, “${lookup(var.worker_elb_sg_map[var.region], var.environment)}”)}”]
}
Pre multi-region support variables module:
module “variables” {
source = “git::https://////variables”
environment = “${local.environment}”
size = “${local.size}”
}
Post multi-region support variables module:
module “variables” {
source = “git::https://////variables”
environment = “${local.environment}”
size = “${local.size}”
region = “${var.region}”
}
The outcome
To manage expectations our first implementation of supporting multi-region deployments was meant to be active/passive; which means that only one region will be receiving traffic and the other is strictly a failover in case of disaster.
After implementing these changes and making some significant additions to the way our build pipeline functions and verifies our code base, we have been able to reliably deploy to both us-east-1 and us-west-2 for almost four months. Since the difference in deploying to different regions and different environments is nothing more than a few input variables, maintaining everything has taken little to no additional overhead.
I’m very happy with the way things have turned out for my team so far but we are not even close to finished. The next iterations will focus on automated failover and eventually going to an active/active implementation. I’m excited to see where we go next in our development journey.
Related: