Building feature toggles into Terraform
As you know from my two previous posts, Deploying Multiple Environments with Terraform and Multi-Region Deployments with Terraform, the ultimate goal of our project is to be able to deploy a Kubernetes cluster using Terraform to any environment and/or region with a single code base. So far in this series, we have successfully solved for deploying to multiple environments and multiple regions. As our journey continues, we will encounter a more complex challenge; not all environments require all of the same features.
Enter feature toggles …
What requires a toggle?
Traditionally feature toggles have been used in software development for toggling software features on and off. Since we are deploying a Kubernetes cluster and running a platform for our users our feature toggles have a slightly different use case. We maintain a set of Helm charts that are deployed with each Kubernetes cluster. However, not all of these Helm charts will be required in every environment we support. So, based on where we are deploying Kubernetes, we want the ability to either deploy, or not deploy, a given Helm chart.
In addition to the Helm charts, we have some additional infrastructure (it’s more of a special use case) that needs to be deployed to a few select environments. Since this infrastructure is not required for the majority of the Kubernetes clusters we deploy, we want the ability to either deploy, or not deploy, the given infrastructure.
The Terraform count parameter
As you might have guessed from the title, Terraform as of 0.11.x does not ship with the ability to natively write a toggle around a particular resource or group of resources. Terraform does ship with a count parameter which can be used on all resources. The count parameter tells Terraform how many of a resource to build; yes, zero is accepted, in which case it won’t build anything. As a first pass you might decide to just add the count parameter and flip the value between 0and 1 based on what is needed.
This will not create the S3 bucket object:
resource “aws_s3_bucket_object” “some_s3_objct” {
count = 0
bucket = “${aws_s3_bucket.bucket.id}”
key = “some_s3_object_path/some_s3_object”
content = “This is an S3 object”
server_side_encryption = “AES256”
}
This will create the S3 bucket object:
resource “aws_s3_bucket_object” “some_s3_objct” {
count = 1
bucket = “${aws_s3_bucket.bucket.id}”
key = “some_s3_object_path/some_s3_object”
content = “This is an S3 object”
server_side_encryption = “AES256”
}
By simply adding the count parameter we can get the behavior we want, but we don’t get the functionality. In order for this pattern to work, changes would need to be made to the code every time the existence of a resource changed. Thankfully the count parameter allows interpolations, which opens up a few more robust options for deciding whether or not a resource should be created.
Interpolating the count parameter
By interpolating the count parameter you could potentially link the value to the value of some other variable in your code base; environment for example. This would allow you to dynamically decide whether or not a resource is built based on which environment you are building in. The caveat here is that the interpolation is a little more complex than the normal string interpolation and doesn’t lend itself well to multiple use cases. In order to make this pattern work you must implement a ternary to do the decisioning.
This will create the S3 bucket object if var.environment equals some_env:
resource “aws_s3_bucket_object” “some_s3_objct” {
count = “${var.environment == “some_env” ? 1 : 0}”
bucket = “${aws_s3_bucket.bucket.id}”
key = “some_s3_object_path/some_s3_object”
content = “This is an S3 object”
server_side_encryption = “AES256”
}
This pattern seems to give us both the behavior and functionality we want by allowing us to dynamically create or not create the S3 bucket object without changing the code — but only if the logic isn’t too complex. For example, if there was a need to create some_s3_object in multiple environments, or if more than one variable had a particular value, than this ternary could still do the job, but not cleanly.
Building the toggle
Now that we have seen the limitations faced by simpler implementations, we can focus on building out a complete feature toggle that is managed externally by inputs. The count parameter and ternary will still be in use, but we are adding an input variable and an optional output. The input variable is essentially just an “enable” variable that takes a “true” or “false” string. The output is an optional resource that gives the user information if something has or has not been created. We can keep the code mostly the same for the individual resource, but we do need to add some additional code around it.
variable “create_s3_object” {
description = “Whether or not to create the S3 object”
value = “true”
}
output “s3_object_created” {
value = ${var.create_s3_object == “true” ? “S3 object created” : “S3 object not created”}
}
resource “aws_s3_bucket_object” “some_s3_objct” {
count = “${var.create_s3_object == “true” ? 1 : 0}”
bucket = “${aws_s3_bucket.bucket.id}”
key = “some_s3_object_path/some_s3_object”
content = “This is an S3 object”
server_side_encryption = “AES256”
}
The above code block now has the two things necessary to create a feature toggle for a resource as well as an option output to give the user some context around what was deployed. The input variable does ship with a default, in this case “true” but it can be set to “false” if needed.
Using the toggle
Now that we have built a proper feature toggle for Terraform, we can utilize this new functionality as part of our deployments in any environment. Even though the input variable has a default value we can easily override it as needed without having to change the underlying code. We can pass the input variable and it’s value to Terraform either at the command line directly, or by using an auto.tfvars file.
Depending on your use case either could be preferred, the important thing is that this pattern fits nicely into both local deployments, as well as automated deployments through a CI/CD tool.
The below example illustrates how to pass an input variable to Terraform at the command line. With our simple example this seems to be reasonable; however as the number of toggles increases, the number of -var arguments also increases.
terraform apply -var create_s3_object=true
Creating an auto.tfvars file is incredibly simple. You can name it whatever you want ( in this example toggles.auto.tfvars) and Terraform will automatically read it in at run time. The content is also very straightforward since it is just a list of key/value pairs where the key is the variable name and the value is obviously the value you want to pass.
# toggles.auto.tfvars
create_s3_object = false
The verdict
Using this patter my team is currently deploying 31 separate feature toggles to our Kubernetes cluster across nine environments and two regions. Individual platform developers have the ability to toggle features on and off as needed for local development and our CI/CD makes full use of a rendered auto.tfvars file to manage the toggles for deployments.
This is not the end of our journey though; simply the first stop. Future development will implement dependent toggles which will allow us to create a dependency tree for all of the capabilities our platform offers as well as the ability to toggle a feature off that was previously toggled on.
Related: