Optimizing Blue-Green deployment with AWS CodeDeploy
Streamlining deployment processes with AWS CodeDeploy: A closer look at seamless Blue-Green deployment
Updated January 9, 2024
Before we go deep into the trenches of how to use Blue-Green deployment, let's try to understand what it is:
Blue-Green deployment is a deployment pattern intending to deploy a new version of an application/software without any downtime and minimal risk. Blue-Green deployment is achieved by bringing up a similar stack and then deploying the new version of the application on this new stack. Traffic is moved from the current stack (which is called the Blue stack) to the new stack (which is called the Green stack).
Now that we’ve covered what Blue-Green deployment is, let's discuss the key benefits when using Blue-Green deployment.
- No downtime: You are moving the traffic from the Blue stack to the Green stack.
- Easy rollback: If the Green stack isn’t healthy, you can follow the reverse process and move the traffic back to the Blue stack.
- Reduced risk: You can validate the Green stack by running functional tests before you migrate the prod live traffic.
In this blog, we thoroughly explain how to a Blue-Green deployment using AWS CodeDeploy service for ECS container tasks.
When AWS launched ECS in April 2015, there was no out-of-box support for Blue-Green deployment. Engineers tried various options to work around this, including:
- Swapping auto-scaling groups behind ELB.
- Updating the auto-scaling group launch configurations.
- DNS routing update using Route53.
In August 2016, AWS launched an Application Load Balancer (ALB) but was still not supporting ECS Blue-Green deployment. To work around this, our Capital One team implemented a homegrown solution for Blue-Green deployment using a beta ALB for test traffic and then swapped out the ALB listener rules for traffic cutover between Blue-Green tasks.
In Nov 2018, Amazon ECS added official support for Blue-Green deployments using CodeDeploy.
Prerequisites for successful ECS Blue-Green deployment
- The ECS deployment type should be “blue-green.”
- The ECS service must use the Application Load Balancer or the Network Load Balancer. We will be using the ALBr in this blog.
- The ALB should have a listener that will take prod traffic.
- An optional test listener can be added to the load balancer, which is used to route test traffic. CodeDeploy routes your test traffic to the replacement/Green task set during deployment when specifyiing a test listener.
- Two ALB target groups should be created, one for the Blue tasks and another for the Green tasks.
CodeDeploy Blue-Green deployment flow: From initial setup to testing phase
The assumption is that the dev team is already using deployment automation to create an ECS service with ALB and target groups programatically or they should be doing the setup using AWS Console.
The diagram below shows the initial deployment where only Blue tasks are run and take 100% of production traffic.
Now, let’s look at a CodeDeploy based Blue-Green deployment. You’ll notice from the diagram below that Green tasks start (the new version of the code) and are attached to Target Group 2. The ALB test traffic listener is now ready for test traffic on port 8443, test traffic will be sent to Green tasks using Target Group 2. We can add a hook (a lambda function) once test traffic is ready through the test listener. The lambda function can perform some functional testing on the ALB/test listener port 8443 and will return either “succeeded” or “failed.”
Assuming the test traffic lambda hook returned “succeeded,” the production traffic is routed to Target Group 2, which is in turn served by Green tasks (new code version). The ALB prod listener port 443 and test listener port 8443 now point to Target Group 2. CodeDeploy will keep the Blue tasks for a pre-configured period so that a rollback can be possible from the CodeDeploy console or through CLI/API call.
Once the pre-configured period has elapsed, CodeDeploy will terminate the Blue tasks, and after this point, rollback won’t be possible.
Required CodeDeploy resources supporting ECS Blue-Green deployment
Let’s dive deep into the implementation details now. The below AWS CodeDeploy artifacts are required to support the ECS Blue-Green deployment:
- CodeDeploy Application
- CodeDeploy Deployment Group
- CodeDeploy Deployment
So you can either create the above CodeDeploy resources from the AWS CodeDeploy console, or programmatically create them using CLI or language-specific APIs like Python/Boto3. Let’s walk through what that looks like:
Create a CodeDeploy application using Python/Boto3:
cd_client = boto3.client('codedeploy')
response = cd_client.create_application(
applicationName=application_name,
computePlatform='ECS'
)
Create a CodeDeploy deployment group using Python/Boto3:
Create a CodeDeploy deployment group using Python/Boto3:
response = cd_client.create_deployment_group(
applicationName='AppECS-sample-springboot-app-qa-bg',
deploymentGroupName='DgpECS-sample-springboot-app-qa-bg',
deploymentConfigName='CodeDeployDefault.ECSAllAtOnce',
serviceRoleArn='arn:aws:iam::123456789123:role/ecs_service_role',
triggerConfigurations=[
{
'triggerName': 'sample-springboot-app-qa-code-deploy-bg-trigger',
'triggerTargetArn': 'arn:aws:sns:us-east-1:123456789123:my_sns_topic',
'triggerEvents': [
"DeploymentStart",
"DeploymentSuccess",
"DeploymentFailure",
"DeploymentStop",
"DeploymentRollback",
"DeploymentReady"
]
},
],
autoRollbackConfiguration={
'enabled': True,
'events': [
'DEPLOYMENT_FAILURE', 'DEPLOYMENT_STOP_ON_ALARM', 'DEPLOYMENT_STOP_ON_REQUEST',
]
},
deploymentStyle={
'deploymentType': 'BLUE_GREEN',
'deploymentOption': 'WITH_TRAFFIC_CONTROL'
},
blueGreenDeploymentConfiguration={
'terminateBlueInstancesOnDeploymentSuccess': {
'action': 'TERMINATE',
'terminationWaitTimeInMinutes': 15
},
'deploymentReadyOption': {
'actionOnTimeout': 'CONTINUE_DEPLOYMENT'
}
},
loadBalancerInfo={
'targetGroupPairInfoList': [
{
'targetGroups': [
{
'name': 'sample-springboot-app-qa-tg1'
},
{
'name': 'sample-springboot-app-qa-tg2'
}
],
'prodTrafficRoute': {
'listenerArns': 'arn:aws:elasticloadbalancing:us-east-1:123456789123:listener/app/sample-springboot-app-qa-alb/2b8b8ab60f9c7e43/97b643f12d4fa8a4'
},
'testTrafficRoute': {
'listenerArns': 'arn:aws:elasticloadbalancing:us-east-1:123456789123:listener/app/sample-springboot-app-qa-alb/2b8b8ed64f9c7e43/09261df9b5476d39'
}
},
]
},
ecsServices=[
{
'serviceName': 'sample-springboot-app-qa',
'clusterName': 'my-test-cluster'
}
]
)
Create a CodeDeploy deployment using YAML file:
response = cd_client.create_deployment(
applicationName='AppECS-sample-springboot-app-qa-bg',
deploymentGroupName='DgpECS-sample-springboot-app-qa-bg',
revision={
'revisionType': 'AppSpecContent',
'appSpecContent': {
'content': "APPSPEC FILE CONTENT",
'sha256': "xxxxxx"
}
},
ignoreApplicationStopFailures=False,
autoRollbackConfiguration={
'enabled': True,
'events': [
'DEPLOYMENT_FAILURE',
'DEPLOYMENT_STOP_ON_ALARM',
'DEPLOYMENT_STOP_ON_REQUEST'
]
}
)
The CodeDeploy “Deployment” uses an AppSpec file, is a YAML file that provides the resource information and execution hooks information. The Resources section, as shown below, contains the Task Definition and container info. The Hooks section lets you add lambda functions to be triggered during various points in the lifecycle of the CodeDeploy Blue-Green deployment (more details in the below section).
{
"version": 0.0,
"Resources": [
{
"TargetService": {
"Type": "AWS::ECS::Service",
"Properties": {
"TaskDefinition": "ECS-TASK-DEFINITION-ARN",
"LoadBalancerInfo": {
"ContainerName": "my-container",
"ContainerPort": 8080
}
}
}
}
],
"Hooks": [
{
"AfterAllowTestTraffic": "arn:aws:lambda:us-east-1:1234567890:function:my-green-ready-hook"
}
]
}
Blue-Green Deployment in action: The happy path
Initial setup of Blue-Green deployment:
Here we have an ECS service using Blue-Green deployment (powered by CodeDeploy) running a Fargate task.
Note that the ECS service is running based on ECS task definition version 132 and is running one Fargate task:
The ALB and the Target Groups attached to this ECS service:
The listener with port 443 is the PROD traffic listener and the port 8443 is the TEST traffic listener. At this point, both the listeners are attached to the same target group ending with “-tg2”.
The two target groups created for this ECS service to support Blue-Green deployment using Code Deploy. At this point, only the target group “-tg2” is running a target/task and is attached to the above ALB listener.
Initiating the Blue-Green deployment
First, we start the Blue-Green deployment for the ECS service. The ECS deployment can be started programmatically using the ECS API or the AWS ECS console - update ECS service feature, followed by a new deployment creation from the CodeDeploy console or CodeDeploy API. The updated code version will be mentioned in the new ECS task definition file that will be used to update the ECS service.
As the ECS service’s deployment controller is CodeDeploy, deployment in CodeDeploy gets triggered and a new ECS task is started here.
You can see below a new ECS task has started, the running count is now 2. Under the Deployments tab, the PRIMARY entry is for the current Blue task. The ACTIVE entry is for the new upcoming Green task.
Green tasks are now up and running. The PROD traffic is still being served by the Blue task in the original stack. Rollback is still possible at this stage, but keep in mind that rollback will terminate the Green task.
Now the ALB listeners have been updated. PROD listener (Port-443) is still attached to the target group “-tg2” and servicing live traffic. The TEST listener (Port-8443) is attached with the target group “-tg1”, which was previously not at all attached to the ALB. So you can hit the ALB DNS:8443 and test the Green stack which is not taking any PROD traffic.
Now, both “-tg1” (taking TEST traffic) and “-tg2” (taking PROD traffic) are attached to the ALB serving TEST and PROD listeners.
PROD traffic moved from Blue to Green task
Traffic routing is completed and the new Green task (or Replacement task) is now serving the PROD traffic. But how did traffic routing happen? Do we have any control over this? — Good questions and I will answer them in a separate section :)
Rollback is still possible at this stage as the Blue task is still around and sitting idle, it will be available for the next 15 mins (this duration is configurable in the DeploymentGroup attribute “terminationWaitTimeInMinutes” as shown in the previous section) But how does the rollback work? — Again a good question that I will explain in another section.
ECS is now running the task using the updated task definition version 133 (previously 132). Both the original and new tasks are still running, that's why rollback is still possible. The Deployments tab now shows that 100% of traffic is being served by the Replacement task, which is the new Green task.
Now, both the ALB listeners PROD (Port-443) and TEST (Port-8443) are pointing to the target group “-tg1” which has the Green task as the target.
Fast forward 15 minutes — Blue-Green deployment has been completed
Based on the CodeDeploy “DeploymentGroup” configuration “terminationWaitTimeInMinutes,” after 15 minutes it will terminate the Blue task. Rollback is no longer possible now as the Blue task is gone!
The ECS running task count is back to 1 as the Blue/original task is gone. The Task definition version is “133,” meaning the new version serves the PROD traffic. Under the Deployments tab, the ACTIVE entry is now gone.
Blue-Green deployment in action: Rollback flow
Let's recap our current stable PROD state:
-
ECS service is running 1 task using task definition version 133.
-
ALB listeners both PROD (Port-443) and TEST (Port-8443) pointing to target group “-tg1”.
- Target group “-tg1” is active and serving PROD traffic and “-tg2” is not attached to the ALB.
We want to deploy a new version now and will rollback post traffic rerouting to demo the rollback flow:
Let's Initiate the Blue-Green deployment (again):
Let’s fast forward and review the state where the new Blue-Green deployment reroutes the traffic to the Green stack and waits for 15 minutes before terminating the Original/Blue task.
We have CodeDeploy Blue-Green deployment waiting to terminate the Blue task and PROD traffic being served by the new Green/Replacement task.
The ECS service is now running a new task using the new ECS task definition version 134 (we started this deployment with 133) and our ECS running task count is 2 - one with ECS task def version 133 and another with new ECS task def version 134. ECS Deployments should be showing 100% traffic being served by the new PRIMARY task.
ALB listeners PROD (Port-443) and TEST (Port-8443) are both pointing to the target group “-tg2”.Previously they were attached to “-tg1”.
Rollback strategy: Handling deployment failures and reverting changes
Let's initiate the rollback from the CodeDeploy console by clicking the “Stop and rollback deployment” button:
The deployment works because CodeDeploy stops the current deployment and skips the step of deleting the Original/Blue task. It then creates a new deployment to rollback the previous deployment and reroutes the traffic back to the Original/Blue task from the Replace/Green task. It also terminates the Replacement/Green task.
ECS and ALB configuration after rollback
The ECS service is now running a task using the original task def version 133 and the running task count is back to 1. The ECS Deployments are showing 100% PROD traffic through the PRIMARY task.
The ALB listeners both PROD and TEST are back to the target group “-tg1”.
So this completes the successful rollback!
Managing traffic routing between Blue-Green tasks
CodeDeploy allows you to attach hooks to the Blue-Green deployment pipeline. The hooks are nothing but lambda functions that you implement.
A few example scenarios are:
- Running functional tests on the Green stack before routing PROD traffic.
- Some environmental setups like downloading/uploading files to S3 before PROD traffic migration.
Amazon ECS deployment: Guide to lifecycle event hooks
Ref: Guide to ECS found here
- BeforeInstall — Used to run tasks before the replacement task set is created.
- AfterInstall — Used to run tasks after the replacement task set is created and one of the target groups is associated with it.
- AfterAllowTestTraffic — Used to run tasks after the test listener serves traffic to the replacement task set. The results of a hook function at this point can trigger a rollback.
- BeforeAllowTraffic — Used to run tasks after the second target group is associated with the replacement task set, but before traffic is shifted to the replacement task set. The results of a hook function at this lifecycle event can trigger a rollback.
- AfterAllowTraffic — Used to run tasks after the second target group serves traffic to the replacement task set. The results of a hook function at this lifecycle event can trigger a rollback.
How can I add a lifecycle hook?
Lifecycle hooks lambda functions can be added through the AppSpec file that you create and attach to the CodeDeploy Deployment object, please refer to the section above for an example.
My implementation for the automatic rollback involves:
Adding a hook for the lifecycle event “AfterAllowTestTraffic.”
The lambda function runs functional tests on the Green task using the ALB DNS + TEST listener Port (8443).
If the functional test passes (i.e., returns “Succeeded” to the CodeDeploy console), the Blue-Green deployment will continue and traffic will shift to the Green stack.
If the test fails (i.e., returns “Failed” to the CodeDeploy console), it will automatically roll back of the traffic to the Blue stack.
TL;DR
ECS service deployment using AWS CodeDeploy is a powerful combination providing straightforward and robust Blue-Green deployment support.
The additional deployment lifecycle hooks allow you to control the traffic routing policy per your requirements.
If you are already extensively using AWS services, and ECS is your container deployment platform, consider this a go-to architecture over homegrown solutions.
If you want to learn about Canary deployment patterns using existing AWS services, please look here, it’s an excellent post on this topic!