Optimizing Blue-Green deployment with AWS CodeDeploy

Streamlining deployment processes with AWS CodeDeploy: A closer look at seamless Blue-Green deployment

Updated January 9, 2024

Before we go deep into the trenches of how to use Blue-Green deployment, let's try to understand what it is:

Blue-Green deployment is a deployment pattern intending to deploy a new version of an application/software without any downtime and minimal risk. Blue-Green deployment is achieved by bringing up a similar stack and then deploying the new version of the application on this new stack. Traffic is moved from the current stack (which is called the Blue stack) to the new stack (which is called the Green stack).

Now that we’ve covered what Blue-Green deployment is, let's discuss the key benefits when using Blue-Green deployment.

  • No downtime: You are moving the traffic from the Blue stack to the Green stack.
  • Easy rollback: If the Green stack isn’t healthy, you can follow the reverse process and move the traffic back to the Blue stack.
  • Reduced risk: You can validate the Green stack by running functional tests before you migrate the prod live traffic.

In this blog, we thoroughly explain how to a Blue-Green deployment using AWS CodeDeploy service for ECS container tasks.

When AWS launched ECS in April 2015, there was no out-of-box support for Blue-Green deployment. Engineers tried various options to work around this, including:

  • Swapping auto-scaling groups behind ELB.
  • Updating the auto-scaling group launch configurations.
  • DNS routing update using Route53.

In August 2016, AWS launched an Application Load Balancer (ALB) but was still not supporting ECS Blue-Green deployment. To work around this, our Capital One team implemented a homegrown solution for Blue-Green deployment using a beta ALB for test traffic and then swapped out the ALB listener rules for traffic cutover between Blue-Green tasks.

In Nov 2018, Amazon ECS added official support for Blue-Green deployments using CodeDeploy.

Prerequisites for successful ECS Blue-Green deployment

  • The ECS deployment type should be “blue-green.”
  • The ECS service must use the Application Load Balancer or the Network Load Balancer. We will be using the ALBr in this blog.
  • The ALB should have a listener that will take prod traffic.
  • An optional test listener can be added to the load balancer, which is used to route test traffic. CodeDeploy routes your test traffic to the replacement/Green task set during deployment when specifyiing a test listener.
  • Two ALB target groups should be created, one for the Blue tasks and another for the Green tasks.

CodeDeploy Blue-Green deployment flow: From initial setup to testing phase

The assumption is that the dev team is already using deployment automation to create an ECS service with ALB and target groups programatically or they should be doing the setup using AWS Console.

The diagram below shows the initial deployment where only Blue tasks are run and take 100% of production traffic.

Initial Blue stack

Now, let’s look at a CodeDeploy based Blue-Green deployment. You’ll notice from the diagram below that Green tasks start (the new version of the code) and are attached to Target Group 2. The ALB test traffic listener is now ready for test traffic on port 8443, test traffic will be sent to Green tasks using Target Group 2. We can add a hook (a lambda function) once test traffic is ready through the test listener. The lambda function can perform some functional testing on the ALB/test listener port 8443 and will return either “succeeded” or “failed.”

Green traffic flowing through Test listener

Assuming the test traffic lambda hook returned “succeeded,” the production traffic is routed to Target Group 2, which is in turn served by Green tasks (new code version). The ALB prod listener port 443 and test listener port 8443 now point to Target Group 2. CodeDeploy will keep the Blue tasks for a pre-configured period so that a rollback can be possible from the CodeDeploy console or through CLI/API call.

Prod traffic flowing through Green tasks

Once the pre-configured period has elapsed, CodeDeploy will terminate the Blue tasks, and after this point, rollback won’t be possible.

Blue tasks terminated

Required CodeDeploy resources supporting ECS Blue-Green deployment

Let’s dive deep into the implementation details now. The below AWS CodeDeploy artifacts are required to support the ECS Blue-Green deployment:

  • CodeDeploy Application
  • CodeDeploy Deployment Group
  • CodeDeploy Deployment
Blue, red, and green flowchart with white text and blue arrows, detailing AWS CodeDEploy artifacts

So you can either create the above CodeDeploy resources from the AWS CodeDeploy console, or programmatically create them using CLI or language-specific APIs like Python/Boto3. Let’s walk through what that looks like:

 

Create a CodeDeploy application using Python/Boto3:

    cd_client = boto3.client('codedeploy')
response = cd_client.create_application(
   applicationName=application_name,
   computePlatform='ECS'
)
  

Create a CodeDeploy deployment group using Python/Boto3:

Create a CodeDeploy deployment group using Python/Boto3:

    response = cd_client.create_deployment_group(
   applicationName='AppECS-sample-springboot-app-qa-bg',
   deploymentGroupName='DgpECS-sample-springboot-app-qa-bg',
   deploymentConfigName='CodeDeployDefault.ECSAllAtOnce',
   serviceRoleArn='arn:aws:iam::123456789123:role/ecs_service_role',
   triggerConfigurations=[
       {
           'triggerName': 'sample-springboot-app-qa-code-deploy-bg-trigger',
           'triggerTargetArn': 'arn:aws:sns:us-east-1:123456789123:my_sns_topic',
           'triggerEvents': [
               "DeploymentStart",
               "DeploymentSuccess",
               "DeploymentFailure",
               "DeploymentStop",
               "DeploymentRollback",
               "DeploymentReady"
           ]
       },
   ],
   autoRollbackConfiguration={
       'enabled': True,
       'events': [
           'DEPLOYMENT_FAILURE', 'DEPLOYMENT_STOP_ON_ALARM', 'DEPLOYMENT_STOP_ON_REQUEST',
       ]
   },
   deploymentStyle={
       'deploymentType': 'BLUE_GREEN',
       'deploymentOption': 'WITH_TRAFFIC_CONTROL'
   },
   blueGreenDeploymentConfiguration={
       'terminateBlueInstancesOnDeploymentSuccess': {
           'action': 'TERMINATE',
           'terminationWaitTimeInMinutes': 15
       },
       'deploymentReadyOption': {
           'actionOnTimeout': 'CONTINUE_DEPLOYMENT'
       }
   },
   loadBalancerInfo={
       'targetGroupPairInfoList': [
           {
               'targetGroups': [
                   {
                       'name': 'sample-springboot-app-qa-tg1'
                   },
                   {
                       'name': 'sample-springboot-app-qa-tg2'
                   }
               ],
               'prodTrafficRoute': {
                   'listenerArns': 'arn:aws:elasticloadbalancing:us-east-1:123456789123:listener/app/sample-springboot-app-qa-alb/2b8b8ab60f9c7e43/97b643f12d4fa8a4'
               },
               'testTrafficRoute': {
                   'listenerArns': 'arn:aws:elasticloadbalancing:us-east-1:123456789123:listener/app/sample-springboot-app-qa-alb/2b8b8ed64f9c7e43/09261df9b5476d39'
               }
           },
       ]
   },
   ecsServices=[
       {
           'serviceName': 'sample-springboot-app-qa',
           'clusterName': 'my-test-cluster'
       }
   ]
)
  

Create a CodeDeploy deployment using YAML file:

    response = cd_client.create_deployment(
   applicationName='AppECS-sample-springboot-app-qa-bg',
   deploymentGroupName='DgpECS-sample-springboot-app-qa-bg',
   revision={
       'revisionType': 'AppSpecContent',
       'appSpecContent': {
           'content': "APPSPEC FILE CONTENT",
           'sha256': "xxxxxx"
       }
   },
   ignoreApplicationStopFailures=False,
   autoRollbackConfiguration={
       'enabled': True,
       'events': [
           'DEPLOYMENT_FAILURE',
           'DEPLOYMENT_STOP_ON_ALARM',
           'DEPLOYMENT_STOP_ON_REQUEST'
       ]
   }
)
  

The CodeDeploy “Deployment” uses an AppSpec file, is a YAML file that provides the resource information and execution hooks information. The Resources section, as shown below, contains the Task Definition and container info. The Hooks section lets you add lambda functions to be triggered during various points in the lifecycle of the CodeDeploy Blue-Green deployment (more details in the below section).

    {
   "version": 0.0,
   "Resources": [
       {
           "TargetService": {
               "Type": "AWS::ECS::Service",
               "Properties": {
                   "TaskDefinition": "ECS-TASK-DEFINITION-ARN",
                   "LoadBalancerInfo": {
                       "ContainerName": "my-container",
                       "ContainerPort": 8080
                   }
               }
           }
       }
   ],
   "Hooks": [
 {
  "AfterAllowTestTraffic": "arn:aws:lambda:us-east-1:1234567890:function:my-green-ready-hook"
 }
]
}
  

Blue-Green Deployment in action: The happy path

Initial setup of Blue-Green deployment:

Here we have an ECS service using Blue-Green deployment (powered by CodeDeploy) running a Fargate task.

Note that the ECS service is running based on ECS task definition version 132 and is running one Fargate task:

Task screen with black text showing ECS- Before Blue-Green deployment

ECS - Before Blue-Green deployment

The ALB and the Target Groups attached to this ECS service:

ECS service screen with black text and blue highlighted buttons and rows, showing ALB before blue-green deployment

ALB — Before Blue-Green deployment

The listener with port 443 is the PROD traffic listener and the port 8443 is the TEST traffic listener. At this point, both the listeners are attached to the same target group ending with “-tg2”.

ECS service screen showing target groups before blue-green deployment, in black text and with blue buttons

Target Groups — Before Blue-Green deployment

The two target groups created for this ECS service to support Blue-Green deployment using Code Deploy. At this point, only the target group “-tg2” is running a target/task and is attached to the above ALB listener.

Initiating the Blue-Green deployment

First, we start the Blue-Green deployment for the ECS service. The ECS deployment can be started programmatically using the ECS API or the AWS ECS console - update ECS service feature, followed by a new deployment creation from the CodeDeploy console or CodeDeploy API. The updated code version will be mentioned in the new ECS task definition file that will be used to update the ECS service.

As the ECS service’s deployment controller is CodeDeploy, deployment in CodeDeploy gets triggered and a new ECS task is started here.

Initiate Deployment screen showing blue and grey status buttons and black text

CodeDeploy — Initiate Deployment

You can see below a new ECS task has started, the running count is now 2. Under the Deployments tab, the PRIMARY entry is for the current Blue task. The ACTIVE entry is for the new upcoming Green task.

ECS deployment in progress screen with black text and grey table rows

ECS — Blue-Green deployment in progress in CodeDeploy

Green tasks are now up and running. The PROD traffic is still being served by the Blue task in the original stack. Rollback is still possible at this stage, but keep in mind that rollback will terminate the Green task.

Deploy screen with blue, grey, and green status bars and black text

Code Deploy — Post creation of Green task

Now the ALB listeners have been updated. PROD listener (Port-443) is still attached to the target group “-tg2” and servicing live traffic. The TEST listener (Port-8443) is attached with the target group “-tg1”, which was previously not at all attached to the ALB. So you can hit the ALB DNS:8443 and test the Green stack which is not taking any PROD traffic.

ECS screen with black text, blue button, and blue and grey highlighted table rows

ALB — Now listeners are connected to PROD and TEST target groups

Now, both “-tg1” (taking TEST traffic) and “-tg2” (taking PROD traffic) are attached to the ALB serving TEST and PROD listeners.

Target groups screen with black text and grey and blue highlighted table rows, and blue buttons

Target Groups — Both target groups now active and attached to ALB

PROD traffic moved from Blue to Green task

Traffic routing is completed and the new Green task (or Replacement task) is now serving the PROD traffic. But how did traffic routing happen? Do we have any control over this? — Good questions and I will answer them in a separate section :)

Rollback is still possible at this stage as the Blue task is still around and sitting idle, it will be available for the next 15 mins (this duration is configurable in the DeploymentGroup attribute “terminationWaitTimeInMinutes” as shown in the previous section) But how does the rollback work? — Again a good question that I will explain in another section.

Traffic rerouting status screen with blue, grey, and green status bars and black text

CodeDeploy — Traffic rerouting done

ECS is now running the task using the updated task definition version 133 (previously 132). Both the original and new tasks are still running, that's why rollback is still possible. The Deployments tab now shows that 100% of traffic is being served by the Replacement task, which is the new Green task.

ECS Screen detailing Post traffic rerouting from blue to green, in black text

ECS — Post traffic rerouting from blue to green

Now, both the ALB listeners PROD (Port-443) and TEST (Port-8443) are pointing to the target group “-tg1” which has the Green task as the target.

Post traffic rerouting with blue and grey table rows, black text, and blue buttons

ALB — Post traffic rerouting

Fast forward 15 minutes — Blue-Green deployment has been completed

Based on the CodeDeploy “DeploymentGroup” configuration “terminationWaitTimeInMinutes,” after 15 minutes it will terminate the Blue task. Rollback is no longer possible now as the Blue task is gone!

Status screen with blue, grey, and green status bars and black text

CodeDeploy — Post termination of Blue/original task

The ECS running task count is back to 1 as the Blue/original task is gone. The Task definition version is “133,” meaning the new version serves the PROD traffic. Under the Deployments tab, the ACTIVE entry is now gone.

Status screen with black text and grey table, and red boxes outlining task definition, task ID, and run count

Blue-Green deployment in action: Rollback flow

Let's recap our current stable PROD state:

  • ECS service is running 1 task using task definition version 133.

  • ALB listeners both PROD (Port-443) and TEST (Port-8443) pointing to target group “-tg1”.

  • Target group “-tg1” is active and serving PROD traffic and “-tg2” is not attached to the ALB.

We want to deploy a new version now and will rollback post traffic rerouting to demo the rollback flow:

Let's Initiate the Blue-Green deployment (again):

Let’s fast forward and review the state where the new Blue-Green deployment reroutes the traffic to the Green stack and waits for 15 minutes before terminating the Original/Blue task.

We have CodeDeploy Blue-Green deployment waiting to terminate the Blue task and PROD traffic being served by the new Green/Replacement task.

Deploy screen with blue, green, and grey status bars and black text

CodeDeploy — Post rerouting traffic

The ECS service is now running a new task using the new ECS task definition version 134 (we started this deployment with 133) and our ECS running task count is 2 - one with ECS task def version 133 and another with new ECS task def version 134. ECS Deployments should be showing 100% traffic being served by the new PRIMARY task.

ECS screen with black text, grey tables, and red boxes outlining task definition, run count, and task ID

ECS Service - Deployment View

ALB listeners PROD (Port-443) and TEST (Port-8443) are both pointing to the target group “-tg2”.Previously they were attached to “-tg1”.

ECS screen with blue buttons, blue and grey highlighted table rows, and black text

ALB - Listeners

Rollback strategy: Handling deployment failures and reverting changes

Let's initiate the rollback from the CodeDeploy console by clicking the “Stop and rollback deployment” button:

Stop and rollback deployment screen with black text and orange button

Code Deploy - Stop and Rollback Message

The deployment works because CodeDeploy stops the current deployment and skips the step of deleting the Original/Blue task. It then creates a new deployment to rollback the previous deployment and reroutes the traffic back to the Original/Blue task from the Replace/Green task. It also terminates the Replacement/Green task.

Status screen with grey, blue, and green status bars

Code Deploy - Rollback Deployment

ECS and ALB configuration after rollback

The ECS service is now running a task using the original task def version 133 and the running task count is back to 1. The ECS Deployments are showing 100% PROD traffic through the PRIMARY task.

ECS screen with black text, blue button, grey table, and red box outlining task definition, run count, and task ID

ECS Service - Post Rollback

The ALB listeners both PROD and TEST are back to the target group “-tg1”.

ECS screen with blue and grey table and blue buttons

ALB - Listeners Post Rollback

So this completes the successful rollback!

 

Managing traffic routing between Blue-Green tasks

CodeDeploy allows you to  attach hooks to the Blue-Green deployment pipeline. The hooks are nothing but lambda functions that you implement.

A few example scenarios are:

  • Running functional tests on the Green stack before routing PROD traffic.
  • Some environmental setups like downloading/uploading files to S3 before PROD traffic migration.

Amazon ECS deployment: Guide to lifecycle event hooks

Ref: Guide to ECS found here

  • BeforeInstall — Used to run tasks before the replacement task set is created.
  • AfterInstall — Used to run tasks after the replacement task set is created and one of the target groups is associated with it.
  • AfterAllowTestTraffic — Used to run tasks after the test listener serves traffic to the replacement task set. The results of a hook function at this point can trigger a rollback.
  • BeforeAllowTraffic — Used to run tasks after the second target group is associated with the replacement task set, but before traffic is shifted to the replacement task set. The results of a hook function at this lifecycle event can trigger a rollback.
  • AfterAllowTraffic — Used to run tasks after the second target group serves traffic to the replacement task set. The results of a hook function at this lifecycle event can trigger a rollback.

How can I add a lifecycle hook?

Lifecycle hooks lambda functions can be added through the AppSpec file that you create and attach to the CodeDeploy Deployment object, please refer to the section above for an example.

My implementation for the automatic rollback involves:

Adding a hook for the lifecycle event “AfterAllowTestTraffic.”

The lambda function runs functional tests on the Green task using the ALB DNS + TEST listener Port (8443).

If the functional test passes (i.e., returns “Succeeded” to the CodeDeploy console), the Blue-Green deployment will continue and traffic will shift to the Green stack.

If the test fails (i.e.,  returns “Failed” to the CodeDeploy console), it will automatically roll back of the traffic to the Blue stack.

TL;DR

ECS service deployment using AWS CodeDeploy is a powerful combination providing straightforward  and robust Blue-Green deployment support.

The additional deployment lifecycle hooks allow you to control the traffic routing policy per your requirements.

If you are already extensively using AWS services, and ECS is your container deployment platform, consider this a go-to architecture over homegrown solutions.

If you want to learn about Canary deployment patterns using existing AWS services, please look here, it’s an excellent post on this topic!

References


Avijit Sarkar, Lead Software Engineer

Competent and dynamic IT professional enriched with the latest trends and techniques and a wide range of skill in Project Management, Quality Initiatives, Technology, Critical Thinking, Troubleshooting, Problem Analysis and Resolution.

Explore #LifeAtCapitalOne

Feeling inspired? So are we.

Related Content

JiTI
Article | April 4, 2019
blurred bottles on bar shelves
Article | February 26, 2020