Go: A serverless journey into Credit Offers API development
Evolution of the Capital One Credit Offers API.
As a tech-driven financial company, one of our biggest goals at Capital One is to use technology to easily, quickly, and directly engage with our customers. One important way a potential Capital One customer can engage with us is through our affiliate channel. Affiliate partners like Credit Sesame, CreditCards.com, and Bankrate have a special partnership with Capital One where they display available credit card options and guide potential customers to the credit cards that are right for them. For our affiliate partners, this is done through the Credit Offers API. We like to think that it helps people make smarter credit card choices. We also like to think it shows off our cutting edge use of technology.
What is the Credit Offers API?
How does it work? The Credit Offers API exposes a full list of Capital One credit card offers that our affiliate partners can display to their customers, along with details such as rewards information and product reviews. It has a pre-qualification feature where card offerings can be personalized, matching a customer to the right cards for them (Without affecting their credit score!). It also has a nice pre-fill feature that gives a complete, pre-filled application to make applying for a Capital One credit card a smooth experience.
When version 1.0 was built in 2015, Credit Offers was built in Java. The current version of the API, version 3.0, has been rebuilt so it is full serverless and written in Go. These are two really cool technologies and combined they’re a powerful way to expose and showcase our credit card products.
Why Go?
In mid 2016 we added an additional endpoint to the API that would acknowledge that an offer had been displayed to a customer. Go was starting to gain really huge, impressive momentum at Capital One and based on a POC that my team put together, we saw a huge performance improvement vs Java. These results were clear so we decided to use it for this new endpoint.
At the time, no single team member knew Go, but within a month, everyone was writing in Go and we were building out the endpoints. It was the flexibility, how easy it was to use, and the really cool concept behind Go (how Go handles native concurrency, garbage collection, and of course safety+speed.) that helped engage us during the build. Also, who can beat that cute mascot!
The whole process of rewriting the Credit Offers API in Go was much simpler than expected. In my role as Tech Lead I am the last one to approve when a piece of code will be merged and ready for release. So I had to get deeper into Go to not only understand the code, but the business logic as well. One of the more satisfying surprises was that mixing in business logic with a simple language like Go meant it was very easy to jump into this role and not be a bottleneck for releases.
This is because one of the goals of Go is more readable code. It does this by keeping a really simple syntax, its standard library and testing support.
The next technology we wanted to explore was serverless.By definition, Go is fast and simple, which is also how you would define a lambda. In fact, we didn’t want to go serverless without Go. Which brings us to...
Why serverless?
During another round of analysis, we realized our use case was perfect for a serverless approach. I had attended AWS re:Invent last year, mainly focusing on the serverless sessions, and had all these ideas about how the Credit Offers API could work with serverless. As soon as lambda support for Go was released in early January 2018 we started doing the migration, we started analyzing the tools, we started encapsulating the code in lambdas and adding additional alerting so we could go fully serverless. We completed this migration in early October of this year.
We were drawn to serverless because of the combination of its four main pillars:
Because of the affiliates channel and the way the Credit Offers API impacts our partners and potential customers, it was really important to be able to seamlessly scale our API without interruptions in service. Going serverless has not only allowed us to move away from servers, but it’s allowed us to be faster and more resilient. One of the biggest “wins” has been that our developers do not need to stop and worry about infrastructure and can dedicate more time to creating and delivering business-related features.
In this pic you can see how simple the architecture is. How we integrate with Amazon CloudWatch to monitor the health of the system. We elected DynamoDB for its escaping capabilities and flexible schema; for long term storage we use a S3 bucket. We also use Amazon’s SNS for alerting when there’s an issue. You can also see how we incorporated an external monitoring system — New Relic.
And again, a lambda sounds simple but we do have some complex processes including calling external APIs that are part of the pre-qualification flow. Go goroutines and lambda are a match made in heaven.
Another key piece of our serverless solution is how we do deployments. Canary deployments allow us to deploy 5% of any lambda update (versioning), set specific canary-deployment alerts, and if something fails, rolls it back automatically without any intervention. After the first 5% release succeeds, we figure out how long we want to test, and start incrementally releasing the new version of the lambda from there. Canary deployments have been really powerful for us. Like I mentioned, we have dependencies with our affiliates, and as we’re scaling and growing our new releases cannot go down. By using a solid pipeline with canary deployments they don’t.
What did we learn from this serverless and Go journey?
#1 Serverless doesn’t mean you lose control.
It’s the opposite, you have more control. Yes, scaling is seamless and performance is better, but there are limits and you have to check them. You cannot just throw billions of requests without being aware that it could be disruptive to your service if you don’t plan properly. To compensate for that, you can expand to different regions, to different availability zones, and plan for scaling your application.
#2 Testing is fundamental.
One of the biggest concerns from the community is how to test a lambda because the lambda depends on an event? Well, we actually explored with SAM local — which is the serverless application model for doing local testing. On the developer laptop they can simulate events and fully test the lambda.
#3 Mind your data and your databases.
There are several options for serverless databases depending on your use case, what data you need to keep, how long you need to keep it, etc. In our case the straightforward choice was to select a database that gave us the flexibility and capabilities we needed for this API.
#4 Keep it simple.
Lambdas shouldn’t be a huge piece of code. Right size your functions — keep them a simple Yes/No.
#5 Enable distributed tracing.
We used AWS X-Ray. It’s a natural choice for the AWS environment and gives you a full picture of how your system is performing. And it’s not just for the lambdas — it’s for the databases, for the Route 53s, for the gateway, etc. You can get every point of contact for the application through X-Ray including other microservices.
#6 Design your serverless pipeline with canary deployments in mind.
Well tested, incremental deployments have made our API more resilient. By using canary deployments we can release 24x7, with no service interruptions to our affiliates.
#7 The microservices mindset.
When designing your solution, try to make it simple, identify dependencies in and out, and don’t move from the whiteboard until you can create a smooth coupling within your ecosystem. Finally, fully control your events source.
Results
We saw some great results in terms of performance gains, cost savings, and team velocity with serverless. A 70% performance gain from the time the lambda gets the request to the time it replies back is pretty impressive. Even more impressive is our big achievement with cost — 90% savings by removing EC2, ELB, and RDS. And as the Tech Lead, the 30% increase in team velocity we gained by not spending time patching, fixing, and taking care of servers is time we can now spend innovating, creating, and expanding on business requirements.
Now that the API is 100% serverless, we have a solid roadmap of what else we want to do with the technology. We want to incorporate AWS CodeBuild to create a simple pipeline for multiple build processes, we want to use AWS Step Functions to add better retry mechanism, and we want to incorporate ELKinto our logging so we can add some business dashboards to our solutions. But we’ll save that for another post.