Enhancing serverless observability with Python

Key strategies and tools to leverage Python for effective serverless observability.

The serverless space continues to be powered by Python’s vibrant ecosystem. From each of the major cloud providers and across serverless functions to big data, the preferred language is Python. In this article, we're going to focus exclusively on serverless functions and an oft-cited challenge, observability.

What it means to be serverless

Serverless applications involve a trade-off, where cloud infrastructure providers give up control of the compute resources that underlie their code executions. This shift allows developers to focus on the value their code delivers, rather than on the operational requirements associated with running servers as virtual machines. This is highly appealing as it enables on-demand billing based on compute time, which may be more cost effective. Meanwhile the provider can manage all hypervisor, operating system and runtime upgrades in order to ensure that the developers’ code remains secure, compliant and isolated.

Given that all major providers support Python as a native runtime, developing serverless applications powered by Python just makes sense!

Capital One uses serverless at scale

See how we’re building and running serverless applications at a massive scale.

How to observe serverless applications

A common concern when deciding to adopt a serverless architecture is observability. Without the ability to use secure shell (SSH) to access the underlying operating system, how do you access key resources such as: 

  • Your log files on the server
  • Top data about your application performance
  • Aggregate business context about what your application is doing
  • Interrogate your application to see what happened in any specific transaction

It may be surprising to know that not only are all of these tasks accessible within the serverless stack, but implementing observability in a serverless manner is also a better practice compared to logging into an instance or container interactively.

Serverless observability starts with thoughtful application design. An observable system requires as much planning and design as test-driven development or ensuring that the business logic is handled correctly. Unlike most design considerations, serverless applications have evolved to a point where very little code needs to be written in order to achieve this design. In fact, in many cases, the code only needs to be instrumented.

Understanding logs, metrics and telemetry in serverless observability

Similar to how we write code in modern times using behavior-driven development (BDD) and test-driven development (TDD), we use observability-driven design (ODD) to build modern distributed applications, including those based on serverless technology. This means that we need to instrument code in a meaningful way and design the windows into our application’s behavior intentionally.

This can be broken down into three basic components: logs, metrics and telemetry.

Let’s consider a simple API for scheduling payments. The application should be able to:

  • Log an identifying attribute of the account.
  • Emit a metric to indicate a successful payment was scheduled.
  • Send telemetry to identify which parts of the scheduling process are being invoked in real-time (e.g., a commit to the database and a message to queue up the payment when able).
Observability-driven development: instrument the code, measure and analyze findings, improve with changes

Logs

Logs are the most familiar part of this trio to most software engineers. For almost 25 years, standard logging formats have been published by documents like RFC-3164 (now RFC-5424) for the Syslog standard, and all modern languages have logging libraries that incorporate these conventions, including log levels. Logs are statements of facts, such as a code line where an error was encountered, what happened to the customer experience or simple information that a specific part of the application fired. The Python standard library’s `logging` module is no exception.

Logs have become such a standard tool in our development toolbox that they are often the first answer to any observability question. This reliance on logs can be likened to the adage "when you’re comfortable with the hammer, everything starts looking like a nail." However, while logs are crucial, relying solely on them for aggregation can lead to problems that are difficult to solve.

Metrics

Metrics are numeric measurements taken over time. Examples include the duration of function execution, the number of executions that timed out or business metrics like the outcome of the execution. Metrics are powerful because they allow you to summarize and analyze data using simple sums and averages as well as descriptive statistical measurements such as percentile values.

Telemetry

Telemetry, on the other hand, provides detailed insights into code execution for each API call made or methods invoked, depending on the code's instrumentation. The scale of telemetry sets it apart. While telemetry measurements are valuable for any given serverless function, they become more insightful when they are taken as distributed telemetry. This approach brings together a full user flow with the same level of visibility and the ability to drill down in each specific area.

Adopting observability-driven development as a preferred methodology

Serverless observability is not just a possibility; it's an essential practice for modern application development. By leveraging Python's robust ecosystem and the inherent capabilities of serverless infrastructure, developers can create highly observable systems that offer deep insights into application performance and behavior.

Designing for observability is achievable and highly beneficial. By ensuring your serverless applications are observable from the start, you can save time and add significant value to your projects. This approach enhances both quality and functionality. 

Embracing observability-driven development (ODD) allows you to design your applications with built-in monitoring, logging and telemetry, ensuring you can proactively identify and address issues. This approach not only improves the reliability and performance of your serverless functions, but also enhances your ability to deliver high-quality software rapidly.

As the serverless landscape continues to evolve, integrating observability from the start will become increasingly important. The strategies and practices discussed here will help you navigate the complexities of serverless observability, ensuring your applications are resilient, efficient, and ready for future challenges.

Check out our full PyCon 2024 presentation: “Python Powered Serverless Observability”

For an even deeper dive into Python and how it powers the majority of the serverless world, check out my colleague Brian McNamara, distinguished engineer, and my PyCon 2024 speaking session. In our session, you’ll explore the community libraries that exist to improve application observability, including a step-by-step instrumentation of code. 

After watching this session recording, you’ll walk away with a clear understanding of how to design observability into your serverless development, as well as some fundamental tools that will enable you to effectively scale your services.

Explore Capital One's tech career opportunities

Interested in joining a world-class team of engineers working to change banking for good? Explore tech careers at Capital One.


Dan Furman, Distinguished Engineer

Dan is a solutions architect, open source enthusiast, and cloud native advocate. With 15 years of experience, he is inspired by the innovation, speed, and trends that become best practices across programming languages. Dan's on a mission to make software delivery approachable, strategic, cost effective, and timely by thoughtfully building our toolbox.