Dockerization: A practical guide to Docker containerization

Package, deploy and run apps using containers. Find out how containerization in Docker works in this tutorial.

Developers rely on containerization to help create, package and deploy applications and development assets independently — application code, plus all its dependencies, for example. Having dependencies and libraries in a centralized location allows developers and dev ops teams to optimize and streamline application development, running and scaling as needed.

Docker is a cloud-native platform that lets you build, run and share containerized applications through dockerization — the process of creating Dockerfiles that can package all the software needed to run an application into a portable and executable Docker image. 

Cloud containerization

Cloud containers act as a virtual homebase for software. These containers run on top of an operating system (OS) and on the back of a comprehensive setup that includes defining the OS and the number of containers needed. They abstract the underlying OS away and allow you to focus solely on the application. This is what makes containers so attractive.

In most cases, cloud containers are attached to the OS from a native cloud environment — Microsoft, Azure or AWS, for example. When large enterprises find themselves bogged down by too many containers, they may use containerization alongside tools designed to orchestrate the containers themselves. Recent innovations have helped mitigate the risk attached to performing common and necessary tasks like scanning and the actual containerizing process.

Containerization use cases

Potential use cases for cloud containerization include:

  • The need for rapid deployment

  • Desire for application portability, such as moving applications from one environment to another environment that uses the same OS

  • Use of agile methodologies that include CI/CD versus old-school waterfall agile dev methods

  • Enterprises that require scalability of containers

  • The need to reduce IT costs that might increase if developers were to use virtual machines

  • Efficiency of building in an isolated environment

  • Goal of creating standardization during the development process

Docker architecture

Docker uses a fairly standard client-server model with multiple layers. This infrastructure is a great way to implement continuous integration and development of CI/CD methodologies while providing easy mobility across dev environments — as long as the operating system is the same.

Docker client

This is how users interact with Docker, using the provided command line interface (CLI) to build, run and stop applications. The client may share a host with the background program or connect via a remote host.

Docker host

This environment encompasses the background program, images, containers, network and storage needed to set up to execute and run applications.

Based on this setup and the components below, Docker is engineered to speed up the deployment and runtime of applications, as containers can run either on demand or continuously.

Main docker components

These are the main components of Docker.

Server docker daemon

Photo credit to k21academy.com

Docker images

Docker images are templates (written in YAML — Yet Another Markup Language) that house the instructions for the associated container. These images are built in read-only layers, with build tags used to define which file to reference when the image is used to run the application from the container.

Dockerfile

Dockerfiles contain all the commands and dependencies applications need to run while also helping to create a static image the container uses to run the application. Later on, we explore the anatomy of a Dockerfile.

Docker registries

Users can leverage Docker registries to store, manage and distribute images built from Dockerfiles via commands like docker push, docker pull and docker run. Docker Hub, a public registry, has free and paid plans, but private and third-party registries also exist.

Docker containers

The Docker container runs the images and applications, connecting using restful APIs. The image and file deliver the application to the host server.

Tutorial: How to dockerize Python applications

Developers interested in using Python to build scalable applications may find that Docker makes deployment faster, easier and more secure. Here’s how to get started, including step-by-step instructions.

1. Install Docker

Start by downloading the latest version of Docker. There are options for both Windows and macOS. You may also need to update your code editor and download the coordinating Docker extension for that app.

2. Choose a base image

The first layer of your infrastructure is a base image. There are tech-specific options you can use depending on your coding environment:

Otherwise, you’ll need to install everything independently, building off a Base OS.

3. Install the necessary packages

Depending on what you need, this step could be redundant. Focus on installing the things that are absolutely needed rather than hedging bets and installing everything that’s available just because it’s there.

Generally speaking, the most common packages include:

  • Docker Engine: This is the core package.

  • Docker Compose: For multi-container Docker apps.

  • Docker CLI: The interface for interacting with Docker Engine.

  • Containerd: The runtime for running containers.

To dockerize Python, you’ll also need pip and virtualenv. Depending on the OS you’re using, you can use the following commands for installation.

  • Windows and macOS: Use the Docker Desktop app from the Docker site

  • Linux

    pip install docker
Ubuntu
  
    sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io
CentOS
  
    sudo yum update
sudo yum install docker-ce docker-ce-cli containerd.io
  

4. Add custom files

You can add custom files using Dockerfile or the docker cp command. 

For Dockerfile, use the COPY or ADD instruction command to add files from your local system. The ADD command should be used to add files from a remote URL or a local tarball.  

Example: 

    FROM python:3.9
#or any preferred Python version
ADD example.txt /
  

The docker cp command can also be used to add files from your local file system to a Docker container. To do this, add the file and container name after the docker cp command. In the example below, the ‘example.txt’ following the docker cp command is the file from your machine. The right side is the location in the Docker container where it will be placed.

Example: 

    docker cp example.txt examplecontainer:/
  

5. Define which user can run your container (optional)

Is it necessary to create a specific user ID to run a container? Only if the application requires access to user or group tables. Otherwise, you can skip this step — even though many popular applications are built with specific IDs included. If you do have a need to set the user ID, the example below will help. This code snippet obtains the current user’s user and group ID and assigns them to the ‘UID’ and ‘GID’ environment variables. The container uses the values stored in the UID and GID environment variables to specify the user and group.

Example:

    export UID=$(id -u)
export GID=$(id -g)
docker run -it \
    --user $UID:$GID \
    --workdir="/home/$USER" \
    --volume="/etc/group:/etc/group:ro" \
    --volume="/etc/passwd:/etc/passwd:ro" \
    --volume="/etc/shadow:/etc/shadow:ro" \
     /bin/bash
  

6. Define the exposed ports

Exposed ports are basically metadata for the container application. They also define firewall rules, like which ports will be allowable.

Here’s an example of how to expose a port (8080, for example) in Docker:

    EXPOSE 8080
  

Note that in the Docker environment, there’s a difference between an exposed port and a published port. An exposed port is most often used by the internal dev ops team, while a published port maps the container to the Docker host, making an application accessible to the rest of the world.

7. Define the entrypoint or use the CMD declaration

A Docker image needs either an entrypoint or CMD declaration or it won’t start. Defining the entrypoint is an important step as it determines how a Docker image behaves when it’s first started. 

There are two ways to define the entrypoint for your application:

  • Exec: Runs the command without using a shell — environment variables and special characters are interpreted literally. Of the two, exec is the more efficient but less flexible entrypoint.
    ENTRYPOINT [“python”, “app.py”]
  
  • Shell: Environment variables and special characters are interpreted by the shell’s rules. 
    ENTRYPOINT [“/bin/sh”, “-c”, “python app.py”]
  

The CMD command sets default parameters that can be overridden. This provides instructions on how to build the Docker image. There are two ways you can make the CMD command.

  • Exec:
    CMD [“executable”, “parameter1”, “parameter2”]
  
  • Shell:
    CMD command parameter1 parameter2
  

8. Define a configuration method

Applications must have built-in parameters. One option is to use an application-specific configuration file that includes essentials like format, fields and location — fairly cumbersome when working within a complex environment encompassing multiple technologies.

The other option is to leverage the Twelve-Factor App and a simple envsubst command to replace the configuration template already in place in the docker-entrypoint.sh script mentioned earlier.

9. Externalize your data

Industry standard is to avoid saving persistent data inside a container, as containers are created and destroyed quickly, and all data, including persistent data, is lost. Instead, consider sending data to an external folder on the Base OS — mounted volume and bind mounts are both solid options. This doesn’t remove the risk entirely, though, as this can still be problematic in a cloud environment, where the underlying EC2/Server may be ephemeral as well.

Docker can also be used alongside testing tools like Jenkins or Github for more version control. You can share publicly through the use of published ports and possibly reduce the cost of serving up data requests with containers that can be programmed to run on demand rather than continuously.

10. Handle the logs

So, what is persistent data exactly, and how should these data logs be handled? There’s a lot we could say about handing logs. For now, we’ll bring up that the conventional approach is to avoid log files altogether. Instead, you can use stdout and stderr as an event stream — Docker automatically captures everything sent to stdout and will make it available via a simple docker logs command.

It’s also important to mention that for applications that write log files automatically, volume can become an issue. The best way to avoid depleting server space is to manage log rotations using a tool like logrotate that automatically rotates and compresses log files.

Dockerize apps and streamline your development

Docker is commonplace in the industry, and knowing how to dockerize can help developers in big ways, improving dev ops and making applications more portable. Learn more about how we’re using cloud technologies like Docker to build cutting-edge tech in our blog. 


Capital One Tech

Stories and ideas on development from the people who build it at Capital One.

Explore #LifeAtCapitalOne

Feeling inspired? So are we.