Docker Learnings

Docker Learnings

I'm working on a job for my dad which involves taking two audio files, extracting various insights from them and uploading the results to Google's BigQuery. The pipeline involves making several calls to Google's APIs, such as their Speech-to-text, and integrates with some of their services, such as GCS and BigQuery. Based on my time spent looking at job postings its clear to me that companies highly value experience building in the cloud, specifically using providers such as GCP, AWS, or Microsoft Azure, and that they value experience with containerizing and deploying applications with Docker. I figured this would be a good opportunity to practice just that. I learned a lot about Docker through this project, so here is my attempt as an amateur to explain Docker.

What is Docker?

Docker is a tool designed to make it easy to run applications across different environments. It does this by creating isolated environments, called containers, that contain everything needed to run the application. This includes dependencies, environment variables, and anything else.

What makes containers so special is the fact that they share the host system's OS kernel, while still keeping the application environment isolated. This is what makes it different from a virtual machine. This makes them lightweight and fast, while still avoiding the "it works on my machine" problem.

It's useful because it makes it easy for developers to create and deploy applications without worryingwhether it will run in whatever environment they deploy it to. Docker has become somewhat of a standard, and most major cloud providers make it easy for developers to deploy and run containers on their platform.

How Do You Use Docker?

Before learning how to use Docker it's important to understand the basics of how it works. Before running a container, you first have to make a Docker image. The image is what is used to create containers, with one image being able to create multiple container instances. Think of the image as the recipe, and the containers as the meal. With just one recipe, many meals can be made.

The first step is creating a Dockerfile in the root of your project. The Dockerfile contains the instructions for how to create the Docker image. Let's look at an example Dockerfile, which looks suspiciously similar to the one I wrote for this project.

FROM python:3.9-slim

# Set working directory
WORKDIR /code

# Install system dependencies including FFmpeg
RUN apt-get update && apt-get install -y \
    ffmpeg \
    && rm -rf /var/lib/apt/lists/*

# Copy requirements first to leverage Docker cache
COPY requirements.txt .

# Install Python dependencies
RUN pip install --no-cache-dir -r requirements.txt

# Copy application code
COPY . .

# Set environment variables
ENV PYTHONPATH=/code

# Expose port
EXPOSE 8080

# Command to run the application
CMD uvicorn app.main:app --host 0.0.0.0 --port ${PORT:-8080}

There's a lot going on here, so let's break it down step by step.

1. Choosing the Base Image

FROM python:3.9-slim

The first line chooses the base image for the image you would like to create. Think of this as just a template, which provides the OS and pre-installed packages your app will build on.

2. Set Working Directory

WORKDIR /code

Docker containers are built on Linux, so they have a file system just like any other computer. This sets the working directory of the container, similar to cd in bash. You do this because typically the app does not need to have access to everything inside of the container.

3. Install System Dependencies

RUN apt-get update && apt-get install -y \
    ffmpeg \
    && rm -rf /var/lib/apt/lists/*

The RUN command is virtually the same as typing something into your own terminal. It executes commands during the image build process. Here it is used to install any of the system dependencies the app will need to run.

4. Copy And Install Requirements

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

The COPY command unsurprisingly copies the given directory or file into the target. In this case it copies my requirements.txt file into current working directory, which is denoted by ".". This means there is now a requirements.txt file in the code directory of my container, because I earlier specified the working directory as code. Then RUN is used again to install the python dependencies. You do not need to install python because it is included in the base image.

5. Copy Application Code

COPY . .

This copies everything from your local build context, which is usually the same directory as your Dockerfile, into the current working directory inside of the image (previously defined as /code) You usually won't want everything from your entire app to be copied into your container, such as pycache, logs, a virtual environment if you have one, and whatever else. This is why you create a .dockerignore file and place it in the root of your application, which functions similarly to a .gitignore file. When copying files from your computer into the container, Docker will ignore whatever you put in the .dockerignore file.

6. Set Environment Variables

ENV PYTHONPATH=/code

This is how you can set the environment variables of your application. Be careful with storing sensitive information here such as API keys, because this information will be available to anyone who can access the container. This information should ideally be passed at runtime using .env files with Docker Compose or by using a secure secret manager.

7. Exposing a Port

EXPOSE 8080

This is what "exposes" your port to internet traffic, but it's really more of a documentation instruction than a functional one. You actually allow your application to receive traffic when you run the container.

8. Run Your App

CMD uvicorn app.main:app --host 0.0.0.0 --port ${PORT:-8080}

This is where you will actually run the command that will start your application. The difference between CMD and RUN is that RUN is used to build the image itself, but CMD is what is run to actually start the container. If CMD was replaced by RUN, the container would never start. Here I use this command to start the server of my app using uvicorn.

Important Note About Images

Each command in the Dockerfile creates what's known as an image layer. Docker will go through each layer from bottom to top (1st command to last command), to decide whether it needs to run it again or if it can use its cache. If nothing has changed, it uses the cache, dramatically speeding up the build time. However, if it finds a layer that does need to be rerun, everything after it will be rerun as well. This means it's a good idea to place commands that change infrequently (installing system or Python dependencies), earlier in your Dockerfile, and commands that change more often (copying code), later. For example, in my file I install all of the dependencies before copying my code. This means that even if I change my code, the image will not have to reinstall any of the dependencies, unless they have changed. If I did it the other way around, anytime I change my code would cause Docker to reinstall the dependencies, even if they haven't changed.

Building the Image

Once you have completed your Dockerfile, the next step is to build your image. You can do this with the following command:

docker build -t your-image-name .

The -t part is optional, but it is used for tagging your image with a name, and is usually recommended. The . means to use your current directory, where your Dockerfile and the rest of your application is. Make sure you have Docker installed and running on your computer when you do this, or else it won't work. You can install Docker on the official website.

Running the Container

The last step is to actually use the image to run the container. This is done with the following command:

docker run -p 8080:8080 your-image-name

This will use your image to run your container. The -p 8080:8080 part forwards your local machine's port to your container's port, which is necessary for web apps or APIs because it allows your container to receive traffic.

My Takeaways

While it may seem simple to just follow a tutorial, in reality things almost never work without any issues. I ran into many unforeseen issues when building and deploying my container, which allowed me to better understand that there is more depth to Docker than meets the eye. You need to consider how to handle sensitive information or how to handle permissions within the container. I'm just an amateur, so I can only imagine what else is important to know, but I can know that there is a lot I don't know and the only way to learn it is through experience.