Continuous integration/Docker/Dockerfiles

From mediawiki.org

Dockerfile syntax is simple. The syntactic simplicity hides many pitfalls for the unwary. This page outlines some best practices for Dockerfile creation and the building or Docker images.

Other useful guides[edit]

Keep images lean[edit]

Don't add packages to an image that are not needed to run a container. If you do need to troubleshoot a container, troubleshooting tools can be added at container runtime. For instance, adding a text editor to a base image would be a bad idea.

Minimize image layers[edit]

In general, each command in your Dockerfile creates a layer in the image cache which increases the VirtualSize of your image and the resulting containers.

Subsequent layers cannot reclaim any size added in the previous layer. It is best practice to keep the number of layers to a minimum and reclaim resources within the layer on which they are created.

Consider the following examples:

Warning Warning: Don't do this:
FROM docker-registry.wikimedia.org/wikimedia-jessie:latest
RUN apt-get update && \
    DEBIAN_FRONTEND=noninteractive apt-get install --yes git
RUN git clone --depth 1 https://gerrit.wikimedia.org/r/integration/composer.git /srv/composer
RUN apt-get update && \
    DEBIAN_FRONTEND=noninteractive apt-get install --yes ca-certificates && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

vs

FROM docker-registry.wikimedia.org/wikimedia-jessie:latest
RUN apt-get update && \
    DEBIAN_FRONTEND=noninteractive apt-get install --yes git ca-certificates && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/* && \
    git clone --depth 1 https://gerrit.wikimedia.org/r/integration/composer.git /srv/composer

When you inspect the VirtualSize of the images resulting from these Dockerfiles you see that the image that has more layers is larger by about 12MB:

$ printf "Fewer layers:\t%sMB\nMore layers:\t%sMB\n" \
→ $(units -t $(docker inspect --format '{{.VirtualSize}}bytes' example/good:layers) megabytes) \
→  $(units -t $(docker inspect --format '{{.VirtualSize}}bytes' example/bad:layers) megabytes)
Fewer layers:   169.92271MB
More layers:    181.51308MB

The size discrepancy is due to example/bad:layers having an intermediate layer where all the apt information still exists. This can be seen easily with the dockviz tool:

$ dockviz images -n -t docker-registry.wikimedia.org/wikimedia-jessie:latest
└─sha256:a81cc7ec7998d634eb89e76caa53aad876bf4cfc92bd3953e8c57ed0350cf322 Virtual Size: 80.4 MB Tags: docker-registry.wikimedia.org/wikimedia-jessie:latest
  ├─sha256:41eb6982c569e7da13bab1896f312ef8e17503bb96390a56153dc7b48a93a588 Virtual Size: 169.9 MB Tags: example/good:layers
  └─sha256:71db73ed96ed52fd3ba1ed3e66a8b25fc83e8c5b36326d4d7d950538e779a48e Virtual Size: 171.6 MB
    └─sha256:a5e177cacca18ea56dd5e24db6de2df57df850c002dd8a062c0804cdcdfbdcf2 Virtual Size: 181.5 MB
      └─sha256:b49b5667a3bd49cd6642e3f77f8cf2b8f301bab7da251144e0d7dc6fbf2334af Virtual Size: 181.5 MB Tags: example/bad:layers

The image layer cache is not your friend[edit]

There is much unintuitive behavior that results from using the layer cache. In general, Docker will step through each instruction in your Dockerfile and search the layer cache for a layer created using the same instruction from your Dockerfile. For COPY instruction docker will also compare the contents of the file being copied to the file created by the COPY instruction in the layer cache.

The consequences of this behavior are not always immediately evident. For instance, if there were a security fix available in Debian for the latest version of git rebuilding an image from a Dockerfile that included RUN apt-get update && apt-get install git would not be sufficient to ensure that the version of the git package with the Debian security fix is contained in the resulting image. Running a docker build using the --no-cache option is the easiest way to ensure that the layer cache on a machine is not creating unintended consequences.

Prefer COPY to ADD[edit]

From Dockerfile Best Practices guide:

Although ADD and COPY are functionally similar, generally speaking, COPY is preferred. That’s because it’s more transparent than ADD. COPY only supports the basic copying of local files into the container, while ADD has some features (like local-only tar extraction and remote URL support) that are not immediately obvious. Consequently, the best use for ADD is local tar file auto-extraction into the image, as in ADD rootfs.tar.xz /.