At Structural, we use Docker to automate the deployment and scaling of our infrastructure. From that experience, we’ve learned a few things about building rad applications in the cloud.

Ship Your Dockerfiles With Your Project

Dockerfiles fundamentally have a hard dependency on your source code. As such, they should ship with that source code; not stored elsewhere like in a separate “infrastructure” github repository. Preferably, they should even be in the root directory of your project, but that’s a less interesting recommendation.

Your Basic Dockerfile Format

Most Dockerfiles you’ll write will follow this general format:

FROM a version-tagged image…
# Copy your source code in…
# Start your app.

I’ve seen Dockerfiles that clone their source code from Github during the build process. This is usually unnecessary and complicates things. If your Dockerfile is shipped with your source code, then logically any machine which has the Dockerfile already has your source code.

We’ll get more complicated as we go on. I promise.

What Doesn’t Go In A Dockerfile?

  • Secrets. Any super special secrets should be provided by the environment (as envars or otherwise). You don’t commit secrets to Github, and you shouldn’t put them in your Dockerfile.

  • Stage-Specific Config. Don’t include anything like ENV NODE_ENV production. Your Dockerfiles should always produce an image that can be shared between any environment or stage. Anything stage-specific should be provided by the environment; as a simple example, you can provide it as an argument to docker run -e .

Copy Dependencies In?

You’re building a node app. Do you copy node_modules in with the rest of your code, or do you npm install during the build process?

In general, I prefer running npm install inside the build.

  • It’s simpler. If you copy node_modules into the build, you’ll need to push additional configuration into your build environment. You might need to add additional steps to the build process, like a production prune and a native module rebuild.

  • Your builds aren’t really faster. You’re probably not storing node_modules in the git repo. You’re probably using CI. Thus, your CI environment will have to run npm install if you want to copy dependencies in. If your choice is “run it in the environment” or “run it in the build”, there’s little performance difference.

  • You’re still dependent on npm. For the same reason as the last point. If you use a clean CI environment, someone has to run npm install.

  • Caching is mostly unaffected. One argument for copying dependencies from the environment is that docker can cache the intermediate image after a COPY if dependencies haven’t changed, but it can never cache a RUN. This is a great point, with one issue: most of the time, a step soon after installing dependencies is copying your source code in, which should always miss the cache.

This advice only applies to projects which don’t ship dependencies with the git repo.

Ship Build Artifacts?

Let’s say you’ve got a Go app. Where do you run go build ; outside or inside of your docker build? There are two schools of thought on this.

  • Running go build outside the build means you get a ridiculously small docker image. You could even use FROM scratch, which means the resulting image is only slightly larger than the size of the binary.

  • Running go build inside the build means your app is 100% reproducible from the source code with just a Dockerfile.

I prefer running it inside the build, for the same reasons I like running npm install inside the build; it means everything I need to get the app built is inside the Dockerfile.

That being said, even Google has examples of doing go build outside of the Dockerfile, so this isn’t a hard recommendation.

Cache Cash

Quiz time. Which of the following partial Dockerfiles is better?

FROM ubuntu
COPY package.json .
RUN npm i --production
COPY src .

or

FROM ubuntu
COPY src package.json .
RUN npm i --production

Its the first one. Docker can cache intermediate images resulting from a COPY if the content of the files you copy doesn’t change. If I only change something in src, the first Dockerfile allows Docker to cache the first three steps, including the expensive npm i. In the second one, any change to any part of the source code forces a new npm i.

Don’t Fear Large Images

Docker makes really great use of image layering and caching. Starting with a large image like node isn’t ideal, but it does make your life easier and pushes a lot of that image maintenance to people who probably know a bit more about it than you do.

The trade-off is a large image upload to your Docker registry, but you only eat most of that cost during the first push. Thanks to image layer caching, every subsequent push should be much smaller, mostly dependent on the size of your app.

If you’re shipping a static binary, such as one from Go, you might be tempted to use the scratch image. I’d suggest avoiding this; its better to go with something like busybox which is still pretty tiny but also gives you utilities to debug the container as it runs if you ever need to.

Tagging

How should you tag images into your repository? I have a few thoughts on this.

  • Use latest with caution. Its useful for convenience, but it is highly important that you always know what “version” of your app is running on your host (whatever “version” means in your app). latest doesn’t allow you to do that.

  • I like using a simple commit hash. You can get the latest by saying git rev-parse HEAD.

  • Let’s say you want to deploy uncommitted changes to a staging environment. This commit hash now can’t adequately represent what is inside the image. First, you can determine if there are uncommitted changes by doing a zero-length string check on git status -s. Then, I’d personally use something like "staging_$(date -u +"%Y%m%dT%H%M%SZ")", which will give you a docker-safe ISO-8601-like string.

How Do I Get This Image Hosted?

This should be a whole article on its own. But here are a couple of my favorite services you can start looking into now.

  • Heroku: Offers pretty complete hosting for Docker images.
  • Hyper: Simpler and cheaper than Heroku.
  • Kubernetes and (optionally) GKE: This is the game changer. Look forward to another article about it!

Docker Is Awesome!

With the right tooling around it, Docker makes deploying and scaling applications highly consistent and reproducible. Every type of app can benefit from Docker, from monoliths to a service-oriented-architecture. Embrace the hype!