The production mistake to avoid: PID1 in a container

You carefully handled your application’s termination signals, set up a flawless graceful shutdown, and yet in production your container still refuses to die cleanly: it takes ten seconds to go down, then gets forcibly killed by the orchestrator. The culprit isn’t your application code. It’s hiding one level lower, in the way your process is launched inside the container, and more precisely in the very peculiar role of PID1.

In the first part of this series, we saw why it’s essential to correctly handle termination signals (SIGTERM, SIGINT) to achieve a graceful shutdown. But that work rests on a prerequisite that often goes unspoken: your application process still has to actually receive those signals. It sounds obvious, and yet that’s exactly where everything falls apart.

The shell-form ENTRYPOINT trap

There are many ways to launch an application inside a container, and several of them end up propagating signals incorrectly. The most common example involves the ENTRYPOINT instruction of a Dockerfile.

Docker offers two forms for ENTRYPOINT (and CMD):

the shell form: ENTRYPOINT node index.js
the exec form: ENTRYPOINT ["node", "index.js"]

When the shell form is used, Docker doesn’t run your application directly. It first launches a shell (/bin/sh -c), and it’s that shell that becomes the container’s PID1. Your application, for its part, is started as a child process of that shell.

The problem is mechanical: the shell invoked this way does not forward the signals it receives to its children. When the orchestrator sends a SIGTERM to the container, it addresses it to PID1 (that is, the shell) and your application never sees it go by. All the graceful-shutdown code you wrote stays dead letter, and after the grace period the container is brutally killed by a SIGKILL.

One might be tempted to conclude that the ideal solution is simply to remove the intermediary between the containerized application and the orchestrator, by switching to the exec form so that the application itself becomes PID1. It’s an improvement, but it’s not the right line of reasoning.

The real problem isn’t the intermediary

The underlying problem isn’t that there’s an intermediary in the container. Nor is it solely about signal propagation. The heart of the matter is the very nature of PID1 and the responsibilities the Kernel assigns to it.

Correctly managing a containerized application in production follows a precise rule:

The “init” process (that is, PID1) must be able to correctly assume the responsibilities the Kernel entrusts to it.

To understand why, we need to step back for a moment to the fundamentals of an operating system and the organization of its processes.

PID1, the root of the process tree

On an operating system, the set of all processes is organized as a tree. The root node of that tree carries the identifier 1: this is PID1, commonly called “init”. All other processes descend from it, directly or indirectly.

Unlike a run-of-the-mill process, the “init” process is entrusted by the Kernel with some very particular responsibilities:

Initializing system services. It’s the one that starts all the processes required by the operating system at boot.
Adopting and cleaning up zombie and orphan processes. An orphan process is a process whose parent has terminated; it is then “re-adopted” by PID1. A zombie process is a process that has finished executing but whose entry persists in the process table as long as no one has read its exit code: its resources stay tied up. PID1’s role is to “reap” these zombies in order to free those resources.

In the specific context of containers, a third responsibility is added: the correct propagation of signals to child processes, precisely what was lacking with the shell form.

Why your application must not be PID1

We can therefore see why making your application PID1 is not the right solution. Most application runtimes (Node.js, Python, the JVM…) were never designed to assume the role of init. They don’t implement zombie-process reaping, and their default signal handling differs from what’s expected of a true init.

Concretely, if your application is PID1:

the zombie processes spawned by sub-processes it might have launched (a system call, a script, an external tool) will never be cleaned up, and will accumulate until they saturate the container’s process table;
signal semantics can be surprising: the Kernel applies a different default treatment to PID1 (certain signals not explicitly handled are simply ignored instead of causing the process to terminate).

None of these responsibilities can, or should, rest on your application. So you need a real init process at the top of the tree, and you attach your application to it as a child process. Not to add yet another intermediary, but to entrust the role of PID1 to a program built for it.

The solution: Tini

Several solutions exist to provide this minimal init process, but the reference in the field is Tini.

Tini has a single objective: to provide an “init” process that behaves exactly as you’d expect from a PID1. It does three things, and it does them well:

it correctly forwards the signals it receives to your application;
it reaps zombie processes to keep them from accumulating;
it stays extremely lightweight, without imposing any application logic.

It’s a standalone executable, but (and this is important) it’s been bundled by default in Docker since version 1.13. You can therefore enable it without even installing it:

on the command line, with the docker run --init flag;
with Docker Compose (since v2.2), by adding init: true to a service’s configuration.

Explicit integration in a Dockerfile

If you prefer to explicitly control the init process in your image (for example, to avoid depending on the --init flag at runtime) you can install Tini and use it directly as the ENTRYPOINT, in exec form:

# Dockerfile

FROM node:18-alpine

RUN apk add --no-cache tini

# Copy app files...

ENTRYPOINT ["/sbin/tini", "--", "node", "index.js"]

Two details are worth highlighting in this ENTRYPOINT:

the exec form (["...", "..."]) is used, ensuring that no intermediate shell comes between the two;
Tini becomes PID1 and, thanks to the -- separator, launches node index.js as a child process. From now on it’s Tini that receives the orchestrator’s signals and relays them cleanly to your application, while also taking care of cleaning up zombies.

Conclusion

Crafting your application’s graceful shutdown is essential, but it’s a wasted effort if the signals never reach it. And in a container, the responsibility for receiving, handling, and propagating those signals, just like that of cleaning up zombie processes, falls to PID1. And your application isn’t built for that role.

The good practice fits in one sentence: entrust PID1 to a real init process. With Docker, it’s often within reach of a simple --init, or a Tini ENTRYPOINT in exec form in your Dockerfile. A seemingly trivial detail, but one that makes all the difference between a container that shuts down cleanly and another that gets forcibly killed on every deployment.