← Back to Blog
Architecture

Directed Acyclic Graphs (DAGs) and topological sort

Understanding DAGs: why cycles are a problem, how topological sort determines task execution order, and how it unlocks parallelization.

📅 ✍️ Antoine Coulon
daggraphstopological-sorttask-orchestrationparallelization

Behind most systems that automate the execution of tasks (databases, CI/CD pipelines, data processing engines, compilers, runtimes) lies the same data structure: the directed acyclic graph, or DAG. It’s this structure, and the guarantee it provides, that makes possible optimizations as essential as parallelization or the reliable resolution of a dependency order. But you still need to understand why acyclicity isn’t a theoretical detail, and how you actually derive a concrete execution order from it.

What is a DAG?

A DAG is, quite simply, a directed graph that contains no cycle, no “loop”. Every edge has a direction, and by following those directions it’s impossible to return to a node you’ve already visited.

This absence of a cycle is precisely what distinguishes a DAG from any other directed graph. A cycle can form in two ways:

What’s the problem with cycles?

Cycles aren’t just a graph-theory curiosity: they introduce very concrete problems when resolving and interpreting a dependency graph.

The direct consequence is that a graph containing a cycle becomes, at best, partially interpretable, and at worst entirely impossible to resolve. How do you decide that A must run before B if B itself must run before A? The question has no answer. There are, of course, strategies to detect and break these cycles, but they all come with a non-negligible cost, in compute time as well as in implementation complexity.

That’s why guaranteeing acyclicity up front is so valuable: it turns a potentially unsolvable problem into one that always admits a solution.

Why DAGs are at the heart of task orchestration

One of the domains that relies most directly on DAGs, and not the least of them, is task orchestration. This covers a wide variety of systems:

Any system that automates task execution relies, directly or indirectly, on the properties a DAG offers. Without them, it would be deprived of a great many optimizations, parallelization first and foremost, which are often the very reason these systems are interesting in the first place.

A DAG-based execution model provides two fundamental guarantees: it ensures that all the dependencies a node requires are satisfied before executing it, and that the dependency order is strictly respected. These two properties are the foundation on which the correctness and efficiency of orchestration rest.

How do you determine the task execution order?

Once you have a DAG, the remaining task is to extract a usable execution order from it. Several DAG-specific algorithms can do this; one of the most popular is topological sort.

Topological sort produces a valid sequence of operations within a DAG, respecting the dependency order: for every “a node must precede another” dependency, the sequence places the first before the second.

Take a graph made of three nodes A, B, and C, where the notation A → B means “A depends on B”. With the dependencies A → B and B → C, topological sort tells us that C must run first, then B, then finally A.

By analyzing a graph’s nodes and their dependencies this way, you determine not only the execution order but also the nature of the applicable operation: sequential or parallel. In the example above, no parallelization is possible: A depends on B, which itself depends on C. Everything must unfold strictly sequentially. If, however, A depended on both B and C, but B and C were independent of each other, topological sort would reveal that B and C can be executed in parallel before moving on to A, exactly the kind of optimization the structure makes possible.

Topological sort in practice

A classic way to compute a topological sort is Kahn’s algorithm, which relies on each node’s in-degree, i.e. the number of dependencies not yet resolved. You start with the nodes that have no dependency, then “peel away” the graph:

type Graph = Map<string, string[]>;

/**
 * Returns a valid execution order (Kahn's algorithm).
 * Throws if the graph contains a cycle: it's precisely
 * the acyclicity property that guarantees an order exists.
 */
function topologicalSort(graph: Graph): string[] {
  const inDegree = new Map<string, number>();
  for (const node of graph.keys()) inDegree.set(node, 0);
  for (const deps of graph.values()) {
    for (const dep of deps) {
      inDegree.set(dep, (inDegree.get(dep) ?? 0) + 1);
    }
  }

  // Nodes with no incoming dependency are ready to be processed.
  const ready = [...inDegree].filter(([, d]) => d === 0).map(([n]) => n);
  const ordered: string[] = [];

  while (ready.length > 0) {
    const node = ready.shift()!;
    ordered.push(node);

    for (const dep of graph.get(node) ?? []) {
      const next = inDegree.get(dep)! - 1;
      inDegree.set(dep, next);
      if (next === 0) ready.push(dep);
    }
  }

  if (ordered.length !== graph.size) {
    throw new Error("Cycle detected: no topological order exists.");
  }

  return ordered;
}

The detail to remember isn’t so much the algorithm itself as what it reveals: if, at the end of the traversal, not all nodes have been ordered, it means some nodes’ in-degree never reached zero, in other words, a cycle. Cycle detection is therefore not a separate step: it falls out naturally from the algorithm. It’s the concrete illustration of the link between acyclicity and solvability.

Conclusion

The DAG isn’t an abstraction reserved for algorithms courses: it’s the silent foundation of everything that orchestrates tasks. Its central property, the absence of a cycle, is what turns a dependency graph into an always-solvable problem. Topological sort is its practical corollary: it translates this structure into a concrete execution order, while revealing opportunities for parallelization and flagging, where relevant, circular dependencies.

Understanding this mechanism means better grasping what happens under the hood of the compilers, CI/CD pipelines, and orchestrators we use every day, and, the day you design your own execution engine, knowing which guarantees to rely on.