I recently came across the interesting problem of dependency management in a build system. The basic idea is this:

If you are given a set of software modules to build, where a module could depend on other modules, how would you define an algorithm for building all of the modules by satisfying the module dependencies.

I have thought about this problem before, in relation to package management in Linux distributions, but didn't spend the time to explore the underlying algorithms. This article explores three fundamental algorithms that concerns detection of cyclic dependencies.

## Representing dependencies

When there are no module dependencies, we can do a linear sweep
through all of the modules, building them one after the other. When
there are non-cyclic dependencies, we can use a *depth-first search*
algorithm to traverse the dependency graph, building a module when all
of the modules it depends on have already been built, recursively
branching out when there are unmet dependencies. What is interesting
about this problem is the special case when there are cyclic
dependencies. The following is an example of cyclic dependencies:

Formally, the dependencies between modules can be represented by a directed graph. In the algorithms that follow, we use an adjacency list representation of the dependencies, as opposed to an adjacency matrix representation. This allows us to efficiently traverse all of the connected modules starting from a given module. The following is a simple representation of the dependencies in our example above:

```
struct vertices_s {
int deg;
int adj[MAX_DEGREE];
} vertices[] = {
{3, {2, -3, 4}},
{2, {-1, 3}},
{3, {1, -2, 7}},
{3, {-1, -5, 6}},
{2, {4, -7}},
{3, {-4, 7, -8}},
{4, {-3, 5, -6, -12}},
{3, {6, -9, 11}},
{2, {8, -10}},
{3, {9, -11, -12}},
{3, {-8, 10, 12}},
{3, {7, 10, -11}}
}
```

Please note that to simplify the code, we have decided not to
encapsulate the above data structure inside a graph structure, and so
on. Furthermore, we use static arrays rather than dynamic memory
allocation to simplify the code. Any practical implementation should
do proper encapsulation and dynamic memory allocation. In this
article, our aim is to distill down to the *gist* of the algorithms.

To represent the dependency graph, we first label each of the vertices
sequentially starting at 1. As it will become clear later on, we do
not start labelling at 0 so that we can use the sign of the label to
encode directionality of the edges. For the entire graph, we maintain
an array of vertices, where for each vertex we store first the
*degree* (the number of edges that connect to the vertex) and the
corresponding adjacency list. The adjacency list is
itself stored as an array of vertex labels. Hence, the first array
element:

```
{3, {2, -3, 4}},
```

corresponds to the vertex with label `1`

(note that array indexing
begins at 0). It means that vertex `1`

is touched by 3 incoming or
outgoing edges. The adjacent vertices that correspond to
each of these edges are `2`

, `3`

and `4`

. Where the label is negative,
we have an incoming edge emanating from that vertex. Hence, `(1, 2)`

,
`(1, 4)`

and `(3, 1)`

are valid edges of the graph; whereas, `(1, 3)`

is not a valid edge.

## Cyclic dependency

So, how do we deal with cyclic dependencies? Well, what do we mean by
a *cyclic dependency*. Two modules are cyclically dependent if there
is a dependency path that leads from the first module to the second,
and vice versa. For instance, in the example graph above, `1`

, `2`

and
`3`

are cyclic dependent. If we were to do a recursive dependency
expansion on any one of these modules, we will get into an infinite
expansion loop. This is the reason why it is very important to be able
to detect cyclically dependent modules.

Formally, a cyclic dependency is defined by what is know as a
*strongly connected component* of a graph. A strongly connected
component
is a sub-graph of the original graph such that there is a
path from each of the vertices in the sub-graph to every other vertex
in the sub-graph. Hence, in our example graph above, we have three
strongly connected components:

```
[1, 2, 3]
[4, 5, 6, 7]
[8, 9, 10, 11, 12]
```

A directed graph where all of its strongly connected components are of
*order* one is an acyclic directed
graph. In other
words, if all of the strongly connected components of a directed graph
have only one vertex, there are no cycles in the graph. The *order* of a
graph gives the number of vertices in the graph.

Translating this to our module dependency problem, we can detect cyclic dependencies by identifying strongly connected components that have more than one module in them.

## The algorithms

There are three algorithms for identifying strongly connected components in a directed graph. These are, in chronological order of publication:

The following implementations include our dependency graph above as an
example use case. Personally, Tarjan's algorithm felt quite natural to
implement as I was able to easily track the steps using
pen-and-paper. With Kosaraju's algorithm, the trick was to invert the
sign of the labels in the adjacency list to carry out the second
depth-first search without actually creating a
transpose of the
original graph. Dijkstra's algorithm was slightly more complicated
as it uses two stacks. Tracking the steps with pen-and-paper was
a challenge.

I hope that you found this article useful.