Strongly connected components

July 29, 2014

I recently came across the interesting problem of dependency management in a build system. The basic idea is this:

If you are given a set of software modules to build, where a module could depend on other modules, how would you define an algorithm for building all of the modules by satisfying the module dependencies.

I have thought about this problem before, in relation to package management in Linux distributions, but didn't spend the time to explore the underlying algorithms. This article explores three fundamental algorithms that concerns detection of cyclic dependencies.

Representing dependencies

When there are no module dependencies, we can do a linear sweep through all of the modules, building them one after the other. When there are non-cyclic dependencies, we can use a depth-first search algorithm to traverse the dependency graph, building a module when all of the modules it depends on have already been built, recursively branching out when there are unmet dependencies. What is interesting about this problem is the special case when there are cyclic dependencies. The following is an example of cyclic dependencies:

Example graph with cyclic dependencies

Example graph with cyclic dependencies

Formally, the dependencies between modules can be represented by a directed graph. In the algorithms that follow, we use an adjacency list representation of the dependencies, as opposed to an adjacency matrix representation. This allows us to efficiently traverse all of the connected modules starting from a given module. The following is a simple representation of the dependencies in our example above:

struct vertices_s {
    int deg;
    int adj[MAX_DEGREE];
} vertices[] = {
    {3, {2, -3, 4}},
    {2, {-1, 3}},
    {3, {1, -2, 7}},
    {3, {-1, -5, 6}},
    {2, {4, -7}},
    {3, {-4, 7, -8}},
    {4, {-3, 5, -6, -12}},
    {3, {6, -9, 11}},
    {2, {8, -10}},
    {3, {9, -11, -12}},
    {3, {-8, 10, 12}},
    {3, {7, 10, -11}}
}

Please note that to simplify the code, we have decided not to encapsulate the above data structure inside a graph structure, and so on. Furthermore, we use static arrays rather than dynamic memory allocation to simplify the code. Any practical implementation should do proper encapsulation and dynamic memory allocation. In this article, our aim is to distill down to the gist of the algorithms.

To represent the dependency graph, we first label each of the vertices sequentially starting at 1. As it will become clear later on, we do not start labelling at 0 so that we can use the sign of the label to encode directionality of the edges. For the entire graph, we maintain an array of vertices, where for each vertex we store first the degree (the number of edges that connect to the vertex) and the corresponding adjacency list. The adjacency list is itself stored as an array of vertex labels. Hence, the first array element:

    {3, {2, -3, 4}},

corresponds to the vertex with label 1 (note that array indexing begins at 0). It means that vertex 1 is touched by 3 incoming or outgoing edges. The adjacent vertices that correspond to each of these edges are 2, 3 and 4. Where the label is negative, we have an incoming edge emanating from that vertex. Hence, (1, 2), (1, 4) and (3, 1) are valid edges of the graph; whereas, (1, 3) is not a valid edge.

Cyclic dependency

So, how do we deal with cyclic dependencies? Well, what do we mean by a cyclic dependency. Two modules are cyclically dependent if there is a dependency path that leads from the first module to the second, and vice versa. For instance, in the example graph above, 1, 2 and 3 are cyclic dependent. If we were to do a recursive dependency expansion on any one of these modules, we will get into an infinite expansion loop. This is the reason why it is very important to be able to detect cyclically dependent modules.

Formally, a cyclic dependency is defined by what is know as a strongly connected component of a graph. A strongly connected component is a sub-graph of the original graph such that there is a path from each of the vertices in the sub-graph to every other vertex in the sub-graph. Hence, in our example graph above, we have three strongly connected components:

[1, 2, 3]
[4, 5, 6, 7]
[8, 9, 10, 11, 12] 

A directed graph where all of its strongly connected components are of order one is an acyclic directed graph. In other words, if all of the strongly connected components of a directed graph have only one vertex, there are no cycles in the graph. The order of a graph gives the number of vertices in the graph.

Translating this to our module dependency problem, we can detect cyclic dependencies by identifying strongly connected components that have more than one module in them.

The algorithms

There are three algorithms for identifying strongly connected components in a directed graph. These are, in chronological order of publication:


The following implementations include our dependency graph above as an example use case. Personally, Tarjan's algorithm felt quite natural to implement as I was able to easily track the steps using pen-and-paper. With Kosaraju's algorithm, the trick was to invert the sign of the labels in the adjacency list to carry out the second depth-first search without actually creating a transpose of the original graph. Dijkstra's algorithm was slightly more complicated as it uses two stacks. Tracking the steps with pen-and-paper was a challenge.

I hope that you found this article useful.

Tarjan's algorithm

Dijkstra's algorithm

Kosaraju's algorithm

comments powered by Disqus