Implementing a Beeswarm plot

April 05, 2013

A Beeswarm plot is a two-dimensional visualisation technique where data points are plotted relative to a fixed reference axis so that no two datapoints overlap. The beeswarm plot is a useful technique when we wish to see not only the measured values of interest for each data point, but also the distribution of these values. An example beeswarm plot is show below:

Example Beeswarm plot

Example Beeswarm plot

Properties of a beeswarm plot

For a set of data points with measurements in two-dimensions, say against the x and y axes, let us assume that the x-axis is plotted horizontally, with values increasing from the left to the right of the axis. Similarly, let us assume that the y-axis is plotted vertically, with values increasing from the bottom to the top.

To make a beeswarm plot, we must first choose an axis that we are interested in. For instance, we are interested in the y-axis values. A beeswarm plot, then, has the following properties:

  • All of the data points in the plot are non-overlapping; hence, they are all visible.
  • The vertical placement of the data point reflects its y-axis value.
  • The horizontal placement, however, is arbitrary and is chosen as such to prevent overlapping. It has no bearing on the x-axis value.
  • The preferred choice is to place a data point closest to the fixed axis.

An incremental algorithm

We describe here an incremental algorithm which plots each data point one after the other. Once a data point has been placed on the plot, it is not altered by subsequent data point placements. We shall refer to this set of plotted data points as the swarm. In other words, building a beeswarm plot involves growing the swarm from an empty set until it includes all of the data points.

This algorithm can be described as follows:

  • First, order the data points by sorting them using the value we are interested in. In our example, we sort the data points in increasing order of y-axis values.

  • Then, for each of the data points, we choose an x-axis value that satisfies the beeswarm properties.

To help us choose an x-axis value for each of the data points, we maintain a data structure referred to as the swarm_boundary. This data structure is simply a subset of the swarm. By using the swarm_boundary, we can efficiently decide if a chosen x-axis value will result in an overlap with any member of the swarm. Furthermore, we also use this data structure to generate a list of possible x-axis values that we can choose from to place the next data point.

Finding possible placements

In our example, we are mainly concerned that the vertical placement of a data point reflects its y-axis values (i.e., its height on the plot). We are less concerned with the horizontal placement, as long as it does not violate the beeswarm properties. Thus, all of the possible placement candidates trace a line that is parallel to the x-axis, and the height of this line corresponds to the y-axis value of the data point to be placed. We shall refer to this line as the placement line.

Valid and invalid placement

Valid and invalid placement

Assuming that each swarm member is plotted as a circle of a certain radius, we can quickly find for each member of the swarm the value interval on the placement line that we must avoid to prevent an overlap. Function find_intersections(circle, height) determines this interval for a swarm member circle on the placement line at height. Since there will be multiple intervals for each of the swarm members, we mark each interval with the associated member index. This will be used later to discard invalid intervals.

Finding invalid interval

Finding invalid interval

    function find_intersections(circle, height) {
        var effective_height = height - circle.y,
        diameter = 2 * circle.radius;
        if (effective_height - diameter > 0)
            return undefined;

        var cx = circle.x, x = Math.sqrt(diameter * diameter
            - effective_height * effective_height), index = circle.index;
        return {
            'p1': {
                'index': index,
                'isEnd': false,
                'isValid': true,
                'x': cx + x,
                'y': height
            },
            'p2': {
                'index': index,
                'isEnd': false,
                'isValid': true,
                'x': cx - x,
                'y': height
            }
        };
    }

Function find_candidate_intervals(height, swarm_boundary) determines all of the x-value intervals that must be avoid for each of the swarm members. Since, the swarm_boundary represents the overlap boundary for new data points, we only need to find the intervals for members of the swarm boundary to prevent an overlap with all members of the swarm. Furthermore, if a swarm boundary member does not intersect with a placement line, we can remove it from the swarm boundary as it will never intersect with future placement lines. This is a consequence of the sorting we did before we began data point placement.

    function find_candidate_intervals(height, swarm_boundary) {
        var i = 0, c = swarm_boundary.length, possible_intervals = [];
        while (c--) {
            var isects = find_intersections(swarm_boundary[i], height);
            if (isects === undefined) {
                swarm_boundary.splice(i, 1);
                continue;
            }
            possible_intervals.push(isects.p1);
            possible_intervals.push(isects.p2);
            ++i;
        }
        return possible_intervals;
    }

We use function get_comparator(p, q) to generate a sorting comparator that compares two properties. This will return a function that compares the values of property p. Where the values of p are the same, it will then compare the values of property q, if property q is defined.

    function get_comparator(p, q) {
        return function(a, b) {
            if (a[p] === b[p]) {
                if (q === undefined)
                    return 0;
                else {
                    if (a[q] === b[q])
                        return 0;
                    if (a[q] < b[q])
                        return -1;
                    return 1;
                }
            }
            if (a[p] < b[p])
                return -1;
            return 1;
        };
    }
Merging valid intervals

Merging valid intervals

Once we have identified all of the x-value intervals to avoid for each member of the swarm boundary, we must merge these intervals so that only the valid placement candidates are left. We do this using the function remove_invalid_intervals(intervals), which accepts a list of invalid x-value intervals and returns a set of acceptable x-value intervals. Note that the bounds of valid intervals are marked with isValid = true, and the upper bound for each interval is marked with isEnd = true.

    function remove_invalid_intervals(intervals) {
        var c = intervals.length, valid_intervals = [];
        if (c < 1)
            return valid_intervals;

        var i, j, k = c - 1;
        intervals.sort(get_comparator('x', 'index'));
        for (i = 0; i < k; ++i) {
            if (intervals[i].isEnd)
                continue;
            for (j = i + 1; j < c; ++j) {
                if (intervals[i].index === intervals[j].index) {
                    intervals[j].isEnd = true;
                    break;
                } else
                    intervals[j].isValid = false;
            }
        }
        for (i = 0; i < c; ++i)
            if (intervals[i].isValid)
                valid_intervals.push(intervals[i]);
        return valid_intervals;
    }

Once we have a list of acceptable placement intervals, we can choose any value within these intervals. However, because of the beeswarm property, we wish to place a data point as close as possible to the fixed axis of reference. Function choose_x(intervals, xaxis) returns a placement that is closest to the fixed axis.

    function choose_x(intervals, xaxis) {
        var i, c = intervals.length, distance = [];
        for (i = 0; i < c; ++i) {
            distance.push({
                'i': i,
                'd': Math.abs(xaxis - intervals[i].x)
            });
        }
        distance.sort(get_comparator('d'));
        return intervals[distance[0].i].x;
    }

Function get_x(index, datum, swarm_boundary, xaxis) returns the placement for a data point. It accepts a datapoint index that helps us uniquely identify the data point, the datum to plot (which contains the required y-axis value), the current swarm boundary and the fixed axis of reference. When a placement has been chosen, we grow the swarm by adding the data point.

Since we wish to place a data point as close as possible to the fixed axis, our preferred choice is to place it at the point of intersetion between the fixed axis and the placement line for that point. This is the reason why we include this preferred interval to the set of acceptable x-value intervals. Of course, if the preferred interval turns out to be unacceptable because of an overlap, it will be discarded by remove_invalid_intervals(intervals).

    function get_x(index, datum, swarm_boundary, xaxis, radius) {
        var x, y = datum.y,
        isects = find_candidate_intervals(y, swarm_boundary),
        preferred_choice = {
            'index': index,
            'isEnd': false,
            'isValid': true,
            'x': xaxis,
            'y': y
        };
        isects.push(preferred_choice);
        isects.push(preferred_choice);
        isects = remove_invalid_intervals(isects);
        x = choose_x(isects, xaxis);
        swarm_boundary.push({
            'index': index,
            'x': x,
            'y': y,
            'radius': radius
        });
        return x;
    }

Function Beeswarm(data, xaxis, radius) encapsulates the beeswarm module. When instantiated, it builds a swarm from the data, where the data points are aligned relative to the fixed vertical axis at x = xaxis. Each of the data points will be plotted as a circle with the specified radius. By specifying a radius that is larger than the radius of the plotted data points, we can introduce a minimum distance between data points.

Finally, the prototype function swarm() returns the swarm associated with a Beeswarm instance. The data points supplied during instantiation, i.e., data, is never altered.

    Beeswarm = function(data, xaxis, radius) {
        this.data = data;
        this.xaxis = xaxis;
        this.radius = radius;
    }

    Beeswarm.prototype = {
        swarm: function() {
            var me = this, swarm = [], swarm_boundary = [],
            xaxis = me.xaxis, radius = me.radius,
            data = me.data, i, c = data.length;
            data.sort(get_comparator('y'));
            for (i = 0; i < c; ++i)
                swarm.push({
                    'x': get_x(i, data[i], swarm_boundary, xaxis, radius),
                    'y': data[i].y
                });
            return swarm;
        }
    };

Source code

git clone https://github.com/gyaikhom/beeswarm.git

Example usage

In the following example, plot() is a function which actually renders the data points. We have not included this in the source.

    var beeswarm, num_data = 200,
        min = 10, max = 300,
        xaxis = 200, radius = 3
        dataset = [], i;
        
    for (i = 0; i < num_data; ++i)
        dataset.push({
            'x': undefined,
            'y': Math.floor(Math.random() * (max - min + 1)) + min
        });

    beeswarm = new Beeswarm(dataset, xaxis, radius);
    plot(beeswarm.swarm());

The example Beeswarm plot at the top of the article was generated using a plot() function implemented in d3js.

comments powered by Disqus