Implementing the Fuzzy c-Means algorithm

March 16, 2013

A clustering algorithm organises items into groups based on a similarity criteria. The Fuzzy c-Means algorithm is a clustering algorithm where each item may belong to more than one group (hence the word fuzzy), where the degree of membership for each item is given by a probability distribution over the clusters.

Fuzzy c-Means Algorithm

The fuzzy c-means (FCM) algorithm is a clustering algorithm developed by Dunn, and later on improved by Bezdek. It is useful when the required number of clusters are pre-determined; thus, the algorithm tries to put each of the data points to one of the clusters. What makes FCM different is that it does not decide the absolute membership of a data point to a given cluster; instead, it calculates the likelihood (i.e., the degree of membership) that a data point will belong to that cluster. Hence, depending on the accuracy of the clustering that is required in practice, appropriate tolerance measures can be put in place. Since the absolute membership is not calculated, FCM can be extremely fast because the number of iterations required to achieve a specific clustering exercise corresponds to the required accuracy.

Iterations

In each iteration of the FCM algorithm, the following objective function \(J\) is minimised:

\begin{equation}J = \sum_{i=1}^{N}\sum_{j=1}^{C}\delta_{ij}\parallel x_i-c_j\parallel^2\end{equation}

Here, \(N\) is the number of data points, \(C\) is the number of clusters required, \(c_j\) is the centre vector for cluster \(j\), and \(\delta_{ij}\) is the degree of membership for the \(i\)th data point \(x_i\) in cluster \(j\). The norm, \(\parallel x_i-c_j\parallel\) measures the similarity (or closeness) of the data point \(x_i\) to the centre vector \(c_j\) of cluster \(j\). Note that, in each iteration, the algorithm maintains a centre vector for each of the clusters. These data-points are calculated as the weighted average of the data-points, where the weights are given by the degrees of membership.

Degree of membership

For a given data point \(x_i\), the degree of its membership to cluster \(j\) is calculated as follows:

\begin{equation}\delta_{ij} = {1 \over {\sum_{k=1}^{C}{\left({\parallel x_i-c_j\parallel \over \parallel x_i-c_k\parallel}\right)}^{2 \over m-1}}}\end{equation}

where, \(m\) is the fuzziness coefficient and the centre vector \(c_j\) is calculated as follows:

\begin{equation}c_j = { {\sum_{i=1}^{N}\delta_{ij}^m}.x_{i} \over {\sum_{i=1}^{N}\delta_{ij}^m } }\end{equation}

In equation (3) above, \(\delta_{ij}\) is the value of the degree of membership calculated in the previous iteration. Note that at the start of the algorithm, the degree of membership for data point \(i\) to cluster \(j\) is initialised with a random value \(\theta_{ij}\), \(0 \le \theta_{ij} \le 1\), such that \(\sum_{j}^{C}\delta_{ij} = 1\).

Fuzziness coefficient

In equations (2) and (3) the fuzziness coefficient \(m\), where \(1 < m < \infty\), measures the tolerance of the required clustering. This value determines how much the clusters can overlap with one another. The higher the value of \(m\), the larger the overlap between clusters. In other words, the higher the fuzziness coefficient the algorithm uses, a larger number of data points will fall inside a fuzzy band where the degree of membership is neither 0 nor 1, but somewhere in between.

Termination condition

The required accuracy of the degree of membership determines the number of iterations completed by the FCM algorithm. This measure of accuracy is calculated using the degree of membership from one iteration to the next, taking the largest of these values across all data points considering all of the clusters. If we represent the measure of accuracy between iteration \(k\) and \(k+1\) with \(\epsilon\), we calculate its value as follows:

\begin{equation}\epsilon = \Delta_{i}^{N}\Delta_{j}^{C} | \delta_{ij}^{k+1} - \delta_{ij}^{k} |\end{equation}

where, \(\delta_{ij}^{k}\) and \(\delta_{ij}^{k+1}\) are respectively the degree of membership at iteration \(k\) and \(k+1\), and the operator \(\Delta\), when supplied a vector of values, returns the largest value in that vector.

Acknowledgement

This implementation was inspired by Matteo Matteucci's description of the algorithm.

Errata

Tue Sep 30 19:35:10 GMT 2014 - Steve pointed out an error in the interval specification for the fuzziness coefficient, where the lower bound should be open instead of closed. This is now fixed.

Wed Nov 13 20:22:26 GMT 2013 - HyunJun Park pointed out missing fuzziness in equation (3). This is now fixed. Fortunately, the program source code did not contain this error.

Source code

git clone https://github.com/gyaikhom/fcm.git
CWEB Documentation: fcm.pdf (Size 337.3 KB)

Addendum

Since I published this note, I have received a few requests for assistance with specific problems. Unfortunately, due to my schedule I am unable to be of much help. Furthermore, my aim in writing this note was to help understand how the algorithm works. I believe that a good understanding of the algorithm should sufficiently allow adapting this implementation to specific domains.

I feel that most of the difficulty seems to arise due to a lack of understanding of what a distance function is. If this is the case, please research similarity metric, which are domain specific concepts for measuring the distance between two data points. In this note, I used points in the Euclidean space as they are easier to understand. If you are clustering other entities (such as documents, users profiles etc.) you should define an appropriate distance metric which quantifies the similarity between any two entities. There are several resources available on the Internet on these topics. I hope this is of some help.

comments powered by Disqus