In this note we demonstrate Receiver Operating Characteristic (ROC) Curves. We use a representative binary classification example where the aim is to effectively identify healthy individuals from sick patients using a threshold of some measured biochemical parameter (e.g., blood glucose level to diagnose diabetes patients). We hope that the interactive visualisation will make the concept easier to understand.
For a general introduction to ROC curves, please refer to Better decisions through Science by John A. Swets, Robyn M. Dawes and John Monahan [Scientific American, October 2000, pp. 82-87], which gives an interesting account of the subject. Further details on ROC/AUC analysis and related algorithms are discussed in An introduction to ROC analysis by Tom Fawcett [Pattern Recognition Letters, 27 (2006), pp. 861-874].
In the following demonstration, we assume that the measured parameter follows a normal distribution for both the sick patients and healthy individuals. It is possible to change the standard deviation and mean of each group by dragging the corresponding slider. This will update the ROC curve at the top. In general, different standard deviation and mean would correspond to different models. Within each model, you can change the threshold by moving the mouse cursor over the distribution plot. The corresponding point on the ROC curve is displayed as a data point. This point corresponds to the true positive and false positive probabilities at the given threshold.
In this example, we use the threshold to identify healthy individuals from sick patients, where healthy patients are assumed to have values above the given threshold. Hence, healthy individuals that have values above the threshold constitute true positives, and sick patients that have values above the threshold constitute false positives. Healthy individuals that have values below the threshold constitute false negatives, and finally, sick patients that have values below the threshold constitute true negatives. In the ROC curve, we only plot false positive and true positive probabilities.
git clone https://github.com/gyaikhom/roc.git
This implementation uses d3js for the visualisation. The probability calculations for the normal distributions are carried out using Abramowitz and Stengun approximation provided in Section 26.2.17 of Handbook of Mathematical Functions [National Bureau of Standards, June 1964, pp. 932].