Example 2: Applying cluster analysis to outlier detection
Cluster analysis is a statistical technique for grouping data points based on a multiple attributes, such that the data points within the same group are more similar to one another than those in other groups (clusters). In addition to grouping, you can also apply cluster analysis to outlier detection. If a particular cluster has very few data points, it indicates that those data points are dissimilar to most other data points, and may be outliers.
For example, when analysing market trends over time, you may expect similar companies to have related changes in the stock price, as they are impacted by similar market forces. Days where the change in direction or magnitude of a particular stock differs greatly from others may be classified as "outlier days," as this could be an indication that a particular event impacted that company without impacting the market in general.
When analysing tech stocks, you can identify days where notable events may have occurred by selecting the clusters with the fewest number of days. In the example below, you can find this information in the “Days per Cluster” visualization. With selection (through set actions), an end user can interactively explore normal versus abnormal days.