Using purity fields to better understand your clusters

In the Paris release when you create a new Predictive Intelligence cluster solution definition you can select up to three purity fields to calculate purity for your cluster. The purity measurement is helpful to give us details on the composition of each cluster. In the below example, we select assignment group and category as our purity fields. Our goal is to understand what percentage of incidents in the cluster have the same assignment group and the same category.

In the below cluster tree map plot when you hover over the cluster “request support computer incident” our purity fields are displayed. The purity measure is saying that 86% of the incidents in this cluster have the same category and 23% have the same assignment group.

This tells us a great deal. When looking at the assignment group purity metric it tells us only 23% of the incidents in this cluster have the same assignment group. We can infer that we have several assignment groups handling computer request or computer support issues for this particular cluster and we may want to investigate further why we don’t have just one assignment group handling these types of incidents. We also infer that the quality of our category is good as the majority of computer support request (86%) are going to the same category in this cluster.

We can also validate the assignment group distribution of 23% by running a report using the our cluster conditions and the cluster concept fields as our filter criteria. In the below report we can validate that close to 23% of the incidents do belong to the same assignment group.

To compute purity, each cluster is assigned to the class which is most frequent in the cluster, and then the accuracy of this assignment is measured by counting the number of correctly assigned documents and dividing by N. Clusters with a strong affinity toward a class have purity values close to 0, a perfect clustering has a purity of 1. The formula for computing purity is below.1

Footnote 1 -Stanford Evaluation of Clustering. https://nlp.stanford.edu/IR-book/html/htmledition/evaluation-of-clustering-1.html

To learn more about using clustering and purity fields please reference the Predictive Intelligence documentation.