Improve your Predictive Intelligence clustering results with DBSCAN!

Predictive Intelligence clustering is useful in identifying patterns in your data that you may not be able to see with analytics or reporting. Clustering works by grouping data points together that have similar features. Those datapoints can be knowledge articles, change requests, cases, or incidents. Clustering helps us identify automation opportunities and subscribe a value to automate those opportunities as seen in the Luca Morlupi’s Clustering Recommendations dashboard below (see fig1).

(fig1) Clustering Recommendation Utility - reach out to your ServiceNow sales team for setup guidance.

When you create a new clustering solution definition the default algorithm is K-means. In Paris, when you click the Advanced Solution Settings tab of your clustering solutions definition you will see a number of cluster parameters and different clustering algorithms such as HDBSCAN and DBSCAN. DBSCAN stands for Density-Based Spatial Clustering of Applications with Noise. DBSCAN differs from K-means in that you do not need to define the clusters K. The number of K clusters influences the patterns that you may see. DBSCAN also works well for datasets that are dense and not convex. There are a lot of great in-depth articles that you can google to read up on how the algorithm works.

The purpose of this article is to show you a practical application of DBSCAN. Although DBSCAN was introduced in Orlando, by default it’s only visible in the Paris release. By clicking on DBSCAN you can switch from the default K-means clustering algorithm to the DBSCAN algorithm (see fig2). In order to enable DBSCAN in the Orlando release, you need to open a HI Ticket and ask them to reference KB0829924 to enable DBSCAN on your instance.

(fig2)

In the example below (fig3), a K-means cluster was created on the short_description field of the incident table. This diagram is using the new Paris cluster treemap plot to show cluster concept groupings of incident short_descriptions based on similar features. Cluster concepts are the top terms most common in the cluster.

(fig3)

Let’s focus on clusters 1 & 2 in (fig3). Cluster 1, indicates the incidents in this cluster have the word combinations “network access error poor time” in common. We could interpret Cluster 1 to mean we have an issue with poor network access times. But we’re guessing on the word combinations. Cluster 2 says “issue printer token rsa print” as the common terms for this group of incidents. We could also postulate that “printer issues” and “rsa token issues” are potential automation opportunities.

Let's experiment with a different clustering algorithm – specifically DBSCAN, to see if we can get tighter clusters. Below (fig4) is the cluster treemap plot for the same short_description incident data, the only difference is that we have used DBSCAN as our clustering algorithm.

(fig4)

Notice how the clusters are much clearer. We can clearly see the following:

• Network access error• Poor time response from network• Token RSA issue • Printer Issue and print issue• Request plan fidelity account termination

• Zoom fix and Zoom Help