Tuning Predictive Intelligence Models (part 2) – using PredictabilityEstimate

In this article I’ll cover two methods to tune your Predictive Intelligence classification model.

Method 1 - Auto-Tuning a classification model

The simplest way to “tune” a Predictive Intelligence classification model is to set the Target Metric Value. Navigate to your trained solution and click into the Solution Statistics tab. In the below example my Estimated Solution Precision is 74.09%.

If I wanted to improve the precision, I can set the Target Metric Value to a higher precision (in this case 80%) and press Apply Values. The target metric value will optimize the precision by adjusting the recall and coverage percentages for the model. The below picture shows that we have improved our model to a precision of 80.07%, but we did drop in recall and coverage to get the precision gain.

Method 2 - Adding inputs to your model

A second way of tuning our classification model is to add additional inputs to our classification solution definition. The question is, which inputs will have the most impact in improving the precision of the model? To answer this question we can use the PredictabilityEstimate object.

We’ll use the ML API() to instantiate and use the PredictabilityEstimate object in two steps.

Step 1 – Create and submit the PredictabilityEstimate

The PredictabilityEstimate is a scriptable object that provides an estimation of which features can be useful for predicting the target field for classification. To read more about the ML API() and the PredictabilityEstimate object please see our documentation.

Copy & paste the below code (attached as createEstimate.txt) and run as a background script, by going to System definition > Scripts – Background. The below java script uses the PredictabilityEstimate object to recommend input fields to improve the classification prediction for category. If you want to use a different table or a different predicted target field, just swap them out.

I did a quick synopsis of the below key blocks of code 1-4.

Block 1 – specifies the “incident table” as the table with our training data.

Block 2 – says we want to test for “classification” in our solution

Block 3 – defines the target predict field as category

Block 4 – Adds our PredictabilityEstimate object, myEstimate, to the PredictabilityEstimateStore and submits the training job.

Step 2 – Get the unique name of the PredictabilityEstimate

Go to the ML Solutions table and get the unique name for the predictability estimate object that you just created. In my example it’s, ml_x_snc_global_global_predictability_estimate.

Step 3 – Query the PredictabilityEstimate for the recommended input fields

Run the below script to retrieve the Predictability Estimate score for predicting category using classification (see attached script retrieveEstimate).

Interpreting the results:

The script outputs two valuable insights when predicting category. The first being the impact of “nominal field inputs” on improving the model. The second calculates the “density of text input fields”. I won’t get into all the data science details here, but at a high level we’re using feature ranking that leverages a mix of weighted information gain techniques, AUPRC (area under precision recall curve), ReliefF Scoring, and a pre-trained model to select and rank relevant nominal input fields for the recommendation.

Nominal Field Inputs are discrete values that can be used as inputs into the classification model. The below “modelImprovement” numbers show the correlation percentage to our predicted target field, in this case “category”. We can see that subcategory is highly correlated to category, by 47%.

Now here is where we have to apply some human logic to the recommended input field output. We’re trying to predict category, however we’re not going to know the subcategory at the time of prediction. Also category typically drives subcategory so these two fields are typically highly correlated. So we should skip subcategory as a potential input to improve the model and look at the other recommended input fields. For example, assignment_group and service_offering both look like potential input fields that we might be able to use to improve the precision of our classification model. You may need to experiment with different combinations of the recommended input fields to get the best precision. I would suggest you make a copy of the original solution definition and add one new input and train that model to see if precision improves.

If you scroll to the bottom of the output, you will see textInputFields density.

Density of text input fields, represents the fullness or the sparsity of a potential input text field. Your predictive intelligence models will perform better if the text input fields have a high density of data. In our example above the short_description is a good text input field because it has a density of 99%. We want to avoid any text input fields that have low percentages.

In summary, we just learned about two methods for tuning your classification model. There is a third method that I did not cover, which is using your knowledge of the data to determine which input fields are required and analyzing the data using Performance Analytics or reporting.

Again for reference the document for using the Machine Learning APIs here:

https://docs.servicenow.com/bundle/paris-performance-analytics-and-reporting/page/administer/predict...

Best Regards, Lener