I need to clear something up.... RE: Indicators / Sample Size

I have just seen the following statement, on a post (I hope you don't mind my quoting, its important...):

For indicators, sample size needs to be considered based on the volume of records being checked in the table - you can usually estimate this using a standard statistical sample size calculator, or simply use 0 if you want to have the indicator check all records in the table for a match (in the example I gave above, you'll note a specific sample size, but that's because at ServiceNow we do change records FOR EVERYTHING and the lion's share of those records are automatically generated, and so that table can get pretty huge for us, but your mileage may vary).

I need to stop this circulating

I have clarified it on numerous occasions, but it feels like here is a great place to provide some further clarity on the matter. (And get some wider feedback)

The sample size is purely used to specify how many records are collected for supporting data. It does not limit the number of records which are checked. It never has.

The indicator engine will check ALL records that meet the condition; and then - if sample size is set, it will create a randomisation across the full data set and select this amount for supporting data.

An auditor might not want to see all 100k+ records which meet the condition; but the system NEEDS to check them. Otherwise it will check the top 10, and all will be well. The devil is in the detail.

For full clarification see the following code excerpts:

IndicatorEngine is invoked on the following methods:

    runAllIndicators: function() {
        this._runAllIndicators();
    },

    runIndicator: function(indicatorRecord) {
        return this._runIndicator(indicatorRecord);
    },

runAllIndicators will invoke the runIndicator for each , so lets follow runIndicator through - and we are only talking about BASIC method (because Manual and Scripted have their own approach to determining pass or fail)....

within run , it will _getStrategy and then call .run() on that strategy:

var strategy = this._getStrategy(indicatorRecord);
var resultId = strategy.run();

so lets jump to BasicIndicatorStrategy API run method, I will annotate with my own comments:

run: function () {
    var resultId = "";
    // Retrieve the table of the profile documentId
    var item = this.indicatorRecord.item;

    if (item != null) {
        var profile = item.profile;

        if (profile != null) {
            var count = new GlideAggregate(this.indicatorRecord.reference_table);
            var criteria = this.indicatorRecord.getValue("criteria");
            if (criteria != null && criteria != '')
                count.addEncodedQuery(criteria); // Phil: it is going to run the criteria!
            if (this.indicatorRecord.use_reference_field)
                count.addQuery(this.indicatorRecord.getValue("reference_field"), profile.applies_to); // Phil: for the profile source! *ahem* Entity
            count.addAggregate('COUNT'); 
            count.query();
            count.next();

            if (count.getAggregate('COUNT') > 0) { // Phil: records are present 
                if (this.indicatorRecord.passed == "passed") // Phil: toggle for good thing or bad thing
                    resultId = this.addResult(true, "", "", "");
                else
                    resultId = this.addResult(false, "", "", "");
            } else { // Phil: no records are present
                if (this.indicatorRecord.passed == "passed") // Phil: toggle for good thing or bad thing
                    resultId = this.addResult(false, "", "", "");
                else
                    resultId = this.addResult(true, "", "", "");
            }

            if (resultId != '') // Phil: if we got a result, then go and collect the supporting data... 
                this.collectSupportingData(resultId); // Phil: this method will look at sample size... 
        } else {
            var pt = new sn_grc.GRCUtils().getMessage("profile_lower");
            gs.addErrorMessage(gs.getMessage("The {0} has no item associated", pt));
        }
    } else
        gs.addErrorMessage(gs.getMessage("The indicator has no item associated"));
    return resultId;
},

For the avoidance of any further doubt:

collectSupportingData: function (result, task) {
    var sample = parseInt(this.indicatorRecord.getValue("sample_size")); // Phil: here she is !! 
    if (isNaN(sample))
        sample = 0;

    // Retrieve the supporting data records
    var gr = new GlideRecord(this.indicatorRecord.getValue("reference_table")); // Phil: using a GlideRecord now, not GlideAggregate 
    var criteria = this.indicatorRecord.getValue("criteria");
    if (criteria != null && criteria != '')
        gr.addEncodedQuery(criteria);

    if (!this.indicatorRecord.item.nil()) {
        if (!this.indicatorRecord.item.profile.nil()) {
            if (this.indicatorRecord.use_reference_field) {
                gr.addQuery(this.indicatorRecord.getValue("reference_field"), this.indicatorRecord.item.profile.applies_to);
            }
        } else {
            gs.debug("Associated item to " + pr + " not found during collection data");
            return;
        }
    } else {
        gs.debug("Indicator has no associated item during collection data");
        return;
    }

    gr.query();

    var rowCount = gr.getRowCount();
    if (rowCount == 0)
        return;

    if ((sample > rowCount) || (sample == 0)) // Phil: avoid minuses
        sample = rowCount;

    var probability = Math.min(sample / rowCount, 0.95) + 0.05; // Phil: set up for random

    // Build the supporting data field list. If none are selected, get all table fields
    var supportingDataFields = this.indicatorRecord.supporting_data_fields;
    var supportingDataFieldList = [];

    if (supportingDataFields != '')
        supportingDataFieldList = supportingDataFields.toString().split(',');
    else
        supportingDataFieldList = this.getAllFields(gr);

    var numRecordsAdded = 0;
    while (gr.next() && numRecordsAdded < sample) {
        if (Math.random() >= probability) // Phil: randomize 
            continue;

        // For each supporting data record, store every selected field
        for (var i = 0; i < supportingDataFieldList.length; i++) {
            var table = this.indicatorRecord.getValue("reference_table");
            var fieldName = supportingDataFieldList[i].toString();
            var fieldValue = gr.getElement(fieldName).getDisplayValue();
            this.insertSupportingDataRecord(table, gr.getUniqueValue(), gr.getElement(fieldName).getName(), fieldName, fieldValue, result, task);
        }
        numRecordsAdded++;
    }
},

Now, please - can anyone explain to me if I have this wrong?

The query is executed, which decides on the passed or failed. (Using GlideAggregate. No setLimit) No sample size in sight.

The result is generated.

The supporting data is generated, where sample size is used. Be aware, performance wise - this is going to loop through every single record which meets the condition, regardless of sample size. And it will write a new record to database for every selected field on the table! So here is why it may take some time between the Indicator Result being created, and the full sample set being available...

I hope this is helpful and maybe we can nip it in the bud!

For me one of the most important things to specify on an Indicator in GRC (similar to PA) is some kind of time frame. You need to allow the indicator to pass at some stage.

E.G. check records created in the last week, for weekly indicator frequency , or incidents closed in the last month, for monthly frequency

Otherwise it will fail forever - and also, performance will become an issue...

But please be aware that sample size will NOT limit how many records it should check (and having thought about the concept of that a few times I fail to see why anyone would want it to, based on some arbitrary value alone).