How to import 4 million records in 3 hours

In my "Titans of ServiceNow" session with @Robert Fedoruk I talked about a self-implemented solution I'm proud of. I mentioned that I've built a custom importer which was able to import ca. 4 million records in about 3 hours. And I promised I would write an article about it, and here it is. Please forgive me for not being able to provide a ready-made solution here, but only the conceptual approach. On the one hand, the solution is very specific to our use case and on the other hand, it was developed as part of a customer project, so I can't just provide ready-made code that the customer has paid for.

But I think the approach should still be interesting, because the idea behind it can be transferred well to many other use cases.

Initial Situation & Constraints

For current my FSM-based project, various types of Foundation Data must be permanently imported from two different company-internal databases so that the Work Orders and the Work Orders Tasks can reference it, such as network elements (CMDB) or locations (table cmn_location). We don't have direct access to these databases, however, there is a large Elastic cluster that already imports them (and many more), preprocesses their data, and makes it available to consumers via a REST-based API.

Elastic does not represent a database but a so-called index, and it doesn't store database records but index documents. With each GET request to the REST-based API, you get at most 10,000 documents returned. So that means pagination is mandatory, but in Elastic this cannot be done via simple URL parameters. Instead, you have to request a so-called PIT ID which represents the frozen state of a temporary version of the underlying data. That PIT ID has to be provided within each GET payload and for the pagination feature, additionally the pointer of the last document from the previous batch. It is not so important here how exactly this works. I only mentioned it to make it clear that a OOTB data source is not sufficient here. Maybe it would have worked with Data Stream Actions from the Integration Hub, but we only have the "Starter" version available according to the contract.

Challenges

Le'ts assume we would implement a naive solution where requesting the data in nearly 400 batches (4 million divided by 10,000) and storing it in one of the target tables is done within the same script. And in addition, it is assumed that the creation of a new data record takes an average of 100 milliseconds. I think this is a realistic value for the CDMB or the cmn_location table, since several business rules are fired after the insertion. Then the whole import would take about 4.6 days. This is not only unacceptable, but would also lead to considerable technical difficulties on both sides. So one of the two challenges is to consume the data as fast as possible and cache it somewhere in a table that is inherently optimized for performance.

On the other hand, importing in just one script is very inefficient. The ServiceNow architecture is designed to run many actions simultaneously in the background. Furthermore, depending on the contract, each customer has at least 2 application nodes available that can be used for load balancing. The second challenge is therefore the parallelization of inserting the data into the corresponding target tables.

Solution

We have agreed with the Elastic team that the data fields of the index documents are named exactly like our table fields in ServiceNow. This way, all the data can be transferred one-to-one into GlideRecords and without time-consuming transformations. Instead, all additional calculations on the data, such as generating true ServiceNow references between two tables, were relegated from the importer to a secondary process I've called "Afterburner", which we won't look at further here. The elimination of all dependencies between two index documents is a crucial prerequisite for the high degree of parallelization described below.

For both challenges, the solution uses the same artifact - the table sysevent with custom queues.

Due to its high importance in ServiceNow as a kind of message bus, it is very performant and designed to decouple data producers and daa consumers from each other.

Challenge 1

The sysevent table is therefore used by the import script to cache blocks of Elastic documents. At field parm1 the name of the target table is stored and in field parm2 a JSON-stringified array of 1 to x index documents - depending on the average size of an index document delivered by Elastic.

Challenge 2

The actual import does not take place in this script, but in the registered Script Actions which are executed by ServiceNow in case one of the import events occur. The first step towards parallelization is to register multiple event queues, which will be filled evenly by the import script. The second step towards parallelization is to use all existing application nodes. That can only be achieved by directly creating a record at table sys_trigger with "System ID" = "ALL NODES".

The field "Job context" of these parent records must contain the invocation of the method GlideEventManager('QUEUE NAME').process().

As a result, ServiceNow creates child records automatically - one for each node:

In summary and visualized, the solution is as follows:

Result

With the help of the introduced approach the time for the initial import could be reduced to circa 3 hours. That was enough for us, because we also wanted to make sure that the users could work in ServiceNow as usual during the import without any performance losses, which was actually the case.

And it makes no sense to derive a universally valid speed factor for your scenarios, because the real duration depends on many other conditions. In our case, for example, it is the fact that our ServiceNow instances are operated on-premise and the provider has optimized the database and networks to the maximum, which means that our ServiceNow instances are much faster than those in the ServiceNow's own cloud. Furthermore, other performance-hungry jobs (other imports, discovery, service mapping, etc) can also have a strong impact on the import speed. And last but not least, the import speed depends on the total amount of data as well as the type of target tables. For example, in the CMDB, a lot of background actions are triggered after an insert, which would not be the case with a simple custom table.