Retrieving and Processing Files from Amazon S3

In an increasingly “as a service” world, we were finding that, for us to import and export data files from and to other systems, we were having to build jobs outside of the platform to move those files between Amazon S3 and FTP. So we decided to cut out the middleman.

What

Our objective was to be able to list the contents of an Amazon S3 bucket, and then selectively read and process files (objects) from the bucket, for the purpose of creating and updating records in our ServiceNow instance. Given that we have IntegrationHub Enterprise, we had the option to leverage the Amazon S3 Spoke on the ServiceNow Store.

Note: the approach here can be adapted to work via scheduled jobs, script includes, and business rules, but if you have the option to use IntegrationHub and Flow Designer, that is a much better way to go – the interface makes it really easy to organize your work and build re-usable functions.

How

Step 1: Download and Install the Amazon S3 Spoke from the ServiceNow Store

We’ll be building a number of custom actions, but starting with the spoke gives us some solid core functionality, and allows us to work in an existing application scope.

Step 2: Configure your Connection and Credential Alias

Goes without saying, but easy to overlook. If you add the OOTB spoke action “List Objects By Bucket” and get errors or variable “Bucket Name” isn’t empty, you probably don’t have the right connection settings or permissions. For the process described here, you’ll need an AWS Access Key and Secret Key for a user with AmazonS3ReadOnlyAccess at least (if you want to put files in the bucket, you’ll obviously need to adjust this accordingly).

Step 3: Plan your Flow

Our use case involves creating and updating Customer Account and Account Relationship records. The bucket contains a large number of .JSON files. We need to go to the bucket, list all of the files, determine whether they meet a naming criteria and whether they’ve been updated since the last time we executed the flow. We then process each file that meets the criteria.

The overall Flow we are building looks like this:

Step 4: Create a System Property

We’ll need someplace to store the last date and time we ran the flow. Why not use a system property? You can put your property in Global or the Amazon S3 Spoke, but remember you’ll need to set cross-scope access depending on where you put it.

Step 5: Build some Actions

The OOTB Actions for Amazon S3 are great, but aren’t quite suited to what we need. You can list objects in a bucket, but you can’t filter by date updated. You can download an object, but you can only attach it to an Incident record. No worries though – we can fix that! Keep in mind that, since we’re working in a vendor’s scope and all of our Actions are custom, we’ll want to use some sort of naming convention to keep track of which ones are ours.

Action 1: Custom List Objects By Bucket JSON

This action is very similar to an OOTB action, but has the added benefit of including the last modified date of the S3 object, which we need so we can be smart about what objects we process (keep those API calls to a minimum).

Start with OOTB Action “List Objects By Bucket JSON”.
Edit step 3 “Post-Processing & Error Handling.
1. On line 6, replace label with modified.
2. On line 13, replace label with modified.
3. Replace the entire contents of line 24 with "modified": contents[i]['LastModified'].toString().

Now you have an action that will go to your S3 bucket and put all of the object names and last modified dates in a JSON object that you can work with.

Action 2: Custom Get System Property

Create a utility action you can re-use whenever you need it. This action has a simple script step that grabs the system property we created earlier and outputs it to a Flow variable we’ll use in the next step.

Inputs
1. Label: Property; Name: property; Type: String; Mandatory: true
Script Step
1. Input Variables
  1. Name: prop_name; Value: action > Property
2. Script
  1. outputs.prop_val =gs.getProperty(inputs.prop_name)
3. Output Variables
  1. Label: Property Value; Name: prop_val; Type: String; Mandatory: true
4. Outputs
  1. Label: Property Value; Name: prop_val; Type: String; Mandatory: true; Value: step > Script step > Property Value

Action 3: Custom Build S3 File List

Get all of the files from Action 1 that meet the date criteria from Action 2. We’ll also do some optional file name filtering so we just get the objects we want.

Inputs
1. Label: File JSON; Name: file_json; Type: JSON; Mandatory: true
2. Label: Last Run; Name: last_run; Type: String; Mandatory: false
3. Label: Object Name Filter; Name: object_name_filter; Type: String; Mandatory: false
4. Label: File Extension; Name: file_extension; Type: String; Mandatory: false
Script Step
1. Input Variables
  1. Name: fileJSON; Value: action > File JSON
  2. Name: lastRun; Value: action > Last Run
Name: nameFilter; Value: action > Object Name Filter
Name: fileExtension; Value: action > File Extension
Script
```
(function execute(inputs, outputs) {
var lpVal = inputs.lastRun || "2000-01-01 00:00:00";
var nameFilter = inputs.nameFilter || "";
var extension = inputs.fileExtension || "";
var fileString = JSON.stringify(inputs.fileJson);
var processList = [];

var debug = false;

var lastProcessed = new GlideDateTime(lpVal);

gs.info("Flow Designer Action: CH Build S3 File List - starting processing of JSON File: " + fileString + "; Contents length: " + inputs.fileJson.data.length);

for (var i = 0; i < inputs.fileJson.data.length; i++){

var file = inputs.fileJson.data[i].name;

var fmVal = inputs.fileJson.data[i].modified || "2000-01-02 00:00:00";

if(debug){  
            gs.info("Flow Designer Action: CH Build S3 File List - Processing file " + file + ", last modified " + fmVal);  
}  

var modified = new GlideDateTime(fmVal);  
var compRes = "before";  
var filterRes = "No file filter was set"  
var action = "Ignoring";  

if(modified > lastProcessed && file.includes(extension)){  
  compRes = "after";  
  action = "Processing";  
  if(nameFilter && nameFilter > ""){  
    if(file.includes(nameFilter)){  
      filterRes = "File name matches filter";  
    } else {  
      filterRes = "File name does not meet filter criteria";  
      action = "Ignoring"  
    }  
  }  
}  

if(debug){  
            gs.info("Flow Designer Action: CH Build S3 File List - For file " + file + ", last modified is " + modified + " which is " + compRes + " the last processing date of " + lastProcessed + "; " + filterRes + ".  " + action + " this file");  
}  

if(action == "Processing")  
  processList.push(file);

}

            gs.info("Flow Designer Action: CH Build S3 File List - Finding files modified after " + lastProcessed + " using file name filter '" + nameFilter + "'; File list contains " + processList.length + " records:  " + processList.toString());  

            outputs.files = processList;  
            outputs.records = processList.length;

})(inputs, outputs);

```
1. Output Variables

1. Label: files; Name: files; Type: Array.String; Mandatory: true

1. Label: file; Name: file_record; Type: String

2. Label: records; Name: records; Type: String; Mandatory: true
2. Outputs

1. Label: Files; Name: files; Type: Array.String; Mandatory: true; Value: step > Build Outputs > files

1. Label: File Key; Name: variable_child0;

2. Label: File Count; Name: file_count; Type: String; Mandatory: true; Value: step > Build Outputs > records

Note: The code above gets the timestamp from the S3 object as generated by AWS. When you set the value of fmVal using the timestamp, the time part will be set to 00:00:00. We’re just processing once a day, so that doesn’t matter for us, but it’s something you will want to address if you are processing multiple times per day.

Flow Logic: If any files need to be processed

For Each loops error when they have nothing to process. We need to know if there were any files that passed the check in the previous action, which is why we have the File Count output.

Condition 1: 3 > File Count is not 0

Referring back to the overview of the Flow, the next 3 nodes will fall into this check.

Flow Logic: For Each Item in 3 > Files

Go through the array of files we created in Action 3. This is where we will actually read in the objects from the bucket that met our criteria, and do something with them.

Action 4: Custom Get S3 Object

This is where you’ll start to do your own thing, based on your use case. We’ll all start out the same way, getting the object from the bucket. What you do with the object is up to you. In my case, the objects are text files containing JSON-formatted data. I end this action by getting the text and placing it in an output string variable, so I can use it later.

Start with OOTB Action “Download S3 Object to ServiceNow Record”.
In the Inputs section, remove “Record” and “Download As”.

In the Pre-Processing script, remove Input and Output Variables “key” and “file_name”. Your script should look like this:

(function execute(inputs, outputs) {

    outputs.bucket_name = new AmazonS3Utils().ValidateBucketName(inputs

        .bucket_name);

    outputs.bucket_region = new AmazonS3Utils().getBucketRegion(outputs

        .bucket_name)

})(inputs, outputs);

Rename the REST Step from “Download S3 Object to ServiceNow Record” to “Get Object by Key”.
1. Under Request Details, replace the Resource Path variable with action > File Key (make sure you keep the “/” before it).
2. Under Response Handling, uncheck Save as Attachment.
Update the Post Processing & Error Handling step. This is where we’ll make the object’s contents usable for whatever we need, rather than attaching it to an Incident record.
1. After line 3, insert outputs.contents = inputs.responsebody;
2. Add an Output Variable
  1. Label: contents; Name: contents; Type (will depend on what you are doing with the contents; in my case “String”).

Actions 5 through X: Process Files

This part is really up to you and your use case. For me, I needed to take the contents output in Action 4, convert the string to JSON, and use the resulting JSON object to create and update customer accounts, and to create relationships to other accounts.

Action X + 1: Get Current Date and Time

Now that we’re done processing, we need to set the system property we retrieved in Action 2. To do this, we need to get the current date and time.

Inputs (none)
Script Step
1. Input Variables (none)
2. Script
  (function execute(inputs, outputs) { var now = new GlideDateTime(); outputs.nowdtasdatetime = now; outputs.nowdtasstring = now.toString();

})(inputs, outputs);

Output Variables
1. Label: status; Name: status; Type: string; Mandatory: true
2. Label: message; Name: message; Type: string; Mandatory: true
Outputs
1. Label: status; Name: status; Type: string; Mandatory: true; Value: step > Script Status > status
2. Label: message; Name: message; Type: string; Mandatory: true; Value: step > Script Status > message

Action X + 2: Set System Property

Finally, put the current date and time in our system property, so we know when we last processed.

Inputs
1. Label: Property; Name: property; Type: String; Mandatory: True
1. Label: Value; Name: value; Type: String; Mandatory: True
Script Step
1. Input Variables
2. 1. Name: property; Value: action > Property
  1. Name: value; Value: action > Value
3. Script
```
(function execute(inputs, outputs) {
var gr = new GlideRecord("sys_properties")
gr.addQuery("name", inputs.property);
gr.query();
if(gr.next()){

gr.value = inputs.value;

gr.update();

outputs.status = "success"

} else {

outputs.status = "error"

outputs.message = "Unable to set system property " + inputs.property + "; please ensure this is a valid system property.";

}

})(inputs, outputs);

```

4. Output Variables

5. 1. Label: status; Name: status; Type: string; Mandatory: true

2. Label: message; Name: message; Type: string; Mandatory: true

6. Outputs

7. 1. Label: status; Name: status; Type: string; Mandatory: true; Value: step > Script Status > status

2. Label: message; Name: message; Type: string; Mandatory: true; Value: step > Script Status > message

All Together Now

Trigger

Whatever trigger conditions you want, but this was built to be run on a schedule.

Actions (and their inputs)

Custom List Objects by Bucket JSON
1. Bucket Name: (select from list based on your AWS permissions)
Custom Get System Property
1. Property: (name of system property used to store last run date/time)
Custom Build File List
1. File JSON: [1 > Contents]
2. Last Run: [2 > Property Value]
3. Object Name Filter: (optional literal value)
4. File Extension: (optional literal value)
If at Least One File was Returned by Custom Build File List
1. [3 > File Count] is not 0
For each Item in
Custom Get S3 Object
1. Bucket Name: (same value as 1a)
2. File Key: [5 > File Key]
Use Case Processing
Custom Get Current Date and Time
Custom Set System Property
1. Property: (name of system property used in step 2)
2. Value: [8 > Now Date Time as String]

And there you have it. Have any suggestions to improve this? Did you find it useful? Let me know!