logo

NJP

Scripted String Matching Techniques - Part 3 (Jaro-Winkler)

Import · Sep 01, 2018 · article

NOTE: MY POSTINGS REFLECT MY OWN VIEWS AND DO NOT NECESSARILY REPRESENT THE VIEWS OF MY EMPLOYER, ACCENTURE.

DIFFICULTY LEVEL: ADVANCED
Assumes having taken the class SSNF and has good advanced level of knowledge and/or familiarity with Scripting in ServiceNow.

So why am I pursuing this, what might seem, very boring topic? Well, this need has come up so often, and it is so inadequately addressed with the platform; that it needs a library of such functions to meet that need!

The idea here is to provide a fuzzy search that will allow the developer the ability to find close matches. Data-being-data you most likely will NOT be in control of what you are asked to process. Thus the need for this sort of thing!

So, the questions that present themselves: How many times have you had to go manually looking for a particular user in the sys_user table and been frustrated by effort? Sometime it takes you a bit to track them down via the source field? Wouldn't it be nice to have something human-like go through and find such things for you and just present you with the results? Quickly?

Answer for me: You bet! Happens all the time!

Okay, getting off my soap-box here. image

In this next-to-last article on string search algorithms (yes, yes, there will be one more) I will present a really great fuzzy search method: Jaro-Winkler Distance.

I searched the internet over (sounds like an old Hee-Haw song doesn't it?), and found several implementations in JavaScript. I converted a few of these, that looked reasonable, into a single Script Include Library, and then created a Fix Script that utilized the same data and results from my previous article for comparison purposes.

Note: I have attached the Fix Script and Script Includes I created to this article for download. You accept the install as-is and unsupported if you decide to use it.

See: Community Code Snippets: Scripted String Matching Techniques - Part 2 (Levenshtein)

In running the Fix Script the results were very interesting! For the "Buddy" matches I used my Levenshtein results as control values.

The results below are from the Fix Script. I edited them to include the control information and the variance (if significant):

*** Script: ---> 

    Jordan Thomas script:
    aaapppp: 0
    frog: 92.50
    fly: 0
    elephant: 44.17
    hippo: 44.17
    hippo: 0
    hello: 88.00
    ABC : 90.67
    D N H Enterprises: 95.25
    My Gym 94.20
    PENNSYLVANIA: 89.80
    Buddy: 66.71
    Buddy (flipped): 66.71

    Shannan Archer script:
    aaapppp: 0
    frog: 92.5
    fly: 0
    elephant: 44.166666666666664
    hippo: 44.166666666666664
    hippo: 0
    hello: 88.0
    ABC : 92.22222222222223
    D N H Enterprises: 95.25189786059352
    My Gym 95.16666666666667
    PENNSYLVANIA: 91.501554001554
    Buddy: 66.7055167055167
    Buddy (flipped): 66.7055167055167

    thsig script (Apache StringUtils Java library as control values):
    aaapppp: 0
    frog: 75.0
    fly: 0
    elephant: 58.25
    hippo: 44.166666666666664
    hippo: 0
    hello: 88.96000000000001
    ABC : 90.91228070175438
    D N H Enterprises: 87.82239477891652
    My Gym 86.39526508336331
    PENNSYLVANIA: 83.47820512820513
    Buddy: 90.69427268180065
    Buddy (flipped): 87.10764538417449

As you can see the differences from the control values were not that far off! I was especially interested in the Levenshtein accuracy. Overall not to different from the Apache Java library, and more accurate than the Levenshtein algorithm.

Okay, so now leaning toward Jaro-Winkler Distance as the better string matching method.

In my next article I will cover two other algorithms and we will see if they work better than Jaro-Winkler. They are:

  • Smith-Waterman
  • Needleman-Wunsch

Enjoy!

Steven Bell.

If you find this article helps you, don't forget to log in and mark it as "Helpful"!

image

Originally published on: 09-01-2018 7:45 AM

I updated the code and brought the article into alignment with my new formatting standard.

View original source

https://www.servicenow.com/community/developer-articles/community-code-snippets-scripted-string-matching-techniques-part/ta-p/2330301