Token set ratio

Supported in: Batch, Streaming

Compute the token set ratio between two strings. Token set ratio is a metric describing how similar two strings are, and will return a value between 0 and 1, where 0 means that there are no similarities between the two strings and 1 means that they are the same (or one is a substring of the other).

Expression categories: Distance measurement, String

Declared arguments

  • Ignore case: Do you want to ignore case when comparing the left and right strings?
    Literal<Boolean>
  • Left: Left string to compare.
    Expression<String>
  • Right: Right string to compare.
    Expression<String>

Output type: Double

Examples

Example 1: Base case

Argument values:

  • Ignore case: false
  • Left: left
  • Right: right
leftrightOutput
hello worldworld hello1.0
Hellohello world0.5
hello hello WorlDhello world0.8181818181818181
hellofarewell0.46153846153846156
empty stringempty string1.0

Example 2: Base case

Description: By setting ignore case to true, letters of different case are treated as equal.

Argument values:

  • Ignore case: true
  • Left: left
  • Right: right
leftrightOutput
Hellohello world1.0
hello hello WorlDhello world1.0
helloFAREWELL0.46153846153846156

Example 3: Null case

Argument values:

  • Ignore case: false
  • Left: left
  • Right: right
leftrightOutput
hellonullnull
nullhellonull
nullnullnull