Edit distance

Supported in: Batch, Streaming

Compute the edit distance between two strings. Supports Levenshtein, indel, and Damerau-Levenshtein distance.

Expression categories: Distance measurement, String

Declared arguments

  • Distance function - Distance function used to calculate the edit distance between the two strings.
    Enum<Damerau-Levenshtein Distance, Indel Distance, Levenshtein Distance>
  • Ignore case - Do you want to ignore case when comparing the left and right strings?
    Literal<Boolean>
  • Left - Left string to compare.
    Expression<String>
  • Right - Right string to compare.
    Expression<String>
  • optional Normalize distance - Do you want to normalize the distance to a value between 0 and 1, where 0 means no difference between strings and 1 means no similarity?
    Literal<Boolean>

Output type: Double | Integer

Examples

Example 1: Base case

Description: String edit distance calculated using Levenshtein distance Argument values:

  • Distance function: levenshtein
  • Ignore case: false
  • Left: left
  • Right: right
  • Normalize distance: false
leftrightOutput
hellohello0
hallohello1
hlelohello2
hellohEllO2
hellohello, world!8
hellofarewell6

Example 2: Base case

Description: By setting ignore case to true, letters of different case are treated as equal. Here calculated using Damerau-Levenshtein distance. Argument values:

  • Distance function: damerau_levenshtein
  • Ignore case: true
  • Left: left
  • Right: right
  • Normalize distance: false
leftrightOutput
hellohello0
hallohello1
hlelohello1
hellohEllO0
hellohello, world!8
hellofarewell6

Example 3: Base case

Description: By setting normalize to true, the edit distance is normalized to a value between 0 and 1. Here calculated using indel distance. Argument values:

  • Distance function: indel
  • Ignore case: false
  • Left: left
  • Right: right
  • Normalize distance: true
leftrightOutput
hellohello0.0
hallohello0.2
hlelohello0.2
hellohEllO0.4
hellohello, world!0.4444444444444444
hellofarewell0.5384615384615384

Example 4: Null case

Argument values:

  • Distance function: levenshtein
  • Ignore case: false
  • Left: left
  • Right: right
  • Normalize distance: false
leftrightOutput
hellonullnull
nullhellonull
nullnullnull