2017-10-20

Navigation

Skip Navigation Links

Hash algorithms

Asymmetric Algorithms

Symmetric Cipher Algorithms

Encoding Algorithms

Compression Algorithms

Pseudo Random Number Algorithms

Steganography

Library Wrappers

String Comparison

Others

Syntax highlighting by Prism
PBCrypto.com Mirror

Levenshtein String Distance

Algorithm creator(s)

Vladimir Levenshtein


PB author(s)

Wayne Diamond


Description

Levenshtein Distance (LD) is a measure of the similarity between two strings - the source string (s) and the target string (t). The LD is the number of deletions, insertions, or substitutions required to transform s into t.


Note

n/a


Source

https://forum.powerbasic.com/forum/user-to-user-discussions/source-code/25475-levenshtein-string-distance-algorithm-string-similarity-test?t=24815


See also

n/a


Source Code

Download source code file levenshtein.bas (Right-click -> "Save as ...")

'LEVENSHTEIN.BAS - Ported by Wayne Diamond to PB from the VB sample at http://www.merriampark.com/ld.htm
'Quote from ld.htm:
'  Levenshtein distance (LD) is a measure of the similarity between two strings,
'  which we will refer to as the source string (s) and the target string (t). The
'  distance is the number of deletions, insertions, or substitutions required to
'  transform s into t. For example, If s is "test" and t is "test", then LD(s,t) = 0,
'  because no transformations are needed. The strings are already identical. If s is
'  "test" and t is "tent", then LD(s,t) = 1, because one substitution
'  (change "s" to "n") is sufficient to transform s into t. The greater
'  the Levenshtein distance, the more different the strings are.
'  Levenshtein distance is named after the Russian scientist Vladimir
'  Levenshtein, who devised the algorithm in 1965. If you can't spell or pronounce
'  Levenshtein, the metric is also sometimes called edit distance.

#COMPILE EXE "lsdist.exe"

'********************************
'*** Compute Levenshtein Distance
'********************************
FUNCTION LD(BYVAL s AS STRING, BYVAL t AS STRING) AS LONG
DIM d() AS LONG  ' matrix
DIM m AS LONG    ' length of t
DIM n AS LONG    ' length of s
DIM i AS LONG    ' iterates through s
DIM j AS LONG    ' iterates through t
DIM s_i AS STRING   ' ith character of s
DIM t_j AS STRING   ' jth character of t
DIM cost AS LONG ' cost
  ' Step 1
  n = LEN(s)
  m = LEN(t)
  IF n = 0 THEN
    LD = m
    EXIT FUNCTION
  END IF
  IF m = 0 THEN
    LD = n
    EXIT FUNCTION
  END IF
  REDIM d(0 TO n, 0 TO m) AS LONG
  ' Step 2
  FOR i = 0 TO n
    d(i, 0) = i
  NEXT i
  FOR j = 0 TO m
    d(0, j) = j
  NEXT j
  ' Step 3
  FOR i = 1 TO n
    s_i = MID$(s, i, 1)
    ' Step 4
    FOR j = 1 TO m
      t_j = MID$(t, j, 1)
      ' Step 5
      IF s_i = t_j THEN
        cost = 0
      ELSE
        cost = 1
      END IF
      ' Step 6
      d(i, j) = MIN%(d(i - 1, j) + 1, d(i, j - 1) + 1, d(i - 1, j - 1) + cost)
    NEXT j
  NEXT i
  ' Step 7
  LD = d(n, m)
  ERASE d
END FUNCTION
 
FUNCTION PBMAIN() AS LONG
DIM LevenshteinDistance AS LONG
String1$ = "test"
String2$ = "test"
LevenshteinDistance = LD(String1$, String2$)
STDOUT "The Levenshtein Distance between " & String1$ & " and " & String2$ & " is: " & STR$(LevenshteinDistance)
 
String1$ = "pie"
String2$ = "pies"
LevenshteinDistance = LD(String1$, String2$)
STDOUT "The Levenshtein Distance between " & String1$ & " and " & String2$ & " is: " & STR$(LevenshteinDistance)
 
String1$ = "doom"
String2$ = "wolfenstein"
LevenshteinDistance = LD(String1$, String2$)
STDOUT "The Levenshtein Distance between " & String1$ & " and " & String2$ & " is: " & STR$(LevenshteinDistance)
END FUNCTION

Mirror provided by Knuth Konrad