2019-05-21

### Navigation ### Others

Syntax highlighting by Prism

## Levenshtein String Distance

### Algorithm creator(s)

Vladimir Levenshtein

Wayne Diamond

### Description

Levenshtein Distance (LD) is a measure of the similarity between two strings - the source string (s) and the target string (t). The LD is the number of deletions, insertions, or substitutions required to transform s into t.

n/a

### See also

n/a

#### Source Code

Download source code file levenshtein.bas (Right-click -> "Save as ...")

``````'LEVENSHTEIN.BAS - Ported by Wayne Diamond to PB from the VB sample at http://www.merriampark.com/ld.htm
'Quote from ld.htm:
'  Levenshtein distance (LD) is a measure of the similarity between two strings,
'  which we will refer to as the source string (s) and the target string (t). The
'  distance is the number of deletions, insertions, or substitutions required to
'  transform s into t. For example, If s is "test" and t is "test", then LD(s,t) = 0,
'  because no transformations are needed. The strings are already identical. If s is
'  "test" and t is "tent", then LD(s,t) = 1, because one substitution
'  (change "s" to "n") is sufficient to transform s into t. The greater
'  the Levenshtein distance, the more different the strings are.
'  Levenshtein distance is named after the Russian scientist Vladimir
'  Levenshtein, who devised the algorithm in 1965. If you can't spell or pronounce
'  Levenshtein, the metric is also sometimes called edit distance.

#COMPILE EXE "lsdist.exe"

'********************************
'*** Compute Levenshtein Distance
'********************************
FUNCTION LD(BYVAL s AS STRING, BYVAL t AS STRING) AS LONG
DIM d() AS LONG  ' matrix
DIM m AS LONG    ' length of t
DIM n AS LONG    ' length of s
DIM i AS LONG    ' iterates through s
DIM j AS LONG    ' iterates through t
DIM s_i AS STRING   ' ith character of s
DIM t_j AS STRING   ' jth character of t
DIM cost AS LONG ' cost
' Step 1
n = LEN(s)
m = LEN(t)
IF n = 0 THEN
LD = m
EXIT FUNCTION
END IF
IF m = 0 THEN
LD = n
EXIT FUNCTION
END IF
REDIM d(0 TO n, 0 TO m) AS LONG
' Step 2
FOR i = 0 TO n
d(i, 0) = i
NEXT i
FOR j = 0 TO m
d(0, j) = j
NEXT j
' Step 3
FOR i = 1 TO n
s_i = MID\$(s, i, 1)
' Step 4
FOR j = 1 TO m
t_j = MID\$(t, j, 1)
' Step 5
IF s_i = t_j THEN
cost = 0
ELSE
cost = 1
END IF
' Step 6
d(i, j) = MIN%(d(i - 1, j) + 1, d(i, j - 1) + 1, d(i - 1, j - 1) + cost)
NEXT j
NEXT i
' Step 7
LD = d(n, m)
ERASE d
END FUNCTION

FUNCTION PBMAIN() AS LONG
DIM LevenshteinDistance AS LONG
String1\$ = "test"
String2\$ = "test"
LevenshteinDistance = LD(String1\$, String2\$)
STDOUT "The Levenshtein Distance between " & String1\$ & " and " & String2\$ & " is: " & STR\$(LevenshteinDistance)

String1\$ = "pie"
String2\$ = "pies"
LevenshteinDistance = LD(String1\$, String2\$)
STDOUT "The Levenshtein Distance between " & String1\$ & " and " & String2\$ & " is: " & STR\$(LevenshteinDistance)

String1\$ = "doom"
String2\$ = "wolfenstein"
LevenshteinDistance = LD(String1\$, String2\$)
STDOUT "The Levenshtein Distance between " & String1\$ & " and " & String2\$ & " is: " & STR\$(LevenshteinDistance)
END FUNCTION
``````

Mirror provided by Knuth Konrad