Levenshtein String Distance
Algorithm creator(s)
Vladimir Levenshtein
PB author(s)
Wayne Diamond
Description
Levenshtein Distance (LD) is a measure of the similarity between two strings - the source string (s) and the target string (t). The LD is the number of deletions, insertions, or substitutions required to transform s into t.
Note
n/a
Source
https://forum.powerbasic.com/forum/user-to-user-discussions/source-code/25475-levenshtein-string-distance-algorithm-string-similarity-test?t=24815
See also
n/a
Source Code
Download source code file levenshtein.bas (Right-click -> "Save as ...")
'LEVENSHTEIN.BAS - Ported by Wayne Diamond to PB from the VB sample at http://www.merriampark.com/ld.htm
'Quote from ld.htm:
' Levenshtein distance (LD) is a measure of the similarity between two strings,
' which we will refer to as the source string (s) and the target string (t). The
' distance is the number of deletions, insertions, or substitutions required to
' transform s into t. For example, If s is "test" and t is "test", then LD(s,t) = 0,
' because no transformations are needed. The strings are already identical. If s is
' "test" and t is "tent", then LD(s,t) = 1, because one substitution
' (change "s" to "n") is sufficient to transform s into t. The greater
' the Levenshtein distance, the more different the strings are.
' Levenshtein distance is named after the Russian scientist Vladimir
' Levenshtein, who devised the algorithm in 1965. If you can't spell or pronounce
' Levenshtein, the metric is also sometimes called edit distance.
#COMPILE EXE "lsdist.exe"
'********************************
'*** Compute Levenshtein Distance
'********************************
FUNCTION LD(BYVAL s AS STRING, BYVAL t AS STRING) AS LONG
DIM d() AS LONG ' matrix
DIM m AS LONG ' length of t
DIM n AS LONG ' length of s
DIM i AS LONG ' iterates through s
DIM j AS LONG ' iterates through t
DIM s_i AS STRING ' ith character of s
DIM t_j AS STRING ' jth character of t
DIM cost AS LONG ' cost
' Step 1
n = LEN(s)
m = LEN(t)
IF n = 0 THEN
LD = m
EXIT FUNCTION
END IF
IF m = 0 THEN
LD = n
EXIT FUNCTION
END IF
REDIM d(0 TO n, 0 TO m) AS LONG
' Step 2
FOR i = 0 TO n
d(i, 0) = i
NEXT i
FOR j = 0 TO m
d(0, j) = j
NEXT j
' Step 3
FOR i = 1 TO n
s_i = MID$(s, i, 1)
' Step 4
FOR j = 1 TO m
t_j = MID$(t, j, 1)
' Step 5
IF s_i = t_j THEN
cost = 0
ELSE
cost = 1
END IF
' Step 6
d(i, j) = MIN%(d(i - 1, j) + 1, d(i, j - 1) + 1, d(i - 1, j - 1) + cost)
NEXT j
NEXT i
' Step 7
LD = d(n, m)
ERASE d
END FUNCTION
FUNCTION PBMAIN() AS LONG
DIM LevenshteinDistance AS LONG
String1$ = "test"
String2$ = "test"
LevenshteinDistance = LD(String1$, String2$)
STDOUT "The Levenshtein Distance between " & String1$ & " and " & String2$ & " is: " & STR$(LevenshteinDistance)
String1$ = "pie"
String2$ = "pies"
LevenshteinDistance = LD(String1$, String2$)
STDOUT "The Levenshtein Distance between " & String1$ & " and " & String2$ & " is: " & STR$(LevenshteinDistance)
String1$ = "doom"
String2$ = "wolfenstein"
LevenshteinDistance = LD(String1$, String2$)
STDOUT "The Levenshtein Distance between " & String1$ & " and " & String2$ & " is: " & STR$(LevenshteinDistance)
END FUNCTION