Soundex is a phonetic algorithm for indexing names by sound, as pronounced in English. It is commonly used with databases to help with searching and is built-in to many database engines such as PostgreSQL and MySQL. SoundEx is not included with SQLite by default and there may be situations when you want to use it when searching.
Fortunately the algorithm is not all that difficult. You can read more about SoundEx on Wikipedia, but here are the general steps:
- Retain the first letter of the name and drop all other occurrences of a, e, i, o, u, y, h, w.
- Replace consonants with digits as follows (after the first letter):
- b, f, p, v → 1
- c, g, j, k, q, s, x, z → 2
- d, t → 3
- l → 4
- m, n → 5
- r → 6
- If two or more letters with the same number are adjacent in the original name (before step 1), only retain the first letter; also two letters with the same number separated by ‘h’ or ‘w’ are coded as a single number, whereas such letters separated by a vowel are coded twice. This rule also applies to the first letter.
- If you have too few letters in your word that you can’t assign three numbers, append with zeros until there are three numbers. If you have more than 3 letters, just retain the first 3 numbers.
In general, implementing this algorithm is pretty straightforward. This is an an example function for the SoundEx algorithm using Xojo, which should be pretty easy to translate to other languages as needed:
Private Function SoundEx(word As Text) As Text
Const kLength As Integer = 4
Dim value As Text
Dim size As Integer = word.Length
' Make sure the word is at least two characters in length
If (size > 1) Then
word = word.Uppercase
' Convert the word to a character array for faster processing
Dim chars() As Text = word.Split
' For storing the SoundEx character codes
Dim code() As Text
' The current and previous character codes
Dim prevCode As Integer = 0
Dim currCode As Integer = 0
' Add the first character
code.Append(chars(0))
Dim loopLimit As Integer = size - 1
' Loop through all the characters and convert them to the proper character code
For i As Integer = 0 To loopLimit
Select Case chars(i)
Case "H", "W"
currCode = -1
Case "A", "E", "I", "O", "U", "Y"
currCode = 0
Case "B", "F", "P", "V"
currCode = 1
Case "C", "G", "J", "K", "Q", "S", "X", "Z"
currCode = 2
Case "D", "T"
currCode = 3
Case "L"
currCode = 4
Case "M", "N"
currCode = 5
Case "R"
currCode = 6
End Select
If i > 0 Then
' two letters With the same number separated by 'h' or 'w' are coded as a single number
If currCode = -1 Then currCode = prevCode
' Check to see if the current code is the same as the last one
If currCode <> prevCode Then
' Check to see if the current code is 0 (a vowel); do not proceed
If currCode <> 0 Then
code.Append(currCode.ToText)
End If
End If
End If
prevCode = currCode
' If the buffer size meets the length limit, then exit the loop
If (code.Ubound = kLength - 1) Then
Exit For
End If
Next
' Pad the code if required
size = code.Ubound + 1
For j As Integer = size To kLength - 1
code.Append("0")
Next
' Set the return value
value = Text.Join(code, "")
End If
' Return the computed soundex
Return value
End Function
You call the SoundEx function like this:
Dim result As Text
result = SoundEx("Robert") ' R163
result = SoundEx("Rupert") ' R163
result = SoundEx("Rubin") ' R150
result = SoundEx("Ashcraft") ' A261
result = SoundEx("Ashcroft") ' A261
result = SoundEx("Tymczak") ' T522
result = SoundEx("Pfister") ' P236
By saving the SoundEx results (in SQLite, JSON or wherever) you can use them again to compare with SoundEx results on other values for better searching.
Top comments (0)