Edit distance
The Edit distance is a problem to measure how much two strings are different from one another by counting the minimum number of operations required to convert one string into the other. Edit distance problem can be solved by many different approaches.But the most efficient approach to solve the Edit distance problem is Dynamic programming approach which takes the O(N * M) time complexity, where N and M are sizes of the strings.
Edit distance has different definitions which uses different sets of string operations. Levenshtein distance operations is the basic set of operations which is used in Edit distance Problem.
Operation allowed are:
- Delete any character from the string.
- Replace any character with any other
- Add any character into any part of the string.
Problem Statement
Given two strings str1 and str2, and the task is to find minimum number operations required to convert string str1 into str2.
Edit Distance Problem Example
Below is the example of Edit Distance Problem with input- output constraint and the solution for the example using the Dynamic programming approach.
Input – Output Data for the Algorithm
- Str_1 : This contains the first string.
- Str_2 : This contains the second string.
- N : This contains the size of the first string.
- M : This contains the size of the second string.
- Solution : This is used to store the number of operations required.
Input and Output of the Example
Given two strings str1 = “Big” and str2 = “Bang” of size of N = 3 and N = 3 respectively, and the task is to find minimum number operations required to convert string str1 into str2.
Answer: 2
Solution of the Edit Distance Problem Example
Solution of the above example using the Dynamic programming approach.
Given data are
String 1 : | B | i | g |
---|
String 1 : | B | a | n | g |
---|
1.Create a empty table where First column represents the String 1 and First Row represents the String 2 with additional Value( empty value) in both.
String 1 \ String 2 | Φ | B | a | n | g |
---|---|---|---|---|---|
Φ | |||||
B | |||||
i | |||||
g |
2.Let us start filling the table untill one of the string is empty. We will compare “Big” to Φ and then “Bang” to Φ.
- To convert Φ to Φ, we need no operation so value is 0.
- To convert B to Φ, we need 1 operation of modify, so value is 1.
- To convert i to Φ, we need 2 operation of modify and insert, so value is 2.
- for g to Φ, value is 3.
- Similarly for Bang, values will be 0, 1, 2, 3, 4.
Updated table will be
String 1 \ String 2 | Φ | B | a | n | g |
---|---|---|---|---|---|
Φ | 0 | 1 | 2 | 3 | 4 |
B | 1 | ||||
i | 2 | ||||
g | 3 |
3.Now check the each character of String 1 with String 2. And update the table according to approach.
If Str_1[i-1] == Str_2[j-1]:
Table[i,j] = Table[i-1,j-1]
else:
Table[i,j] = 1 + min(Table[i-1][j-1], Table[i-1][j], Table[i][j-1])
String 1 \ String 2 | Φ | B | a | n | g |
---|---|---|---|---|---|
Φ | 0 | 1 | 2 | 3 | 4 |
B | 1 | 0 | 1 | 2 | 3 |
i | 2 | 1 | 1 | 2 | 3 |
g | 3 | 2 | 2 | 2 | 2 |
6.Return Table[-1][-1] for the answer.
Answer: 2
Pseudocode for Edit Distance Problem using Dynamic programming
Here’s Pseudocode for solving the Edit Distance Problem using Dynamic programming.
Edit_Distance_Problem(Str_1, Str_2):
// length of strings
N <- length of Str_1
M <- length of Str_2
//for each r = 0,...,M
Initialize Table[0,r] = 0
//for each j = 1,...,N
Initialize Table[j,0] = 0
//for every row
For j = 1,...,N:
//for every column
For r = 0,...,M:
If Str_1[j-1] == Str_2[r-1]:
Table[j,r] = Table[j-1,r-1]
else:
Table[j,r] = 1 + min(Table[j-1][r-1], Table[j-1][r],Table[j][r-1])
Return Table[N,M]
Implementation of Edit Distance Problem using Dynamic programming in Python
# Python program for edit distance problem using
# Dynamic Programming Approach
# Create a table to store results of subproblems
def table():
return [[0 for i in range(M + 1)] for j in range(N + 1)]
# function to find minimum number operations
#required to convert string Str_1 into Str_2.
def Edit_Distance_Problem(Str_1, Str_2, N, M):
# Fill d[][] in bottom up manner
for i in range(N + 1):
for j in range(M + 1):
if j == 0:
Table[i][j] = i
elif i == 0:
Table[i][j] = j
elif Str_1[i-1] == Str_2[j-1]:
Table[i][j] = Table[i-1][j-1]
else:
Table[i][j] = 1 + min(Table[i-1][j-1], Table[i-1][j],Table[i][j-1])
# returning the result
return Table[N][M]
# Driver program
# Given two strings
Str_1 = "Big"
Str_2 = "Bang"
# Sizes of the strings
N = len(Str_1)
M = len(Str_2)
# create the Table
Table = table()
# Function calling
Solution = Edit_Distance_Problem(Str_1, Str_2, N, M)
# printing the solution
print(Solution)