Java Program to Implement Levenshtein Distance Computing Algorithm

This is a java program to implement Levenshtein Distance Computing Algorithm. In computer science, edit distance is a way of quantifying how dissimilar two strings (e.g., words) are to one another by counting the minimum number of operations required to transform one string into the other. Edit distances find applications in natural language processing, where automatic spelling correction can determine candidate corrections for a misspelled word by selecting words from a dictionary that have a low distance to the word in question. In bioinformatics, it can be used to quantify the similarity of macromolecules such as DNA, which can be viewed as strings of the letters A, C, G and T. Several definitions of edit distance exist, using different sets of string operations. One of the most common variants is called Levenshtein distance.

Here is the source code of the Java Program to Implement Levenshtein Distance Computing Algorithm. The Java program is successfully compiled and run on a Windows system. The program output is also shown below.

  1.  
  2. package com.sanfoundry.setandstring;
  3.  
  4. import java.util.ArrayList;
  5. import java.util.Collections;
  6. import java.util.List;
  7. import java.util.Scanner;
  8.  
  9. public class ApproxStringMatchingUsingLevenshteinDistance
  10. {
  11.     public static int distance(String a, String b)
  12.     {
  13.         a = a.toLowerCase();
  14.         b = b.toLowerCase();
  15.         int[] costs = new int[b.length() + 1];
  16.         for (int j = 0; j < costs.length; j++)
  17.             costs[j] = j;
  18.         for (int i = 1; i <= a.length(); i++)
  19.         {
  20.             costs[0] = i;
  21.             int nw = i - 1;
  22.             for (int j = 1; j <= b.length(); j++)
  23.             {
  24.                 int cj = Math.min(1 + Math.min(costs[j], costs[j - 1]),
  25.                         a.charAt(i - 1) == b.charAt(j - 1) ? nw : nw + 1);
  26.                 nw = costs[j];
  27.                 costs[j] = cj;
  28.             }
  29.         }
  30.         return costs[b.length()];
  31.     }
  32.  
  33.     public static void main(String[] args)
  34.     {
  35.         Scanner sc = new Scanner(System.in);
  36.         String text = "In computer science, approximate string matching, "
  37.                 + "often colloquially referred to as fuzzy string searching is the technique of finding strings that match a pattern approximately rather than exactly. "
  38.                 + "The problem of approximate string matching is typically divided into two sub-problems: finding approximate substring matches inside a given string and finding "
  39.                 + "dictionary strings that match the pattern approximately.";
  40.         System.out.println("System genetated string is: \n'" + text + "'");
  41.         System.out.println("Enter the keyword to search: ");
  42.         String keyword = sc.nextLine();
  43.         String[] data = text.split(" ");
  44.         List<Integer> dist = new ArrayList<Integer>();
  45.         for (int i = 0; i < data.length; i++)
  46.         {
  47.             dist.add(distance(data[i], keyword));
  48.         }
  49.         Collections.sort(dist);
  50.         System.out.print("Did you mean: ");
  51.         for (int i = 0; i < data.length; i++)
  52.         {
  53.             if (distance(data[i], keyword) == dist.get(0))
  54.             {
  55.                 System.out.print(data[i] + " ");
  56.             }
  57.         }
  58.         sc.close();
  59.     }
  60. }

Output:

$ javac ApproxStringMatchingUsingLevenshteinDistance.java
$ java ApproxStringMatchingUsingLevenshteinDistance
 
System genetated string is: 
'In computer science, approximate string matching, often colloquially referred to as fuzzy string searching is the technique of finding strings that match a pattern approximately rather than exactly. The problem of approximate string matching is typically divided into two sub-problems: finding approximate substring matches inside a given string and finding dictionary strings that match the pattern approximately.'
Enter the keyword to search: 
cpmputer
Did you mean: computer

Sanfoundry Global Education & Learning Series – 1000 Java Programs.

advertisement
advertisement

Here’s the list of Best Books in Java Programming, Data Structures and Algorithms.

If you find any mistake above, kindly email to [email protected]

advertisement
advertisement
Subscribe to our Newsletters (Subject-wise). Participate in the Sanfoundry Certification contest to get free Certificate of Merit. Join our social networks below and stay updated with latest contests, videos, internships and jobs!

Youtube | Telegram | LinkedIn | Instagram | Facebook | Twitter | Pinterest
Manish Bhojasia - Founder & CTO at Sanfoundry
Manish Bhojasia, a technology veteran with 20+ years @ Cisco & Wipro, is Founder and CTO at Sanfoundry. He lives in Bangalore, and focuses on development of Linux Kernel, SAN Technologies, Advanced C, Data Structures & Alogrithms. Stay connected with him at LinkedIn.

Subscribe to his free Masterclasses at Youtube & discussions at Telegram SanfoundryClasses.