Java Program to Solve Approximate String Matching using Dynamic Programming

This is a java program to solve approximate string matching using dynamic programming.

Here is the source code of the Java Program to Use Dynamic Programming to Solve Approximate String Matching. The Java program is successfully compiled and run on a Windows system. The program output is also shown below.

  1.  
  2. package com.sanfoundry.setandstring;
  3.  
  4. import java.util.ArrayList;
  5. import java.util.Collections;
  6. import java.util.HashMap;
  7. import java.util.List;
  8. import java.util.Map;
  9. import java.util.Scanner;
  10.  
  11. public class ApproxStringMatching
  12. {
  13.     private List<String>        foods          = new ArrayList<String>();
  14.     private Map<String, Double> matchingScores = new HashMap<String, Double>();
  15.     private Scanner             scanner;
  16.     private static double[][]   mismatchScoreTable;
  17.     private String              in;
  18.     private int                 inLength;
  19.  
  20.     public ApproxStringMatching(String text)
  21.     {
  22.         /*
  23.          * read the file, fill the food list
  24.          */
  25.         try
  26.         {
  27.             scanner = new Scanner(text);
  28.             while (scanner.hasNext())
  29.             {
  30.                 this.foods.add(scanner.nextLine());
  31.             }
  32.         }
  33.         catch (Exception e)
  34.         {
  35.             e.printStackTrace();
  36.             System.exit(1);
  37.         }
  38.         if (mismatchScoreTable == null)
  39.             initMismatchScoreTable();
  40.     }
  41.  
  42.     public List<String> getFoods()
  43.     {
  44.         return this.foods;
  45.     }
  46.  
  47.     private static void initMismatchScoreTable()
  48.     {
  49.         mismatchScoreTable = new double[256][256];
  50.         /*
  51.          * Score any combination of two characters as 1 by default.
  52.          */
  53.         for (int i = 0; i < 256; i++)
  54.             for (int j = 0; j < 256; j++)
  55.                 mismatchScoreTable[i][j] = 1.0d;
  56.         /*
  57.          * If the input charater and reference character are the same,
  58.          * there is no typo. So the error score is 0.
  59.          */
  60.         for (int i = 0; i < 256; i++)
  61.             mismatchScoreTable[i][i] = 0.0d;
  62.         /*
  63.          * For people who use both German keyboard and English keyboard,
  64.          * this typo is highly frequent.
  65.          */
  66.         mismatchScoreTable['y']['z'] = 0.1d;
  67.         mismatchScoreTable['z']['y'] = 0.1d;
  68.         mismatchScoreTable['v']['b'] = 0.15d;
  69.         mismatchScoreTable['b']['v'] = 0.15d;
  70.         mismatchScoreTable['n']['m'] = 0.11d;
  71.         mismatchScoreTable['m']['n'] = 0.11d;
  72.         mismatchScoreTable['t']['r'] = 0.15d;
  73.         mismatchScoreTable['r']['t'] = 0.15d;
  74.         mismatchScoreTable['g']['h'] = 0.15d;
  75.         mismatchScoreTable['h']['g'] = 0.15d;
  76.         mismatchScoreTable['y']['u'] = 0.15d;
  77.         mismatchScoreTable['u']['y'] = 0.15d;
  78.         /*
  79.          * more typo possibilities can be inserted here....
  80.          */
  81.     }
  82.  
  83.     public Map<String, Double> getScores(String in)
  84.     {
  85.         this.in = in;
  86.         this.inLength = in.length();
  87.         for (String food : this.foods)
  88.         {
  89.             int refLength = food.length();
  90.             double[][] errScore = new double[inLength + 1][refLength + 1];
  91.             errScore[0][0] = 0.0d;
  92.             for (int inCharAt = 1; inCharAt <= this.inLength; inCharAt++)
  93.                 errScore[inCharAt][0] = inCharAt;
  94.             for (int refCharAt = 1; refCharAt <= refLength; refCharAt++)
  95.                 errScore[0][refCharAt] = refCharAt;
  96.             for (int inCharAt = 1; inCharAt <= inLength; inCharAt++)
  97.                 for (int refCharAt = 1; refCharAt <= refLength; refCharAt++)
  98.                 {
  99.                     /*
  100.                      * if a character is absent at the given position
  101.                      * in the input string, we add score 1.
  102.                      */
  103.                     double charAbsence = errScore[inCharAt - 1][refCharAt] + 1;
  104.                     /*
  105.                      * if a character is redundant at the given position in the
  106.                      * input string, we add score 1.
  107.                      */
  108.                     double charRedundance = errScore[inCharAt][refCharAt - 1] + 1;
  109.                     /*
  110.                      * if it is a matching error, we add the score specified in
  111.                      * the score table for matching errors.
  112.                      */
  113.                     double mismatch = errScore[inCharAt - 1][refCharAt - 1]
  114.                             + mismatchScoreTable[this.in.charAt(inCharAt - 1)][food
  115.                                     .charAt(refCharAt - 1)];
  116.                     /*
  117.                      * initialize the score for swap error to a very big value.
  118.                      */
  119.                     double charPositionSwap = 999999d;
  120.                     /*
  121.                      * score for swap error
  122.                      */
  123.                     if (inCharAt > 1
  124.                             && refCharAt > 1
  125.                             && this.in.charAt(inCharAt - 1) == food
  126.                                     .charAt(refCharAt - 2)
  127.                             && this.in.charAt(inCharAt - 2) == food
  128.                                     .charAt(refCharAt - 1))
  129.                     {
  130.                         /*
  131.                          * the score for typing "ie" as "ei" and vice versa
  132.                          * is even lower
  133.                          */
  134.                         if (this.in.charAt(inCharAt - 2) == 'e'
  135.                                 && this.in.charAt(inCharAt - 1) == 'i')
  136.                         {
  137.                             charPositionSwap = errScore[inCharAt - 2][refCharAt - 2] + 0.25;
  138.                         }
  139.                         /*
  140.                          * more cases can be inserted here.
  141.                          */
  142.                         else
  143.                             charPositionSwap = errScore[inCharAt - 2][refCharAt - 2] + 0.5;
  144.                     }
  145.                     /*
  146.                      * more error cases can be inserted here.
  147.                      */
  148.                     double minScore = mismatch;
  149.                     if (charAbsence < minScore)
  150.                     {
  151.                         minScore = charAbsence;
  152.                     }
  153.                     if (charRedundance < minScore)
  154.                     {
  155.                         minScore = charRedundance;
  156.                     }
  157.                     if (charPositionSwap < minScore)
  158.                     {
  159.                         minScore = charPositionSwap;
  160.                     }
  161.                     errScore[inCharAt][refCharAt] = minScore;
  162.                 }
  163.             this.matchingScores.put(food, errScore[this.inLength][refLength]);
  164.         }
  165.         return this.matchingScores;
  166.     }
  167.  
  168.     @SuppressWarnings({ "unchecked", "rawtypes" })
  169.     public static void main(String[] args)
  170.     {
  171.         String text = "In computer science, approximate string matching "
  172.                 + "(often colloquially referred to as fuzzy string searching) is the technique of finding strings that match a pattern approximately (rather than exactly). "
  173.                 + "The problem of approximate string matching is typically divided into two sub-problems: finding approximate substring matches inside a given string and finding "
  174.                 + "dictionary strings that match the pattern approximately.";
  175.         Scanner sc = new Scanner(System.in);
  176.         ApproxStringMatching demo = new ApproxStringMatching(text);
  177.         System.out.print("Please type a word. Type q for exit: ");
  178.         sc.nextLine();
  179.         while (sc.hasNext())
  180.         {
  181.             String in = sc.nextLine();
  182.             if (in.equals("q"))
  183.             {
  184.                 System.exit(0);
  185.             }
  186.             System.out.println("You typed " + in);
  187.             System.out.println("--------------------------------------------");
  188.             Map scoreMap = demo.getScores(in);
  189.             for (String food : demo.getFoods())
  190.             {
  191.                 System.out.println(food + "\t error score: "
  192.                         + scoreMap.get(food));
  193.             }
  194.             System.out.println("--------------------------------------------");
  195.             double minScore = (Double) Collections.min(scoreMap.values());
  196.             if (minScore == 0.0d)
  197.             {
  198.                 System.out.println(in + " is in the list.");
  199.             }
  200.             else
  201.             {
  202.                 List<String> corrections = new ArrayList<String>();
  203.                 StringBuffer sb = new StringBuffer("Do you mean:- ");
  204.                 for (String food : demo.getFoods())
  205.                 {
  206.                     if (scoreMap.get(food).equals(minScore))
  207.                     {
  208.                         corrections.add(food);
  209.                         sb.append(food).append(" or ");
  210.                     }
  211.                 }
  212.                 sb.append("?");
  213.                 System.out.println(sb.toString());
  214.             }
  215.             System.out.println("Please type a word. Type q for exit: ");
  216.         }
  217.         sc.close();
  218.     }
  219. }

Output:

$ javac ApproxStringMatching.java
$ java ApproxStringMatching
 
Please type a word. Type q for exit: String 
You typed String
--------------------------------------------
In computer science, approximate string matching (often colloquially referred to as fuzzy string searching) is the technique of finding strings that match a pattern approximately (rather than exactly). The problem of approximate string matching is typically divided into two sub-problems: finding approximate substring matches inside a given string and finding dictionary strings that match the pattern approximately.	 error score: 417.0
--------------------------------------------
Do you mean:- In computer science, approximate string matching (often colloquially referred to as fuzzy string searching) is the technique of finding strings that match a pattern approximately (rather than exactly). The problem of approximate string matching is typically divided into two sub-problems: finding approximate substring matches inside a given string and finding dictionary strings that match the pattern approximately. or ?
Please type a word. Type q for exit: Matching
You typed Matching
--------------------------------------------
In computer science, approximate string matching (often colloquially referred to as fuzzy string searching) is the technique of finding strings that match a pattern approximately (rather than exactly). The problem of approximate string matching is typically divided into two sub-problems: finding approximate substring matches inside a given string and finding dictionary strings that match the pattern approximately.	 error score: 410.0
--------------------------------------------
Do you mean:- In computer science, approximate string matching (often colloquially referred to as fuzzy string searching) is the technique of finding strings that match a pattern approximately (rather than exactly). The problem of approximate string matching is typically divided into two sub-problems: finding approximate substring matches inside a given string and finding dictionary strings that match the pattern approximately. or ?
Please type a word. Type q for exit:

Sanfoundry Global Education & Learning Series – 1000 Java Programs.

advertisement
advertisement

Here’s the list of Best Books in Java Programming, Data Structures and Algorithms.

If you find any mistake above, kindly email to [email protected]

advertisement
advertisement
Subscribe to our Newsletters (Subject-wise). Participate in the Sanfoundry Certification contest to get free Certificate of Merit. Join our social networks below and stay updated with latest contests, videos, internships and jobs!

Youtube | Telegram | LinkedIn | Instagram | Facebook | Twitter | Pinterest
Manish Bhojasia - Founder & CTO at Sanfoundry
Manish Bhojasia, a technology veteran with 20+ years @ Cisco & Wipro, is Founder and CTO at Sanfoundry. He lives in Bangalore, and focuses on development of Linux Kernel, SAN Technologies, Advanced C, Data Structures & Alogrithms. Stay connected with him at LinkedIn.

Subscribe to his free Masterclasses at Youtube & discussions at Telegram SanfoundryClasses.