Java Program to Implement Aho Corasick Algorithm for Pattern Searching

This is a java program to implement Aho-Corasick Algorithm. It is a kind of dictionary-matching algorithm that locates elements of a finite set of strings (the “dictionary”) within an input text. It matches all patterns simultaneously. The complexity of the algorithm is linear in the length of the patterns plus the length of the searched text plus the number of output matches. Note that because all matches are found, there can be a quadratic number of matches if every substring matches (e.g. dictionary = a, aa, aaa, aaaa and input string is aaaa).

Here is the source code of the Java Program to Implement Aho-Corasick Algorithm for String Matching. The Java program is successfully compiled and run on a Windows system. The program output is also shown below.

  1.  
  2. package com.sanfoundry.setandstring;
  3.  
  4. import java.util.Arrays;
  5. import java.util.Scanner;
  6.  
  7. public class StringSearchUsingAhoCorasickAlgo
  8. {
  9.     static final int ALPHABET_SIZE = 26;
  10.     Node[]           nodes;
  11.     int              nodeCount;
  12.  
  13.     public static class Node
  14.     {
  15.         int     parent;
  16.         char    charFromParent;
  17.         int     suffLink    = -1;
  18.         int[]   children    = new int[ALPHABET_SIZE];
  19.         int[]   transitions = new int[ALPHABET_SIZE];
  20.         boolean leaf;
  21.         {
  22.             Arrays.fill(children, -1);
  23.             Arrays.fill(transitions, -1);
  24.         }
  25.     }
  26.  
  27.     public StringSearchUsingAhoCorasickAlgo(int maxNodes)
  28.     {
  29.         nodes = new Node[maxNodes];
  30.         // create root
  31.         nodes[0] = new Node();
  32.         nodes[0].suffLink = 0;
  33.         nodes[0].parent = -1;
  34.         nodeCount = 1;
  35.     }
  36.  
  37.     public void addString(String s)
  38.     {
  39.         int cur = 0;
  40.         for (char ch : s.toCharArray())
  41.         {
  42.             int c = ch - 'a';
  43.             if (nodes[cur].children[c] == -1)
  44.             {
  45.                 nodes[nodeCount] = new Node();
  46.                 nodes[nodeCount].parent = cur;
  47.                 nodes[nodeCount].charFromParent = ch;
  48.                 nodes[cur].children[c] = nodeCount++;
  49.             }
  50.             cur = nodes[cur].children[c];
  51.         }
  52.         nodes[cur].leaf = true;
  53.     }
  54.  
  55.     public int suffLink(int nodeIndex)
  56.     {
  57.         Node node = nodes[nodeIndex];
  58.         if (node.suffLink == -1)
  59.             node.suffLink = node.parent == 0 ? 0 : transition(
  60.                     suffLink(node.parent), node.charFromParent);
  61.         return node.suffLink;
  62.     }
  63.  
  64.     public int transition(int nodeIndex, char ch)
  65.     {
  66.         int c = ch - 'a';
  67.         Node node = nodes[nodeIndex];
  68.         if (node.transitions[c] == -1)
  69.             node.transitions[c] = node.children[c] != -1 ? node.children[c]
  70.                     : (nodeIndex == 0 ? 0 : transition(suffLink(nodeIndex), ch));
  71.         return node.transitions[c];
  72.     }
  73.  
  74.     // Usage example
  75.     public static void main(String[] args)
  76.     {
  77.         StringSearchUsingAhoCorasickAlgo ahoCorasick = new StringSearchUsingAhoCorasickAlgo(
  78.                 1000);
  79.         Scanner sc = new Scanner(System.in);
  80.         System.out.println("Enter the main string: ");
  81.         String main = sc.nextLine().toLowerCase().trim();
  82.         System.out.println("Enter the pattern string: ");
  83.         String pattern = sc.nextLine().toLowerCase().trim();
  84.         ahoCorasick.addString(pattern);
  85.         int node = 0;
  86.         for (char ch : main.toCharArray())
  87.         {
  88.             node = ahoCorasick.transition(node, ch);
  89.         }
  90.         if (ahoCorasick.nodes[node].leaf)
  91.             System.out.println("A '" + pattern + "' string is substring of "
  92.                     + main + " string.");
  93.         else
  94.             System.out.println("A '" + pattern
  95.                     + "' string is not substring of " + main + " string.");
  96.         sc.close();
  97.     }
  98. }

Output:

$ javac StringSearchUsingAhoCorasickAlgo.java
$ java StringSearchUsingAhoCorasickAlgo
 
Enter the main string: 
Sanfoundry
Enter the pattern string: 
Asanfoundry
A 'asanfoundry' string is not substring of sanfoundry string.

Sanfoundry Global Education & Learning Series – 1000 Java Programs.

advertisement
advertisement

Here’s the list of Best Books in Java Programming, Data Structures and Algorithms.

If you find any mistake above, kindly email to [email protected]

advertisement
advertisement
Subscribe to our Newsletters (Subject-wise). Participate in the Sanfoundry Certification contest to get free Certificate of Merit. Join our social networks below and stay updated with latest contests, videos, internships and jobs!

Youtube | Telegram | LinkedIn | Instagram | Facebook | Twitter | Pinterest
Manish Bhojasia - Founder & CTO at Sanfoundry
Manish Bhojasia, a technology veteran with 20+ years @ Cisco & Wipro, is Founder and CTO at Sanfoundry. He lives in Bangalore, and focuses on development of Linux Kernel, SAN Technologies, Advanced C, Data Structures & Alogrithms. Stay connected with him at LinkedIn.

Subscribe to his free Masterclasses at Youtube & discussions at Telegram SanfoundryClasses.