C Program to Implement Boyer-Moore Algorithm for Pattern Searching

«
»
This is a C Program to implement Boyer-Moore algorithm. The Boyer-Moore algorithm is considered as the most efficient string-matching algorithm in usual applications. A simplified version of it or the entire algorithm is often implemented in text editors for the search and substitute commands.

The algorithm scans the characters of the pattern from right to left beginning with the rightmost one. In case of a mismatch (or a complete match of the whole pattern) it uses two precomputed functions to shift the window to the right. These two shift functions are called the good-suffix shift (also called matching shift and the bad-character shift (also called the occurrence shift).

Assume that a mismatch occurs between the character x[i]=a of the pattern and the character y[i+j]=b of the text during an attempt at position j.
Then, x[i+1 .. m-1]=y[i+j+1 .. j+m-1]=u and x[i] != y[i+j]. The good-suffix shift consists in aligning the segment y[i+j+1 .. j+m-1]=x[i+1 .. m-1] with its rightmost occurrence in x that is preceded by a character different from x[i]

Here is source code of the C Program to Implement Boyer-Moore Algorithm for String Matching. The C program is successfully compiled and run on a Linux system. The program output is also shown below.

advertisement
  1. # include <limits.h>
  2. # include <string.h>
  3. # include <stdio.h>
  4.  
  5. # define NO_OF_CHARS 256
  6.  
  7. // A utility function to get maximum of two integers
  8. int max(int a, int b) {
  9.     return (a > b) ? a : b;
  10. }
  11.  
  12. // The preprocessing function for Boyer Moore's bad character heuristic
  13. void badCharHeuristic(char *str, int size, int badchar[NO_OF_CHARS]) {
  14.     int i;
  15.  
  16.     // Initialize all occurrences as -1
  17.     for (i = 0; i < NO_OF_CHARS; i++)
  18.         badchar[i] = -1;
  19.  
  20.     // Fill the actual value of last occurrence of a character
  21.     for (i = 0; i < size; i++)
  22.         badchar[(int) str[i]] = i;
  23. }
  24.  
  25. void search(char *txt, char *pat) {
  26.     int m = strlen(pat);
  27.     int n = strlen(txt);
  28.  
  29.     int badchar[NO_OF_CHARS];
  30.  
  31.     badCharHeuristic(pat, m, badchar);
  32.  
  33.     int s = 0; // s is shift of the pattern with respect to text
  34.     while (s <= (n - m)) {
  35.         int j = m - 1;
  36.  
  37.         while (j >= 0 && pat[j] == txt[s + j])
  38.             j--;
  39.  
  40.         if (j < 0) {
  41.             printf("\n pattern occurs at shift = %d", s);
  42.  
  43.             s += (s + m < n) ? m - badchar[txt[s + m]] : 1;
  44.  
  45.         }
  46.  
  47.         else
  48.             s += max(1, j - badchar[txt[s + j]]);
  49.     }
  50. }
  51.  
  52. int main() {
  53.     char txt[] = "ABAAABCD";
  54.     char pat[] = "ABC";
  55.     search(txt, pat);
  56.     return 0;
  57. }

Output:

$ gcc Boyer-Moore.c
$ ./a.out
 
pattern occurs at shift = 4

Sanfoundry Global Education & Learning Series – 1000 C Programs.

Sanfoundry Certification Contest of the Month is Live. 100+ Subjects. Participate Now!
advertisement
advertisement

Here’s the list of Best Books in C Programming, Data Structures and Algorithms.

advertisement
advertisement
Subscribe to our Newsletters (Subject-wise). Participate in the Sanfoundry Certification contest to get free Certificate of Merit. Join our social networks below and stay updated with latest contests, videos, internships and jobs!

Youtube | Telegram | LinkedIn | Instagram | Facebook | Twitter | Pinterest
Manish Bhojasia - Founder & CTO at Sanfoundry
Manish Bhojasia, a technology veteran with 20+ years @ Cisco & Wipro, is Founder and CTO at Sanfoundry. He lives in Bangalore, and focuses on development of Linux Kernel, SAN Technologies, Advanced C, Data Structures & Alogrithms. Stay connected with him at LinkedIn.

Subscribe to his free Masterclasses at Youtube & technical discussions at Telegram SanfoundryClasses.