C++ Program to Implement Bitap Algorithm for String Matching

This is a C++ Program to Implement Bitap Algorithm. The bitap algorithm (also known as the shift-or, shift-and or Baeza-Yates–Gonnet algorithm) is an approximate string matching algorithm. The algorithm tells whether a given text contains a substring which is “approximately equal” to a given pattern, where approximate equality is defined in terms of Levenshtein distance — if the substring and pattern are within a given distance k of each other, then the algorithm considers them equal. The algorithm begins by precomputing a set of bitmasks containing one bit for each element of the pattern. Then it is able to do most of the work with bitwise operations, which are extremely fast.

Here is source code of the C++ Program to Implement Bitap Algorithm for String Matching. The C++ program is successfully compiled and run on a Linux system. The program output is also shown below.

  1. #include <string>
  2. #include <map>
  3. #include <iostream>
  4.  
  5. using namespace std;
  6. int bitap_search(string text, string pattern)
  7. {
  8.     int m = pattern.length();
  9.     long pattern_mask[256];
  10.     /** Initialize the bit array R **/
  11.     long R = ~1;
  12.     if (m == 0)
  13.         return -1;
  14.     if (m > 63)
  15.     {
  16.         cout<<"Pattern is too long!";
  17.         return -1;
  18.     }
  19.  
  20.     /** Initialize the pattern bitmasks **/
  21.     for (int i = 0; i <= 255; ++i)
  22.         pattern_mask[i] = ~0;
  23.     for (int i = 0; i < m; ++i)
  24.         pattern_mask[pattern[i]] &= ~(1L << i);
  25.     for (int i = 0; i < text.length(); ++i)
  26.     {
  27.         /** Update the bit array **/
  28.         R |= pattern_mask[text[i]];
  29.         R <<= 1;
  30.         if ((R & (1L << m)) == 0)
  31.  
  32.             return i - m + 1;
  33.     }
  34.     return -1;
  35. }
  36. void findPattern(string t, string p)
  37. {
  38.     int pos = bitap_search(t, p);
  39.     if (pos == -1)
  40.         cout << "\nNo Match\n";
  41.     else
  42.         cout << "\nPattern found at position : " << pos;
  43. }
  44.  
  45. int main(int argc, char **argv)
  46. {
  47.  
  48.     cout << "Bitap Algorithm Test\n";
  49.     cout << "Enter Text\n";
  50.     string text;
  51.     cin >> text;
  52.     cout << "Enter Pattern\n";
  53.     string pattern;
  54.     cin >> pattern;
  55.     findPattern(text, pattern);
  56. }

Output:

$ g++ BitapStringMatching.cpp
$ a.out
 
Bitap Algorithm Test
Enter Text
DharmendraHingu
Enter Pattern
Hingu
 
Pattern found at position : 10
------------------
(program exited with code: 0)
Press return to continue

Sanfoundry Global Education & Learning Series – 1000 C++ Programs.

advertisement
advertisement

Here’s the list of Best Books in C++ Programming, Data Structures and Algorithms.

If you find any mistake above, kindly email to [email protected]

advertisement
advertisement
Subscribe to our Newsletters (Subject-wise). Participate in the Sanfoundry Certification contest to get free Certificate of Merit. Join our social networks below and stay updated with latest contests, videos, internships and jobs!

Youtube | Telegram | LinkedIn | Instagram | Facebook | Twitter | Pinterest
Manish Bhojasia - Founder & CTO at Sanfoundry
Manish Bhojasia, a technology veteran with 20+ years @ Cisco & Wipro, is Founder and CTO at Sanfoundry. He lives in Bangalore, and focuses on development of Linux Kernel, SAN Technologies, Advanced C, Data Structures & Alogrithms. Stay connected with him at LinkedIn.

Subscribe to his free Masterclasses at Youtube & discussions at Telegram SanfoundryClasses.