Regular expressions can be thought of as the algebraic description of a regular language. Regular expression can be defined by the following rules:
- Every letter of the alphabet ∑ is a regular expression.
- Null string є and empty set Φ are regular expressions.
- If r1 and r2 are regular expressions, then
(i) r1, r2
(ii) r1r2 ( concatenation of r1r2 )
(iii) r1 + r2 ( union of r1 and r2 )
(iv) r1*, r2* ( kleen closure of r1 and r2 )
are also regular expressions - If a string can be derived from the rules 1, 2 and 3 then it is also a regular expression.
Note that a* means zero or more occurrence of a in the string while a+ means that one or more occurrence of a in the string. That means a* denotes language L = {є , a, aa, aaa, ….} and a+ represents language L = {a, aa, aaa, ….}. And also note that there can be more than one regular expression for a given set of strings.
Order for precedence for the operations is: kleen > concatenation > union. This rule allows us to lessen the use of parentheses while writing the regular expression. For example a + b*c is the simplified form of (a + ((b)*c)). Note that (a + b)* is not the same as a + b*, a + b* is (a + (b)*).
Question 1: Write a regular expression for a set of strings of 0s and 1s with even number of 0s.
View Answer
For a regular expression r, we denote the language it represents as L(r).
Some rules applicable on regular languages are as follows:
For two regular expressions r1 and r2
- r1 + r2 is a regular expression denoting union of L(r1) and L(r2). That is
L(r1 + r2) = L(r1) U L(r2) - r1r2 is a regular expression denoting the concatenation of L(r1) and L(r2). That is
L(r1r2) = L(r1) . L(r2) - r* is a regular expression denoting the closure of L(r). That is
L(r*) = L(r)*
These rules will be used to create regular expressions from a given languages.
For example consider the language L = {an bm : (n + m) is even}. For writing the regular expression we will have to consider two cases
Case 1: n and m both are even
Case 2: n and m are both odd.
Let the regular expression for case 1 be r1, then
r1 = (aa)* (bb)* (as even numbers can be represented as 2n)
Let the regular expression for case 2 be r1, then
r2 = (aa)* a (bb)* b (as odd numbers can be represented as 2n + 1)
Let the regular expression for the language L be r, then
r = r1 + r1 (as it will be the union of both the cases described above)
r = ((aa)* (bb)*) + (aa)* a (bb)* b)) is the regular expression for the given language.
Question 2: Write a regular expression for the language containing odd number of 1s, ∑ = {0,1}.
View Answer
0*(10*10*)10*
Question 3: Write a regular expression for the language of C identifiers.
View Answer
l = a + b + …. Z + A + B + … + Z
and let d denote regular expression for the digit, then
d = 0 + 1 + … + 9
The language for the identifier contains letter or underscore followed by any number of letters, underscores or digits. The language can be represented by the following regular expression
(l + _)(l + _ + d)*
Question 4: Write a regular expression for the language
L = {abnw: n >= 3, w є (a + b)+}
View Answer
ab3b*(a + b)+
Programming regular expressions
Regular expressions are used to implement fast searching and matching operations. To program a regular expression in C++ we can use the header file “regex”. To access the functions of this header file go to compiler options of the development kit and write -std=c++11 or write -std=gnu++11. In this tutorial we will only cover the theoretical part of regular expression so only one sample program is given. Search the internet for more functions and examples of the regex library.
Here is source code of the C++ Program for matching using regular expressions. The C++ program is successfully compiled and run on a Linux system. The program output is also shown below.
// regex_match example
#include <iostream>
#include <string>
#include <regex>
using namespace std;
int main() {
string s ("regular expressions");
regex e ("(regular)(.*)");
if (regex_match(s, e))
cout << "string object matched\n";
if (regex_match( s.begin(), s.end(), e ))
cout << "range matched\n";
smatch sm;
regex_match(s, sm, e);
cout << "string object with " << sm.size() << " matches\n";
regex_match(s.cbegin(), s.cend(), sm, e);
cout << "range with " << sm.size() << " matches\n";
cout << "the matches were: ";
for (unsigned i = 0; i < sm.size(); ++i)
cout << "[" << sm[i] << "] ";
cout << endl;
return 0;
}
Output: string object matched range matched string object with 3 matches range with 3 matches the matches were: [regular expressions] [regular] [r expressions]
Sanfoundry Global Education & Learning Series – 100+ Theory of Computation Tutorials.
If you wish to look at all Tutorials and their examples, go to Theory of Computation Tutorials.