Theory of Computation – Regular Expressions and Regular Languages

Regular languages are languages that can be generated from one-element languages by applying certain standard operations a finite number of times. They are the languages that can be recognized by finite automata. These simple operations include concatenation, union and kleen closure. By the use of these operations regular languages can be represented by an explicit formula.

Regular expressions can be thought of as the algebraic description of a regular language. Regular expression can be defined by the following rules:

  1. Every letter of the alphabet ∑ is a regular expression.
  2. Null string є and empty set Φ are regular expressions.
  3. If r1 and r2 are regular expressions, then
    (i) r1, r2
    (ii) r1r2 ( concatenation of r1r2 )
    (iii) r1 + r2 ( union of r1 and r2 )
    (iv) r1*, r2* ( kleen closure of r1 and r2 )
    are also regular expressions

  4. If a string can be derived from the rules 1, 2 and 3 then it is also a regular expression.

Note that a* means zero or more occurrence of a in the string while a+ means that one or more occurrence of a in the string. That means a* denotes language L = {є , a, aa, aaa, ….} and a+ represents language L = {a, aa, aaa, ….}. And also note that there can be more than one regular expression for a given set of strings.


Order for precedence for the operations is: kleen > concatenation > union. This rule allows us to lessen the use of parentheses while writing the regular expression. For example a + b*c is the simplified form of (a + ((b)*c)). Note that (a + b)* is not the same as a + b*, a + b* is (a + (b)*).

Question 1: Write a regular expression for a set of strings of 0s and 1s with even number of 0s.
View Answer

The set of strings would be 00, 001, 0011, 1001, 1010, …. The regular expression for the above set of strings can be written as (((00)*1*) + (01*0))*.

For a regular expression r, we denote the language it represents as L(r).

Sanfoundry Certification Contest of the Month is Live. 100+ Subjects. Participate Now!

Some rules applicable on regular languages are as follows:
For two regular expressions r1 and r2

  1. r1 + r2 is a regular expression denoting union of L(r1) and L(r2). That is
    L(r1 + r2) = L(r1) U L(r2)

  2. r1r2 is a regular expression denoting the concatenation of L(r1) and L(r2). That is
    L(r1r2) = L(r1) . L(r2)

  3. r* is a regular expression denoting the closure of L(r). That is
    L(r*) = L(r)*

These rules will be used to create regular expressions from a given languages.
For example consider the language L = {an bm : (n + m) is even}. For writing the regular expression we will have to consider two cases
Case 1: n and m both are even
Case 2: n and m are both odd.
Let the regular expression for case 1 be r1, then
r1 = (aa)* (bb)* (as even numbers can be represented as 2n)
Let the regular expression for case 2 be r1, then
r2 = (aa)* a (bb)* b (as odd numbers can be represented as 2n + 1)
Let the regular expression for the language L be r, then
r = r1 + r1 (as it will be the union of both the cases described above)
r = ((aa)* (bb)*) + (aa)* a (bb)* b)) is the regular expression for the given language.

Question 2: Write a regular expression for the language containing odd number of 1s, ∑ = {0,1}.
View Answer

The language will contain at least one 1. It may contain any number of 0s anywhere in the string. So the language we have to write a regular expression for is 1, 01, 01101, 0111, 111, . . . . This language can be represented by the following regular expression

Question 3: Write a regular expression for the language of C identifiers.
View Answer

The language of C identifiers can contain letters, digit and underscore (_). The first character has be the letter or the underscore. The literal cannot begin with a digit. Suppose l denotes the regular expression of letters, i.e
l = a + b + …. Z + A + B + … + Z
and let d denote regular expression for the digit, then
d = 0 + 1 + … + 9
The language for the identifier contains letter or underscore followed by any number of letters, underscores or digits. The language can be represented by the following regular expression
(l + _)(l + _ + d)*

Question 4: Write a regular expression for the language
L = {abnw: n >= 3, w є (a + b)+}
View Answer

The strings in the language begins with a followed by three bs and followed by string w. w will contain at least one a or b. The strings are like abbba, abbbb, abbbbababab, abbbaaaa, . . . This language can be represented by the following regular expression
ab3b*(a + b)+

Programming regular expressions
Regular expressions are used to implement fast searching and matching operations. To program a regular expression in C++ we can use the header file “regex”. To access the functions of this header file go to compiler options of the development kit and write -std=c++11 or write -std=gnu++11. In this tutorial we will only cover the theoretical part of regular expression so only one sample program is given. Search the internet for more functions and examples of the regex library.

Here is source code of the C++ Program for matching using regular expressions. The C++ program is successfully compiled and run on a Linux system. The program output is also shown below.

  1. // regex_match example
  2. #include <iostream>
  3. #include <string>
  4. #include <regex>
  5. using namespace std;
  7. int main() {
  8.     string s ("regular expressions");
  9.     regex e ("(regular)(.*)");
  10.     if (regex_match(s, e))
  11.         cout << "string object matched\n";
  12.     if (regex_match( s.begin(),  s.end(),  e ))
  13.         cout << "range matched\n";
  14.     smatch sm;
  15.     regex_match(s, sm, e);
  16.     cout << "string object with " << sm.size() << " matches\n";
  17.     regex_match(s.cbegin(),  s.cend(),  sm,  e);
  18.     cout << "range with " << sm.size() << " matches\n";
  19.     cout << "the matches were: ";
  20.     for (unsigned i = 0; i < sm.size();  ++i)
  21.         cout << "[" << sm[i] << "] ";
  22.     cout << endl;
  23.     return 0;
  24. }

string object matched
range matched
string object with 3 matches
range with 3 matches
the matches were: [regular expressions] [regular] [r expressions]

Sanfoundry Global Education & Learning Series – 100+ Theory of Computation Tutorials.


If you wish to look at all Tutorials and their examples, go to Theory of Computation Tutorials.

Subscribe to our Newsletters (Subject-wise). Participate in the Sanfoundry Certification contest to get free Certificate of Merit. Join our social networks below and stay updated with latest contests, videos, internships and jobs!

Youtube | Telegram | LinkedIn | Instagram | Facebook | Twitter | Pinterest
Manish Bhojasia - Founder & CTO at Sanfoundry
Manish Bhojasia, a technology veteran with 20+ years @ Cisco & Wipro, is Founder and CTO at Sanfoundry. He lives in Bangalore, and focuses on development of Linux Kernel, SAN Technologies, Advanced C, Data Structures & Alogrithms. Stay connected with him at LinkedIn.

Subscribe to his free Masterclasses at Youtube & technical discussions at Telegram SanfoundryClasses.