Theory of Computation – Context Free Grammars (CFG) and Context Free Languages (CFL)

In this tutorial we will introduce context free grammars and context free language. Context free languages have great practical significance. CFLs are used by the compiler in the parsing phase as they define the syntax of a programming language and are used in many editors.

There are four important components in a grammatical description of a language:

  1. There is a set of symbols that form the strings of the language being defined. They are called terminal symbols, represented by Vt.
  2. There is a finite set of variables, called non-terminals. These are represented by Vn.
  3. One of the variable represents the language being defined; it is called the start symbol. It is represent by S.
  4. There are finite set of rules called productions that represent the recursive definition of a language. Each production consists of:
    1. A variable that is being defined by the production. This variable is often called the head of the production.
    2. The production symbol ->.
    3. A string of zero or more terminals and variable.

Formal definition


A context-free grammar (CFG) is a 4-tuple G=(Vn, Vt, S, P), where Vn and Vt are disjoint finite sets, S is an element of Vn, and P is a finite set of formulas of the form A -> α, where A ϵ Vn and α ϵ (Vn U Vt)*.

Iterative Derivation

If α -> β is a production of P in CFG and a and b are strings in (Vn, Vt)*, then
Aαb =>G aβb
We say that the production α -> β is applied to the string aαb to obtain aβb or we say that aαb directly drives aβb.
Now suppose α1, α2, α3, …., αm are string in (Vn U Vt)*,
m >= 1 and α1 =>G α2, α2 =>G α3, …., αm-1 =>Gαm
Then we say that α1 =>G* αm, i.e. , we say α1, derives αm in grammar G. If α derives β by exactly i steps, we say α=>Gi β.

The Language Generated by a CFG


Let G = (Vn, Vt, S, P) be a CFG. The language generated by G is
L(G) = {x ϵ Σ* | S =>G*x}
A language L is a context-free language(CFL) if there is a CFG G so that L = L(G).

Bacos Naur Form (BNF)

CFGs are generally written in BNF. In this notation we combine multiple productions with same non-terminal on the left hand side as follows:
S -> bA/aB

Note: We always begin with the start symbol S.


Example:Write a CFG for the language
L = {wcwr | w ϵ (a, b)*}

Let G be a CFG for the language L. G = (Vn, Vt, P, S)
Here, Vn = {S}
Vt = {a, b, c}
And P is given by
S -> aSa
S -> bSb
S -> c
Let us check that abbcbba can be derived from the given CFG.
S =>aSa (use the production S -> aSa)
=> abSba (use S -> bSb)
=> abbSbba (use S ->bSb)
=> abbcbba (use S->c)
So string abbcbba can be derived from given CFG.

Eample 2: Write a CFG for the regular expression r = 0*1(0+1)*

To generate a grammar we divide the regular expression into 3 parts 0*, 1, (0 + 1)* represented by 3 non-terminal symbols.
Let the CFG be G = (Vn, Vt, P, S)
Here, Vn = {S, A, B}
Vt = {0, 1}
And productions P are defined as
S -> A1B
S -> 0A/ϵ
S -> 0B/1B/ϵ
Let us test the grammar against an example : 001011
S => A1B
=> 0A10B
=> 00A101B
=> 001011

Question 1: Write a CFG, which generates palindrome for binary numbers.

The grammar will generate palindrome for binary numbers, that is 00, 010, 001100, …
Let G be the CFG G = (Vn, Vt, P, S)
Vn = {S}
Vt = {0, 1}
And production rule P is defined as
S -> 0S0/ 1S1
S -> 0/1/ϵ

Question 2: Write a CFG for language L = {0i1j2k | k <= i or k <= j}

We can think of L as union of two language,
L = L1 U L2
Where L1 = {0i1j2k | k <= i} And L1 = {0i1j2k | k <= j} Let us assume that CFG for L1 is G1
G1 = (Vn, Vt, P1, S1)
Vn = {S1, X, Y}
Vt = {0, 1}
And P1 is defined as
S1 -> S12/0Y2
Y -> C/0Y/ϵ
C -> 1C/ϵ
Let us assume that CFG for L2 is G2
G2 = (Vn, Vt, P2, S2)
Vn = {S2, A, B}
Vt = {0, 1}
And P2 is defined as
S2 -> AB
A -> 0A/ϵ
B -> 1B2/1B/ϵ
We can define CFG for L(since L = L1 U L2)
Let us assume that CFG for L is G
G = (Vn, Vt, P , S)
Vn = { S1, S2, X, Y, A, B}
Vt = {0, 1}
P is defined as
S -> S1/S2
S1 -> S12/0Y2
Y -> C/0Y/ϵ
C -> 1C/ϵ
S2 -> AB
A -> 0A/ϵ
B -> 1B2/1B/ϵ

Sanfoundry Global Education & Learning Series – 100+ Theory of Computation Tutorials.

If you wish to look at all Tutorials and their examples, go to Theory of Computation Tutorials.

Manish Bhojasia, a technology veteran with 20+ years @ Cisco & Wipro, is Founder and CTO at Sanfoundry. He is Linux Kernel Developer & SAN Architect and is passionate about competency developments in these areas. He lives in Bangalore and delivers focused training sessions to IT professionals in Linux Kernel, Linux Debugging, Linux Device Drivers, Linux Networking, Linux Storage, Advanced C Programming, SAN Storage Technologies, SCSI Internals & Storage Protocols such as iSCSI & Fiber Channel. Stay connected with him @ LinkedIn