TOC Notes 2020

CS602-Advance Theory of
Computation
Compiled by
Dr. Abdus Salam

MSc Computer Science, Quaid-e-Azam University, Islamabad (1987)
PhD Computer Science, International Islamic University Islamabad (2011)
Department of Computing
Abasyn University Peshawar
2020
Dr. Abdus Salam Advance Theory of Computation Page |1

Contents
1 Introduction ......................................................................................................................... 4
1.1 Complexity Theory ...................................................................................................... 4
1.2 Computability Theory .................................................................................................. 5
1.3 Automata Theory.......................................................................................................... 5
1.4 Theory of Computation ................................................................................................ 6
1.5 Formal Definition of Computation............................................................................... 6
2 Preliminaries ........................................................................................................................ 7
2.1 Classifying Formal Languages ..................................................................................... 7
2.2 Sets ............................................................................................................................... 7
2.3 Operations on Sets ........................................................................................................ 7
2.4 Additional Terminology ............................................................................................... 8
2.5 Functions ...................................................................................................................... 8
2.6 Graphs .......................................................................................................................... 8
2.7 Trees ............................................................................................................................. 8
2.8 Proof Techniques.......................................................................................................... 8
2.9 Fundamental Concepts ................................................................................................. 9
3 Type-3: Regular Languages............................................................................................... 11
3.1 Finite Automata (FA) ................................................................................................. 11
3.2 Deterministic Finite Automata (DFA) ....................................................................... 11
3.3 Nondeterministic Finite Automata ............................................................................. 13
3.4 DFA = NFA................................................................................................................ 16
4 Grammars for Regular Languages ..................................................................................... 19
4.1 Regular Grammars ..................................................................................................... 19
4.2 Closure I ..................................................................................................................... 22
4.3 Closure II: Union, Concatenation, Negation, Kleene Star, Reverse .......................... 23
4.4 Closure III: Intersection and Set Difference .............................................................. 24
4.5 The Pumping Lemma ................................................................................................. 25
4.6 Applying the Pumping Lemma .................................................................................. 26
5 Type-2: Context-Free Languages ...................................................................................... 28
5.1 Languages and Grammars .......................................................................................... 28
5.2 Context-Free Grammars ............................................................................................. 28
5.3 Derivations (Leftmost and Rightmost) ....................................................................... 30
5.4 Derivation Trees ......................................................................................................... 30
5.5 Simplifying Context-Free Grammars ......................................................................... 32
5.6 Normal Forms of Context-Free Grammars ................................................................ 33
5.7 Chomsky’s Normal Form ........................................................................................... 33
5.8 Parsing ........................................................................................................................ 35
6 Pushdown Automaton (PDA) ............................................................................................ 38
6.1 Pushdown Automaton (PDA) .................................................................................... 38
6.2 Instantaneous Description .......................................................................................... 38
6.3 Accepting Strings with PDA ...................................................................................... 39
6.4 Graphical Representation ........................................................................................... 40
6.5 Deterministic Pushdown Automata (DPDA) ............................................................. 40
6.6 Nondeterministic PDAs.............................................................................................. 44
6.7 CFL and PDA Equivalence ........................................................................................ 45
6.8 From NPDA to CFG .................................................................................................. 49
7 Type-1: Context-Sensitive Languages............................................................................... 53

7.1 Definitions of Context Sensitive Languages .............................................................. 53
7.2 Linear Bounded Automata ......................................................................................... 55
8 Type-0: Recursively Enumerable Languages .................................................................... 58
8.1 Turing Machine .......................................................................................................... 58
8.2 Formal Definition of Turing Machines ...................................................................... 58
8.3 Recursively Enumerable Languages .......................................................................... 60
8.4 Turing Machine as a Language Recognizer ............................................................... 60
9 Decidability ....................................................................................................................... 67
9.1 Decision Problems...................................................................................................... 67
9.2 The Church-Turing Thesis ......................................................................................... 67
9.3 The Halting Problem for Turing Machines ................................................................ 67
10 Undecidability ................................................................................................................ 69
10.1 Problems That Computers Cannot Solve................................................................ 69
10.2 Programs that Print “Hello World"......................................................................... 69
10.3 The Hypothetical “Hello World" Tester ................................................................. 70

1 Introduction
What are the fundamental capabilities and limitations of computers?
This question goes back to the 1930s when mathematical logicians first began to explore the
meaning of computation. Technological advances since that time have greatly increased our
ability to compute and have brought this question out of the realm of theory into the world of
practical concern.
In each of the three areas-automata, computability, and complexity-this question is interpreted
differently, and the answers vary according to the interpretation. Following this, we explore
each area in separately. Here, we introduce these parts in reverse order because starting from
the end you can better understand the reason for the beginning.
Purpose of the Theory of Computation: Develop formal mathematical models

of computation that reflect real-world computers.
1.1 Complexity Theory

Computer problems come in different varieties; some are easy, and some are hard. For
example, the sorting problem is an easy one. Say that you need to arrange a list of numbers in
ascending order. Even a small computer can sort a million numbers rather quickly. Compare
that to a scheduling problem. Say that you must find a schedule of classes for the entire
university to satisfy some reasonable constraints, such as that no two classes take place in the
same room at the same time. The scheduling problem seems to be much harder than the
sorting problem. If you have just a thousand classes, finding the best schedule may require
centuries, even with a supercomputer.
What makes some problems computationally hard and others easy?
Central Question in Complexity Theory: Classify problems according to

their degree of “difficulty”. Give a rigorous proof that problems that seem to
be “hard” are really “hard”.
This is the central question of complexity theory. Remarkably, we don't know the answer to it,
though it has been intensively researched for the past 35 years. In one of the important
achievements of complexity theory thus far, researchers have discovered an elegant scheme
for classifying problems according to their computational difficulty. It is analogous to the
periodic table for classifying elements according to their chemical properties. Using this
scheme, we can demonstrate a method for giving evidence that certain problems are
computationally hard, even if we are unable to prove that they are.
You have several options when you confront a problem that appears to be computationally
hard.

First, by understanding which aspect of the problem is at the root of the difficulty, you may
be able to alter it so that the problem is more easily solvable.
Second, you may be able to settle for less than a perfect solution to the problem. In certain
cases finding solutions that only approximate the perfect one is relatively easy.
Third, some problems are hard only in the worst case situation, but easy most of the time.
Depending on the application, you may be satisfied with a procedure that occasionally is slow
but usually runs quickly.
Finally, you may consider alternative types of computation, such as randomized computation,
that can speed up certain tasks.
One applied area that has been affected directly by complexity theory is the ancient field of
cryptography. In most fields, an easy computational problem is preferable to a hard one
because easy ones are cheaper to solve. Cryptography is unusual because it specifically
requires computational problems that are hard, rather than easy, because secret codes should
be hard to break without the secret key or password. Complexity theory has pointed
cryptographers in the direction of computationally hard problems around which they have
designed revolutionary new codes.
1.2 Computability Theory

During the first half of the twentieth century, mathematicians such as Kurt Godel, Alan
Turing, and Alonzo Church discovered that certain basic problems cannot be solved by
computers.
One example of this phenomenon is the problem of determining whether a mathematical
statement is true or false. This task is the bread and butter of mathematicians. It seems like a
natural for solution by computer because it lies strictly within the realm of mathematics. But
no computer algorithm can perform this task.
Among the consequences of this profound result was the development of ideas concerning
theoretical models of computers that eventually would help lead to the construction of actual
computers.
Central Question in Computability Theory: Classify problems as being

solvable or unsolvable.
The theories of computability and complexity are closely related. In complexity theory, the
objective is to classify problems as easy ones and hard ones, whereas in computability theory
the classification of problems is by those that are solvable and those that are not.
Computability theory introduces several of the concepts used in complexity theory.
1.3 Automata Theory

Automata theory deals with the definitions and properties of mathematical models of
computation. These models play a role in several applied areas of computer science. One
model, called the finite automaton, is used in text processing, compilers, and hardware design.
Another model, called the context-free grammar, is used in programming languages and
artificial intelligence.

Automata theory is an excellent place to begin the study of the theory of computation. The
theories of computability and complexity require a precise definition of a computer. Automata
theory allows practice with formal definitions of computation as it introduces concepts
relevant to other non-theoretical areas of computer science.
Central Question in Automata Theory: Do these models have the same power, or
can one model solve more problems than the other?
1.4 Theory of Computation

The theory of computation begins with a question: What is a computer? It is perhaps a silly
question, as everyone knows that this thing I type on is a computer. But these real computers
are quite complicated-too much so to allow us to set up a manageable mathematical theory of
them directly. Instead we use an idealized computer called a computational model. As with
any model in science, a computational model may be accurate in some ways but perhaps not
in others.
Thus we use several different computational models, depending on the features we want to
focus on. You are already familiar with the simplest model, called the finite state machine or
finite automaton as defined in definition 1.
Definition 1:
A finite automaton is a 5-tuple (Q, Σ, δ, qo, F), where
1. Q is a finite set called the states,
2. Σ is a finite set called the alphabet,
3. δ:Q x Σ → Q is the transition function,
4. qo  Q is the start state, and
5. F  Q is the set of accept states.
1.5 Formal Definition of Computation

Based on the formal definition of finite automaton we can define the computation.
Let M = (Q, Σ, δ, qo, F) be a finite automaton and let w = wI w2 . . . wn be a string where each
wi is a member of the alphabet Σ. Then M accepts w if a sequence of states r0, r1, . . . , rn in Q
exists with three conditions:
1. ro = qo,
2. δ(ri, wi+1) = ri+1, for i = 0, ..., n - 1, and
3. rn  F.
Condition 1 says that the machine begins in the start state.
Condition 2 says that the machine goes from state to state according to the transition
function.
Condition 3 says that the machine accepts its input if it ends up in an accept state.
We say that M recognizes language A if A = {w | M accepts w}.

2 Preliminaries
2.1 Classifying Formal Languages
Chomsky’s classification of formal languages:
Table 1: Chomsky's Hierarchy
TYPE GRAMMAR AUTOMATA

(Language Generating Devices) (Language Recognition Devices)
Type–3: Finite Automata
Regular Grammar (RG)
Regular Languages DFA/NFA/∈-NFA
Type–2: Context-Free Grammar Pushdown Automata
Context-Free Languages (CFG) (PDA)
Type–1: Context Sensitive Grammar Linear Bounded Automata
Context Sensitive Languages (CSG) (LBA)
Type–0:
Recursively Enumerable Free Grammar Turing Machine
Languages
Type–3  Type–2  Type–1  Type–0
2.2 Sets
 Importance: languages are sets
 A set is a collection of "things," called the elements or members of the set. It is
essential to have a criterion for determining, for any given thing, whether it is or is not
a member of the given set. This criterion is called the membership criterion of the set.
 There are two common ways of indicating the members of a set:
o List all the elements, e.g. {a, e, i, o, u}
o Provide some sort of an algorithm or rule, such as a grammar
 Notation:
o To indicate that x is a member of set S, we write x S
o We denote the empty set (the set with no members) as {} or
o If every element of set A is also an element of set B, we say that A is a subset
of B, and write A  B
o If every element of set A is also an element of set B, but B also has some
elements not contained in A, we say that A is a proper subset of B, and write A
B
2.3 Operations on Sets

 The union of sets A and B, written A  B, is a set that contains everything that is in A,
or in B, or in both.
 The intersection of sets A and B, written A  B, is a set that contains exactly those
elements that are in both A and B.

 The set difference of set A and set B, written A - B, is a set that contains everything
that is in A but not in B.
 The complement of a set A, written as Ä with a bar drawn over it, is the set containing
everything that is not in A. This is almost always used in the context of some universal
set U that contains "everything" (meaning "everything we are interested in at the
moment"). Then Ä is shorthand for U - A.
2.4 Additional Terminology

The cardinality of a set A, written |A|, is the number of elements in a set A. The power set of a
set Q, written 2 , is the set of all subsets of Q. The notation suggests the fact that a set
containing n elements has a power set containing 2 elements. Two sets are disjoint if they
have no elements in common, that is, if A∩B = .
2.5 Functions
A function is a rule which relates the values of one variable quantity to the values of another
variable quantity, and does so in such a way that the value of the second variable quantity is
uniquely determined by (i.e. is a function of) the value of the first variable quantity.
2.6 Graphs
Automata are graphs. A graph consists of two sets:
 A set V of vertices (or nodes), and
 A set E of edges (or arcs).
o An edge consists of a pair of vertices in V. If the edges are ordered, the graph
is a digraph (a contraction of "directed graph").
o A walk is a sequence of edges, where the finish vertex of each edge is the start
vertex of the next edge. e.g.: (a, e), (e, i), (i, o), (o, u).
o A path is a walk with no repeated edges.
o A simple path is a path with no repeated vertices.
2.7 Trees
 Trees are used in some algorithms.
 A tree is a kind of digraph:
o It has one distinguished vertex called the root;
o There is exactly one path from the root to each vertex; and
o The level of a vertex is the length of the path to it from the root.
 Terminology:
o If there is an edge from A to B, then A is parent of B, and B is the child of A.
o A leaf is a node with no children.
o The height of a tree is the largest level number of any vertex.
2.8 Proof Techniques

Importance
 Proofs are encapsulated understanding
 You may be asked to learn a very few important proofs

Proof by induction
 Prove something about P1 (the basis)
 Prove that if it is true for Pn, then it is true for Pn+1 (the inductive assumption)
 Conclude that it is true for all P
Proof by contradiction (also called reductio ad absurdum)

 Assume some fact P is false
 Show that this leads to a contradiction
 Conclude P must be true
2.9 Fundamental Concepts

 Symbol
A symbol is a single object and is an abstract entity that has no meaning by itself. It can
be alphabet, character or special character such as 0, 1, a, b, #, $ etc
 Alphabet
An alphabet is a finite, nonempty set of symbols, Σ (sigma) is used for an alphabet.
For example
{0, 1} is an alphabet with two symbols
{a, b} is another alphabet with two symbols, and
English alphabet is also an alphabet
Σ = {0,1}, or Σ = the binary alphabet
Σ = {a,b,…..z}, or Σ=the set of all lower-case letters
 String
o A string is a finite sequence of symbols chosen from some alphabet. e.g. 01101
is a string from the binary alphabet Σ = {0, 1}.
o The number of symbols in a string is called the length of a string. e.g. 011101
has length 5 and the number of symbols are 2. The standard notation for the
length of a string w is |w|. e.g. |011| = 3 and | ε | = 0
o The empty string (also called null string) is the string with length zero. That is,
it has no symbols. It is denoted with ε (epsilon), that may be chosen from any
alphabet whatsoever. Thus | ε | = 0.
o Powers of an Alphabet: If Σ is an alphabet, we define Σk to be the set of strings
of length k, each of whose symbols is in Σ. e.g. Σ0 = {ε}. If Σ = {0,1}, then
Σ1 = {0,1}
Σ2 = {00,01,10,11}
Σ3 = {000,001,010,011,100,101,110,111}
o Kleene Star: The set of all strings over an alphabet is denoted as Σ*. For
instance, {0,1}* = {ε,0,1,00,01,10,11,000,….}. Put another way,
Σ* = Σ0 U Σ1 U Σ2 U Σ3 ….
Σ+ = Σ1 U Σ2 U Σ3 ….
o Concatenation of Strings: Let x and y be strings. Then xy denotes the string
obtained by concatenating x with y, that is, xy is the string obtained by
appending the sequence os symbols of y to that of x. e.g. if x = aab and y =
bbab, then xy = aabbbab. Note that xy ≠ yx.

 Language
o A language is a set of strings over an alphabet. Thus {a, ab, baa} is a language
(over alphabet {a, b}).
o A set of strings all of which are chosen from some Σ*, where Σ is a particular
alphabet, is called a language. If Σ is an alphabet, and L Σ*, then L is a
language over Σ.
o An example is English, where the collection of legal English words is a set of
strings over the alphabet that consists of all the letters.
 Language Examples
o L1 = {w {a, b}* : each a in w is immediately preceded and immediately
followed by a b}.
o L2 = {w  {a, b}* : w has abab as substring}.
o L3 = {w {a, b}* : w has neither aa nor bb as a substring}.
o L4 = {w  {a, b}* : w has an even number of substring ab}.
o L5 = {w  {a, b}* : w has both ab & ba as substrings}.
o L6 = {w  {a, b}* : w contains a’s and b’s and end in bb}.
o L7 = {w {a, b}* : w contains different first and last letters. If word begins
with a a, to be accepted it must end with a b and vice versa}.
o L8 = {w  {a, b}* : w starts with a and has odd number of a’s or starts with b
and has even number of b’s }.
o L9 = {w  {a, b}* : w has three consecutive b’s (not necessarily at the end)}.
o L10 = {w  {a, b}* : w contain odd number of a’s and b’s}
There are three fundamental concepts that we will be working with in this course:
 Languages
o A language is a subset of the set of all possible strings formed from a given set
of symbols.
o There must be a membership criterion for determining whether a particular
string is in the set.
 Grammars
o A grammar is a formal system for accepting or rejecting strings.
o A grammar may be used as the membership criterion for a language.
 Automata
o An automaton is a simplified, formalized model of a computer.
o An automaton may be used to compute the membership function for a
language.
o Automata can also compute other kinds of things.
Assignment No. 1
Dr. Abdus Salam Advance Theory of Computation Page | 10

3 Type-3: Regular Languages
3.1 Finite Automata (FA)
The finite automaton is basically a very simple
computer that consists only of an input tape, a
tape reading device called reading head, and a
finite control unit. The input tape provides the
string of symbols to be computed. Initially, the
reading head device is placed at the leftmost
square of the tape and the finite control is set in a
designated start state. The reading head linearly
reads the tape telling the finite control unit which
symbol is currently being read. The finite control
unit exists in a one of a finite number of states, which Figure 1: A finite automata
contain a start state and some number of final states. For

each symbol that is read the control unit either stays in the same state or moves to another
state, for any state and any character this decision is fixed. If after reading a string from an
input tape the automaton is in a final state then the string is said to be accepted by the
automaton.
3.2 Deterministic Finite Automata (DFA)

A deterministic finite automaton is a simple language recognition device. It is called
deterministic because their operation is completely determined by the present state and input
read. DFAs are:
 Deterministic--there is no element of choice
 Finite--only a finite number of states and arcs
 Automata--produce only a yes/no answer
A DFA is drawn as a graph, with each state is represented by a circle.
One designated state is the start state. Some states (possibly including the start state) can be
designated as final states. Final states are represented by double-circle.
Arcs between states represent state transitions -- each such arc is labeled with the symbol that
triggers the transition.
Example 3.1: Design a DFA to accept all strings of even number of 0’s and 1’s.
Example input string: 1 0 0 1 1 1 0 0
Operation
Start with the "current state" set to the start state and a "read head" at the beginning of the
input string; while there are still characters in the string:

Read the next character and advance the read head;
From the current state, follow the arc that is labeled
with the character just read; the state that the arc points
to becomes the next current state;
When all characters have been read, accept the string if

the current state is a final state, otherwise reject the
string.
Figure 2: A DFA for Even-Even Language
Sample trace:
Since q0 is a final state, the string is accepted.
3.2.1 Formal Definition of a DFA

A deterministic finite Automata or DFA is a quintuple: M = (Q, Σ, δ, q0, F), where;
Q is a finite set of states,
Σ is a finite set of symbols, the input alphabet,
δ: Q x Σ Q is a transition function,
q0  Q is the initial state,
F  Q is a set of final states.
3.2.2 Automata for Ada identifiers

In Ada, an identifier consists of a letter followed by any number of letters, digits, and
underlines. However, the identifier may not end in an underline or have two underlines in a
row. Here is an automaton to recognize Ada identifiers.
M = (Q, Σ, δ, q0, F), where

Q is {q0, q1, q2, q3},
Σ is {letter, digit, underline},
δ is given by
δ(q0, letter) = q1 δ(q1, letter) = q1
δ(q0, digit) = q3 δ(q1, digit) = q1
δ(q0, underline) = q3 δ(q1, underline) = q2
δ(q2, letter) = q1 δ(q3, letter) = q3
δ(q2, digit) = q1 δ(q3, digit) = q3
δ(q2, underline) = q3 δ(q3, underline) = q3
q0  Q is the initial state,

{q1}  Q is a set of final states.
Figure 3: A DFA for Ada identifiers

3.2.3 Extended Transition Function (DFA)
Definition δ*: The fact that δ is a function implies that every vertex has an outgoing arc for
each member of Σ. We can also define an extended transition function δ* as δ*: Q x Σ* Q.
Basis: δ*(q0, ε) = q0
Induction: Suppose w = xa then δ*(q, w) = δ(δ*(q, x), a)
If a DFA M = (Q, Σ, δ, q0, F) is used as a membership criterion, then the set of strings
accepted by M is a language. That is, L(M) = {w  Σ* : δ*(q0, w)  F}.
Languages that can be defined by DFAs are called regular languages.
Extended transition function for DFA discussed in Example 3.1:

δ(q0,1010) =?
δ (q0, ε ) = q0
δ…
3.2.4 Abbreviated Automata for Ada Identifiers

The following is an abbreviated automaton to
recognize Ada identifiers. The difference is that,
in this automaton, δ does not appear to be a
function. It looks like a partial function, that is, it
is not defined for all values of Q  Σ.
We can complete the definition of δ by assuming
the existence of an "invisible" state and some
Figure 4: A DFA for Ada identifiers
"invisible" arcs. Specifically, there is exactly one
implicit error state; If there is no path shown from a
state for a given symbol in Σ, there is an implicit path for that symbol to the error state; the
error state is a trap state: once you get into it, all arcs (one for each symbol in Σ) lead back to
it; and the error state is not a final state.
The automaton represented above is really exactly the same as the automaton on the previous
page; we just haven't bothered to draw one state and a whole bunch of arcs that we know must
be there.
3.3 Nondeterministic Finite Automata

A finite-state automaton can be nondeterministic in either or both of two ways:
Figure 5: A simple NFA Figure 6: A simple NFA with empty moves
A state may have two or more arcs emanating from it labeled with the same symbol (Figure
5). When the symbol occurs in the input, either arc may be followed.

A state may have one or more arcs emanating from it labeled with λ or  (the empty string)
(Figure 6). These arcs may optionally be followed without looking at the input or consuming
an input symbol.
Due to non-determinism, the same string may cause an NFA to end up in one of several
different states, some of which may be final while others are not. The string is accepted if any
possible ending state is a final state.
3.3.1 Formal Definition of NFAs

A nondeterministic finite Automata or NFA is defined by the quintuple:
M = (Q, Σ, δ, q0, F)
Where;
 Q is a finite set of states,
 Σ is a finite set of symbols, the input alphabet,
 δ: Q  ( Σ  { λ} ) 2 is a transition function,
 q0  Q is the initial state,
 F  Q is a set of final states.
These are all the same as for a DFA except for the definition of δ:
 Transitions on ε (or λ) are allowed in addition to transitions on elements of Σ, and
 The range of δ is 2 rather than Q. This means that the values of δ are not elements of
Q, but rather are sets of elements of 2Q.
The language defined by NFA M is defined as

L(M) = {w  Σ*: δ*(q0, w)  F }
Extended Transition Function for NFA (δ*)
Definition: The fact that δ is a function implies that every vertex has an outgoing arc for each
member of Σ. We can also define an extended transition function δ*as δ*: Q  Σ* 2Q.
Basis: δ(q, ε) = q
Induction: Suppose w = xa then δ*(q, w) = ⋃𝑘𝑖=1 δ(δ∗ (𝑝𝑖 , 𝑥), 𝑎)
Example 3.2:
(q0,aba) =?
(q0, ε ) = {q0}
:
Figure 5: A simple NFA

Extended Transition Function for ε-NFA (δ*)
The presence of -transitions (i.e., when q ∈ δ(p, )) causes technical problems. To overcome
these problems we introduce the notion of ε-closure.
For any state p of an NFA we define the ε-closure of p to be set ε-closure(p) consisting of all
states q such that there is a path from p to q whose spelling is ε. This means that either q = p,
or that all the edges on the path from p to q have the label .
Definition
Basis: State p is in ε-closure of p.
Induction: If state p is in ε-closure(q), and there is transition from state p to state r labelled ,
then r is also in ε-closure(q)
Definition: The extended transition function δ*: Q  (Σ {}) 2Q for ε-NFA is define as:
Basis: δ* (q, ) = ε-closure(q)
Induction: Suppose w = xa then δ*(q, xa) = ⋃𝑝∈δ(δ∗(𝑞𝑖 , 𝑥),𝑎) ε − closure(𝑝))
Example 3.3
Figure 6: An e- NFA for the language of Integer
Figure 7: An e-NFA for the language of integer or real numbers
δ(q0, +12.5) =?
δ(q0,  ) = ε-closure({q0})= { q0, q1}
:
Assignment No. 2

3.4 DFA = NFA
Two automata are equivalent if they accept the same language. A DFA is just a special case of
an NFA that happens not to have any null transitions or multiple transitions on the same
symbol. So DFAs are not more powerful than NFAs.
For any NFA, we can construct an equivalent DFA (see below). So NFAs are not more
powerful than DFAs. DFAs and NFAs define the same class of languages -- the regular
languages.
To translate an NFA into a DFA, the trick is to label each state in the DFA with a set of states
from the NFA. Each state in the DFA summarizes all the states that the NFA might be in. If
the NFA contains |Q| states, the resultant DFA could contain as many as |2 | states. (Usually
far fewer states will be needed.)
Example 3.4: Consider the following non-deterministic finite automaton (NFA).
Figure 8: An e-NFA to be converted to DFA
Q = states = {1, 2, 3, 4, 5}
Start state: { 1 }
Accepting state(s): { 5 }
Now construct an equivalent DFA.

 The states of the DFA will be determined by subsets of the states of the NFA.
Unfortunately, there are 25 = 32 different subsets of Q. Here are some of them:
 QD = states for the DFA
 Some states in QD = Empty set, {1}, {2}, {3}, {4}, {5}, {1,2}, {1,3}, ..., {1,2,3,4,5}
 But not all will be reachable from the start state of the DFA to be constructed.
 The start state is the ε-closure of the start state of the NFA.
 The final states of the DFA will correspond to the subsets of the NFA that contain a
final state.
The ε-closure of a set of states, R, of the NFA will be denoted by E(R). E(R) = R ∪ { q | there
is an r in R with an ε transition to q }. In the example, E({1}) = {1} ∪ { 2 } = {1,2}

Construction
1. Compute the start state: E({1}) = {1,2}
2. Start a table for the transition function or draw a diagram with {1,2} but only
containing the start state:
a b
{1,2}
3. Compute the transition function for the DFA from the start state.
a. For one of the inputs, say 'a', consider all possible states that can be reached in
the NFA from any one of the states in {1,2} on input 'a'. These are states that
are reachable along paths labeled 'a', also allowing any edges labeled ε.
For example, DFA state {1, 2}

From 1, we have
a path from 1 to 3 labeled 'a': 1 a 3
a path from 1 to 4 labeled ε 'a': 1 ε 2 'a' 4
a path from 1 to 5 labeled ε 'a': 1 ε 3 'a' 5
From 2, we have
a path from 2 to 4 labeled 'a': 2 'a' 4
a path from 2 to 5 labeled 'a': 3 'a' 5
So altogether we can reach {3,4,5} from {1,2} on input 'a'
a b
{1,2} {3,4,5} ∅
b. Next compute the transitions from the start state with input 'b'. But when the
NFA transitions are examined there are no paths from either state in {1,2} with
label 'b'. So the subset of states that can be reached is the empty set, ∅.
4. If a new state is reached when the transitions on 'a' and 'b' are computed, the process
has to be repeated this new state. For example, {3, 4, 5} is a new state for the DFA and
so we must compute transitions from this state.
DFA state {3, 4, 5}, input 'a'

From 3, we have no transition paths labeled 'a'
From 4, a path from 4 to 5 labeled 'a': 4 a 5
From 5, there are no transition paths labeled 'a'
So altogether we can reach {5} from {3, 4, 5} on input 'a'
DFA state {3, 4, 5}, input 'b'

From state 3, a path from 3 to 4 labeled 'b': 3 b 4
From state 4, a path from 4 to 5 labeled 'b': 4 b 5
From 5, there are no transition paths labeled 'b'
So altogether we can reach {4, 5} from {3, 4, 5} on input 'b'

Filling in the table we get:
a b
{1,2} {3,4,5} ∅
{3,4,5} {5} {4,5}
5. Continuing filling in the table as long as any states are entered that do not yet have a
row. For example neither {5} or {4, 5} have a row yet. So pick one and compute its
transitions.
The final states of the DFA are the sets that contain 5 since that is the only final state of the
NFA. The final table and corresponding DFA state diagram are:
a b
{1,2} {3,4,5} ∅
{3,4,5} {5} {4,5}
{5} ∅ ∅
{4,5} {5} {5}
∅ ∅ ∅
Converted DFA and its transitions.
Figure 9: Final DFA
Assignment No. 3

4 Grammars for Regular Languages
Now we shift our focus to study certain types of formal language generators. Such a device
begins, when given some sort of “start” signal, to construct (produce) a string. Its operation is
not completely determined from the beginning but is nevertheless limited by a set of rules.
Eventually this process halts, and the device outputs a completed string. The language defined
by the device is the set of all strings that it can generate.
In computer programming languages, a variable or identifier is a storage address associate

with a quantity referred to as value. An identifier is an arbitrarily long sequence of digits,
underscores, lowercase and uppercase letters. A valid identifier must begin with a non-digit
character (letter, underscore, or Unicode non-digit character). Identifiers are case-sensitive
(lowercase and uppercase letters are distinct), and every character is significant. A C++
identifier rules can be written as:
<id>  <letter><rest>
<letter>  a|b|c…A|B|..|Z|_
<rest>  <letter><rest>|<digit><rest>|e
<digit>  0|1|2….|9
Grammar
 V is a finite set of (meta) symbols, or variables.
 T is a finite set of terminal symbols.
 S V is a distinguished element of V called the start symbol.
 P is a finite set of productions.
The above is true for all grammars. We will distinguish among different kinds of grammars
based on the form of the productions. If the productions of a grammar all follow a certain
pattern, we have one kind of grammar. If the productions all fit a different pattern, we have a
different kind of grammar.
Productions have the form:

(V T) (V T) .
Different types of grammars can be defined by putting additional restrictions on the left-hand
side of productions, the right-hand side of productions, or both.
We know that languages can be defined by grammars. Now we will begin to classify
grammars; and the first kinds of grammars we will look at are the regular grammars.
4.1 Regular Grammars

A regular grammar is either a right-linear grammar or a left-linear grammar.
To be a right-linear grammar, every production of the grammar must have one of the two
forms V T*V or V T*.

To be a left-linear grammar, every production of the grammar must have one of the two forms
V VT* or V T*.
You do not get to mix the two. For example, consider a grammar with the following
productions:
S 
S aX
X Sb
This grammar is neither right-linear nor left-linear, hence it is not a regular grammar. We have
no reason to suppose that the language it generates is a regular language (one that is generated
by a DFA).
In fact, the grammar generates a language whose strings are of the form a b . This language
cannot be recognized by a DFA. (Why not?)
4.1.1 Right-Linear Grammars

In general, productions have the form:
(V T) (V T) .
In a right-linear grammar, all productions have one of the two forms:
V T*V
or
V T*
That is, the left-hand side must consist of a single variable, and the right-hand side consists of
any number of terminals (members of ) optionally followed by a single variable. (The
"right" in "right-linear grammar" refers to the fact that, following the arrow, a variable can
occur only as the rightmost symbol of the production.)
4.1.2 Right-Linear Grammars and NFAs

So DFAs, NFAs, and regular grammar are all "equivalent," in the sense that any language you
define with one of these could be defined by the others as well. A regular grammar can be
converted to an equivalent NFA; thus, a language defined by a regular grammar is a regular
language.
Let G = (V, Σ, P, S) be a regular grammar. Define the NFA M = (Q, Σ, δ, S, F) as follows:
i) Q = V  {f} where f V, if P contains a rule A  a
V otherwise
ii) δ(A, a) = B whenever A  aB  P
f whenever A  a  P
iii) f = {A | A   P}  {f} if f  Q
{A| A   P} Otherwise
Then L(M) = L(G)

This simple relationship between the right-linear grammars and NFAs can also be depicted by
the following diagrams:
A xB
A xyzB
A B
A x
As an example of the correspondence between an NFA and a right-linear grammar, the

following automaton and grammar both recognize the set of strings consisting of an even
number of 0's and an even number of 1's.
S 
S 0B
S 1A
A 0C
A 1S
B 0S
B 1C
C 0A
C 1B
Find regular grammar for the following NFA:
4.1.3 Left-Linear Grammars

In a left-linear grammar, all productions have one of the two forms:
V VT*
or
V T*
That is, the left-hand side must consist of a single variable, and the right-hand side consists of
an optional single variable followed by any number of terminals. This is just like a right-linear
grammar except that, following the arrow, a variable can occur only on the left of the
terminals, rather than only on the right.

Example 4.1:
G = (V, T, S, P): V = {S, A, B} T = {a, b} S = S P = {S → Aab, A → Aab|B, B → a}
We won't pay much attention to left-linear grammars, because they turn out to be equivalent
to right-linear grammars. Given a left-linear grammar for language L, we can construct a
right-linear grammar for the same language, as follows:
Step Method
Construct a right-linear Replace each production A x of L with a production A
grammar for the (different) xR, and replace each production A B x with a
language LR. production A R
x B.
Construct an NFA for LR We talked about deriving an NFA from a right-linear
from the right-linear grammar on an earlier page. If the NFA has more than one
grammar. This NFA should final state, we can make those states nonfinal, add a new
have just one final state. final state, and put transitions from each previously final
state to the new final state.
Reverse the NFA for LR to 1. Construct an NFA to recognize language L.
obtain an NFA for L. 2. Ensure the NFA has only a single final state.
3. Reverse the direction of the arcs.
4. Make the initial state final and the final state initial.
Construct a right-linear This is the technique we just talked about on an earlier page.
grammar for L from the
NFA for L.
4.2 Closure I
A set is closed under an operation if, whenever the operation is applied to members of the set,
the result is also a member of the set.
For example, the set of integers is closed under addition, because x+y is an integer whenever
x and y are integers. However, integers are not closed under division: if x and y are integers,
x/y may or may not be an integer.
We have defined several operations on languages:
L1  L2 Strings in either L1 or L2
L1  L2 Strings in both L1 and L2
L1L2 Strings composed of one string from L1 followed by one string from L2
-L1 All strings (over the same alphabet) not in L1
L1* Zero or more strings from L1 concatenated together
L1 - L2 Strings in L1 that are not in L2
L1R Strings in L1, reversed

We will show that the set of regular languages is closed under each of these operations.
4.3 Closure II: Union, Concatenation, Negation, Kleene Star, Reverse
4.3.1 General Approach

 Build automata (DFAs or NFAs) for each of the languages involved.
 Show how to combine the automata to create a new automaton that recognizes the
desired language.
 Since the language is represented by an NFA or DFA, conclude that the language is
regular.
4.3.2 Union of L1 and L2

L1 U L2 = {𝑤 ∶ 𝑤 ∈ 𝐴 𝑜𝑟 𝑤 ∈ 𝐵 }
 Create a new start state.
 Make a transition from the new start state to each of the original start states.
4.3.3 Concatenation of L1 and L2

 Put a  transition from each final state of L1 to the initial state of L 2
 Make the original final states of L1 nonfinal
4.3.4 Negation of L1 (Complement)

 Start with a (complete) DFA, not with an NFA.
 Make every final state nonfinal and every nonfinal state final.
4.3.5 Kleene Star of L1

 Make a new start state; connect it to the original start state with a  transition.

 Make a new final state; connect the original final states (which become nonfinal) to it
with  transitions.
 Connect the new start state and new final state with a pair of  transitions.
4.3.6 Reverse of L1
 Start with an automaton with just one final state.
 Make the initial state final and the final state initial.
 Reverse the direction of every arc.
4.4 Closure III: Intersection and Set Difference

Just as with the other operations, you prove that regular languages are closed under
intersection and set difference by starting with automata for the initial languages, and
constructing a new automaton that represents the operation applied to the initial languages.
However, the constructions are somewhat trickier.
In these constructions you form a completely new machine, whose states are each labeled with
an ordered pair of state names: the first element of each pair is a state from L1, and the second
element of each pair is a state from L2. (Usually you won't need a state for every such pair,
just some of them.)
Begin by creating a start state whose label is (start state of L1, start state of L2).
Repeat the following until no new arcs can be added:
Find a state (A, B) that lacks a transition for some x in .
Add a transition on x from state (A, B) to state (δ(A, x), δ(B, x)). (If this state
doesn't already exist, create it.)
The same construction is used for both intersection and set difference. The distinction is in
how the final states are selected.
Intersection: Mark a state (A, B) as final if both (i) A is a final state in L1, and (ii) B is a final
state in L2.
Set difference: Mark a state (A, B) as final if A is a final state in L1, but B is not a final state
in L2.
Example: Even 0’s and 1’s

Suppose L1 is the binary strings with an even number of 0’s, and L2 the binary strings with an
even number of 1’s. Then the FAs for these languages both have two states:
And so the FA for L1 ∩ L2 has four states:
Assignment No. 4
4.5 The Pumping Lemma

Here's what the pumping lemma says:
 If an infinite language is regular, it can be defined by a DFA.

 The DFA has some finite number of states (say, n).
 Since the language is infinite, some strings of the language must have length > n.
 For a string of length > n accepted by the DFA, the walk through the DFA must
contain a cycle.
 Repeating the cycle an arbitrary number of times must yield another string accepted by
the DFA.
The pumping lemma for regular languages is another way of proving that a given (infinite)
language is not regular. (The pumping lemma cannot be used to prove that a given language is
regular.)
The proof is always by contradiction. A brief outline of the technique is as follows:

 Assume the language L is regular.
 By the pigeonhole principle, any sufficiently long string in L must repeat some state in
the DFA; thus, the walk contains a cycle.
 Show that repeating the cycle some number of times ("pumping" the cycle) yields a
string that is not in L.
 Conclude that L is not regular.

Why this is hard:
 We don't know the DFA (if we did, the language would be regular!). Thus, we have do
the proof for an arbitrary DFA that accepts L.
 Since we don't know the DFA, we certainly don't know the cycle.
Why we can sometimes pull it off:

 We get to choose the string (but it must be in L).
 We get to choose the number of times to "pump."
4.6 Applying the Pumping Lemma

Here's a more formal definition of the pumping lemma:
If L is an infinite regular language, then there exists some positive integer m such that any
string w L whose length is m or greater can be decomposed into three parts, xyz, where
 |xy| is less than or equal to m,

 |y| > 0,
 wi = xyiz is also in L for all i = 0, 1, 2, 3, ....
Here's what it all means:
 m is a (finite) number chosen so that strings of length m or greater must contain a

cycle. Hence, m must be equal to or greater than the number of states in the DFA.
Remember that we don't know the DFA, so we can't actually choose m; we just know
that such an m must exist.
 Since string w has length greater than or equal to m, we can break it into two parts, xy
and z, such that xy must contain a cycle. We don't know the DFA, so we don't know
exactly where to make this break, but we know that |xy| can be less than or equal to m.
 We let x be the part before the cycle, y be the cycle, and z the part after the cycle. (It is
possible that x and z contain cycles, but we don't care about that.) Again, we don't
know exactly where to make this break.
 Since y is the cycle we are interested in, we must have |y| > 0, otherwise it isn't a cycle.
 By repeating y an arbitrary number of times, xy*z, we must get other strings in L.
 If, despite all the above uncertainties, we can show that the DFA has to accept some
string that we know is not in the language, then we can conclude that the language is
not regular.
To use this lemma, we need to show:

1. For any choice of m,
2. for some w  L that we get to choose (and we will choose one of length at least m),
3. for any way of decomposing w into xyz, so long as |xy| isn't greater than m and y isn't
λ,
4. we can choose an i such that xyiz is not in L.
We can view this as a game wherein our opponent makes moves 1 and 3 (choosing m and
choosing xyz) and we make moves 2 and 4 (choosing w and choosing i). Our goal is to show
that we can always beat our opponent. If we can show this, we have proved that L is not
regular.

4.6.1 Pumping Lemma Example 1
Prove that L = {anbn: n 0} is not regular.
1. We don't know m, but assume there is one.

2. Choose a string w = anbn where n > m, so that any prefix of length m consists entirely
of a's.
3. We don't know the decomposition of w into xyz, but since |xy| m, xy must consist
entirely of a's. Moreover, y cannot be empty.
4. Choose i = 0. This has the effect of dropping |y| a's out of the string, without affecting
the number of b's. The resultant string has fewer a's than b's, hence does not belong to
L. Therefore L is not regular.

Prove that L = {anbk: n > k and n 0} is not regular.

2. Choose a string w = anbk where n > m, so that any prefix of length m consists entirely
of a's, and k = n-1, so that there is just one more a than b.
3. We don't know the decomposition of w into xyz, but since |xy| m, xy must consist
entirely of a's. Moreover, y cannot be empty.
4. Choose i = 0. This has the effect of dropping |y| a's out of the string, without affecting
the number of b's. The resultant string has fewer a's than before, so it has either fewer
a's than b's, or the same number of each. Either way, the string does not belong to L, so
L is not regular.

Prove that L = {an: n is a prime number} is not regular.

2. Choose a string w = an where n is a prime number and |xyz| = n > m+1. (This can
always be done because there is no largest prime number.) Any prefix of w consists
entirely of a's.
3. We don't know the decomposition of w into xyz, but since |xy| m, it follows that |z| >
1. As usual, |y| > 0,
4. Since |z| > 1, |xz| > 1. Choose i = |xz|. Then |xyiz| = |xz| + |y||xz| = (1 + |y|)|xz|. Since (1
+ |y|) and |xz| are each greater than 1, the product must be a composite number. Thus
|xyiz| is a composite number.

5 Type-2: Context-Free Languages
5.1 Languages and Grammars
A context-free language is a language that can be defined by a context-free grammar. If
grammar G is context free but not regular, we know the language L(G) is context free. We do
not know that L(G) is not regular. It might be possible to find a regular grammar G2 that also
defines L.
5.2 Context-Free Grammars

A grammar G = (V, T, S, P) is a context free grammar (CFG) if all productions in P have the
form
V (V  T)*
Thus, a context-free grammar is a language generating device for context-free languages.
5.2.1 Regular Grammars Are Context Free

Recall that productions of a right-linear grammar must have one of the two forms
A x
or
A xB
where
 A, B  V, and
 x  T*.
Since T*  (V U T)* and T*V  (V U T)*, it follows that every right-linear grammar is also
a context-free grammar. Similarly, right-linear grammars and linear grammars are also
context-free grammars. A context-free language (CFL) is a language that can be defined by a
context-free grammar.
5.2.2 Examples of CFG

Example 5.1: Consider the following grammar:
G = ({S, A, B}, {a, b}, S, {S AB, A aA, A , B Bb, B })
Is G a context-free grammar? Yes.
Is G a regular grammar? No.
Is L(G) a context-free language? Yes.
Is L(G) a regular language? Yes -- the language L(G) is regular because it can be defined by
the regular grammar:
G = ({S, A, B}, {a, b}, S, {S A, A aA, A B, B bB, B })

Example 5.2: We have shown that L = {anbn | n 0} is not regular. Here is a context-free
grammar for this language.
G = ({S}, {a, b}, S, {S aSb, S }
Example 5.3: We have shown that L = {anbk | k > n 0} is not regular. Here is a context-free
grammar for this language.
G = ({S, B}, {a, b}, S, {S aSb, S B, B bB, B b}).
Example 5.4: The language L = {wwR | w {a, b}*}, where each string in L is a palindrome,
is not regular. Here is a context-free grammar for this language.
G = ({S}, {a, b}, S, {S aSa, S bSb, S }).
Example 5.5: The language L = {w | w {a, b}*, na(w) = nb(w)}, where each string in L has
an equal number of a's and b's, is not regular. Consider the following grammar:
G = ({S}, {a, b}, S, {S aSb, S bSa, S SS, S }).
1. Does every string recognized by this grammar have an equal number of a's and b's?
2. Is every string consisting of an equal number of a's and b's recognized by this
grammar?
Example 5.6: The language L, consisting of balanced strings of parentheses, is context-free

but not regular. The grammar is simple, but we have to be careful to keep our symbols ( and )
separate from our meta symbols ( and ).
G = ({S}, {(, )}, S, {S (S ), S SS, S }).
5.2.3 Sentential Forms

A sentential form is the start symbol S of a grammar or any string in (V T)* that can be
derived from S.
Consider the linear grammar: ({S, B}, {a, b}, S, {S aS, S B, B bB, B }).
A derivation using this grammar might look like this:
S aS aB abB abbB abb
Each of {S, aS, aB, abB, abbB, abb} is a sentential form. Because this grammar is linear, each
sentential form has at most one variable. Hence there is never any choice about which variable
to expand next.

Figure 10: Description of grammar
5.3 Derivations (Leftmost and Rightmost)

Now consider the grammar
G = ({S, A, B, C}, {a, b, c}, S, P)

where
P = {S ABC, A aA, A , B bB, B , C cC, C }.
With this grammar, there is a choice of variables to expand. Here is a sample derivation:
S ABC aABC aABcC aBcC abBcC abBc abbBc abbc
If we always expanded the leftmost variable first, we would have a leftmost derivation:
S ABC aABC aBC abBC abbBC abbC abbcC abbc
Conversely, if we always expanded the rightmost variable first, we would have a rightmost
derivation:
S ABC ABcC ABc AbBc AbbBc Abbc aAbbc abbc
There are two things to notice here:
1. Different derivations result in quite different sentential forms, but

2. For a context-free grammar, it really doesn't make much difference in what order we
expand the variables.
5.4 Derivation Trees

Since the order in which we expand the variables in a sentential form doesn't make any
difference, it would be nice to show a derivation in some way that is independent of the order.
A derivation tree is a way of presenting a derivation in an order-independent fashion.
For example, for the following derivation:

S ABC aABC aABcC aBcC abBcC abBc abbBc abbc
we would have the derivation tree:
Figure 11: A parse tree
This tree represents not just the given derivation, but all the different orders in which the same
productions could be applied to produce the string abbc.
A partial derivation tree is any subtree of a derivation tree such that, for any node of the
subtree, either all of its children are also in the subtree, or none of them are.
The yield of the tree is the final string obtained by reading the leaves of the tree from left to
right, ignoring the s (unless all the leaves are , in which case the yield is ). The yield of the
above tree is the string abbc, as expected.
The yield of a partial derivation tree that contains the root is a sentential form.
5.4.1 Ambiguity
The following grammar generates strings having an equal number of a's and b's.
G = ({S}, {a, b}, S, S aSb | bSa | SS | )
The string "abab" can be generated from this grammar in two distinct ways, as shown by the
following derivation trees:
Similarly, abab has two distinct leftmost derivations:
S aSb abSab abab

S SS aSbS abS abaSb abab

Likewise, abab has two distinct rightmost derivations:
S aSb abSab abab

S SS SaSb Sab aSbab abab
Each derivation tree can be turned into a unique rightmost derivation, or into a unique
leftmost derivation. Each leftmost or rightmost derivation can be turned into a unique
derivation tree. So these representations are largely interchangeable.
5.4.2 Ambiguous Grammars, Ambiguous Languages

Because derivation trees, leftmost derivations, and rightmost derivations are equivalent
notations, the following definitions are equivalent:
A grammar G is ambiguous if there exists some string wL(G) for which
 there are two or more distinct derivation trees, or
 there are two or more distinct leftmost derivations, or
 there are two or more distinct rightmost derivations.
Grammars are used in compiler construction. Ambiguous grammars are undesirable because
the derivation tree provides considerable information about the semantics of a program;
conflicting derivation trees provide conflicting information.
Ambiguity is a property of a grammar, and it is usually (but not always) possible to find an
equivalent unambiguous grammar.
An inherently ambiguous language is a language for which no unambiguous grammar exists.
Assignment No. 5
5.5 Simplifying Context-Free Grammars

The productions of context-free grammars can be forced into a variety of forms without
affecting the expressive power of the grammar.
5.5.1 Empty production removal

If the empty string does not belong to a language, then there is a way to eliminate productions
of the form A  from the grammar.
If the empty string does belong to a language, then we can eliminate  from all productions
save for the single production S . In this case we can also eliminate any occurrences of S
from the right-hand-side of productions.
5.5.2 Unit production removal

We can eliminate productions of the form A B from a context-free grammar.
5.5.3 Left Recursion Removal

A variable A is left-recursive if it occurs in a production of the form
A Ax

for any x(V U T)*. A grammar is left-recursive if it contains at least one left-recursive
variable. Every context-free language can be represented by a grammar that is not left-
recursive.
5.6 Normal Forms of Context-Free Grammars

5.6.1 Chomsky Normal Form
A grammar is in Chomsky Normal Form if all productions are of the form
A BC
or
A a
where A, B, and C are variables and a is a terminal. Any context-free grammar that does not
contain λ can be put into Chomsky Normal Form. (Most textbook authors also allow the
production S  so long as S does not appear on the right hand side of any production.)
Chomsky Normal Form is particularly useful for programs that have to manipulate grammars.
5.6.2 Greibach Normal Form

A grammar is in Greibach Normal Form if all productions are of the form
A ax
where a is a terminal and xV*.
Grammars in Greibach Normal Form are typically ugly and much longer than the CFG from
which they were derived. Greibach Normal Form is useful for proving the equivalence of
CFGs and NPDAs. When we discuss converting a CFG to an NPDA, or vice versa, we will
use Greibach Normal Form.
5.7 Chomsky’s Normal Form

When working with context-free grammars, it is often convenient to have them in simplified
form. One of the simplest and most useful forms is called the Chomsky normal form. Chomsky
normal form is useful in giving algorithms for working with context-free grammars.
DEFINITION
A context-free grammar is in Chomsky normal form if every rule is of the form
A  BC
Aa
where a is any terminal and A, B, and C are any variables-except that B and C
may not be the start variable. In addition we permit the rule S  e, where S is
the start variable.
THEOREM
Any context-free language is generated by a context-free grammar in Chomsky normal form.
PROOF IDEA
We can convert any grammar G into Chomsky normal form. The conversion has several stages
wherein rules that violate the conditions are replaced with equivalent ones that are satisfactory.
First, we add a new start variable. Then, we eliminate all e rules of the form A  e. We also

eliminate all unit rules of the form A  B. In both cases we patch up the grammar to be sure
that it still generates the same language. Finally, we convert the remaining rules into the proper
form.
PROOF
First, we add a new start variable S0 and the rule S0 S, where S was the original start variable.
This change guarantees that the start variable doesn't occur on the right-hand side of a rule.
Second, we take care of all e rules. We remove an e-rule A  e, where A is not the start variable.
Then for each occurrence of an A on the right-hand side of a rule, we add a new rule with that
occurrence deleted. In other words, if R  uAv is a rule in which u and v are strings of variables
and terminals, we add rule Ruv. We do so for each occurrence of an A, so the rule R 
uAvAw causes us to add R  uvAw, R  uAvw, and R  uvw. If we have the rule R  A, we
add R e unless we had previously removed the rule R  e. We repeat these steps until we
eliminate all empty rules not involving the start variable.
Third, we handle all unit rules. We remove a unit rule A  B. Then, whenever a rule B  u
appears, we add the rule A  u unless this was a unit rule previously removed. As before, u is
a string of variables and terminals. We repeat these steps until we eliminate all unit rules.
Finally, we convert all remaining rules into the proper form. We replace each rule A  u1 u2 .
. . uk, where k > 3 and each ui is a variable or terminal symbol, with the rules A  u1A1, A1 
u2A2, A2  u3A3, . . . , and Ak-2  uk-1 uk.
The Ai's are new variables. If k = 2, we replace any terminal ui in the preceding rule(s) with the
new variable Ui and add the rule Ui  ui.
Example 5.7:
Consider the following CFG and convert it to Chomsky’s Normal Form by using the conversion
procedure just given. The series of grammars presented illustrates the steps in the conversion.
Rules shown in bold have just been added. Rules shown in gray have just been removed.
1. The original CFG is shown on the left. The result of applying the first step to make a new
start variable appears on the right.
SASA | aB S0S
AB | S SASA | aB
Bb |  AB | S
Bb | 
2. Remove  rules B, shown on the left, and A, shown on the right.
S0S S0S
SASA | aB | a SASA | aB | a | SA | AS | S
AB | S |  AB | S | 
Bb |  Bb
3a. Remove unit rules S  S, shown on the left, and So  S, shown on the right.

S0S S0 ASA | aB | a | SA | AS
SASA | aB | a | SA | AS | S S ASA | aB | a | SA | AS
AB | S AB | S
Bb Bb
3b. Remove unit rules A  B and A  S.

S0 ASA | aB | a | SA | AS S0 ASA | aB | a | SA | AS
S ASA | aB | a | SA | AS S ASA | aB | a | SA | AS
AB | S | b A S | b | ASA | aB | a | SA | AS
Bb Bb
4. Convert the remaining rules into the proper form by adding additional variables and rules.
The final grammar in Chomsky normal form is equivalent to G6, which follows. (Actually the
procedure given in Theorem produces several variables Ui along with several rules Ui  a. We
simplified the resulting grammar by using a single variable U and rule U  a.)
S0 AA1 | UB | a | SA | AS
S  AA1 | UB | a | SA | AS
A b | AA1 | UB | a | SA | AS
A1 SA
Ua
Bb
5.8 Parsing
There are two ways to use a grammar:
 Use the grammar to generate strings of the language. This is easy -- start with the start
symbol, and apply derivation steps until you get a string composed entirely of
terminals.
 Use the grammar to recognize strings; that is, test whether they belong to the
language. For CFGs, this is usually much harder.
A language is a set of strings, and any well-defined set must have a membership criterion. A
context-free grammar can be used as a membership criterion -- if we can find a general
algorithm for using the grammar to recognize strings.
Parsing a string is finding a derivation (or a derivation tree) for that string. Parsing a string is
like recognizing a string. An algorithm to recognize a string will give us only a yes/no answer;
an algorithm to parse a string will give us additional information about how the string can be
formed from the grammar. The only realistic way to recognize a string of a context-free
grammar is to parse it.
5.8.1 Exhaustive Search Parsing

The basic idea of exhaustive search parsing is this: to parse a string w, generate all strings in
L and see if w is among them.
Problem: L may be an infinite language.

We need two things:
1. A systematic approach, so that we know we haven't overlooked any strings, and

2. A way to stop after generating only a finite number of strings -- knowing that, if we
haven't generated w by now, we never will.
Systematic approaches are easy to find. Almost any exhaustive search technique will do.
We can (almost) make the search finite by terminating every search path at the point that it
generates a sentential form containing more than |w| terminals.
5.8.2 Grammars for Exhaustive Parsing

The idea of exhaustive search parsing for a string w is to generate all strings of length not
greater than |w|, and see whether w is among them. To ensure that the search is finite, we need
to make sure that we can't get into an infinite loop applying productions that don't increase the
length of the generated string.
Note: for the time being, we will ignore the possibility that λ is in the language.
Suppose we make the following restrictions on the grammar:

 Every variable expands to at least one terminal. We can enforce this by disallowing
productions of the form A .
 Every production either has at least one terminal on its right-hand side (thus directly
increasing the number of terminals), or it has at least two variables (thus indirectly
increasing the number of terminals). In other words, we disallow productions of the
form A B, where A and B are both variables.
With these restrictions,

 A sentential form of length n yields a sentence of length at least n.
 Every derivation step increases either the length of the sentential form or the number
of terminals in it.
 Hence, any string w L can be generated in at most 2|w|-1 derivation steps.
5.8.3 Grammars for Exhaustive Parsing II

We have shown that exhaustive search parsing is a finite process, provided that there are no
productions of the form A  or A B in the grammar. Chomsky’s normal form describes
the method for removing such productions from a grammar without altering the language
recognized by the grammar. There is, however, one special case we need to consider.
If λ belongs to the language, we need to keep the production S λ. This creates a problem if
S occurs on the right hand side of some production, because then we have a way of decreasing
the length of a sentential form. All we need to do in this case is to add a new start symbol, say
S0, and to replace the production S λ with the pair of productions
S0 
S0 S

5.8.4 Efficient Parsing
Exhaustive search parsing is, of course, extremely inefficient. It requires time exponential in
|w|. For any context-free grammar G, there are algorithms for parsing strings w L(G) in time
proportional to the cube of |w|. This is still unsatisfactory for practical purposes.
There are ways to further restrict context-free grammars so that strings may be parsed in linear
or near-linear time. These restricted grammars are covered in courses in compiler construction
but will not be considered here. All such methods do reduce the power of the grammar, thus
limiting the languages that can be recognized. There is no known linear or near-linear
algorithm for parsing strings of a general context-free grammar.

6 Pushdown Automaton (PDA)
6.1 Pushdown Automaton (PDA)
A pushdown automaton (PDA) is basically a DFA/NFA which has an additional stack storage.
The transitions that a machine makes are based not only on the input and current state, but
also on the stack.
The PDA is formerly defined as sextuple:
M = (K, Σ, Γ, Δ, s, F) where,
K = finite state set
Σ = finite input alphabet
Γ = finite stack alphabet
s ∈ K: start state
F ⊆ K: final states Figure 12: Pushdown Automata (PDA)
The transition relation, Δ is a finite subset of (K×(Σ∪{ λ })×Γ*) × (K×Γ*)
Δ is now a function of three arguments. The first two are the same as before: the state, and
either λ or a symbol from the input alphabet. The third argument is the symbol on top of the
stack. Just as the input symbol is "consumed" when the function is applied, the stack symbol
is also "consumed" (removed from the stack). We must have the finite qualifier because the
full subset is infinite by virtue of the Γ* component.
The meaning of the transition relation is that, for σ ∈ Σ, if ((p, σ, α), (q, β)) ∈ Δ:
the current state is p

the current input symbol is σ
the string at the top of the stack is α
then:
the new state is q
replace α on the top of the stack by β (pop the α and push the β)
Otherwise, if ((p, λ, α), (q, β)) ∈ Δ, this means that if
the current state is p
the string at the top of the stack is α
then (not consulting the input symbol), we can
change the state to q
replace α on the top of the stack by β
This transition relation ((p, u, λ), (q, a)) pushes symbol a on to the stack and ((p, u, a), (q, λ)
pops a from the stack.
6.2 Instantaneous Description

Suppose someone is in the middle of stepping through a string with a DFA, and we need to
take over and finish the job. We will need to know two things: (1) the state the DFA is in, and

(2) what the remaining input is. But if the automaton is an NPDA instead of a DFA, we also
need to know (3) the contents of the stack.
An instantaneous description of a pushdown automaton is an element of K×Σ*×Γ*, a triplet

(q, w, u), where
 q is the current state of the automaton,

 w is the unread part of the input string, and
 u is the stack contents (written as a string, with the leftmost symbol at the top of the
stack).
Let the symbol " " indicate a move of the PDA, and suppose that δ(q1, a, x) = {(q2, y), ...}.
Then the following move is possible:
(q1, aw, xZ) (q2, w, yZ)
where w indicates the rest of the string following the a, and Z indicates the rest of the stack
contents underneath the x. This notation says that in moving from state q1 to state q2, an a is
consumed from the input string aw, and the x at the top (left) of the stack xZ is replaced with
y, leaving yZ on the stack.
We define the usual yields in one step relation:

(p, σw, αz) ⊢ (q, w, βz) if ((p, σ, α), (q, β)) ∈ Δ
or
(p, w, αz) ⊢ (q, w, βz) if ((p, λ, α), (q, β)) ∈ Δ
We have the notation " " to indicate a single move of an PDA. We will also use " " to
indicate a sequence of zero or more moves, and we will use " " to indicate a sequence of one
or more moves. As expected, the yields relation, , is the reflexive, transitive closure of ⊢.
If M = (Q, Σ, , δ, q0, z, F) is a PDA, then the language accepted by M, is given by
L(M) = {w Σ*: (q0, w, z) (p, , u), p F, u *}.
6.3 Accepting Strings with PDA

Suppose you have the PDA M = (Q, Σ, , δ, q0, z, F). How do you use this PDA to recognize
strings? A language can be accepted by PDA using two approaches: (i) Acceptance by Empty
Stack, and (ii) Acceptance by Final State.
A string w is accepted by the PDA if (q0, w, ) ⊢* (f, , ). Namely, from the start state with
empty stack, we
• process the entire string,
• end in a final state
• end with an empty stack.
The language accepted by PDA is L(M) = {w | (q0, w, z) ⊢* (f, ε, ε), q0, f ∈ Q}, is the set of
all accepted strings. The empty stack is our key new requirement relative to finite state
machines. To recognize string w, begin with the instantaneous description (q0, w, z)
where
 q0 is the start state,

 w is the entire string to be processed, and
 z is the start stack symbol.
Starting with this instantaneous description, make zero or more moves, just as you would with
an NFA. There are two kinds of moves that you can make:
 λ-transitions. If you are in state q1, x is the top (leftmost) symbol in the stack, and δ(q1,
λ, x) = {(q2, w2), ...}, then you can replace the symbol x with the string w2 and move to
state q2.
 Nonempty transitions. If you are in state q1, a is the next unconsumed input symbol, x
is the top (leftmost) symbol in the stack, and δ(q1, a, x) = {(q2, w2), ...}, then you can
remove the a from the input string, replace the symbol x with the string w2, and move
to state q2.
In final state acceptability, a PDA accepts a string when, after reading the entire string, the
PDA is in a final state. From the starting state, we can make moves that end up in a final state
with any stack values. The stack values are irrelevant as long as we end up in a final state.
For a PDA (Q, Σ, , δ, q0, z, F), the language accepted by the set of final states F is −
L(PDA) = {w | (q0, w, z) ⊢* (q, ε, x), q ∈ F} for any input stack string x.
If you are in a final state when you reach the end of the string (and maybe make some λ
transitions after reaching the end), then the string is accepted by the PDA. It doesn't matter
what is left on the stack.
6.4 Graphical Representation

Graphically a PDA transition ((q0, , α), (q1, β)) where  = ε or  ∈ Σ would be depicted
like this (respectively):
or
The stack usage represented by α/β represents these actions:

• the top of the stack must match α
• if we make the transition, pop α and push β
A PDA is non-deterministic. There are several forms of non-determinism in the description:
• Δ is a relation
• there are ε-transitions in terms of the input
• there are ε-transitions in terms of the stack contents
The true PDA ε-transition, in the sense of being equivalent to the NFA ε-transition is this:
because it consults neither the input, nor the stack and will leave the previous configuration
intact.
6.5 Deterministic Pushdown Automata (DPDA)

A nondeterministic finite Automata differs from a deterministic finite Automata in two ways:

 The transition function δ is single-valued for a DFA, multi-valued for an NFA.
 An NFA may have λ-transitions.
A nondeterministic pushdown automaton (NPDA) differs from a deterministic pushdown

automaton (DPDA) in almost the same ways:
 The transition function δ is at most single-valued for a DPDA, multi-valued for an
NPDA.
Formally: | δ(q, a, b)| = 0 or 1, for every q  Q, a Σ {  }, and b .
 Both NPDAs and DPDAs may have λ-transitions; but a DPDA may have a λ-transition
only if no other transition is possible.
Formally: If δ(q, , b) , then δ(q, c, b) = for every c Σ.
A deterministic context-free language is a language that can be recognized by a DPDA. The

deterministic context-free languages are a proper subset of the context-free languages.
Example 6.1:
Every finite automaton can be viewed as a pushdown automaton that never operates on its stack.
Let M = (K, ∑, ∆, s, F) be a finite automaton and let M’ = (K, ∑, Γ, ∆’, s, F), where Γ = Ф and
∆’ = {((p, u, e), (q, e)), (p, u, q)  ∆}.
Then M and M’ accept the same language.
Example 6.2:
Design a PDA to accept the language L = { wcwR : w ∈ {a, b}‫} ٭‬.
Let M = (K, ∑, Γ, ∆, s, F) where K = { s, f }, ∑ = {a, b, c}, Γ = {a, b} and F = {f} and ∆
contains the following transitions.
1. ((s, a, e), (s, a)) push a
2. ((s, b, e), (s, b)) push b
3. ((s, c, e), (f, e)) change state
4. ((f, a, a), (f, e)) pop a
5. ((f, b, b), (f, e)) pop b
State Unread Input Stack Transition
s abbcbba e -
s bbcbba a 1
s bcbba ba 2
s cbba bba 2
f bba bba 3
f ba ba 5
f a a 5
f e e 4
Observe that this PDA is deterministic in the sense that there are no choices in transitions.

Example 6.3:
The anbn language. The language is L = { w ∈ {a, b}* : w = anbn, n ≥ 0 }.
Here are two PDAs for L:
and
The idea in both machines is to stack the a's and match off the b's. The first one is non-
deterministic in the sense that it could prematurely guess that the a's are done and start matching
off b's. The second version is deterministic in that the first b acts as a trigger to start matching
off. Note that we must make both states final in the second version in order to accept ε.
6.5.1 Empty Stack Knowledge

There is no mechanism built into a PDA to determine whether the stack is empty or not. It's
important to realize that the transition:
x = σ ∈ Σ or ε
means to do so without consulting the stack; it says nothing about whether the stack is empty
or not. Nevertheless, one can maintain knowledge of an empty stack by using a dedicated
stack symbol, c, representing the "stack bottom" with the property that it is pushed onto an
empty stack by a transition from the start state with no other outgoing or incoming transitions.
Example 6.4:
Design a PDA to accept the language having equal numbers of a’s and b’s. #σ(w) = the number
of occurrences of σ in w
The language is L = {w ∈ {a,b}*: #a(w) = #b(w) }.
• PDA keeps a special symbol c
on the bottom of the stack as a
marker.
• Either a string of a’s or string of
b’s is kept by M on its stack.
Let M = (K, ∑, Γ, ∆, S, F), where k = {s, q, f}, ∑ = {a, b}, Γ= {a, b, c},
F = {f} and ∆ is listed below:
1. ((s, e, e), (q, c))
2. ((q, a, c), (q, ac))
3. ((q, a, a), (q, aa))
4. ((q, a, b), (q, e))
5. ((q, b, c), (q, bc))

6. ((q, b, b), (q, bb))
7. ((q, b, a), (q, e))
8. ((q, e, c), (f, e))
State Unread input Stack Transition used
s abbbabaa e -
q abbbabaa c 1
q bbbabaa ac 2
q bbabaa c 7
q babaa bc 5
q abaa bbc 6
q baa bc 4
q aa bbc 6
q a bc 4
q e c 4
f e e 8
6.5.2 Deterministic Context Free language (DCFL)

A DPDA is a PDA in which no state p has two different outgoing transitions
((p, x, α), (q, β)) and ((p, x′, α′), (q′, β′))
which are compatible in the sense that both could be applied. A DCFL is basically a language
accepted by a DPDA, but we need to qualify this further.
We want to argue that the language L = {w ∈ {a, b}* : #a(w) = #b(w)} is deterministic
context free in the sense there is DPDA which accepts it.
In the above PDA, the only non-determinism is the issue of guessing the end of input;
however, this form of non-determinism is considered artificial. When one considers whether a
language L supports a DPDA or not, a dedicated end-of-input symbol is always added to
strings in the language.
Formally, a language L over Σ is deterministic context free, or L is a DCFL, if L$ is accepted
by a DPDA M where $ is a dedicated symbol not belonging to Σ. The significance is that we
can make intelligent usage of the knowledge of the end of input to decide what to do about the
stack. In our case, we would simply replace the transition into the final state by:
with this change, our PDA is now a DPDA:
Example 6.5: a*b* examples

Two common variations on a's followed by b's. When they're equal, no stack bottom is
necessary. When they're unequal, you must be prepared to recognize that the stacked a's have
been completely matched or not.
a. { anbn : n ≥ 0 }
b. { ambn : 0 ≤ m < n }
6.6 Nondeterministic PDAs

The transition function for an NPDA has the form δ: (Q( ∑ {})  Q*)
In deterministic case, when the function is applied, the automaton moves to a new state q  Q
and pushes a new string of symbols x*  onto the stack. Since we are dealing with a
nondeterministic pushdown automaton, the result of applying δ is a finite set of (q, x) pairs. If
we were to draw the automaton, each such pair would be represented by a single arc.
As with an NFA, we do not need to specify δ for every possible combination of arguments.
For any case where δ is not specified, the transition is to  Q, the empty set of states.
Consider the example of following NPDA.
Example 6.6: Q = {q0, q1, q2, q3}, Σ = {a, b},  = {0,1}, δ, q0, z=0, F={q3}, where,
δ(q0, a, 0) = {(q1, 10), (q3, )}

δ(q0, , 0) = {(q3, )}
δ(q1, a, 1) = {(q1, 11)}
δ(q1, b, 1) = {(q2, )}
δ(q2, b, 1) = {(q2, )}
δ(q2, , 0) = {(q3, )}
This transition graph of NPDA is drawn.

Note: The top of the stack is considered to be to the left, so that, for example, if we get an a
from the starting position, the stack changes from to .

We can compute (recognize) the string aaabbb by the following instantaneous description
sequence of moves:
(q0, aaabbb, 0) (q1, aabbb, 10)

(q1, abbb, 110)
(q1, bbb, 1110)
(q2, bb, 110)
(q2, b, 10)
(q2, , 0)
(q3, , ).
Since q3  F, hence the string is accepted.
Example 6.7:
Construct PDA to accept L = {wwR : w  {a, b}‫}٭‬.
M = (K, ∑, Γ, ∆, s, f) where K = {s, f}, ∑ = {a, b},  = {a, b}, and F = {f} and ∆ contains the
following five transitions:
1. ((s, a, ), (s, a))
2. ((s, b, ), (s, b))
3. ((s, , ), (f, ))
4. ((f, a, a), (f, ))
5. ((f, b, b), (f, ))
 The machine guesses when it has reached the middle of the input string and changes
from state s to f in a non-deterministic fashion.
 Whenever the machine is in state s, it can non-deterministically choose either to push
the next input symbol into the stack, or to switch to state f without consuming any input.
This PDA is identical to the PDA in Example 6.2 except for the ε-transition. Nevertheless, there
is a significant difference in that this PDA must guess when to stop pushing symbols, jump to
the final state and start matching off of the stack.
Therefore, this machine is decidedly non-deterministic. In a general programming model (like
Turing Machines), we have the luxury of preprocessing the string to determine its length and
thereby knowing when the middle is coming.
Assignment No. 6
6.7 CFL and PDA Equivalence

The goal of this section is to prove that CFLs and PDAs are equivalent, namely, for every
CFL, G, there is a PDA M such that L(G) = L(M) and vice versa. For any context-free
grammar in Greibach Normal Form we can build an equivalent nondeterministic pushdown
automaton. This establishes that an NPDA is at least as powerful as a CFG.
The proof is split into two lemmas:

• Lemma 1: given a grammar, construct a PDA and show the equivalence

• Lemma 2: given a PDA, construct a grammar and show the equivalence
Of the two, the first uses a very simple, intuitive construction to achieve a mimick a leftmost
derivation by using a PDA. The converse step is far more complicated than the first. The
construction in the first is quite simple. The construction in the second involves two steps:
• Convert a PDA into a simple PDA.
The notion of simple means that the stack is always consulted, and pushed or popped
by one symbol only, or kept the same size. This part is quite straightforward.
• Convert the simple PDA into a grammar.
6.7.1 Grammar to PDA construction

This construction is quite simple. Given G = (V, Σ, R, S), what the PDA will do is effect a
leftmost derivation of a string w ∈ L(G). The PDA is
M = ( {0,1}, Σ, Γ, Δ, 0, {1} )
Namely, there are only two states. The PDA moves immediately to the final state is 1 with the
start symbol on the stack and then stays in this state.
These are three types of transitions in Δ:

Type 1: ( (0, ε, ε), (1, S) )
Type 2: ( (1, ε, A), (1, α) ) for each A → α ∈ 𝑅
Type 3: ( (1, σ, σ), (1, ε) ) for each σ ∈ Σ
This PDA is inherently non-deterministic; if there is a choice of rules to apply to a non-

terminal, then there is a non-deterministic choice of processing steps. Graphically the
presentation is this:
To say that G and M are equivalent means that L(M) = L(G), or, considering an arbitrary
string w ∈ Σ*:
S ⇒* w ⇔ (0,w,ε) ⊢* (1,ε,ε)
The grammar for anbn is: S ⟶ ε | aSb

Here is the PDA:
A simple run is:

(0, aabb, ε) ⊢ (1, aabb, S) ⊢ (1, aabb, aSb) ⊢ (1, abb, Sb) ⊢ (1, abb, aSbb) ⊢ (1, bb, Sbb)
⊢ (1, bb, bb) ⊢ (1, b, b) ⊢ (1, ε, ε)

Example 6.8:
Consider the grammar G = ({S, A, B}, {a, b}, S, P), where
P = {S → a, S → aAB, A → aA, A → a, B → bB, B → b}.
These productions can be turned into transition functions by rearranging the components:
This yields the following table:
(start) δ(q0, , ) {(q1, S)} Type 1

S a, δ(q1, , S) {(q1, a)} Type 2
S aAB, δ(q1,  , S) {(q1, aAB)} Type 2
A aA, δ(q1, , A) {(q1, aA)} Type 2
A a, δ(q1, , A) {(q1, a)} Type 2
B bB, δ(q1, , B) {(q1, bB)} Type 2
B b δ(q1, , B) {(q1, b)} Type 2
for a ∈ Σ δ(q1, a, a) {(q1, )} Type 3
for b ∈ Σ δ(q1, b, b) {(q1, )} Type 3
(finish)
For example, the derivation

S  aAB  aaB  aabB  aabb
maps into the sequence of moves

(q0, aabb, ) ⊢ (q1, aabb, S) ⊢ (q1, aabb, aAB) ⊢ (q1, abb, AB) ⊢(q1, abb, aB) ⊢ (q1, bb, B) ⊢ (q1, bb,
bB) ⊢(q1, b, B) ⊢(q1, b, b) ⊢ (q1, , )
6.7.2 Expression Grammar

The most informative examples are those in which there exist the possibility of not using a
leftmost derivation such as our expression grammar:
E→E+T|T
T→T*F|F
F → (E) | a
We can readily match up a leftmost derivation of a + a * a with the corresponding machine
configuration processing as follows:
(0, a + a * a, ε)
⊢ (1, a + a * a, E)
⊢ (1, a + a * a, E + T) E⇒E+T
⊢ (1, a + a * a, T + T) ⇒T+T
⊢ (1, a + a * a, F + T) ⇒F+T
⊢ (1, a + a * a, a + T) ⇒a+T
⊢ (1, + a * a, + T)
⊢ (1, a * a, T)
⊢ (1, a * a, T * F) ⇒a+T*F
⊢ (1, a * a, F * F) ⇒a+F*F
⊢ (1, a * a, a * F) ⇒a+a*F
⊢ (1, * a, * F)
⊢ (1, a, F)
⊢ (1, a, a) ⇒a+a*a
⊢ (1, ε, ε)

6.7.3 Grammar to PDA proof
For simplicity, assume that ⇒ means a leftmost derivation step.
Claim: For w ∈ Σ*, α ∈ (V-Σ)V* ∪ {ε}:
∗
S ⇒ wα ⇔ (1,w,S) ├* (1,ε,α)
(Proof ⇒):
Induction on the length of the leftmost derivation.
𝑛
S ⇒ α′ ⇒ wα
Then, since the last step was leftmost, we can write:
α′ = xAβ for x ∈ Σ*
and then
xzβ = wα for A → z (A)
𝑛
By induction, since S ⇒ xAβ:
(1,x,S) ├* (1,ε,Aβ)
Furthermore, applying the transition of type 2, we have:
(1,ε,Aβ) ├ (1,ε,zβ)
Putting these two together:
(1,x,S) ├* (1,ε,zβ) (B)
Looking at (A) we see that the string x must be a prefix of w because α begins with a non-
terminal, or is empty. Write
w = xy and therefore, zβ = yα (C)
As a consequence of (B) we get:
(1,xy,S) ├* (1,y,zβ) (D)
Combine (C) and (D) to get:
(1,w,S) ├* (1,y,yα) (E)
Apply |y| transitions of type 3 to get
(1,y,yα) ├* (1,ε,α) (F)
Combine (E) and (F) to get the desired result:
(1,w,S) ├* (1,ε,α)
(Proof ⇐):
The proof going this direction is by induction on the number of type-2 steps in the derivation.
This restriction makes the entire proof simpler than the converse that we just proved.
We'll proceed to the induction step. Assume true for up to n steps and prove true for n+1 type-
2 steps. Write:
(1,w,S) ├* (1,y,Aβ) ├ (1,y,zβ) ├* (1,ε,α)
where the use of the rule
A⇒z
represents the final type-2 step and the last part of the chain is type-3 steps only. The string y
must be a suffix of w, and so writing
w = xy
we have:
(1,xy,S) ├* (1,y,Aβ)
and therefore,

(1,x,S) ├* (1,ε,Aβ)
Therefore by induction:
∗
S ⇒ xAβ
and consequently,
∗
S ⇒ xzβ (A)
Now if we look at the last part:
(1,y,zβ) ├* (1,ε,α)
We observe, that since this consists of only type-3 transitions, it must be that
yα = zβ (B)
and so, putting (A) and (B) together, we get:
∗
S ⇒ xyα
Knowing that w = xy gives us the result we want:
∗
S ⇒ wα
6.8 From NPDA to CFG

We have shown that, for any CFG, we can produce an equivalent NPDA. We will now show
that, for any NPDA, we can produce an equivalent CFG. This will establish the equivalence of
CFGs and NPDAs.
We assert without proof that any NPDA can be transformed into an equivalent NPDA that has
the following form:
1. The NPDA has only one final state, which it enters if and only if the stack is empty;
2. With 𝑎 ∈ ∑∪ { }, 𝑎ll transitions must have the form
δ(q, a, A) = (qj, λ)
or
δ(q, a, A) = (qj, BC)
When we write a grammar, we can use any variable names we choose. As in programming
languages, we like to use "meaningful" variable names. When we translate an NPDA into a
CFG, we will use variable names that encode information about both the state of the NPDA
and the stack contents. Variable names will have the form [qiAqj], where qi and qj are states
and A is a variable. The "meaning" of the variable [qiAqj] is that the NPDA can go from state
qi with Ax on the stack to state qj with x on the stack.
Each transition of the form δ (qi, a, A) = (qj, λ) results in a single grammar rule.
Each transition of the form δ (qi, a, A) = (qj, BC) results in a multitude of grammar rules, one
for each pair of states qx and qy in the NPDA.

This algorithm results in a lot of useless (unreachable) productions, but the useful productions
define the context-free grammar recognized by the NPDA.
6.8.1 Example Converting a PDA into a CFG

Here we give an example of how to create a CFG generating the language accepted by a PDA
(by empty stack). The example PDA accepts the language {0n1n}. Formally, the PDA is (Q, Σ,
Γ, δ, q, Z, F) where Q = { q, r}, Σ = {0, 1}, Γ = {Z, X}, δ is defined by:
δ((q, 0, Z), (q, XZ))
δ((q, 0, X), (q, XX))
δ((q, 1, X), (r, ε))
δ((r, 1, X), (r, ε))
δ((r, ε, Z), (r, ε))
Since the PDA accepts by empty stack, the final set F is irrelevant. The construction defines a
CFG G = (V, T, P, S) where V contains a start symbol S as well as a symbol [sY t] for every
combination of stack symbol Y and states s and t. Thus V = fS; [qZq]; [qZr]; [qXq]; [qXr];
[rZq]; [rZr]; [rXq]; [rXr]. T = {0, 1}, the input alphabet of the PDA. S is the start symbol for
the grammar.
The intuitive meaning of variables like [qXq] is that it represents the language (set of strings)
that label paths from q to q that have the net effect of popping X off the stack (without going
deeper into the stack).
The productions P of G have two forms. First, for the start symbol S we add productions to
the "[startState, startStackSymbol, state]" variable for every state in the PDA. The language
generated by S will correspond to the set of strings labeling paths from S to any other state
that have the net effect of emptying the stack (popping off the starting stack symbol).
For our example PDA we get the two productions:

S → [qZq] | [qZr]
(recall that the | separates alternate right-hand-sides for the variable).
Next, for each transition in the PDA of the form δ(s, a, Y ) contains (t, ε) (i.e. state s goes to t
while reading symbol a from the input and popping Y off the stack) we add the production
[sY t] → a
to the grammar (we have three transitions of this form in the PDA, all into state r). This
corresponds to the fact that there is a path from s to t labeled by a that has the net effect of
popping Y off the stack.
After this stage we have the following productions (with all non-terminals listed even if they
don’t have any productions):
S → [qZq] | [qZr]
[qZq] →
[qZr] →
[qXq] →
[qXr] →1
[rZq] →
[rZr] → ε

[rXq] →
[rXr] →1
These "push nothing" transitions are just a special case of the general rule:
If there is a transition from s to t that reads a from the input, Y from the stack, and pushes k
symbols Y1Y2 · · · Yk onto the stack, add all productions of the form [sY sk] → a[tY1s1][s1Y2s2] ·
· · [sk-1Yksk] to the grammar (for all combinations of states s1, s2, . . . , sk).
This expresses the intuition that the PDA can go from s to sk with a net effect of popping Y by
first going from s to t while popping Y pushing the Yi’s on the stack, and then taking a path
that pops off each Yi in turn.
In the "push nothing" case, this results in RHS’s that are single terminals as above.
Lets apply the general rule to the δ(q, 0, Z) = (q, XZ) transition. In this case, we are pushing
the string XZ of length 2 on the stack, and need to add all productions of the form:
[qZs2] → 0[qXs1][s1Zs2]
where s1 and s2 can be any combinations of q and/or r. In other words, we add the
productions:
[qZq] → 0[qXq][qZq] | 0[qXr][rZq]

[qZr] → 0[qXq][qZr] | 0[qXr][rZr]
to the grammar.
Repeating this for the δ(q, 0, X) = (q, XX) transition gives us:
[qXq] → 0[qXq][qXq] | 0[qXr][rXq]
[qXr] → 0[qXq][qXr] | 0[qXr][rXr]
Collecting all of these productions together, we get (again listing all variables even if they
have no productions):
S → [qZq] | [qZr]
[qZq] → 0[qXq][qZq] | 0[qXr][rZq]
[qXq] → 0[qXq][qXq] | 0[qXr][rXq]
[qXr] → 1 | 0[qXq][qXr] | 0[qXr][rXr]
[rZq] →
[rZr] → ε
[rXq] →
[rXr] → 1
To verify that the language generated by the grammar is the same as the language accepted by
the PDA (by empty-stack), we will look at an example string to gain further insight into why
the construction works.
Consider the string 0011 in the language. The PDA accepts the string by pushing two X’s on
the stack while reading 0’s and then popping them off while reading 1’s. To derive 0011 in

the grammar we do the following:
S ⇒ [qZr] ⇒ 0[qXr][rZr] ⇒ 00[qXr][rXr][rZr] ⇒ 001[rXr][rZr] ⇒ 0011[rZr] ⇒ 0011
Consider all the productions again. If we ever generate a [rZq] or [rXq] symbol, then we can’t
remove it (since those symbols have no productions). Therefore we can remove those symbols
and the productions containing them from the grammar.
S → [qZq] | [qZr]
[qZq] → 0[qXq][qZq]
[qXq] → 0[qXq][qXq]
[qXr] → 1 | 0[qXq][qXr] | 0[qXr][rXr]
[rZr] → ε
[rXr] → 1
Now the only production for [qZq] produces another [qZq] symbol, so if we ever generate
the [qZq] variable we will never be able to get to a string of just terminals. Similarly, the only
production for [qXq] produces more [qXq]’s. Removing these two variables (and the
productions that use them) simplifies the grammar to:
S → [qZr]
[qZr] → 0[qXr][rZr]
[qXr] → 1 | 0[qXr][rXr]
[rZr] → ε
[rXr] → 1
Variable [rZr] generates only ε, so it is a no-op and can be deleted. However, unlike the
previous simplifications we must keep a modified versions (with [rZr] replaced by ε) of the
productions using [rZr]. Similarly, variable [rXr] generates only the terminal 1, so we can
replace all uses of it with 1.
S → [qZr]
[qZr] → 0[qXr]
[qXr] → 1 | 0[qXr]1
Now it is easy to see that all derivations of the grammar start with S ⇒ [qZr] ⇒ 0[qXr].
Furthermore, we can now see that variable [qXr] generates all sequences of k ≥ 0 zeros
followed by k + 1 ones. Therefore, S generates {0n1n |n ≥ 1}.
Assignment No. 7

7 Type-1: Context-Sensitive Languages
A context-sensitive language is defined by a set of rewriting rules (called a grammar)
involving symbols from a given finite set of symbols. These rules take a symbol from one
distinct set, the variables, and replace it with one or more symbols from another set, the
terminal symbols. The application of these rules depends on the symbols preceding and
following the variable symbol, hence they are called context-sensitive. A context-sensitive
language is a language defined by a context-sensitive grammar. A context-sensitive grammar
is a formal grammar, in which the left-hand sides and right-hand sides of any production rule
may be surrounded by a context of terminal and non-terminal symbols. In this section we
investigate different definitions for context-sensitive languages which are useful in
understanding their behavior.
7.1 Definitions of Context Sensitive Languages

Definition 1: A context-sensitive grammar (CSG) is a quadruple (V, Σ, P, S), such that:
V is a finite set of variable symbols.
Σ is the alphabet (of terminal symbols) of the grammar. It is required that
∑ ∩V= ∅
S ∈ V is the starting variable.
P is a finite set of productions of the form α → β, where α, β ∈ (Σ ∪ V )∗.
Definition 2: Given a grammar G = (V, Σ, P, S ), G is context-sensitive if every production in

P is of the form αAβ → αγβ,
with A ∈ V, α, β ∈ (Σ ∪ V )∗ and γ ∈ (V ∪ Σ)∗ − λ. However S → λ is allowed provided that
there is no rule in P with S on the right. A language L ⊆ Σ∗ is a context-sensitive language if
it is generated by some context-sensitive grammar G = (V, Σ, P, S). This means that for every
string s ∈ L there is a derivation of s from S, using the productions in P.
Definition 3:
A Context-sensitive language is specified with a context-sensitive grammar (CSG), in which
every production has the form:
𝛼 → 𝛽, where |𝛽| ≥ |𝛼|
Example 7.1:
Let G=(V, T, P, S) be context-sensitive grammar whose production rules are
P = { S → aBb
aB → bBB
bB → aa
B → b}.
The derivation for w=aaabb is as follows:
S  aBb
 bBBb
 aaBb
abBBb
 aaaBb
 aaabb

Example 7.2:
The standard example of a Type 1 language is {anbncn | n>=1} the set of words that consist of
equal numbers of a’s, b’s and c’s, in that order:
a a . . . .. a b b .. . . . b cc.....c
n of them n of them n of them
Constructing a Type 1 grammar

To show how one writes a Type 1 grammar, we shall now derive a grammar for this language.
Starting with the simplest case, we have the rule
0. S → abc
Having got one instance of S, we may want to prepend more a’s to the beginning; if we want
to remember how many there were, we shall have to append something to the end as well at
the same time, and that cannot be a b or a c. We shall use a yet unknown symbol Q. The
following rule pre- and postpends:
1. S → abc | aSQ
If we apply this rule, for instance, three times, we get the sentential form
aaabcQQ
Now, to get aaabbbccc from this, each Q must be worth one b and one c, as was to be
expected, but we cannot just write
Q → bc
because that would allow b’s after the first c. The above rule would, however, be all right if it
were allowed to do replacement only between a b and a c; there, the newly inserted bc will do
no harm:
2. bQc → bbcc
Still, we cannot apply this rule since normally the Q’s are to the right of the c; this can be
remedied by allowing a Q to hop left over a c:
3. cQ → Qc
We can now finish our derivation:

aaabcQQ (3 times rule 1)
aaabQcQ (rule 3)
aaabbccQ (rule 2)
aaabbcQc (rule 3)
aaabbQcc (rule 3)
aaabbbccc (rule 2)
It should be noted that the above derivation only shows that the grammar will produce the
right strings, and the reader will still have to convince himself that it will not generate other
and incorrect strings.
S → abc | aSQ

bQc → bbcc
cQ → Qc
A derivation tree for a2b2c2 is given in following Figure. The grammar is monotonic and
therefore of Type 1; it can be proved that there is no Type 2 grammar for the language.
Figure 13: Derivation tree
7.2 Linear Bounded Automata

A linear bounded automaton (LBA) is an automaton which has a finite length store (usually
called a tape). Characters can be read or written at any position on this tape. It therefore has a
read-write head which can move both left and right. The same tape is normally used for both
the input and the store. The special symbols < and > are used to mark the finite bounds of the
tape beyond which the read-write head cannot move. These special symbols cannot be
overwritten.
At each step, the LBA will perform one of the actions A ∈ {Y, N, L, R}:
• Y denotes “Yes”, accept the input string
• N denotes “No”, do not accept the input string
• L denotes “Left”, move read-write head one space to the left
• R denotes “Right”, move read-write head one
space to the right
A language is context–sensitive iff it can be
recognized by a LBA.
LBA Definition
A linear bounded automaton is a 5-tuple M =
(Q, Σ, Γ, q0, δ), where:
• Q is a finite set of states.
• ∑ is an alphabet (input symbols).
• Γ is an alphabet (store symbols).
• q0 ∈ Q is the initial state.
• δ, the transition function, is from Q ×( Γ ∪ {<, >}) to Q×( Γ ∪ {<, >}) × A.
If ((q, a);(q', b, action)) ∈ 𝛿, then when in state q with a at the current read position on the
tape, M may replace a with b on the tape, perform the specified action, and enter state q'.
M accepts w ∈ ∑* iff it starts with configuration (q0, <w >) and the action Y is taken.

LBA Example 7.3:
L = {anbncn : n ≥ 0}
M = (Q, Σ, Γ, q0, δ) where:
Q = {s, t, u, v, w}
∑ = {a, b, c}
Γ = {a, b, c, x}
q0 = s
δ=
1. ((s, <), (t, <, R))
2. ((t, >), (t, >, Y))
3. ((t, x), (t, x, R))
4. ((t, a), (u, x, R))
5. ((u, a), (u, a, R))
6. ((u, x), (u, x, R))
7. ((u, b), (v, x, R))
8. ((v, b), (v, b, R))
9. ((v, x), (v, x, R))
10. ((v, c), (w, x, L))
11. ((w, c), (w, c, L))
12. ((w, b), (w, b, L))
13. ((w, a), (w, a, L))
14. ((w, x), (w, x, L))
15. ((w, <), (t, <, R))
LBA Example 7.4

This can be represented as a transition diagram as follows:
Figure 14: Transition diagram for an LBA
x → y, A denotes reading symbol x, writing symbol y and performing action A .

LBA Example
The intuition behind the previous example is that on each pass through the input string, we
match one a, one b and one c and replace each of them with an x until there are no a’s, b’s or
c’s left.

Each of the states can be explained as follows:
 State t looks for the leftmost a, changes this to an x, and moves into state u. If no symbol
from the input alphabet can be found, then the input string is accepted.
 State u moves right past any a’s or x’s until it finds a b. It changes this b to an x, and
moves into state v.
 State v moves right past any b’s or x’s until it finds a c. It changes this c to an x, and
moves into state w.
 State w moves left past any a’s, b’s, c’s or x’s until it reaches the start boundary, and
moves into state t.
LBA Computation
Consider input aabbcc for the previous LBA M:
(s, <aabbcc >) ├M (t, <aabbcc >) 4

├M (u, <xabbcc >) 5
├M (u, <xabbcc >) 7
├M (v, <xaxbcc >) 8
├M (v, <xaxbcc >) 10
├M (w, <xaxbxc >) 12
├M (t, <xaxbxc >) 3
├M (t, <xaxbxc >) 4
├M (u, <xxxbxc >) 6
├M (u, <xxxbxc >) 7
├M (v, <xxxxxc >) 9
├M (v, <xxxxxc >) 10
├M (w, <xxxxxx >) 14
├M (t, <xxxxxx >) 3
├M (t, <xxxxxx>) 2

8 Type-0: Recursively Enumerable
Languages
8.1 Turing Machine
In the previous chapters, we have studied three types of computational models: i) finite
automata (FA), ii) pushdown automata (PDA), and iii) linear bounded automata (LBA). A
finite automaton is an abstract machine having a finite control but no run-time memory. A
finite automaton produces only a binary output (accept/reject) and only accept the regular
languages. A pushdown automaton has a finite control and has, in addition, a single pushdown
stack as the run-time memory, whose size may grow at the run time. The class of languages
recognized by pushdown automata is characterized as context-free languages. Then we
studied a linear bounded automaton (LBA) which has a finite length tape (memory).
Characters can be read or written at any position on this tape. The class of languages
recognized by LBA is characterized as context-sensitive languages. These machines are of
limited computational power, compared with our real digital computers. For instance, the
language {anbncn : n  0) is not a context-free language but can be easily recognized by a short
program in the real computers.
In this chapter, we introduce a new machine model, called the Turing machine, which has a
finite control plus a tape as the run-time memory, whose size is unlimited. A Turing machine
uses a tape, which is infinite in both directions. The tape consists of a series of squares, each
of which can hold a single symbol. The tape head, or read-write head, can read a symbol
from the tape, write a symbol to the tape, and move one square in either direction.
Figure 15: Model of a Turing machine
Unlike the other automata we have discussed, a Turing machine does not read "input."
Instead, there may be (and usually are) symbols on the tape before the Turing machine begins;
the Turing machine might read some, all, or none of these symbols. The initial tape may, if
desired, be thought of as "input."
8.2 Formal Definition of Turing Machines

A Turing machine is a quintuple M = (Q, Σ, Γ, δ, q0) where Q is a finite set of states, Γ is a
finite set called the tape alphabet, Γ contains a special symbol # that represents a blank, Σ is a
subset of Γ−{#} called the input alphabet, δ is a partial function from Q×Γ to Q×Γ×{L, R}
called the transition function, and q0  Q is a distinguished state called the start state.

Because the Turing machine has to be able to find its input, and to know when it has
processed all of that input, we require:
 The tape is initially blank (every symbol is #) except possibly for a finite, contiguous
sequence of symbols.
 If there are initially nonblank symbols on the tape, the tape head is initially positioned
on one of them.
Most other textbooks make no distinction between Σ (the input alphabet) and (the tape
alphabet). We do this to emphasize that the "input" (the nonblank symbols on the tape) does
not contain #. Also, there may be more symbols in  than are present in the input.
8.2.1 Transition Function, Instantaneous Descriptions, and Moves

The transition function for Turing machines is given by
δ: Q  Q {L, R}
When the machine is in a given state (Q) and reads a given symbol () from the tape, it
replaces the symbol on the tape with some other symbol (), goes to some other state (Q), and
moves the tape head one square left (L) or right (R).
An instantaneous description or configuration of a Turing machine requires (1) the state the
Turing machine is in, (2) the contents of the tape, and (3) the position of the tape head on the
tape. This can be summarized in a string of the form
xi...xjqmxk...xl
where the x's are the symbols on the tape, qm is the current state, and the tape head is on the
square containing xk (the symbol immediately following qm). A move of a Turing machine can
therefore be represented as a pair of instantaneous descriptions, separated by the symbol " ".
For example, if
δ(q5, b) = (q8, c, R)
then a possible move might be
abbabq5babb abbabcq8abb
8.2.2 Turing Machines as Automata

A Turing machine halts when it no longer has any available moves. If it halts in a final state, it
accepts its input; otherwise, it rejects its input. Formally, a Turing Machine T = (Q, Σ, , δ, q0,
#, F) accepts a language L(M), where
L(M) = (w Σ+: q0w xiqfxj
for some qf F, xi, xj *}
(Notice that this definition assumes that the Turing machine starts with its tape head
positioned on the leftmost symbol.)
We said a Turing machine accepts its input if it halts in a final state. There are two ways this
could fail to happen:
1. The Turing machine could halt in a non-final state, or
2. The Turing machine could never stop (in which case we say it is in an infinite loop. )
If a Turing machine halts, the sequence of configurations leading to the halt state is called a
computation.

8.3 Recursively Enumerable Languages
A language is recursively enumerable if some Turing machine accepts it. Let L be a
recursively enumerable language and M the Turing Machine that accepts it. For string w:
if w ∈ L then M halts in a final state,
if w ∉ L then M halts in a non-final state or loops forever.
Remember that there are three possible outcomes of executing a Turing machine over a given
input. The Turing machine may; Halt and accept the input; Halt and reject the input; or Never
halt.
A language is recursive if there exists a Turing machine that accepts every string of the
language and rejects every string (over the same alphabet) that is not in the language.
8.4 Turing Machine as a Language Recognizer

8.4.1 Recognizing a Regular Language
L = {x  {a, b}* | x contains the substring aba}.
Figure 16: An FA accepting the (a+b)*aba(a+b)*
Figure 17: A TM that accepts (a+b)*aba(a+b)*
Assignment No. 8
8.4.2 Recognizing a Context-Free Language

This machine will match strings of the form {anbn: n 0}. q1 is the only final state.
Table 2: State table of a TM accepting anbn
Current Symbol Symbol Head Next

State Read Written Direction State
Find the left end of the input
q0 a a L q0
q0 b b L q0

q0 # # R q1
Erase the 'a' at the left end of the input
q1 a # R q2
Find the right end of the input
q2 a a R q2
q2 b b R q2
q2 # # L q3
Erase the 'b' at the left end of the input
q3 b # L q0
Example 8.2: Consider the language of palindrome over {a, b}. The Turing machine is given
as follows:
Figure 18: State diagram of a TM for the language of Palindrome
Tracing the moves of the machine:
 (q0, #abaa) ├ (q1, # a baa) ├ (q2, ##baa) ├*(q2, ##baa#) ├ (q3, ##baa)
├ (q4, ##ba) ├* (q4, ##ba) ├ (q1, ##ba) ├ (q5, ###a) ├ (q5, ###a#) ├ (q6, ###a) crash.
 (q0, #aba) ├ (q1, # a ba) ├ (q2, ##ba) ├*(q2, ##ba#) ├ (q3, ##ba)
├ (q4, ##b) ├ (q4, ##b) ├ (q1, ##b) ├ (q5, ####) ├ (q6, ###) ├ (h, #####) accept.

8.4.3 Turing Machines as Transducers
A Turing machine can be used as a transducer. The most obvious way to do this is to treat the
entire nonblank portion of the initial tape as input, and to treat the entire nonblank portion of
the tape when the machine halts as output.
In other words, a Turing machine defines a function y=f(x) for strings x, y * if
q0x qf y
where qf is a final state. A function f is Turing computable if there exists a Turing machine
that can perform the above task.
Let ∑ = {a}, and let L = {w Є ∑*: |w| is even}.

The following Turing Machine M = (K, ∑, δ, S) is a decision procedure for L:
K = {q0, q1 ….q6}. ∑ = {a, Y, N, #}, s = q0 and δ is given by the following table.
Q δ(q,б)
q0 # (q1, L)
q1 a (q2, #)
q1 # (q4, R)
q2 # (q3, L)
q3 a (q0, #)
q3 # (q2, R)
q4 # (q5, Y)
q5 Y (h, R)
q5 N (h, R)
q0 # (q5, N)
(q0, ##aaaa#) ├ (q1, ##aaaa#) ├ (q2, ##aaa##) ├ (q3, ##aaa##) ├ (q0, ##aa###)
├ (q1, ##aa###) ├ ( q2, ##a####) ├ (q3, ##a####) ├ (q0, #######) ├ (q1, #######)
├ (q4, #######) ├ (q5, ##Y###) ├ (h, ##Y###).
8.4.4 Recognizing Other Languages

Now consider a non-context free language L = {ww | w  {a, b}*}. (Double word).

A/A, R
a/a, L
q B/B, R q
0 4
b/b, L
a/a, R
b/b, R a/A, L
#/#, R a/A, R A/A, L b/B, L
b/B, R B/B, L
q q #/#, L q
1 2 3
A/A, L #/#, S
B/B, L
A/a, L q #/#, R q #/#, S

5 6
h
B/b, L
b/B, R A/A, R a/A, R

B/B, R
a/a, R a/a, R
b/b, R q q q b/b, R
7 9 8
#/#, R B/#, L A/#, L #/#, R
a/a, L
b/b, L
#/#, L
Figure 19: State diagram of TM for the language of double word
Let’s trace the moves by the machine for the input string abab.
 (q0, #abab) ├ (q1, # abab) ├ (q2, #Abab) ├*(q2, #Abab#) ├ (q3, #Abab)
├ (q4, #AbaB) ├* (q4, #AbaB) ├ (q1, #AbaB) ├ (q2, #ABaB)
├ (q2, #ABaB) ├ (q3, #ABaB) ├ (q4, #ABAB) ├ (q1, #ABAB)
├ (q5, #ABAB) ├ (q5, #AbAB)├ (q5, #abAB)
(First phase completed, center found.)
├ (q6, #abAB) ├ (q8, # AbAB) ├ (q8, #abAB) ├ (q9, #Ab#B)
├ (q9, #Ab#B) ├ (q6, #Ab#B) ├ (q7, #AB#B) ├ (q7, #AB#B)
├ (q9, #AB#) ├ (q9, #AB) ├ (q6, #AB#) ├ (h, #AB#) (accept).
Turing Machine to Copy Strings

Let us construct a TM that creates a copy of the input string to the right of the input but with a
blank separating the original from the copy.

a/a, R a/a, R
b/b, R b/b, R
#/#, R
q2 q3
#/#, R a/a, L
A/A, R
q0 q1 q4 b/b, L
B/B, R
#/#, L
#/#, L
q5 q6
A/a, L q7 a/a, R a/a, R

B/b, L b/b, R b/b, R
Figure 20: State diagram of TM to copy a string
Let’s trace the computation of the string aba:
(q0, #aba) ├ (q1, #aba) ├ (q2, #Aba) ├ (q2, #Aba) ├ (q2, #Aba#)
├ (q3, #Aba##) ├ (q4, #Aba#a) ├ (q4, #Aba#a) ├ (q4, #Aba#a)
├ (q4, #Aba#a) ├ (q1, #Aba#a) ├ (q5, #ABa#) ├ (q5, #ABa#a)
├ (q6, #ABa#a) ├ (q6, #ABa#a#)
├ (q6, # ABa#ab) ├ (q4, #ABa#ab) ├ (q4, #ABa#ab)
├ (q4, #ABa#ab) ├ (q1, #ABa#ab) ├ (q2, #ABA#ab)
├ (q3, #ABA#ab) ├ (q3, #ABA#ab) ├ (q3, #ABA#ab#)
├ (q4, #ABA#aba) ├ (q4, #ABA#aba) ├ (q4, #ABA#aba)
├ (q4, #ABA#aba) ├ (q1, #ABA#aba) ├ (q7, #ABA#aba)
├ (q7, #ABa#aba) ├ (q7, #Aba#aba) ├ (q7, #aba#aba)
├ (h, #aba#aba). (Accepted).
8.4.5 Sorting Machine

Given a string consisting of a's and b's, this machine will rearrange the string so that all the a's
come before all the b's.
Current Symbol Symbol Head Next
State Read Written Direction State
Find the left end of the input
q0 a a L q0
q0 b b L q0
q0 # # R q1
Find the leftmost 'b'

q1 a a R q1
q1 b b R q2
q1 # # L h
Look for an 'a' to the right of a 'b', replace with 'b'
q2 a b L q3
q2 b b R q2
q2 # # L h
Already replaced 'a' with 'b', now replace 'b' with 'a'
q3 b a L q0
8.4.6 Multitrack Machines

A multitrack tape is one in which the tape is divided into tracks. Multiple tracks increase the
amount of information that can be considered when determining the appropriate transition. A
tape position in a two-track machine is represented by the ordered pair [x, y], where x is the
symbol in track 1 and y in track 2. The states, input alphabet, tape alphabet, initial state, and
final states of a two-track machine are the same as in the standard Turing machine. A two-
track transition reads and rewrites the entire tape position. A transition of a two-track machine
is written δ(qi, [x, y]) = [qj, [z, w], d], where d  {L, R}.
The input to a two-track machine is placed in the standard input position in track 1. All the
positions in track 2 are initially blank. Acceptance in multitrack machines is by final state.
Languages accepted by two-track machines are precisely the recursively enumerable
languages.
8.4.7 Two-Way Tape Machines

A Turing machine with a two-way tape is identical to the standard model except that the tape
extends indefinitely in both directions. Since a two-way tape has no left boundary, the input
can be placed anywhere on the tape. All other tape positions are assumed to be blank. The
tape head is initially positioned on the blank to the immediate left of the input string.
8.4.8 Multitape Machines

A k-tape machine consists of k tapes and k independent tape heads. The states and alphabets
of a multitape machine are the same as in a standard Turing machine. The machine reads the
tapes simultaneously but has only one state. This is depicted by connecting each of the
independent tape heads to a single control indicating the current state. A transition is
determined by the state and symbols scanned by each of the tape heads. A transition in a
multitape machine may
i) change the state
ii) write a symbol on each of the tapes
iii) independently reposition each of the tape heads.
The repositioning consists of moving the tape head one square to the left or one square to the
right or leaving it at its current position. The input to a multitape machine is placed in the
standard position on tape 1. All the other tapes are assumed to be blank. The tape heads
originally scan the leftmost position of each tape. Any tape head attempting to move to the left

of the boundary of its tape terminates the computation abnormally. Any language accepted by
a k-tape machine is accepted by a 2k + 1-track machine.
8.4.9 Nondeterministic Turing Machines

A nondeterministic Turing machine may specify any finite number of transitions for a given
configuration. The components of a nondeterministic machine, apart from the transition
function, are identical to those of the standard Turing machine. Transitions in a
nondeterministic machine are defined by a partial function from Q × Γ to subsets of Q × Γ ×
{L, R}.
Language accepted by a nondeterministic Turing machine is recursively enumerable.
8.4.10 Turing Machines as Language Enumerators

A k-tape Turing machine E = (Q, Σ, Γ, δ, q0) enumerates a language L if
i) the computation begins with all tapes blank
ii) with each transition, the tape head on tape 1(the output tape) remains stationary or
moves to the right
iii) at any point in the computation, the nonblank portion of tape 1 has the form
B#u1#u2# · · · #uk# or B#u1#u2# · · · #uk#v, where ui  L and v  Σ∗
iv) u will be written on output tape 1 preceded and followed by # if, and only if, u  L.

9 Decidability
9.1 Decision Problems
A decision problem P is a set of questions, each of which has a yes or no answer. The single
question "Is 8 a perfect square?" is an example of the type of question under consideration in a
decision problem. A decision problem usually consists of an infinite number of related
questions. For example, the problem PSQ of determining whether an arbitrary natural number
is a perfect square consists of the following questions:
p0 : Is 0 a perfect square?
...
A solution to a decision problem P is an algorithm that determines the appropriate answer to
every question p  P. An algorithm that solves a decision problem should be
i) Complete
ii) Mechanistic
iii) Deterministic.
A procedure that satisfies the preceding properties is often called effective. A problem is
decidable if it has a representation in which the set of accepted input strings form a
recursive language. Since computations of deterministic multitrack and multitape machines
can be simulated on a standard Turing machine, solutions using these machines also
establishes the decidability of a problem.
9.2 The Church-Turing Thesis

The Church-Turing thesis asserts that every solvable decision problem can be transformed
into an equivalent Turing machine problem.
The Church-Turing thesis for decision problems: There is an effective procedure to solve a
decision problem if, and only if, there is a Turing machine that halts for all input strings and
solves the problem.
The extended Church-Turing thesis for decision problems: A decision problem P is
partially solvable if, and only if, there is a Turing machine that accepts precisely the elements
of P whose answer is yes.
A proof by the Church-Turing thesis is a shortcut often taken in establishing the existence of a
decision algorithm. Rather than constructing a Turing machine solution to a decision problem,
we describe an intuitively effective procedure that solves the problem. The Church-Turing
thesis asserts that a decision problem P has a solution if, and only if, there is a Turing machine
that determines the answer for every p  P. If no such Turing machine exists, the problem is
said to be undecidable.
9.3 The Halting Problem for Turing Machines

Theorem: The halting problem for Turing machines is undecidable.
Proof: The proof is by contradiction. Assume that there is a Turing machine H that solves the
halting problem. A string is accepted by H if
i) the input consists of the representation of a Turing machine M followed by a string w
ii) the computation of M with input w halts.
If either of these conditions is not satisfied, H rejects the input. The operation of the machine

H is depicted by the fig 9.1 The machine H is modified to construct a Turing machine H’. The
Figure 9.1: Halting Machine
computations of H’ are the same as H except H’ loops indefinitely whenever H terminates in

an accepting state, that is, whenever M halts on input w. The transition function of H’ is
constructed from that of H by adding transitions that causes H’ to move indefinitely to the
right upon entering an accepting configuration of H.
H’ is combined with a copy machine to construct another Turing machine D. The input to D is
a Turing machine representation R(M). A computation of D begins by creating the string
R(M)R(M) from the input R(M). The computation continues by running H’ on R(M)R(M).
The input to the machine D may be the representation of any Turing machine with alphabet 0,
1, B. In particular, D is such a machine. Consider a computation of D with input R(D).
Rewriting the previous diagram with M replaced by D and R(M) by R(D), we get
Figure 9.2: Turing Machine D with R(M) as input
Figure 9.3: Turing Machine D with R(D) as input
Examining the preceding computation, we see that D halts with input R(D) if, and only if, D
does not halt with input R(D). This is obviously a contradiction. However, the machine D can
be constructed directly from a machine H that solves the halting problem. The assumption that
the halting problem is decidable produces the preceding contradiction. Therefore, we conclude
that the halting problem is undecidable.

10 Undecidability
There are specific problems we cannot solve using a computer. These problems are called
“undecidable". While a Turing Machine looks nothing like a PC, it has been recognized as an
accurate model for what any physical computing device can do. We use the Turing Machine
to develop a theory of “undecidable" problems. We show that several problems that are easy
to express are in fact undecidable.
10.1 Problems That Computers Cannot Solve

One particular problem that we discuss is whether the first thing a C program prints is hello,
world. Although we might imagine that simulation of the program would allow us to tell what
the program does, we must in reality contend with programs that take an unimaginably long
time before making any output at all. This problem - not knowing when, if ever, something
will occur - is the ultimate cause of our inability to tell what a program does.
10.2 Programs that Print “Hello World"

In fig 10.1, it is easy to discover that the program prints hello, world and terminates. However,
there are other programs that also print hello, world; yet the fact that they do so is far from
obvious. Figure 10.2 shows another program that might print hello, world. It takes an input n
and looks for positive integer solutions to the equation xn + yn = zn. If it finds one, it prints
hello world. If it never finds integers x, y and z to satisfy the equation, then it continues
searching forever, and never prints hello world.
main()
{
printf (“hello world");
}
Figure 10.1: Hello-World Program
If the value of n that the program reads is 2, then it will eventually find combinations of
integers such as total = 12, x = 3, y = 4, and z = 5, for which xn + yn = zn. Thus, for input 2, the
program does print hello, world.
However, for any integer n > 2, the program will never find a triple of positive integers to
satisfy xn + yn = zn, and thus will fail to print hello world. Interestingly, until a few years ago,
it was known whether this program would print hello world for some large integer n. The
claim that it would not, i.e., that there are no integer solutions to the equation xn + yn = zn if n
> 2, was made by Fermat 300 years ago, but no proof was found until quite recently. This
statement is often referred to as “Fermat’s last theorem."
Let us define the −hello world− problem to be: determine whether a given C program, with a
given input, prints hello world as the first 11 characters that it prints. It would be remarkable
indeed if we could write a program that could examine any program P and input I for P, and
tell whether P, run with I as its input, would print hello world. We shall prove that no such
program exists.

int exp (int i, n)
/* computes i to the power n */
{
int ans, j;
ans = 1;
for (j=1; j<=n; j++) ans *=i;
return(ans);
}
main()
{
int n, total, x, y, z;
scanf(“%d", &n);
total = 3;
while( 1 ) {
for( x=1; x<=total-2; x++ )
for ( y=1; y<=total-x-1; y++) {
z = total - x - y;
if ( exp (x,n) + exp (y,n) == exp (z,n) )
printf (“hello world");
}
total++;
}
}
Figure 10.2: Fermat’s last theorem expressed as a hello-world program
10.3 The Hypothetical “Hello World" Tester

The proof of impossibility of making the hello-world test is a proof by contradiction. That is,
we assume there is a program, call it H, that takes as input program P and an input I, and tells
whether P with input I prints hello world. If a problem has an algorithm like H, that always
tells correctly whether an instance of the problem has answer “yes" or “no", then the problem
is said to be “decidable". Our goal is to prove that H does not exist, i.e. the hello-world
problem is undecidable.
In order to prove that statement by contradiction, we are going to make several changes to H,
eventually constructing a related program called H2 that we show does not exist. Since the
changes to H are simple transformations that can be done to any C program, the only
questionable statement is the existence of H, so it is that assumption we have contradicted.
To simplify our discussion, we shall make a few assumptions about C programs.
i) All output is character-based, e.g., we are not using a graphics package or any other
facility to make output that is not in the form of characters.
ii) All character-based output is performed using printf, rather than put-char() or another
character based output function.
We now assume that the program H exists. Our first modification is to change the output no,
which is the response that H makes when its input program P does not print hello world as its
first output in response to input I. As soon as H prints “n", we know it will eventually follow
with “o". Thus, we can modify any printf statement in H that prints “n" instead of print hello
world. Another printf statement that prints “o" but not the “n" is omitted. As a result, the new

program, which we call H1, behaves like H, except it prints hello world exactly when H would
print no.
Since we are interested in programs that take other programs as input and tell something about
them, we shall restrict H1 so that it:
a. Takes only input P , not P and I.
b. Asks what P would do if its input were its own code, i.e., what would H1 do on
inputs P as program and P as input I as well?
The modifications we must perform on H1 to produce the program H2 are as follows:
1. H2 first reads the entire input P and stores it in an array A, which it “malloc’s" for the
purpose.
2. H2 then simulates H1, but whenever H1 would read input from P or I, H2 reads from the
stored copy in A. To keep track of how much of P and I H1 has read, H2 can maintain
two cursors that mark positions in A.
What H2 does when given itself as input. Recall that H2, given any program P as input, makes
output yes if P prints hello world when given itself as input. Also, H2 prints hello world if P,
given itself as input, does not print hello, world as its first output.
Suppose that the H2 makes the output yes. Then the H2 in the box is saying about its input H2
that H2, given itself as input, prints hello world as its first output. But we just supposed that
the first output H2 makes in this situation is yes rather than hello world.
Thus, it appears that the output of the box is hello world, since it must be one or the other. But
if H2, given itself as input prints hello, world first, then the output of the program H2 must be
yes. Whichever output we suppose H2 makes, we can argue that it makes the other output.
This situation is paradoxical, and we conclude that H2 cannot exist. As a result, we have
contradicted the assumption that H exists. That is, we have proved that no program H can tell
whether or not a given program P with input I prints hello world as its first output.

TOC Notes 2020

Uploaded by

Copyright:

Available Formats

TOC Notes 2020

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

TOC Notes 2020

Uploaded by

Copyright:

Available Formats

CS602-Advance Theory of

Dr. Abdus Salam

Dr. Abdus Salam Advance Theory of Computation Page |1

Dr. Abdus Salam Advance Theory of Computation Page |2

Dr. Abdus Salam Advance Theory of Computation Page |3

Purpose of the Theory of Computation: Develop formal mathematical models

1.1 Complexity Theory

What makes some problems computationally hard and others easy?

Central Question in Complexity Theory: Classify problems according to

Dr. Abdus Salam Advance Theory of Computation Page |4

1.2 Computability Theory

Central Question in Computability Theory: Classify problems as being

1.3 Automata Theory

Dr. Abdus Salam Advance Theory of Computation Page |5

1.4 Theory of Computation

1.5 Formal Definition of Computation

Dr. Abdus Salam Advance Theory of Computation Page |6

Chomsky’s classification of formal languages:

Table 1: Chomsky's Hierarchy

TYPE GRAMMAR AUTOMATA

Type–3  Type–2  Type–1  Type–0

2.3 Operations on Sets

Dr. Abdus Salam Advance Theory of Computation Page |7

2.4 Additional Terminology

2.8 Proof Techniques

Dr. Abdus Salam Advance Theory of Computation Page |8

Proof by contradiction (also called reductio ad absurdum)

2.9 Fundamental Concepts

Dr. Abdus Salam Advance Theory of Computation Page |9

Dr. Abdus Salam Advance Theory of Computation Page | 10

contain a start state and some number of final states. For

3.2 Deterministic Finite Automata (DFA)

Dr. Abdus Salam Advance Theory of Computation Page | 11

When all characters have been read, accept the string if

Since q0 is a final state, the string is accepted.

3.2.1 Formal Definition of a DFA

3.2.2 Automata for Ada identifiers

M = (Q, Σ, δ, q0, F), where

q0  Q is the initial state,

Dr. Abdus Salam Advance Theory of Computation Page | 12

Extended transition function for DFA discussed in Example 3.1:

3.2.4 Abbreviated Automata for Ada Identifiers

3.3 Nondeterministic Finite Automata

Figure 5: A simple NFA Figure 6: A simple NFA with empty moves

Dr. Abdus Salam Advance Theory of Computation Page | 13

3.3.1 Formal Definition of NFAs

The language defined by NFA M is defined as

Extended Transition Function for NFA (δ*)

Figure 5: A simple NFA

Dr. Abdus Salam Advance Theory of Computation Page | 14

Figure 6: An e- NFA for the language of Integer

Figure 7: An e-NFA for the language of integer or real numbers

Dr. Abdus Salam Advance Theory of Computation Page | 15

Example 3.4: Consider the following non-deterministic finite automaton (NFA).

Figure 8: An e-NFA to be converted to DFA

Now construct an equivalent DFA.

Dr. Abdus Salam Advance Theory of Computation Page | 16

For example, DFA state {1, 2}

So altogether we can reach {3,4,5} from {1,2} on input 'a'

DFA state {3, 4, 5}, input 'a'