Grammars A grammar is a 4-tuple G = (V, T, P, S) where 1)V is a set of nonterminal symbols (also called variables or syntactic categories) 2)T is a finite.

Grammars A grammar is a 4-tuple G = (V, T, P, S) where 1)V is a set of nonterminal symbols (also called variables or syntactic categories) 2)T is a finite set of terminal symbols, disjoint from V 3)P is a finite subset of (V  T) * V(V  T) *  (V  T) * 4)S is a distinguished symbol in V called the start symbol (or sentence)

An element ( ,  ) in P is called a production and is written as   .  is referred to as the left hand side of the production and  is referred to as the right hand side of the production. Notice that the left side must have at least one nonterminal

Example: G 1 = (V, T, P, S) where V = {A, S} T = {0, 1} P = {S  0A1 0A  00A1 A   } Note: Grammars are generators, while automata are recognizers. A grammar defines a language in a recursive manner.

Definition: A sentential form of a grammar G = (V, T, P, S) is defined recursively as follows: 1.S is a sentential form 2.If  is a sentential form and  is a production in P, then  is also a sentential form A sentential form of G containing no nonterminal symbol is called a sentence generated by G.

The language generated by a grammar G, denoted by L(G), is the set of all sentences generated by G. Terminology:  G (directly derives) is a relation on (V  T) *. If    is a production in P, then   G   k k-fold product of   + transitive closure of   * reflexive transitive closure of 

If   k  then there is a sequence of strings  0,  i,  i+1,  k such that  0 = ,  i-1   i (1  i  k),  k =  This sequence of strings is called a derivation sequence of length k. L(G) = { w | w in T * and S  * w} Example: L(G 1 ) = {0 n 1 n | n > 0}

Design of grammars L = {w | w  {0,1} * and w starts and ends with the same symbol} S → 0A | 1B |  A → 0A | 1A | 0 B → 0B | 1B | 1

Classification of grammars Definition: A grammar G is said to be 1)Right-linear if each production in P is of the form A  xB or A  x where A and B are in V and x in T *. 2)Context-free if each production is of the form A  , where A is in V and  is in (V  T) *. 3)Context-sensitive if each production is of the form , where |  |  |  |. 4)A grammar with no restrictions as above is called unrestricted.

If a language L is generated by a ‘type x’ grammar, then the language L is said to be ‘type x’ language. Right-linear grammars generate right-linear languages, context-free grammars (CFGs) generate context-free languages (CFLs), context- sensitive grammars (CSGs) generate context- sensitive languages (CSLs). The four types are referred to as Chomsky hierarchy.

Equivalence of regular languages and right-linear languages

Lemma 1 (1) , (2) {  }, (3) {a} for all a in  are right-linear languages Proof: In each case we construct a right-linear grammar generating the language. (1) G = ({S}, , , S} is a right linear grammar such that L(G) = . (2) G = ({S}, , { S   }, S) is a right- linear grammar such that L(G) = {  }. (3) G a = ({S}, , { S  a}, S) is a right- linear grammar such that L(G a ) = { a }.

Lemma2 If L 1 and L 2 are right-linear languages, then (1) L 1  L 2, (2) L 1 L 2, (3) L 1 * are right-linear languages. Proof: Since L 1 and L 2 are right-linear languages, we can assume that there are right- linear grammars G 1 = (N 1, , P 1, S 1 ) and G2 = (N 2, , P 2, S 2 ) such that L(G 1 ) = L 1 and L(G 2 ) = L 2. We can assume that N 1 and N 2 are disjoint (otherwise rename the symbols). Define the following grammars: (1) G 3 = (N 1  N 2  {S 3 }, , P 1  P 2  {S 3  S 1 | S 2 }, S 3 )

(2) G 4 = (N 1  N 2, , P 4, S 1 ) where P 4 is defined as follows: (i) if A  xB is in P 1, then A  xB is in P 4 (ii) if A  x is in P 1, then A  xS 2 is in P 4 (iii) All productions in P 2 are in P 4 (3) G 5 = (N 1  {S 5 }, , P 5, S 5 ) where P 5 is defined as follows: (i) if A  xB is in P 1, then A  xB is in P 5 (ii) if A  x is in P 1, then A  xS 5 and A  x are in P 5 (iii) S 5  S 1 |  are in P 5 Claim: (1) L(G 3 ) = L(G 1 )  L(G 2 ), (2) L(G 4 ) = L(G 1 )L(G 2 ), and (3) L(G 5 ) = (L(G 1 )) *.

Proof of claim (2): If S 1  * w 1 in G 1 then S 1  * w 1 S 2 in G4. If S 2  * x in G 2 then S 2  x * in G4. So, L(G 1 )L(G 2 )  L(G 4 ). Now suppose S 1  * w in G 4. Since there are no productions of the form A  x in G 4 that came out of G 1, we can write the derivation in the form S 1  * xS 2  * xy where w = xy. By construction, there must be derivations of the form S 1  * x and S 2  * y. Hence, L(G 4 )  L(G 1 )L(G 2 ). Proofs of claims 1 and 3 are left as exercise.

Theorem : A regular language is a right- linear language Proof: Follows from Lemma 1 and inductive applications of Lemma2 on the construction of the regular language.

Regular Expression Equations X = aX + b where a and b are regular expressions and X is an unknown is a regular expression equation. It can be verified that a * b is a solution for the equation. A set of regular expression equations is said to be in standard form over a set of indeterminates  = {X 1, …, X n }if for each X i there is an equation of the form X i = a i0 + a i1 X 1 + … + a in X n where a ij are regular expressions over some alphabet disjoint form .

Solving regular expression equations Example: X 1 = 0X 2 + 1X 1 +  X 2 = 0X 3 + 1X 2 X 3 = 0X 1 + 1X 3

Theorem : The language defined by a right-linear grammar is a regular set Proof: Construct a set of regular expression equations from the productions. The solution of the start symbol is a regular expression denoting the language defined by the right-linear grammar.

Example Construct a regular expression equivalent to S  0A | 1S |  A  0B | 1A B  0S | 1B

Definitions: 1)A grammar G = (V, T, P, S) is said to be left-linear if each production is of the form A  Bx or A  x, where A and B are in V and x is in T *. 2)A grammar G = (V, T, P, S) is said to be regular if every production takes one of the forms A  aB or A  a or S  . If S   is a production, then S does not appear in the RHS of any production.

Example: A  B | C B  0B | 1B | 011 C  0D | 1C |  D  0C | 1D

Let M = (Q, , , q 0, F) be a DFA. Define a grammar G = (Q, , P, q 0 ) where P is defined as follows: P = {B  aC |  (B,a) = C}  {B  a |  (B,a) is in F} If q 0 is in F add q 0   to P. Then L(M) = L(G)

a, ba b b a A B C

Let G = (V, T, P, S) be a regular grammar Define as finite automaton M = (V  {f}, T, , S, {f}), where  (q,a) = p if q  ap is in P  (q,a) = f if q  a is in P if S   is a production,  (S,  ) = f Then, L(G) = L(M)

Grammars A grammar is a 4-tuple G = (V, T, P, S) where 1)V is a set of nonterminal symbols (also called variables or syntactic categories) 2)T is a finite.

Similar presentations

Presentation on theme: "Grammars A grammar is a 4-tuple G = (V, T, P, S) where 1)V is a set of nonterminal symbols (also called variables or syntactic categories) 2)T is a finite."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Grammars A grammar is a 4-tuple G = (V, T, P, S) where 1)V is a set of nonterminal symbols (also called variables or syntactic categories) 2)T is a finite.

Similar presentations

Presentation on theme: "Grammars A grammar is a 4-tuple G = (V, T, P, S) where 1)V is a set of nonterminal symbols (also called variables or syntactic categories) 2)T is a finite."— Presentation transcript:

Similar presentations

About project

Feedback