AUTOMATA THEORY
Chapter 05 CONTEX-FREE GRAMMERS AND LANGUAGES
Introduction Context-free grammars (CFG) have played a central role in compiler technology since the 1960’s. They turned the implementation of parsers, ad- hoc implementation task. Parsers: functions that discover the structure of a program.
An informal example Let us consider the language of palindromes. A palindrome is a string that reads the same forward and backward, such as otto, madamimadam. Let’s consider describing only the palindromes with alphabet {0,1}. EX: 0110,11011 etc.
A Context-free Grammar for Palindromes 1.P є 2.P 0 3.P 1 4.P 0P0 5.P 1P1 Only for binary strings.
Definition of CFG A CFG is a way of describing language by recursive rules called productions. A CFG consists of … 1.A finite set of symbols/terminals/terminal symbols. 2.A finite set of variables/nonterminals. 3.A start symbol/start variable. 4.A finite set of productions/rules.
Definition of CFG (continue) Each productions consists of: a.the head of the production. b.the production symbol c.The body of the production, a string of zero or more terminals and variables.
Definition of CFG (continue) The four components of CFG G can be represent as follows: G = (V, T, P, S) Variables terminals productions Start variable
A Context-free Grammar for Palindromes The grammar G for the palindrome is represented by.. G = ({P},{0,1},A,P) pal where A represents the set of five productions: P є P 0 P 1 P 0P0 P 1P1 only for binary string
Example of CFG A CFG for simple expressions where the operators ‘+’ and ‘*’ present. It allows only the letters ‘a’ and ’b’ and the digits ‘0’ and ‘1’. Every identifiers must begin with a and b which may be followed by any other string in {a,b,0,1}* G=({E,I},T,P,E) T={0,1,a,b,+,*,(,)} productions: E I E E+E E E*E E (E) I a 6. I b 7. I Ia 8. I Ib 9. I I0 10 I I1
Derivation using grammar (ab+ab0) 1.E (E) E (E+E) E (I+E) E (Ib+E) E (ab+E) E (ab+I) E (ab+I0) E (ab+Ib0) E (ab+ab0) productions: 1.E I 2.E E+E 3.E E*E 4.E (E) 5.I a 6. I b 7. I Ia 8. I Ib 9. I I0 10 I I1
Example of CFG A CFG for syntactically correct infix algebraic expressions in the variables x, y and z.infix G=({S},T,P,S) T={x, y, z,-,+,*,/,(,)} productions: S → x S → y S → z S → S + S S → S - S S → S * S S → S / S S → ( S )
Derivation using grammar S → S * S S → S / S S → ( S ) productions: S → x S → y S → z S → S + S S → S - S
An informal example
An example of CFG
LMD and RMD LMD (Left Most Derivation): At each step we replace the left most variable by one of its production bodies. Such a derivation is called a leftmost derivation. A derivation is leftmost by using the relations => and => for one or many steps. RMD (Right Most Derivation): At each step we replace the right most variable by one of its production bodies. Such a derivation is called a rightmost derivation. A derivation is leftmost by using the relations => and => for one or many steps. lm rm
Left Most Derivation CFG: E I | E+E | E*E| (E) I a| B| Ia |Ib |I0 | I1 LMD: a*(a+b00): E =>E*E lm=>I*E lm=>a*E lm=>a*(E) lm=>a*(E+E) lm=>a*(I+E) lm=>a * (a+E) lm=>a*(a+I) lm=>a*(a+I0) lm=>a*(a+I00) lm=>a*(a+b00)
Right Most Derivation CFG: E I | E+E | E*E| (E) I a| B| Ia |Ib |I0 | I1 RMD: a*(a+b00): E =>E*E rm=>E*(E) rm=>E*(E+E) rm=>E*(E+I) rm=>E*(E+I0) rm=>E*(E+I00) rm=>E * (E+b00) rm=>E*(I+b00) rm=>E*(a+b00) rm=>I*(a+I00) rm=>a*(a+b00)
The Language of a Grammar If G(V,T,P,S) is a CFG, the language of G, denoted L(G), is the set of terminal strings that have derivations from the start symbol. That is, L(G)={w in T | S w} If a language L is the language of some context-free grammar, then L is said to be a context-free language, or CFL. G *
Parse Tree A tree representation for derivations which shows clearly has the symbols of a terminal string are grouped into substrings. Parse tree used in a compiler, data structure. In a compiler, the tree structure of the source program facilities the translation of the source program into executable code by allowing natural, recursive functions to perform this translation process. Graphical representation for a derivations.
Constructing Parse Tree Let us fix on a grammar G=(V,T,P,S). The parse trees for G are trees with the following conditions: 1.Each interior node is labeled by a variable V. 2.Each leaf is labeled by either variable, a terminal or є. 3.If an interior node is labeled A, and its children are labeled X1, X2………………….,Xk respectively, from the left, then A X1X2…Xk is a production.
Parse Tree Example A parse tree showing the derivation of I+E from E. E E+ E I
Parse Tree Example (Continue..) A parse tree showing the derivation P * 1.P є 2.P 0 3.P 1 4.P 0P0 5.P 1P1 0 0P P 1 P 1 є
The Yield of a Parse Tree If we look at the leaves of any parse tree and concatenate them from left, we get a string called the yield of a parse tree, which is always a string that is derived from the root variable. 1.The yield is a terminal string. That is, all leaves are labeled either with a terminal or with є. 2.The root is labeled by the start symbol.
Parse tree showing a*(a+b00) E E * E I a ()E E+E I a I I0 I 0 b
Parse tree showing ( x + y ) * x - z * y / ( x + x )
Parse tree showing The man read this book
Inference, Derivations, and Parse Trees Leftmost Derivation Rightmost Derivation Recursive Inference Parse Tree Derivation
Self Study Theorem 5.12, 5.14, 5.18
Ambiguous Grammar A grammar uniquely determines a structure for each string in its language. Not every grammar does provide unique structures. When a grammar fails to provide unique structure, it is known as ambiguous grammar. More than one derivation/parse tree.
Ambiguous Grammar example Let us consider a CFG: CFG: E I | E+E | E*E| (E) I a| B| Ia |Ib |I0 | I1 Expression: a + a*a LMD: E E+E I+E a+ E a+ E*E a+ I*E a+ a*E a+ a*I a+ a*a RMD: E E*E E*I E*a E+E*a E+I*a E+ a*a I+ a*a a+ a*a rm lm
LMD E E + I a E * E I a I a E Fig: Trees yield a+a*a
RMD E E * I a E + E I a I a E Fig: Trees yield a+a*a
Removing Ambiguity from Grammar Two causes of ambiguity in the grammar : 1.The precedence of operator is not respected. 2.A sequence of identical operators can group either from the left or from the right.
Prof. Busch - LSU36 Two derivation trees for
Prof. Busch - LSU37 take
Prof. Busch - LSU38 Good Tree Bad Tree Compute expression result using the tree
The solution of the problem of enforcing precedence is to introduce several different variables. 1.A factor- is an expression that cannot be broken apart by any adjacent operators. The only factors in our expression language are: i. Identifiers: It is not possible to separate the letters of identifier by attaching an operator. ii. Any parenthesized expression, no matter what appears inside the parenthesis. 2.A term- is an expression that cannot be broken by the ‘+’ operator. Term is product of one or more factors. 3.An expression-is a sum of one or more terms. Removing Ambiguity from Grammar
Let us consider a CFG: CFG: E I | E+E | E*E| (E) I a| B| Ia |Ib |I0 | I1 An unambiguous expression grammar : I a| B| Ia |Ib |I0 | I1 F I| (E) T F| T*F E T| E+T Removing Ambiguity from Grammar
Unambiguous Grammar example CFG: I a| B| Ia |Ib |I0 | I1 F I| (E) T F| T*F E T| E+T Expression: a + a*a Derivation: E E+T T+T F+ T I+ T a+ T a+ T*F a+ F*F a+ I*I a+ a*a
Inherent Ambiguity Topic L={a n b n c m d m |n>=1, m>=1}U{a n b m c m d m | n>=1, m>=1}
E T + T a T * F I a I a E F I F E E+T T+T F+ T I+ T a+ T a+ T*F a+ F*F a+ I*I a+ a*a Fig: Trees yield a+a*a Unambiguous Grammar example
Example of CFG A CFG for generates prefix expressions with operands x and y and binary operators +, -, *. productions: E → x E → y E → +EE E → -EE E → *EE
Example of CFG Design A CFG for the set of all strings with an equal number of a’s and b’s. productions: S→ aSbS | bSaS | Є
Example of CFG Design A CFG on the string length that no string in L(G) has ba as a substring. productions: S→ aS | Sb | a| b
Example of CFG Design A CFG for the regular expression 0*1(0+1)*. productions: S→ A1B A → 0A | Є B → 0B | 1B| Є
Example of CFG
Application of CFG CFG- a way to describe natural language Two of these uses: 1. Parsers 2. Markup language (HTML,XML) Parsers: A parse tree-as a graphical representation for derivations. Parsing is the process of determining if a string of tokens can be generated by a grammar. A complier may not actually construct a parse tree. However a parser must be capable of constructing such tree. A parser can be constructed for any grammar. The CFG is an essential concept for the implementation of parsers.
YACC Parser Generator Tools such as YACC take a CFG as input and produce a parser Exp: Id {…} | Exp ‘+’ Exp {…} | Exp ‘*’ Exp {…} | ‘(’ Exp ‘)’ {…} Id: ‘a’ {…} |’b’ {…} |Id ‘a’ {…} |Id ‘b’ {…} |Id ‘0’ {…} |Id ‘1’ {…} ;
Rules for YACC Parser Generator Rules: 1.Colon is used as the production symbol, 2.Productions-grouped together by the vertical bar 3.List of bodies for a given head ends with semicolon. 4.Terminals are quoted with single quotes 5.Variable names unquoted.
Markup Language A family of language called markup languages. The string in these languages are documents with certain marks (called tags) in them. Tags semantics of various string within the documents. The things I hate : 1. ABC xyz 2. AB ABC XYZ xy a) The text as viewed The things I hate ABC xyz AB ABC XYZ xy b) the HTML source EM Emphasized string P Paragraph OL Ordered Lists LI List Index
1.Char a|A|… 2.Text є |Char Text 3.Doc є|Element Doc 4.Element Text| Doc | List | 5. ListItem Doc 6. List є|ListItem List
Thank You