Download presentation
Presentation is loading. Please wait.
Published byAbner Woods Modified over 9 years ago
1
Lecture 5 Grammars Topics Moving on from Lexical Analysis Grammars Derivations CFLs Readings: 4.1 January 25, 2006 CSCE 531 Compiler Construction
2
– 2 – CSCE 531 Spring 2006 Overview Last Time Symbol table - hash table from K&R DFA review Simulating DFA figure 3.22 NFAs Thompson Construction: re NFA Examples NFA DFA, the subset construction ε – closure(s), ε – closure(T), move(T,a) Today’s Lecture Flex example Fig 3.28 revisitedReferences
3
– 3 – CSCE 531 Spring 2006 Pop Quiz- I will be a couple of minutes late Draw the NFA that recognizes (00 | 11)* (01 | 10). Given an NFA MR that recognizes the language denoted by a regular expression R build a machine that recognizes R even – that matches R an even number of times
4
– 4 – CSCE 531 Spring 2006 Lexical analyzer for subset of C int constants: int, octal, hex, Float constants C identifiers Keywords for, while, if, else Relational operators >= <= != == Arithmetic, Boolean and bit operators + - * / && || ! ~ & | Other symbols ; { } [ ] * ->
5
– 5 – CSCE 531 Spring 2006 Write core.l Flex Specification Due Monday Jan 30 Notes Install Identifiers and constants into symbol table Return separate token code for each relational operator. Not as in text!! Homework 02 Dues Thursday Jan 26 (now Saturday 28) Construct NFA for recognizing (a|b|ε)(ab)* Convert to DFA
6
– 6 – CSCE 531 Spring 2006 Flex example Fig 3.28 revisited /class/csce531-001/Examples/Flex Put “e=/class/csce531-001/Examples/” in your.bash_profile in your home directory (note the period makes it hidden.) Then when you login you can use “cd $e” to move to the Examples directory Files Files ex0.l, ex1.l (note last character is lowercase “L”) ex3.18.l, Makefile, y.tab.h Fixed a few things so it would actually compile and run
7
– 7 – CSCE 531 Spring 2006 Building and Runningex3.18 Preliminary steps cp $e/Flex/ex3.18.l. // copy lex-spec to current directory cp $e/Flex/ex3.18.l. // copy lex-spec to current directory cp $e/Flex/Makefile. cp $e/Flex/Makefile. cp $e/Flex/y.tab.h. cp $e/Flex/y.tab.h. flex ex3.18.l// creates the file lex.yy.c flex ex3.18.l// creates the file lex.yy.c ls ls gcc lex.yy.c –lfl gcc lex.yy.c –lfl./a.out./a.out if then else xbar (output)
8
– 8 – CSCE 531 Spring 2006 Routines section %main(){ int tok; int tok; while((tok = yylex() ) != EOF){ while((tok = yylex() ) != EOF){ printf("Token code %d\t lexeme %s \n", tok, yytext); printf("Token code %d\t lexeme %s \n", tok, yytext); }} /*Code for install_id() and install_num(); */ int install_id() { }intinstall_num(){}
9
– 9 – CSCE 531 Spring 2006 Regular Languages Regular Expressions NFA DFA All specify/recognize the same languages; these languages in formal language theory are called regular.
10
– 10 – CSCE 531 Spring 2006 Example of a Non-Regular Language L = { 0 n 1 n | n > 0 } is non-regular. Proof: Suppose that L were a regular language, then there would exist some DFA M that accepts L. Suppose that M has k states. Consider the collection of strings 0 00 00 000 000 … 0 k 0 k 0 k+1 0 k+1 Then by the Pigeon hole principle if you start at q 0 and follow the paths determined by the k+1 strings above, two of the strings, say 0 i and 0 j leave you in the same state q. But then from state q following the path determined by the string 1 i must leave you in a final state. must leave you in a final state. But then 0 j 1 i must be accepted also. This is a contradiction, which proves that L is not regular. QED Intuitively a DFA can count only a finite (bounded) number of things. The language of balanced parentheses is non-regular also.
11
– 11 – CSCE 531 Spring 2006 Moving on Up to the Parsing Side Lexical analysis can’t do it all Syntax analysis recognize things from context. The process of discovering the structure for some sentence or program. Need a mathematical model of syntax — a grammar G Need an algorithm for testing membership in L(G)
12
– 12 – CSCE 531 Spring 2006 The Role of the Parser Figure 4.1
13
– 13 – CSCE 531 Spring 2006 Context Free Grammars A Context free grammar is a formal mathematical model that has 4 components, G = (N, T, P, S), where N is a set of grammar symbols called nonterminals N is a set of grammar symbols called nonterminals T is a set of terminals (or tokens) T is a set of terminals (or tokens) P is a set of productions or rewrite rules of the form, P is a set of productions or rewrite rules of the form, Nonterminal string of grammar symbols E.g., N => a b N Terminology, left hand side, right hand side grammar symbols = N U T S is the start symbol (a nonterminal) S is the start symbol (a nonterminal) Generally a grammar is specified by listing the productions.
14
– 14 – CSCE 531 Spring 2006 Example Context Free Grammars Example: G = (N, T, P, S) N = {S, T} T = {a, b, c} P = { S aS, S bT, T c} Notational conventions Nonterminals are typically represented by capital letters N, T, P, S, … or lower case strings in italics, e.g., expr Terminals are typically represented by lower case letters a, b, … z, punctuation symbols, operators, parentheses, digits Unless otherwise stated the nonterminal of the first production is the start symbol “|” shorthand S aS | bT is shorthand for the two S productions S aS, S bT Lower case greek symbol represent strings of grammar symbols
15
– 15 – CSCE 531 Spring 2006 Derivations The derives (=>) relation is a binary relation between strings of grammar symbols. We define derives as below: If T X 1 X 2 …X n is a production and α and β are strings of grammar symbols then we say αTβ derives αX 1 X 2 …X n β and denote this by αTβ => αX 1 X 2 …X n β If T X 1 X 2 …X n is a production and α and β are strings of grammar symbols then we say αTβ derives αX 1 X 2 …X n β and denote this by αTβ => αX 1 X 2 …X n βExample
16
– 16 – CSCE 531 Spring 2006 Review of Properties of Binary Relations If R is a binary relation on A then R is a subset of A x A a subset of A x A Symmetric if a R b implies b R a Symmetric if a R b implies b R a Transitive if a R b and b R c implies a R c Transitive if a R b and b R c implies a R c The transitive closure of R is the minimal subset of A x A that contains R and is a transitive relation. Henceforth we will use (read derives) to denote the transitive closure of “=>” the “one-step” derives on the previous slide. α β means α => α 1 => α 2 => … α n = β Thus α β means that one can apply a sequence of productions and rewrite α as α 1, then apply a production to α 1 to rewrite to obtain α 2 … and eventually obtain β
17
– 17 – CSCE 531 Spring 2006 Derivations and Sentential Forms If α β or α => α 1 => α 2 => … α n = β then we say the sequence of rewrites forms a derivation of β from α. The purpose of a grammar is to rewrite strings of grammar symbols until we obtain a string of terminals(tokens). If G = (N, T, P, S) is a grammar then α is a sentential form if The Start symbol derives α, S α α derives a string of tokens, α ω, where ω Є T* Or written more concisely S α ω, where ω Є T*
18
– 18 – CSCE 531 Spring 2006 Language Generated by a grammar If G = (N, T, P, S) is a grammar then the language generated by G, denoted by L(G) is L(G) = {x Є T* | S x} Example S 0 S 1 | ε
19
– 19 – CSCE 531 Spring 2006 Parse Trees A parse tree is a graphical presentation of a derivation, satisfying The root is the start symbol The root is the start symbol Each leaf is a token or ε (note different font from text) Each leaf is a token or ε (note different font from text) Each interior node is a nonterminal Each interior node is a nonterminal If A is a parent with children X 1, X 2 … X n then A X 1 X 2 … X n is a production If A is a parent with children X 1, X 2 … X n then A X 1 X 2 … X n is a production
20
– 20 – CSCE 531 Spring 2006 Top down vs. Bottom Up Construction of Parse Trees G: S (E)*S S (E) E id + id S ( E ) * S id + id ( E )
21
– 21 – CSCE 531 Spring 2006 Bottom Up Construction of Parse Trees G: S (E)*S S (E) E id + id X * Y + Z * W
22
– 22 – CSCE 531 Spring 2006 Leftmost (Rightmost) derivations A derivation S ω, where ω Є (N U T)* is called Leftmost if at each step you rewrite the leftmost nonterminal in the sentential form. If we want to emphacize that this is a leftmost derviation we will write S LM ω, read S leftmost derives ω. Example E E + E E E * E E id We will henceforth use the ‘|’ shorthand and write this grammar as E E + E | E * E | id Rightmost derivations are defined in a similar manner.
23
– 23 – CSCE 531 Spring 2006 Ambiguity A grammar is ambiguous if there is a string of terminals that has two distinct parse trees (or two distinct LM derivations or 2 RM derivations) Example: E E + E | E * E | id
24
– 24 – CSCE 531 Spring 2006 Eliminating Ambiguity Rewrite the grammar is the approach taken. However there are certain languages that no matter what grammar is chosen it will have to be ambiguous. These languages are called inherently ambiguous languages. We will not consider any of these languages in this class.
25
– 25 – CSCE 531 Spring 2006 Consider the grammar for expressions E E + E | E * E | id
26
– 26 – CSCE 531 Spring 2006 Derivations and Precedence This grammar has no notion of precedence! To add precedence Create a non-terminal for each level of precedence Isolate the corresponding part of the grammar Force the parser to recognize high precedence subexpressions first For algebraic expressions Multiplication and division, first ( level one ) Subtraction and addition, next ( level two )
27
– 27 – CSCE 531 Spring 2006 Rewriting the Expression Grammar Add nonterminals for each level of precedence Term (product) for components of sums Term (product) for components of sums Factor for components of products(terms) Factor for components of products(terms) Expr Expr + Term Expr Expr - Term Expr Term Term Term + Factor Term Term - Factor Term Factor Factor ID Factor NUMBER Factor ( Expr )
28
– 28 – CSCE 531 Spring 2006 Derivation of 5 * X + 3 * Y
29
– 29 – CSCE 531 Spring 2006 Notes on rewritten grammar It is more complex; more nonterminals, more productions. It is more complex; more nonterminals, more productions. It requires more steps in the derivation It requires more steps in the derivation But it does eliminate the ambiguity. But it does eliminate the ambiguity.
30
– 30 – CSCE 531 Spring 2006 Ambiguous Grammar 2 If-else The leftmost and rightmost derivations for a sentential form may differ, even in an unambiguous grammar Classic example — the if-then-else problem Stmt if Expr then Stmt | if Expr then Stmt else Stmt | other stmts
31
– 31 – CSCE 531 Spring 2006 Ambiguity This sentential form has two derivations if Expr 1 then if Expr 2 then Stmt 1 else Stmt 2
32
– 32 – CSCE 531 Spring 2006 Removing the ambiguity To eliminate the ambiguity We must rewrite the grammar to avoid generating the problem We must rewrite the grammar to avoid generating the problem We must associate each else with the innermost unmatched if We must associate each else with the innermost unmatched if S withElse
33
– 33 – CSCE 531 Spring 2006 Ambiguity Removing the ambiguity Must rewrite the grammar to avoid generating the problem Match each else to innermost unmatched if With this grammar, the example has only one derivation Intuition: a NoElse always has no else on its last cascaded else if statement
34
– 34 – CSCE 531 Spring 2006 Ambiguity if Expr 1 then if Expr 2 then Stmt 1 else Stmt 2 This binds the else controlling S 2 to the inner if
35
– 35 – CSCE 531 Spring 2006 Deeper Ambiguity Ambiguity usually refers to confusion in the CFG Overloading can create deeper ambiguity a = f(17) In many Algol-like languages, f could be either a function or a subscripted variable Disambiguating this one requires context Need values of declarations Really an issue of type, not context-free syntax Requires an extra-grammatical solution (not in CFG ) Must handle these with a different mechanism Step outside grammar rather than use a more complex grammar
36
– 36 – CSCE 531 Spring 2006 Regular Languages and Grammars A grammar where all productions are of the form A a or A a B, where A,B N and a T Is called left-linear or sometimes a regular grammar. It turns out that the language generated by a left-linear grammar is a regular language. How would you prove that?
37
– 37 – CSCE 531 Spring 2006 Context Free Languages A language L is called a context free language (CFL) if there exits a context free grammar that generates it, i.e., L = L(G).
38
– 38 – CSCE 531 Spring 2006 Left recursion A Aα | β
39
– 39 – CSCE 531 Spring 2006 Elimination of Immediate Left Recursion
40
– 40 – CSCE 531 Spring 2006 Error handling Error detection Error recovery
41
– 41 – CSCE 531 Spring 2006 Fig 3.27 NFA DFA ab {0,1,2,4,7} {3, 8, 6,1,2,4,7}= {1,2,3,4,6,7,8} {5,6,1,2,4,7}= {1,2,4,5,6,7} {1,2,3,4,6,7,8} {3,6,1,2,4,7,8} a loop ε-clos{5,9} = {5,9,6,1,2,4,7}= {1,2,4,5,6,7,9} {1,2,4,5,6,7} ε-clos{3,8} = {3,8,6,1,2,4,7} = {1,2,3,4,6,7,8} ε-clos{5} = {5,6,1,2,4,7}= {1,2,4,5,6,7} a loop {1,2,4,5,6,7,9} ε-clos{3,8} = {1,2,3,4,6,7,8} ε-clos{5,10} = {5,10,6,1,2,4,7}= {1,2,4,5,6,7,10} {1,2,4,5,6,7,10} ε-clos{3,8} = {1,2,3,4,6,7,8} ε-clos{5} = {1,2,4,5,6,7}
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.