Download presentation
Presentation is loading. Please wait.
Published byJean Morris Modified over 9 years ago
1
Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014
2
3. Context-Free Grammars and Parsing (PART TWO)
3
Contents PART ONE 3.1 The Parsing Process 3.2 Context-Free Grammars 3.3 Parse Trees and Abstract PART TWO 3.4 Ambiguity 3.5 Extended Notations: EBNF and Syntax Diagrams 3.6 Formal Properties of Context-Free Languages
4
3.4 Ambiguity
5
What is Ambiguity Parse trees and syntax trees uniquely express the structure of syntax It is possible for a grammar to permit a string to have more than one parse tree For example, the simple integer arithmetic grammar: exp exp op exp| ( exp ) | number op + | - | * The string: 34-3*42
6
What is Ambiguity exp exp op exp * number exp op exp number - number exp exp op exp _ exp op exp number number * number This string has two different parse trees.
7
What is Ambiguity exp => exp op exp => exp op exp op exp, => number op exp op exp =>number - exp op exp => number - number op exp => number - number * exp => number - number * number exp=> exp op exp =>number op exp =>number - exp =>number - exp op exp =>number - number op exp =>number - number * exp => number - number * number Corresponding to the two leftmost derivations
8
What is Ambiguity The associated syntax trees are
9
An Ambiguous Grammar A grammar that generates a string with two distinct parse trees and at least two distinct derivations. Represents a serious problem for a parser Not specify precisely the syntactic structure of a program In some sense, an ambiguous grammar is like a non- deterministic automaton Two separate paths can accept the same string
10
An Ambiguous Grammar Ambiguity in grammars Cannot be removed nearly as easily as non- determinism in finite automata No algorithm for doing so, unlike the situation in the case of automata Ambiguous grammars Fail the tests that we introduce later for the standard parsing algorithms A body of standard techniques have been developed to deal with typical ambiguities that come up in programming languages.
11
Two Basic Methods dealing with Ambiguity 1. A disambiguating rule: specifies in each ambiguous case which of the parse trees (or syntax trees) is the correct one The advantage : it corrects the ambiguity without changing (and possibly complicating) the grammar. The disadvantage : the syntactic structure of the language is no longer given by the grammar alone. 2. Change the grammar into a form : forces the construction of the correct parse tree, thus removing the ambiguity. Of course, in either method, we must first decide which of the trees in an ambiguous case is the correct one.
12
Remove the Ambiguity in Simple Expression Grammar A disambiguating rule that establishes the relative precedence of the three operations. Give addition and subtraction the same precedence And, to give multiplication a higher precedence A further disambiguating rule is the associativity of each of the operations S pecify that all three of these operations are left associative
13
Remove the Ambiguity in simple Expression Grammar Specify that an operation is nonassociative : A sequence of more than one operator in an expression is not allowed. For instance, writing simple expression grammar in the following form: fully parenthesized expressions exp factor op factor | factor factor ( exp ) | number op + |- | *
14
Remove the Ambiguity in simple Expression Grammar Strings such as 34-3-42 and even 34-3*42 are now illegal, and must instead be written with parentheses, such as (34-3) -42 and 34- (3*42) The disadvantage: not only changed the grammar, also changed the language being recognized.
15
3.4.2 Precedence and Associativity
16
Two Typical Kinds of Ambiguities 1. The arithmetic grammars, such as : the simple integer arithmetic grammar: exp exp op exp| ( exp ) | number op + | - | * The string: 34-3*42 2. The dangling else problem (We will discuss this later) We use precedence and associativity to solve the first kind of ambiguity
17
Group of Equal Precedence The precedence can be added to our simple expression grammar as follows: exp exp addop exp | term addop + | - term term mulop term| factor mulop * factor ( exp ) | number Addition and subtraction will appear "higher" (closer to the root) in the parse and syntax trees Receive lower precedence.
18
Precedence Cascade Grouping operators into different precedence levels. Cascade is a standard method in syntactic specification using BNF. Replacing the rule exp exp addop exp | term by exp exp addop term |term or exp term addop exp |term A left recursive rule makes operators associate on the left A right recursive rule makes them associate on the right
19
Removal of Ambiguity Removal of ambiguity in the BNF rules for simple arithmetic expressions write the rules to make all the operations left associative exp exp addop term |term addop + | - term term mulop factor | factor mulop * factor ( exp ) | number
20
New Parse Tree The parse tree for the expression 34-3*42 is
21
New Parse Tree The parse tree for the expression 34-3-42 The precedence cascades cause the parse trees to become much more complex The syntax trees, however, are not affected
22
3.4.3 The dangling else problem
23
An Ambiguity Grammar Consider the grammar from: statement if-stmt | other if-stmt if ( exp ) statement | if ( exp ) statement else statement exp 0 | 1 This grammar is ambiguous as a result of the optional else. Consider the string if (0) if (1) other else other
26
Dangling else problem Which tree is correct? The first associates the else-part with the first if-statement; The second associates it with the second if-statement. This ambiguity called dangling else problem This disambiguating rule: the most closely nested rule Implies that the second parse tree above is the correct one.
27
An Example For example: if (x != 0) if (y = = 1/x) ok = TRUE; else z = 1/x; Note that, if we wanted we could associate the else-part with the first if-statement by using brackets {...} in C, as in if (x != 0) { if (y = = 1/x) ok = TRUE; } else z = 1/x;
28
A Solution to the dangling else ambiguity in the BNF statement matched-stmt | unmatched-stmt matched-stmt if ( exp ) matched-stmt else matched-stmt | other unmatched-stmt if ( exp ) statement | if ( exp ) matched-stmt else unmatched-stmt exp 0 | 1 Permitting only a matched-stmt to come before an else in an if-statement Forcing all else-parts to be matched as soon as possible.
30
More about dangling else The dangling else problem has its origins in the syntax of Algol60. It is possible to design the syntax in such a way that the dangling else problem does not appear. Require the presence of the else-part, and this method has been used in LISP and other functional languages (where a value must also be returned). Use a bracketing keyword for the if-statement languages that use this solution include Algol68 and Ada.
31
More About Dangling else For example, in Ada, the programmer writes if x /= 0 then if y = 1/x then ok := true; else z := 1/x; end if; Associate the else-part with the second if-statement, the programmer writes if x /= 0 then if y = 1/x then ok := true; end if else z := 1/x; end if;
32
More about dangling else BNF in Ada (somewhat simplified) is if-stmt if condition then statement-sequence end if | if condition then statement-sequence else statement-sequence end if
33
3.4.4 Inessential ambiguity
34
Why Inessential A grammar may be ambiguous and yet always produce unique abstract syntax trees. The grammar ambiguously as stmt-sequence stmt-sequence ; stmt-sequence | stmt stmt s An ambiguity is called an inessential ambiguity Either a right recursive or left recursive grammar rule would still result in the same syntax tree structure
35
Why Inessential Inessential ambiguity: the associated semantics do not depend on what disambiguating rule is used. Arithmetic addition or string concatenation, that represent associative operations (a binary operator is asso ciative if (a b) c = a (b c) for all values a, b, and c). The syntax trees are still distinct, but represent the same semantic value A parsing algorithm will need to apply some disambiguating rule that the compiler writer may need to supply
36
3.5 Extended Notations: EBNF and Syntax Diagrams
37
3.5.1 EBNF Notation
38
Special Notations for Repetitive Constructs Repetition 1. A A | (left recursive) 2. A A | (right recursive) where and are arbitrary strings of terminals and non-terminals, and In the first rule does not begin with A In the second does not end with A
39
Notation for repetition as regular expressions use, the asterisk *. A *, and A * EBNF opts to use curly brackets {...} to express repetition A { }, and A { } The problem with any repetition notation is that it obscures how the parse tree is to be constructed, but, as we have seen, we often do not care. Special Notations for Repetitive Constructs
40
Examples Example: The case of statement sequences The grammar as follows, in right recursive form: stmt-Sequence stmt ; stmt-Sequence | stmt stmt s In EBNF this would appear as stmt-sequence { stmt ; } stmt (right recursive form) stmt-sequence stmt { ; stmt} (left recursive form)
41
Examples A more significant problem occurs when the associativity matters exp exp addop term | term exp term { addop term } (imply left associativity) exp { term addop } term (imply right associativity)
42
Optional construct are indicated by surrounding them with square brackets [...]. The grammar rules for if-statements with optional else- parts would be written as follows in EBNF: statement if-stmt | other if-stmt if ( exp ) statement [ else statement ] exp 0 | 1 stmt-sequence stmt; stmt-sequence | stmt is written as stmt-sequence stmt [ ; stmt-sequence ] Special Notations for Repetitive Constructs
43
3.5.2 Syntax Diagrams
44
Syntax Diagrams Syntax Diagrams: Graphical representations for visually representing EBNF rules. An example: consider the grammar rule factor ( exp ) | number The syntax diagram: factor number () exp The syntax diagram for a context free grammar is like the DFA for RE It help us to translate grammars into programs
45
Syntax Diagrams Boxes representing terminals and non-terminals. Arrowed lines representing sequencing and choices. Non-terminal labels for each diagram representing the grammar rule defining that Non-terminal. A round or oval box is used to indicate terminals in a diagram. A square or rectangular box is used to indicate non- terminals. factor number () exp
46
Syntax Diagrams A repetition : A {B} An optional : A [B] B A B A
47
Examples Example: Consider the example of simple arithmetic expressions. exp exp addop term | term addop + | - term term mulop factor | factor mulop * factor ( exp ) | number This BNF includes associativity and precedence
48
Examples The corresponding EBNF is exp term { addop term } addop + | - term factor { mulop factor } mulop * factor ( exp ) | numberr, The corresponding syntax diagrams are given as follows: exp term addop
49
Examples addop + - term factor mulop * factor ( exp ) number
50
Examples Example: Consider the grammar of simplified if- statements, the BNF Statement if-stmt | other if-stmt if ( exp ) statement | if ( exp ) statement else statement exp 0 | 1 and the EBNF statement if-stmt | other if-stmt if ( exp ) statement [ else statement ] exp 0 | 1
51
The corresponding syntax diagrams are given in following figure. statement number if-stmt if() statementexp else statement exp 0 1
52
3.6 Formal Properties of Context-Free Language
53
3.6.1 A Formal Definition of Context-Free Language
54
Definition Definition: A context-free grammar consists of the following: 1. A set T of terminals. 2. A set N of non-terminals (disjoint from T). 3. A set P of productions, or grammar rules, of the form A a, where A is an element of N and a is an element of (T N)* (a possibly empty sequence of terminals and non- terminals). 4. A start symbol S from the set N.
55
Definition Let G be a grammar as defined above, G = (T, N, P, S). A derivation step over G is of the form a A => a , Where a and are elements of ( T N)*, and A is in P. The set of symbols: The union T N of the sets of terminals and non- terminals A sentential form: a string a in (T N)*.
56
Definition The relation a =>* is defined to be the transitive closure of the derivation step relation =>; t a =>* if and only if there is a sequence of 0 or more derivation steps (n >= 0) a 1=> a 2 => … => a n-1=> a n such that a = -a 1, and = a n (If n = 0, then a = )
57
Definition A derivation over the grammar G is of the form S =>* w, where w T* (i.e., w is a string of terminals only, called a sentence ), and S is the start symbol of G The language generated by G, written L(G), is defined as the set L(G) = {w T* | there exists a derivation S =>* w of G}. L(G) is the set of sentences derivable from S.
58
Definition A l eftmost derivation S =>*lm w is a derivation in which each derivation step a A => a , is such that a T*; that is, a consists only of terminals. A rightmost derivation is one in which each derivation step a A => a has the property that T*.
59
Parse Tree over Grammar G A rooted labeled tree with the following properties: 1. Each node is labeled with a terminal or a non- terminal or . 2. The root node is labeled with the start symbol S. 3. Each leaf node is labeled with a terminal or with . 4. Each non-leaf node is labeled with a non- terminal. 5. If a node with label A N has n children with labels X1, X2,..., Xn (which may be terminals or non-terminals), then A X1X2... Xn P (a production of the grammar).
60
CFG & Ambiguous A set of strings L is said to be a context- free language if there is context-free grammar G such that L = L (G). A grammar G is ambiguous if there exists a string w L(G) such that w has two distinct parse trees (or leftmost or rightmost derivations).
61
3.6.2 Grammar Rules as Equations
62
Meaning of Equation The grammar rules use the arrow symbol instead of an equal sign to represent the definition of names for structures (non-terminals) Left and right-hands sides still hold equality in some extents, but the defining process of the language that results from this view is different. Consider, for example, the following grammar rule, which is extracted (in simplified form) from our simple expression grammar: exp exp+ exp | number
63
Rules as Equation A non-terminal name like exp defines a set of strings of terminals, called E; (which is the language, of the grammar if the non-terminal is the start symbol). let N be the set of natural numbers; (corresponding to the regular expression name number ). Then, the given grammar rule can be interpreted as the set equation E = (E + E) N This is a recursive equation for the set E: E = N (N+N) (N+N+N) (N+N+N+N) ……
64
3.6.3 Chomsky Hierarchy and Limits of Context-Free Syntax
65
The Power of CFG Consider the definition of a number as a sequence of digits using regular expressions: digit = 0|1|2|3|4|5|6|7|8|9 number = digit digit* Writing this definition using BNF, instead, as Digit 0 |1|2|3|4|5|6|7|8|9 number number digit |digit Note: the recursion in the second rule is used to express repetition only.
66
Regular Grammar A grammar is called a regular grammar The recursion in the rule is used to express repetition only Can express everything that regular expressions can Can design a parser accepting characters directly from the input source file and dispense with the scanner altogether. A parser is a more powerful machine than a scanner but less efficient. The grammar would then express the complete syntactic structure, including the lexical structure The language implementers would be expected to extract these definitions from the grammar and turn them into a scanner
67
Context Rules Free of context rule : Non-terminals appear by themselves to the left of the arrow in context-free rules. A rule says that A may be replaced regardless of where the A occurs. Context-sensitive grammar rule : A rule would apply only if occurs before and occurs after the non-terminal. We would write this as A => a , (a ) Context-sensitive grammars are more powerful than context-free grammars Also much more difficult to use as the basis for a parser.
68
Requirement of a Context-Sensitive Grammar Rule The C rule requires declaration before use First : Include the name strings themselves in the grammar rules rather than include all names as identifier tokens that are indistinguishable. Second : For each name, we would have to write a rule establishing its declaration prior to a potential use. Generally, the length of an identifier is unrestricted The number of possible identifiers is (at least potentially) infinite. Even if names are allowed to be only two characters long, The potential for hundreds of new grammar rules. Clearly, this is an impossible situation
69
Solution like a Disambiguating Rule State a rule (declaration before use) not explicit in the grammar Such a rule cannot be enforced by the parser itself, since it is beyond the power of (reasonable) context- free rules to express. This rule becomes part of semantic analysis Depends on the use of the symbol table (which records which identifiers have been declared) The static semantics of the language include type checking (in a statically typed language) and such rules as declaration before use. Regard as syntax only those rules that can be expressed by BNF rules. Everything else we regard as semantics.
70
Unrestricted Grammars More general than the context-sensitive grammars. It have grammar rules of the form , where there are no restrictions on the form of the strings a and (except that a cannot be )
71
Types of Grammars The language classes they construct are also referred to as the Chomsky hierarchy, after Noam Chomsky, who pioneered their use to describe natural languages. type 0 : unrestricted grammar, equivalent to Turing machines type 1 : context sensitive grammar type 2 : context free grammar, equivalent to pushdown automaton type 3 : regular grammar, equivalent to finite automata These grammars represent distinct levels of computational power. Type 0 Type 1 Type 2 Type 3
72
Summaries Regular expression Context free grammar Power is limited Formal definition Derivation Parse tree & Syntax tree graphical explanation leftmost derivation rightmost derivation Ambiguity ambiguity elimination Syntax analysis Lexical analysis
73
End of Part Two THANKS
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.