Download presentation
Presentation is loading. Please wait.
Published byPhilomena Cox Modified over 9 years ago
1
241-437 Compilers: syntax/4 1 Compiler Structures Objective – –describe general syntax analysis, grammars, parse trees, FIRST and FOLLOW sets 241-437, Semester 1, 2011-2012 4. Syntax Analysis
2
241-437 Compilers: syntax/4 2 Overview 1. What is a Syntax Analyzer? 2. What is a Grammar? 3. Parse Trees 4. Types of CFG Parsing 5. Syntax Analysis Sets
3
241-437 Compilers: syntax/4 3 In this lecture Source Program Target Lang. Prog. Semantic Analyzer Syntax Analyzer Lexical Analyzer Front End Code Optimizer Target Code Generator Back End Int. Code Generator Intermediate Code
4
241-437 Compilers: syntax/4 4 1. What is a Syntax Analyzer? Lexical Analyzer if (a == 0) a = b; if(a==0)a=b; Syntax Analyzer builds a parse tree IF EQASSIGN a0ab
5
241-437 Compilers: syntax/4 5 Syntax Analyses that we do IgaveJim cardthe pronounverb proper noun noun phrase articlenoun - Identify the function of each word - Recognize if a sentence is grammatically correct sentence (subject) (action)(object) verb phrase (indirect object) grammar types / categories
6
241-437 Compilers: syntax/4 6 Languages We use a natural language to communicate – –its grammar rules are very complex – –the rules don’t cover important things We use a formal language to define a programming language – –its grammar rules are fairly simple – –the rules cover almost everything
7
241-437 Compilers: syntax/4 7 2. What is a Grammar? A grammar is a notation for defining a language, and is made from 4 parts: – –the terminal symbols – –the syntactic categories (nonterminal symbols) e.g. statement, expression, noun, verb – –the grammar rules (productions) e,g, A => B1 B2... Bn – –the starting nonterminal the top-most syntactic category for this grammar continued
8
241-437 Compilers: syntax/4 8 We define a grammar G as a 4-tuple: G = (T, N, P, S) – –T = terminal symbols – –N = nonterminal symbols – –P = productions/rules – –S = starting nonterminal
9
241-437 Compilers: syntax/4 9 2.1. Example 1 Consider the grammar: T = {0, 1} N = {S, R} P = {S => 0 S => 0 R R => 1 S } S is the starting nonterminal the right hand sides of productions usually use a mix of terminals and nonterminals
10
241-437 Compilers: syntax/4 10 Is “01010” in the language? Start with a S rule: – –RuleString Generated --S S => 0 R0 R R => 1 S0 1 S S => 0 R0 1 0 R R => 1 S0 1 0 1 S S => 00 1 0 1 0 No more rules can be applied since there are no more nonterminals left in the string. Yes, it is in the language.
11
241-437 Compilers: syntax/4 11 Example 2 Consider the grammar: T = {a, b, c, d, z} N = {S, R, U, V} P = {S => R U z | z R => a | b R U => d V U | c V => b | c } S is the starting nonterminal
12
241-437 Compilers: syntax/4 12 The notation: X => Y | Z is shorthand for the two rules: X => Y X => Z Read ‘|’ as ‘or’.
13
241-437 Compilers: syntax/4 13 Is “adbdbcz” in the language? RuleString Generated --S S => R U zR U z R => aa U z U => d V Ua d V U z V => ba d b U z U => d V Ua d b d V U z V => ba d b d b U z U => ca d b d b c z Yes! This grammar has choices about how to rewrite the string.
14
241-437 Compilers: syntax/4 14 Example 3: Sums The grammar: T = {+, -, 0, 1, 2, 3,..., 9} N = {L, D} P = {L => L + D | L – D | D D => 0 | 1 | 2 |... | 9 } L is the starting nonterminal e.g. 5 + 6 - 2
15
241-437 Compilers: syntax/4 15 Example 4: Brackets The grammar: T = { '(', ')' } N = {L} P = {L => '(' L ')' L L => ε } L is the starting nonterminal ε means 'nothing'
16
241-437 Compilers: syntax/4 16 2.2. Derivations A sequence of the form: w 0 w 1 … w n is a derivation of w n from w 0 (or w 0 * w n ) Example: Lrule L => ( L ) L ( L ) Lrule L => ( ) Lrule L => ( ) L * ( ) This means that the sentence ( ) is a derivation of L
17
241-437 Compilers: syntax/4 17 L rule L => ( L ) L ( L ) L rule L => ( L ) L ( L ) ( L ) L rule L => ( L ) ( L ) rule L => ( L ) L (( L ) L ) ( L ) rule L => (( ) L ) ( L ) rule L => ( ( ) L ) ( ) rule L => ( ( ) ) ( ) so L * (( )) ( )
18
241-437 Compilers: syntax/4 18 2.3. Kinds of Grammars There are 4 main kinds of grammar, of increasing expressive power: – –regular (type 3) grammars – –context-free (type 2) grammars – –context-sensitive (type 1) grammars – –unrestricted (type 0) grammars They vary in the kinds of productions they allow.
19
241-437 Compilers: syntax/4 19 Regular Grammars Every production is of the form: A => a | a B | – –A, B are nonterminals, a is a terminal These are sometimes called right linear rules because if a nonterminal appears in the rule body, then it must appear last. Regular grammars are equivalent to REs. S => wT T => xT T => a
20
241-437 Compilers: syntax/4 20 Example Integer => + UInt | - UInt | 0 Digits | 1 Digits |... | 9 Digits UInt => 0 Digits | 1 Digits |... | 9 Digits Digits => 0 Digits | 1 Digits |... | 9 Digits |
21
241-437 Compilers: syntax/4 21 Context-Free Grammars (CFGs) Every production is of the form: A => – –A is a nonterminal, can be any number of nonterminals or terminals The Syntax Analyzer uses CFGs. A => a A => aBcd B => ae
22
241-437 Compilers: syntax/4 22 2.4. REs for Syntax Analysis? Why not use REs to describe the syntax of a programming language? – –they don’t have enough power Examples: – –nested blocks, if statements, balanced braces We need the ability to 'count', which can be implemented with CFGs but not REs.
23
241-437 Compilers: syntax/4 23 3. Parse Trees A parse tree is a graphical way of showing how productions are used to generate a string. The syntax analyzer creates a parse tree to store information about the program being compiled.
24
241-437 Compilers: syntax/4 24 Example The grammar: T = { a, b } N = { S } P = { S => S S | a S b | a b | b a } S is the starting nonterminal
25
241-437 Compilers: syntax/4 25 Parse Tree for “aabbba” The root of the tree is the start symbol S: S Expand using S => S S S S S Expand using S => a S b continued expand the symbol in the circle
26
241-437 Compilers: syntax/4 26 S S S S a b Expand using S => a b S S S S a b ab Expand using S => b a continued
27
241-437 Compilers: syntax/4 27 S S S a b ab S ba Stop when there are no more nonterminals in leaf positions. Read off the string by reading the leaves left to right.
28
241-437 Compilers: syntax/4 28 3.1. Ambiguity Two (or more) parse trees for the same string E => E + E E => E – E E => 0 | … | 9 E E + E E - E E + E E - E E 23 4 2 34 2 – 3 + 4 or
29
241-437 Compilers: syntax/4 29 The two derivations: E E + E E E – E E – E + E 2 – E 2 – E + E 2 – E + E 2 – 3 + E 2 – 3 + E 2 – 3 + 4 2 – 3 + 4
30
241-437 Compilers: syntax/4 30 Fixing Ambiguity An ambiguous grammar can sometimes be made unambiguous: E => E + T | E – T | T T => 0 | … | 9 We'll look at some techniques in chapter 5.
31
241-437 Compilers: syntax/4 31 4. Types of CFG Parsing Top-down (chapter 5) – –recursive descent (predictive) parsing – –LL methods Bottom-up (chapter 6) – –operator precedence parsing – –LR methods – –SLR, canonical LR, LALR
32
241-437 Compilers: syntax/4 32 4.1. A Statement Block Grammar The grammar: T = {begin, end, simplestmt, ;} N = {B, SS, S} P = {B => begin SS end SS => S ; SS | ε S => simplestmt | begin SS end } B is the starting nonterminal
33
241-437 Compilers: syntax/4 33 Parse Tree begin simplestmt ; simplestmt ; end S S SS B B => begin SS end SS => S ; SS SS => S => simplestmt S => begin SS end begin simplestmt ; simplestmt ; end
34
241-437 Compilers: syntax/4 34 4.2. Top Down (LL) Parsing begin simplestmt ; simplestmt ; end SS B B => begin SS end SS => S ; SS SS => S => simplestmt S => begin SS end continued
35
241-437 Compilers: syntax/4 35 begin simplestmt ; simplestmt ; end S SS B B => begin SS end SS => S ; SS SS => S => simplestmt S => begin SS end continued
36
241-437 Compilers: syntax/4 36 begin simplestmt ; simplestmt ; end S SS B B => begin SS end SS => S ; SS SS => S => simplestmt S => begin SS end continued
37
241-437 Compilers: syntax/4 37 begin simplestmt ; simplestmt ; end S S SS B B => begin SS end SS => S ; SS SS => S => simplestmt S => begin SS end continued
38
241-437 Compilers: syntax/4 38 begin simplestmt ; simplestmt ; end S S SS B B => begin SS end SS => S ; SS SS => S => simplestmt S => begin SS end continued
39
241-437 Compilers: syntax/4 39 begin simplestmt ; simplestmt ; end S S SS B 1 2 3 4 5 6 B => begin SS end SS => S ; SS SS => S => simplestmt S => begin SS end
40
241-437 Compilers: syntax/4 40 4.3. Bottomup (LR) Parsing begin simplestmt ; simplestmt ; end S B => begin SS end SS => S ; SS SS => S => simplestmt S => begin SS end continued
41
241-437 Compilers: syntax/4 41 begin simplestmt ; simplestmt ; end S S B => begin SS end SS => S ; SS SS => S => simplestmt S => begin SS end continued
42
241-437 Compilers: syntax/4 42 begin simplestmt ; simplestmt ; end S S SS B => begin SS end SS => S ; SS SS => S => simplestmt S => begin SS end continued
43
241-437 Compilers: syntax/4 43 begin simplestmt ; simplestmt ; end S S SS B => begin SS end SS => S ; SS SS => S => simplestmt S => begin SS end continued
44
241-437 Compilers: syntax/4 44 begin simplestmt ; simplestmt ; end S S SS B => begin SS end SS => S ; SS SS => S => simplestmt S => begin SS end continued
45
241-437 Compilers: syntax/4 45 begin simplestmt ; simplestmt ; end S S SS B 6 5 1 4 2 3 B => begin SS end SS => S ; SS SS => S => simplestmt S => begin SS end
46
241-437 Compilers: syntax/4 46 5. Syntax Analysis Sets Syntax analyzers for top-down (LL) and bottom-up (LR) parsing utilize two types of sets: – –FIRST sets – –FOLLOW sets These sets are generated from the programming language CFG.
47
241-437 Compilers: syntax/4 47 5.1. The FIRST Sets FIRST( ) = set of all terminals that start productions for that non-terminal Example: S => ping S => begin S end FIRST(S) = { ping, begin }
48
241-437 Compilers: syntax/4 48 More Mathematically A is a non-terminal. FIRST(A) = – –{ c | A =>* c , c is a terminal } { } if A =>* is the rest of the terminals and nonterminals after 'c'
49
241-437 Compilers: syntax/4 49 Building FIRST Sets For each non-terminal A, FIRST(A) = FIRST_SEQ( ) FIRST_SEQ( ) ... for all productions A => , A => ,... – – , are the bodies of the productions
50
241-437 Compilers: syntax/4 50 FIRST_SEQ() FIRST_SEQ( ) = { } FIRST_SEQ(c ) = { c }, if c is a terminal FIRST_SEQ(A ) = FIRST(A), if FIRST(A) = (FIRST(A) – { }) FIRST_SEQ( ), if FIRST(A) – – is a sequence of terminals and non-terminals, and possibly empty
51
241-437 Compilers: syntax/4 51 FIRST() Example 1 S => a S e S => B B => b B e B => C C => c C e C => d FIRST(C) = {c,d} FIRST(C) = {c,d} FIRST(B) = FIRST(B) = FIRST(S) = FIRST(S) = Start with FIRST(C) since its rules only start with terminals continued
52
241-437 Compilers: syntax/4 52 FIRST(C) = {c,d} FIRST(B) = {b,c,d} FIRST(S) = do FIRST(B) now that we know FIRST(C) S => a S e S => a S e S => B S => B B => b B e B => b B e B => C B => C C => c C e C => c C e C => d C => d continued
53
241-437 Compilers: syntax/4 53 FIRST(C) = {c,d} FIRST(B) = {b,c,d} FIRST(S) = {a,b,c,d} S => a S e S => a S e S => B S => B B => b B e B => b B e B => C B => C C => c C e C => c C e C => d C => d do FIRST(S) now that we know FIRST(B)
54
241-437 Compilers: syntax/4 54 FIRST() Example 2 P => i | c | n T S Q => P | a S | b S c S T R => b | S => c | R n | T => R S q FIRST(P) = {i,c,n} FIRST(P) = {i,c,n} FIRST(Q) = FIRST(Q) = FIRST(R) = {b, } FIRST(R) = {b, } FIRST(S) = FIRST(S) = FIRST(T) = FIRST(T) = continued Start with P and R since their rules only start with terminals or
55
241-437 Compilers: syntax/4 55 FIRST(P) = {i,c,n} FIRST(Q) = {i,c,n,a,b} FIRST(R) = {b, } FIRST(S) = FIRST(T) = P => i | c | n T S P => i | c | n T S Q => P | a S | b S c S T Q => P | a S | b S c S T R => b | R => b | S => c | R n | S => c | R n | T => R S q T => R S q continued do FIRST(Q) now that we know FIRST(P)
56
241-437 Compilers: syntax/4 56 FIRST(P) = {i,c,n} FIRST(Q) = {i,c,n,a,b} FIRST(R) = {b, } FIRST(S) = {c,b,n, } FIRST(T) = do FIRST(S) now that we know FIRST(R) Note: S R n n because R * P => i | c | n T S P => i | c | n T S Q => P | a S | b S c S T Q => P | a S | b S c S T R => b | R => b | S => c | R n | S => c | R n | T => R S q T => R S q continued
57
241-437 Compilers: syntax/4 57 FIRST(P) = {i,c,n} FIRST(Q) = {i,c,n,a,b} FIRST(R) = {b, } FIRST(S) = {c,b,n, } FIRST(T) = {b,c,n,q} do FIRST(T) now that we know FIRST(R) and FIRST(S) Note: T R S q S q q because both R and S * P => i | c | n T S P => i | c | n T S Q => P | a S | b S c S T Q => P | a S | b S c S T R => b | R => b | S => c | R n | S => c | R n | T => R S q T => R S q
58
241-437 Compilers: syntax/4 58 FIRST() Example 3 S => a S e | S T S T => R S e | Q R => r S r | Q => S T | FIRST(S) = {a} FIRST(S) = {a} FIRST(T) = {r, a, } FIRST(T) = {r, a, } FIRST(R) = {r, } FIRST(R) = {r, } FIRST(Q) = {a, } FIRST(Q) = {a, } Order 1) R, S 2) Q 3) T
59
241-437 Compilers: syntax/4 59 5.2. The FOLLOW Sets FOLLOW( ) = – –set of all the terminals that follow in productions – –the set includes $ if nothing follows
60
241-437 Compilers: syntax/4 60 Example: S => bing A bong | ping A pong | zing A A => ha FOLLOW(A) = { bong, pong, $ }
61
241-437 Compilers: syntax/4 61 More Mathematically A is a non-terminal. FOLLOW(A) = { c in terminals | S => +... A c... } { $ } if S => +... is a sequence of terminals and non-terminals => + is any number of => expansions
62
241-437 Compilers: syntax/4 62 Building FOLLOW() Sets To make the FOLLOW(A) set, apply rules 1-4: 1. for all productions (B =>... A ) add FIRST_SEQ( )-{ } 2. for all (B =>... A ) and FIRST_SEQ( ) add FOLLOW(B) 3. for all (B =>... A) add FOLLOW(B) 4. if A is the start symbol then add { $ } is a sequence of termminals and non-terminals
63
241-437 Compilers: syntax/4 63 What is in FOLLOW(A) for the productions: B => A C C => s FOLLOW(A) gets FIRST_SEQ(C) == FIRST(C) == { s } – –uses rule 1 continued Small Examples
64
241-437 Compilers: syntax/4 64 What is in FOLLOW(A) for the productions: C => B r B => t A FOLLOW(A) gets FOLLOW(B) == { r } – –uses rule 3
65
241-437 Compilers: syntax/4 65 FOLLOW() Example 1 S => a S e | B B => b B C f | C C => c C g | d | FIRST(C) = {c,d, } FIRST(B) = {b,c,d, } FIRST(S) = {a,b,c,d, } FOLLOW(C) = FOLLOW(C) = FOLLOW(B) = FOLLOW(B) = FOLLOW(S) = {$, e} FOLLOW(S) = {$, e} S is the start symbol continued
66
241-437 Compilers: syntax/4 66 S => a S e | B B => b B C f | C C => c C g | d | FIRST(C) = {c,d, } FIRST(B) = {b,c,d, } FIRST(S) = {a,b,c,d, } FOLLOW(C) = {f,g} follow(B) FOLLOW(C) = {f,g} follow(B) FOLLOW(B) = FIRST_SEQ(C f) -{ } FOLLOW(S) = {c, d, f, $, e} FOLLOW(B) = FIRST_SEQ(C f) -{ } FOLLOW(S) = {c, d, f, $, e} FOLLOW(S) = {$,e} FOLLOW(S) = {$,e} continued
67
241-437 Compilers: syntax/4 67 S => a S e | B B => b B C f | C C => c C g | d | FIRST(C) = {c,d, } FIRST(B) = {b,c,d, } FIRST(S) = {a,b,c,d, } FOLLOW(C) = {f,g,c,d,$,e} FOLLOW(C) = {f,g,c,d,$,e} FOLLOW(B) = {c, d, f, $, e} FOLLOW(B) = {c, d, f, $, e} FOLLOW(S) = {$,e} FOLLOW(S) = {$,e}
68
241-437 Compilers: syntax/4 68 FOLLOW() Example 2 S => ( A ) | A => T E E => & T E | T => ( A ) | a | b | c FIRST(T) = {(,a,b,c} FIRST(E) = {&, } FIRST(A) = {(,a,b,c} FIRST(S) = {(, } FOLLOW(S) = {$} FOLLOW(S) = {$} FOLLOW(A) = {)} FOLLOW(A) = {)} FOLLOW(E) = FOLLOW(E) = FOLLOW(T) = FOLLOW(T) = continued
69
241-437 Compilers: syntax/4 69 S => ( A ) | A => T E E => & T E | T => ( A ) | a | b | c FIRST(T) = {(,a,b,c} FIRST(E) = {&, } FIRST(A) = {(,a,b,c} FIRST(S) = {(, } FOLLOW(S) = { $ } FOLLOW(S) = { $ } FOLLOW(A) = { ) } FOLLOW(A) = { ) } FOLLOW(E) = FOLLOW(E) = FOLLOW(A) FOLLOW(E) = { ) } FOLLOW(A) FOLLOW(E) = { ) } FOLLOW(T) = FOLLOW(T) = (FIRST_SEQ(E) – { }) FOLLOW(A) FOLLOW(E) = {&, )} (FIRST_SEQ(E) – { }) FOLLOW(A) FOLLOW(E) = {&, )}
70
241-437 Compilers: syntax/4 70 FOLLOW() Example 3 S => T E1 E1 => + T E1 | T => F T1 T1 => * F T1 | F => ( S ) | id FIRST(F) = FIRST(T) = FIRST(S) = {(,id} FIRST(T1) = {*, } FIRST(E1) = {+, } FOLLOW(S) = {$,)} FOLLOW(S) = {$,)} FOLLOW(E1) = FOLLOW(E1) = FOLLOW(T) = FOLLOW(T) = FOLLOW(T1) = FOLLOW(T1) = FOLLOW(F) = FOLLOW(F) = continued
71
241-437 Compilers: syntax/4 71 S => T E1 E1 => + T E1 | T => F T1 T1 => * F T1 | F => ( S ) | id FIRST(F) = FIRST(T) = FIRST(S) = {(,id} FIRST(T1) = {*, } FIRST(E1) = {+, } FOLLOW(S) = {$,)} FOLLOW(S) = {$,)} FOLLOW(E1) = FOLLOW(S) Follow(E1) = {$,)} FOLLOW(E1) = FOLLOW(S) Follow(E1) = {$,)} FOLLOW(T) = FIRST(E1) FOLLOW(S) FOLLOW(E1) = {+,$,)} FOLLOW(T) = FIRST(E1) FOLLOW(S) FOLLOW(E1) = {+,$,)} FOLLOW(T1) = FOLLOW(T) = {+,$,)} FOLLOW(T1) = FOLLOW(T) = {+,$,)} FOLLOW(F) = FIRST(T1) FOLLOW(T) FOLLOW(T1) = {*,+,$,)} FOLLOW(F) = FIRST(T1) FOLLOW(T) FOLLOW(T1) = {*,+,$,)}
72
241-437 Compilers: syntax/4 72 FOLLOW() Example 4 S => A B C | A D A => a | a A B => b | c | C => D a C D => b b | c c FIRST(D) = FIRST(C) = {b,c} FIRST(B) = {b,c FIRST(A) = FIRST(S) = {a} FOLLOW(S) = {$} FOLLOW(S) = {$} FOLLOW(D) = {a,$} FOLLOW(D) = {a,$} FOLLOW(A) = FOLLOW(A) = FOLLOW(B) = FOLLOW(B) = FOLLOW(C) = FOLLOW(C) = continued
73
241-437 Compilers: syntax/4 73 S => A B C | A D A => a | a A B => b | c | C => D a C D => b b | c c FIRST(D) = FIRST(C) = {b,c} FIRST(B) = {b,c FIRST(A) = FIRST(S) = {a} FOLLOW(S) = {$} FOLLOW(S) = {$} FOLLOW(D) = {a,$} FOLLOW(D) = {a,$} FOLLOW(A) = {b,c} FOLLOW(A) = {b,c} FOLLOW(B) = {b,c} FOLLOW(B) = {b,c} FOLLOW(C) = {$} FOLLOW(C) = {$}
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.