Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 2 (part) + Chapter 4: Syntax Analysis S. M. Farhad 1.

Similar presentations


Presentation on theme: "Chapter 2 (part) + Chapter 4: Syntax Analysis S. M. Farhad 1."— Presentation transcript:

1 Chapter 2 (part) + Chapter 4: Syntax Analysis S. M. Farhad 1

2 Chapter 2 (part) + Chapter 4: Syntax Analysis S. M. Farhad 2

3 Grammars Specify the syntax of a language  Hierarchical structure Java if-else statement if ( expr ) stmt else stmt A production rule for if-else statement stmt  if ( expr ) stmt else stmt Terminals and nonterminals 3

4 Context Free Grammars The notation to specify syntax  Context Free Grammar (CFG)  Backus-Naur Form (BNF) A context-free grammar  Analyze the syntax  Also used to translate the programs Context free grammar  Grammar 4

5 Components of Grammars A set of terminal symbols  For example: token, +, -, keywords A set of nonterminals  Sets of strings help define the language  Nonterminals impose a hierarchical structure  For example expr, stmt as follows: stmt  if ( expr ) stmt else stmt 5

6 Components of Grammars A set of productions  The head or left side Consists of a nonterminal  An arrow means can have the form  Body or right side A sequence of terminals and nonterminals Start symbol  A special nonterminal symbol  The productions for the start symbol are listed first 6

7 Example The arithmetic expression consisting of + or – E  E + E | E – E | E*E | (E) | int int  0|1|2|3|4|5|6|7|8|9 7

8 Derivations Beginning with the start symbol Each rewriting step replaces a nonterminal by the body of one of its productions Left most derivation  Leftmost nonterminal is always chosen  LL grammar (parses from left to right, left most) Rightmost derivation  Rightmost nonterminal is always chosen  LR grammar (parses from left to right, right most) 8

9 Left Most Derivation Given E  E + E | E – E | E*E | (E) | int String int * int + int E => E + E => E*E + E => int *E + E => int * int + E => int * int + int 9

10 Right Most Derivation String int * int + int

11 Parse Tree 11 E EE + EE * int String int * int + int

12 Ambiguity Grammar that produces more than one parse tree for some sentence E EE + EE * int E E E * EE + For string: int * int + int

13 Reasons for Ambiguity Associativity and Precedence +, -, *, / are left associate *, / have higher precedence than +, - Use E and T for two levels of precedence Use F for basic units of expression

14 Non Ambiguous F  int | (E) T  T * F | T / F | F E  E + T | E – T | T String: int * (int + int)

15 Ambiguity: The Dangling Else Consider the grammar S → if E then S | if E then S else S | other This grammar is also ambiguous

16 Ambiguity: The Dangling Else The expression if E1 then if E2 then S1 else S2 has two parse trees 16 S ifE1thenS ifE2thenS1 elseS2 S ifE1thenS ifE2thenS1 elseS2 Typically we want the second form

17 The Dangling Else: A Fix else matches the closest unmatched then We can describe this in the grammar S → MS /* all then are matched */ | US /* some then are unmatched */ MS → if E then MS else MS | other US → if E then S | if E then MS else US 17

18 The Dangling Else: The Parse Tree 18 US ifE1thenS ifE2thenS1 elseS2 S MS The expression if E1 then if E2 then S1 else S2

19 CFG vs RE Grammars are more powerful notation than RE For RE: (a l b)*abb A 0  aA 0 | bA 0 | aA 1 A 1  bA 2 A 2  bA 3 A 3  Ɛ

20 Why us RE in Lexical Analysis Two manageable-sized components More Simple More Concise Construction of Lexical Analyzer becomes easier and efficient 20

21 RE vs CFG REs are most useful for  Identifiers, constants, keywords, and white space Grammars are most useful for describing nested structure  B alanced parentheses, matching begin-end's, corresponding if-then-else Nested structure cannot be described by RE 21

22 Parsing Top down parsing:  Starts at the root and proceeds towards the leave  Easier to understand and program manually Bottom up parsing  Starts at the leaves and proceeds towards the root  more powerful, used by most parser generators 22

23 Recursive Descent Parsing Consider the grammar E → T + E | T T → int | int * T | ( E ) Token stream is: int * int Start with top-level non-terminal E Try the rules for E in order 23

24 Recursive Descent Parsing - Example Try E → T + E Then try a rule for T → ( E ) But ( does not match input token int Try T → int - Token matches. But + after T does not match input token * Try T → int * T This will match but + after T will be unmatched Has exhausted the choices for T  Backtrack to choice for E 24

25 Recursive Descent Parsing - Example Token stream is: int * int Try E → T Follow same steps as before for T And succeed with T → int * T and T → int With the following parse tree 25 E T intE *

26 When Recursive Descent Does Not Work Consider the left-recursive grammar S → S α | β S is called itself without consuming any symbol  Gets into an infinite loop Recursive descent does not work in such cases 26

27 Elimination of Left Recursion Consider the left-recursive grammar S → S α | β S generates all strings starting with a β and followed by a number of α Can rewrite using right-recursion S → β S’ S’ → α S’ | ε 27

28 More Elimination of Left- Recursion In general S → S α1 | … | S αn | β1 | … | βm All strings derived from S start with one of β1,…,βm and continue with several instances of α1,…,αn Rewrite as S → β1 S’ | … | βm S’ S’ → α1 S’ | … | αn S’ | ε 28

29 General Left Recursion The grammar S → A α | δ A → S β is also left-recursive because S → S β α This left-recursion can also be eliminated See book, Section 4.3 for general algorithm 29

30 Summary of Recursive Descent Simple and general parsing strategy  Left-recursion must be eliminated first  … but that can be done automatically Unpopular because of backtracking  Thought to be too inefficient In practice, backtracking is eliminated by restricting the grammar 30

31 Predictive Parsers Like recursive-descent but parser can “predict” which production to use  By looking at the next few tokens  No backtracking Predictive parsers accept LL(k) grammars  L means “left-to-right” scan of input  L means “leftmost derivation”  k means “predict based on k tokens of lookahead” In practice, LL(1) is used 31

32 LL(1) Languages In recursive-descent, for each non-terminal and input token, may be a choice of production LL(1) means that for each non-terminal and token there is only one production Can be specified via 2D tables  One dimension for current non-terminal to expand  One dimension for next token  A table entry contains one production 32

33 Predictive Parsing and Left Factoring Recall the grammar E → T + E | T T → int | int * T | ( E ) Hard to predict because  For T two productions start with int  For E it is not clear how to predict A grammar must be left-factored before use for predictive parsing 33

34 Left-Factoring Example Recall the grammar E → T + E | T T → int | int * T | ( E ) Factor out common prefixes of productions E → T X X → + E | ε T → ( E ) | int Y Y → * T | ε 34

35 Left-Factoring Example Left-factored grammar E → T X X → + E | ε T → ( E ) | int Y Y → * T | ε Token stream is: int * int 35 E T int Y * x T Y ε ε

36 LL(1) Parsing Table Example Left-factored grammar E → T X X → + E | ε T → ( E ) | int Y Y → * T | ε LL(1) parsing table: 36 int*+()$ ET X X+ Eεε Tint Y( E ) Y* Tεεε

37 LL(1) Parsing Table Example Consider the [E, int] entry  “When current non-terminal is E and next input is int, use production E → T X”  This production can generate a int in the first place Consider the [Y,+] entry  “When current non-terminal is Y and current token is +, get rid of Y”  Y can be followed by + only in a derivation in which Y → ε 37

38 LL(1) Parsing Tables - Errors Blank entries indicate error situations  Consider the [E,*] entry  “There is no way to derive a string starting with * from non-terminal E” 38

39 Using Parsing Tables Method similar to recursive descent, except  For each non-terminal S  We look at the next token a  And chose the production shown at [S, a] We use a stack to keep track of pending nonterminals We reject when we encounter an error state We accept when we encounter end-of-input 39

40 LL(1) Parsing Algorithm initialize stack = and next repeat case stack of : if T[X,*next] = Y1…Yn then stack ← ; else error (); : if t == *next ++ then stack ← ; else error (); until stack == 40

41 LL(1) Parsing Example Stack Input Action E $ int* int $ T X T X $ int *int $ int Y int Y X $ int *int $ terminal Y X $ * int $ * T * T X $ * int $terminal T X $ int $ int Y int Y X $ int $ terminal Y X $ $ ε X $ $ ε $ $ ACCEPT 41

42 Constructing Parsing Tables LL(1) languages are those defined by a parsing table for the LL(1) algorithm No table entry can be multiply defined We want to generate parsing tables from CFG 42

43 Constructing Parsing Tables If A → α, where in the line of A we place α ? In the column of t where t can start a string derived from α  α =>* t β  We say that t ∈ First(α) In column of t if α is ε and t can follow an A  S =>* β A t δ  We say t ∈ Follow(A) 43

44 Computing First Sets Definition: First(X) = { t | X =>* tα} ∪ {ε | X =>* ε} Algorithm sketch (see book for details): 1. For all terminals t do First(t) ← { t } 2. If X → A 1 … A k  If a ∈ First(A 1 ), add a to First(X)  Everything in First(A 1 ) is in First(X)  If A 1 does not drive ε stop  If A 1 =>* ε then we add First(A 2 ), and so on 3. For each production X → ε, add ε in First(X) 44

45 First Sets - Example Recall the grammar E → T X X → + E | ε T → ( E ) | int YY → * T | ε First sets First( ( ) = { ( } First( T ) = {int, ( } First( ) ) = { ) } First( E ) = {int, ( } First(int) = {int } First( X ) = {+, ε } First( + ) = { + } First( Y ) = {*, ε } First( * ) = { * } 45

46 Computing Follow Sets Definition: Follow(B) = { t | S =>* β B t δ } If S is the start symbol then $ ∈ Follow(S) If A → α B β then First(β) - ε is in Follow(B) If A → α B or A → α B β and ε ∈ First(β)  Follow(A) is in Follow(B) 46

47 Follow Sets. Example Recall the grammar E → T X X → + E | ε T → ( E ) | int Y Y → * T | ε Follow sets Follow( + ) = {int, ( } Follow( E ) = {), $} Follow( ( ) = {int, ( } Follow( X ) = {), $} Follow( * ) = {int, ( }Follow( T ) = {+, ), $} Follow( ) ) = {+, ), $} Follow( Y ) = {+, ), $} Follow(int) = {*, +, ), $} 47

48 Constructing LL(1) Parsing Tables Construct a parsing table T for CFG, G For each production A → α in G do:  For each terminal t ∈ First(α) do T[A, t] = α  If ε ∈ First(α), for each t ∈ Follow(A) do T[A, t] = α  If ε ∈ First(α) and $ ∈ Follow(A) do T[A, $] = α 48

49 Constructing LL(1) Parsing Tables Grammar E → T X X → + E | ε T → ( E ) | int Y Y → * T | ε 49 int*+()$ ET X X+ Eεε Tint Y( E ) Y* Tεεε First Sets First( T ) = {int, ( } First( E ) = {int, ( } First( X ) = {+, ε } First( Y ) = {*, ε } Follow Sets Follow( X ) = {), $} Follow( E ) = {), $} Follow( T ) = {+, ), $} Follow( Y ) = {+, ), $}

50 LL(1) Parsing Example Stack Input Action E $ int* int $ T X T X $ int *int $ int Y int Y X $ int *int $ terminal Y X $ * int $ * T * T X $ * int $terminal T X $ int $ int Y int Y X $ int $ terminal Y X $ $ ε X $ $ ε $ $ ACCEPT 50

51 Predictive Parsing for Dangling Else Grammar Dangling else grammar S → i E t S | i E t S e S | a E → b Left factoring S → i E t S S’ | a S’ → e S | ε E → b 51

52 Predictive Parsing for Dangling Else Grammar S → i E t S S’ | a S’ → e S | ε E → b 52 abeiT$ SS→aS→iEtSS’ S’S’→eS S’→ε EE→b First(S) = {i, a} First(E) = {b} First(S’) = {e, ε } Follow(S) = {e, $} Follow(S’) = {e, $} Follow(E) = {t}

53 Error Handling in Syntax Analysis Goals  Report the presence of errors clearly and accurately  Recover from each error quickly To detect subsequent errors  Add minimal overhead to the processing of correct programs 53

54 Error Recovery Strategies Panic-Mode Recovery  Discards input symbols one at a time  Synchronizing tokens is used Follow set, keyword, etc Phrase-Level Recovery  Perform local correction on the remaining inputs  Replace a comma by a semicolon, delete an extraneous semicolon  For the empty cells of the parsing table implement the error correcting routines 54

55 Error Recovery Strategies Error Productions  Augment the grammar for erroneous inputs Global Correction  Make as few changes as possible in processing an incorrect input string Read section 4.4.5 55

56 Error Recovery id+*()$ E E' T T' F E → TE' T → FT' F → id E → +TE1 synch T' + ε synch T' →* FT' synch E → TE' T → FT' F → (E) E → ε synch T‘→ ε synch E → ε synch T→ ε synch 56 Table entry [A, a] is empty input a is skipped If the entry is synch then the stack top is popped If the stack top terminal does not match input then stack top is popped

57 Error Recovery: Panic Mode StackInputRemark E $ TE'$ FT'E' $ id T'E'$ TIE' $ * FT'E' $ FT'E' $ TIE' $ E' $ + TE' $ TE' $ FT'E' $ id T'E' $ T'E' $ E' $ $ ) id * + id $ id * + id $ * + i d$ + id $ id $ $ error, skip ) id is in FIRST(E) error, M [F, +] = synch F has been popped 57

58 Bottom-up Parsing Bottom-up parsing is more general than top- down parsing  Efficient although difficult by hand  Similar ideas of top-down parsing Bottom-up is the preferred method in practice Reading: Section 4.5 58

59 Bottom-up Parsing Bottom-up parsers don’t need left factored grammars Hence we can revert to the “natural” grammar for our example: E → T + E | T T → int * T | int | (E) Consider the string: int * int + int 59

60 Bottom-up Parsing Bottom-up parsing reduces a string to the start symbol by inverting productions: int * int + int T → int int * T + int T → int * T T + int T → int T + T E → T T + E E → T + E E 60

61 Observation Read productions from bottom-up parse in reverse (i.e., from bottom to top) This is a rightmost derivation! int * int + int T → int int * T + int T → int * T T + int T → int T + T E → T T + E E → T + E E 61

62 Trivial Bottom-Up Parsing Algorithm Let I = input string repeat pick a non-empty substring β of I where X→ β is a production if no such β, backtrack replace one β by X in I until I = “S” (the start symbol) or all possibilities are exhausted 62


Download ppt "Chapter 2 (part) + Chapter 4: Syntax Analysis S. M. Farhad 1."

Similar presentations


Ads by Google