Chapter 2 (part) + Chapter 4: Syntax Analysis S. M. Farhad 1.

Slides:



Advertisements
Similar presentations
Compiler Construction
Advertisements

Mooly Sagiv and Roman Manevich School of Computer Science
LESSON 18.
6/12/2015Prof. Hilfinger CS164 Lecture 111 Bottom-Up Parsing Lecture (From slides by G. Necula & R. Bodik)
Top-Down Parsing.
By Neng-Fa Zhou Syntax Analysis lexical analyzer syntax analyzer semantic analyzer source program tokens parse tree parser tree.
ISBN Chapter 4 Lexical and Syntax Analysis The Parsing Problem Recursive-Descent Parsing.
Prof. Bodik CS 164 Lecture 61 Building a Parser II CS164 3:30-5:00 TT 10 Evans.
1 Predictive parsing Recall the main idea of top-down parsing: Start at the root, grow towards leaves Pick a production and try to match input May need.
Parsing — Part II (Ambiguity, Top-down parsing, Left-recursion Removal)
Prof. Fateman CS 164 Lecture 91 Bottom-Up Parsing Lecture 9.
1 The Parser Its job: –Check and verify syntax based on specified syntax rules –Report errors –Build IR Good news –the process can be automated.
1 Chapter 4: Top-Down Parsing. 2 Objectives of Top-Down Parsing an attempt to find a leftmost derivation for an input string. an attempt to construct.
Professor Yihjia Tsai Tamkang University
LR(1) Languages An Introduction Professor Yihjia Tsai Tamkang University.
Top-Down Parsing.
Chapter 3 Chang Chi-Chung Parse tree intermediate representation The Role of the Parser Lexical Analyzer Parser Source Program Token Symbol.
– 1 – CSCE 531 Spring 2006 Lecture 7 Predictive Parsing Topics Review Top Down Parsing First Follow LL (1) Table construction Readings: 4.4 Homework: Program.
CPSC 388 – Compiler Design and Construction
CSC3315 (Spring 2009)1 CSC 3315 Lexical and Syntax Analysis Hamid Harroud School of Science and Engineering, Akhawayn University
Syntax and Semantics Structure of programming languages.
Parsing Chapter 4 Parsing2 Outline Top-down v.s. Bottom-up Top-down parsing Recursive-descent parsing LL(1) parsing LL(1) parsing algorithm First.
Chapter 9 Syntax Analysis Winter 2007 SEG2101 Chapter 9.
Top-Down Parsing - recursive descent - predictive parsing
4 4 (c) parsing. Parsing A grammar describes the strings of tokens that are syntactically legal in a PL A recogniser simply accepts or rejects strings.
Chapter 5 Top-Down Parsing.
BİL 744 Derleyici Gerçekleştirimi (Compiler Design)1 Syntax Analyzer Syntax Analyzer creates the syntactic structure of the given source program. This.
4 4 (c) parsing. Parsing A grammar describes syntactically legal strings in a language A recogniser simply accepts or rejects strings A generator produces.
10/13/2015IT 3271 Tow kinds of predictive parsers: Bottom-Up: The syntax tree is built up from the leaves Example: LR(1) parser Top-Down The syntax tree.
Parsing Jaruloj Chongstitvatana Department of Mathematics and Computer Science Chulalongkorn University.
Profs. Necula CS 164 Lecture Top-Down Parsing ICOM 4036 Lecture 5.
1 Compiler Construction Syntax Analysis Top-down parsing.
Lesson 5 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Syntax and Semantics Structure of programming languages.
Joey Paquet, 2000, Lecture 5 Error Recovery Techniques in Top-Down Predictive Syntactic Analysis.
4 4 (c) parsing. Parsing A grammar describes syntactically legal strings in a language A recogniser simply accepts or rejects strings A generator produces.
UNIT - 2 -Compiled by: Namratha Nayak | Website for Students | VTU - Notes - Question Papers.
Top-down Parsing Recursive Descent & LL(1) Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412.
Top-Down Parsing CS 671 January 29, CS 671 – Spring Where Are We? Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract.
Overview of Previous Lesson(s) Over View  In our compiler model, the parser obtains a string of tokens from the lexical analyzer & verifies that the.
1 Nonrecursive Predictive Parsing  It is possible to build a nonrecursive predictive parser  This is done by maintaining an explicit stack.
Top-Down Parsing.
Syntax Analyzer (Parser)
1 Pertemuan 7 & 8 Syntax Analysis (Parsing) Matakuliah: T0174 / Teknik Kompilasi Tahun: 2005 Versi: 1/6.
Top-Down Predictive Parsing We will look at two different ways to implement a non- backtracking top-down parser called a predictive parser. A predictive.
Parsing methods: –Top-down parsing –Bottom-up parsing –Universal.
1 Topic #4: Syntactic Analysis (Parsing) CSC 338 – Compiler Design and implementation Dr. Mohamed Ben Othman ( )
Joey Paquet, 2000, 2002, 2008, 2012, Lecture 5 Error Recovery Techniques in Top-Down Predictive Syntactic Analysis.
UMBC  CSEE   1 Chapter 4 Chapter 4 (b) parsing.
COMP 3438 – Part II-Lecture 6 Syntax Analysis III Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
Spring 16 CSCI 4430, A Milanova 1 Announcements HW1 due on Monday February 8 th Name and date your submission Submit electronically in Homework Server.
Syntax Analysis Or Parsing. A.K.A. Syntax Analysis –Recognize sentences in a language. –Discover the structure of a document/program. –Construct (implicitly.
Syntax Analysis By Noor Dhia Syntax analysis:- Syntax analysis or parsing is the most important phase of a compiler. The syntax analyzer considers.
Syntax and Semantics Structure of programming languages.
Parsing COMP 3002 School of Computer Science. 2 The Structure of a Compiler syntactic analyzer code generator program text interm. rep. machine code tokenizer.
Programming Languages Translator
CS510 Compiler Lecture 4.
Top-down parsing cannot be performed on left recursive grammars.
Syntax Analysis Chapter 4.
4 (c) parsing.
Lecture 7 Predictive Parsing
Top-Down Parsing The parse tree is created top to bottom.
Chapter 4 Top Down Parser.
Ambiguity, Precedence, Associativity & Top-Down Parsing
LL and Recursive-Descent Parsing
Computing Follow(A) : All Non-Terminals
Lecture 7 Predictive Parsing
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Predictive Parsing Program
Parsing CSCI 432 Computer Science Theory
Presentation transcript:

Chapter 2 (part) + Chapter 4: Syntax Analysis S. M. Farhad 1

Chapter 2 (part) + Chapter 4: Syntax Analysis S. M. Farhad 2

Grammars Specify the syntax of a language  Hierarchical structure Java if-else statement if ( expr ) stmt else stmt A production rule for if-else statement stmt  if ( expr ) stmt else stmt Terminals and nonterminals 3

Context Free Grammars The notation to specify syntax  Context Free Grammar (CFG)  Backus-Naur Form (BNF) A context-free grammar  Analyze the syntax  Also used to translate the programs Context free grammar  Grammar 4

Components of Grammars A set of terminal symbols  For example: token, +, -, keywords A set of nonterminals  Sets of strings help define the language  Nonterminals impose a hierarchical structure  For example expr, stmt as follows: stmt  if ( expr ) stmt else stmt 5

Components of Grammars A set of productions  The head or left side Consists of a nonterminal  An arrow means can have the form  Body or right side A sequence of terminals and nonterminals Start symbol  A special nonterminal symbol  The productions for the start symbol are listed first 6

Example The arithmetic expression consisting of + or – E  E + E | E – E | E*E | (E) | int int  0|1|2|3|4|5|6|7|8|9 7

Derivations Beginning with the start symbol Each rewriting step replaces a nonterminal by the body of one of its productions Left most derivation  Leftmost nonterminal is always chosen  LL grammar (parses from left to right, left most) Rightmost derivation  Rightmost nonterminal is always chosen  LR grammar (parses from left to right, right most) 8

Left Most Derivation Given E  E + E | E – E | E*E | (E) | int String int * int + int E => E + E => E*E + E => int *E + E => int * int + E => int * int + int 9

Right Most Derivation String int * int + int

Parse Tree 11 E EE + EE * int String int * int + int

Ambiguity Grammar that produces more than one parse tree for some sentence E EE + EE * int E E E * EE + For string: int * int + int

Reasons for Ambiguity Associativity and Precedence +, -, *, / are left associate *, / have higher precedence than +, - Use E and T for two levels of precedence Use F for basic units of expression

Non Ambiguous F  int | (E) T  T * F | T / F | F E  E + T | E – T | T String: int * (int + int)

Ambiguity: The Dangling Else Consider the grammar S → if E then S | if E then S else S | other This grammar is also ambiguous

Ambiguity: The Dangling Else The expression if E1 then if E2 then S1 else S2 has two parse trees 16 S ifE1thenS ifE2thenS1 elseS2 S ifE1thenS ifE2thenS1 elseS2 Typically we want the second form

The Dangling Else: A Fix else matches the closest unmatched then We can describe this in the grammar S → MS /* all then are matched */ | US /* some then are unmatched */ MS → if E then MS else MS | other US → if E then S | if E then MS else US 17

The Dangling Else: The Parse Tree 18 US ifE1thenS ifE2thenS1 elseS2 S MS The expression if E1 then if E2 then S1 else S2

CFG vs RE Grammars are more powerful notation than RE For RE: (a l b)*abb A 0  aA 0 | bA 0 | aA 1 A 1  bA 2 A 2  bA 3 A 3  Ɛ

Why us RE in Lexical Analysis Two manageable-sized components More Simple More Concise Construction of Lexical Analyzer becomes easier and efficient 20

RE vs CFG REs are most useful for  Identifiers, constants, keywords, and white space Grammars are most useful for describing nested structure  B alanced parentheses, matching begin-end's, corresponding if-then-else Nested structure cannot be described by RE 21

Parsing Top down parsing:  Starts at the root and proceeds towards the leave  Easier to understand and program manually Bottom up parsing  Starts at the leaves and proceeds towards the root  more powerful, used by most parser generators 22

Recursive Descent Parsing Consider the grammar E → T + E | T T → int | int * T | ( E ) Token stream is: int * int Start with top-level non-terminal E Try the rules for E in order 23

Recursive Descent Parsing - Example Try E → T + E Then try a rule for T → ( E ) But ( does not match input token int Try T → int - Token matches. But + after T does not match input token * Try T → int * T This will match but + after T will be unmatched Has exhausted the choices for T  Backtrack to choice for E 24

Recursive Descent Parsing - Example Token stream is: int * int Try E → T Follow same steps as before for T And succeed with T → int * T and T → int With the following parse tree 25 E T intE *

When Recursive Descent Does Not Work Consider the left-recursive grammar S → S α | β S is called itself without consuming any symbol  Gets into an infinite loop Recursive descent does not work in such cases 26

Elimination of Left Recursion Consider the left-recursive grammar S → S α | β S generates all strings starting with a β and followed by a number of α Can rewrite using right-recursion S → β S’ S’ → α S’ | ε 27

More Elimination of Left- Recursion In general S → S α1 | … | S αn | β1 | … | βm All strings derived from S start with one of β1,…,βm and continue with several instances of α1,…,αn Rewrite as S → β1 S’ | … | βm S’ S’ → α1 S’ | … | αn S’ | ε 28

General Left Recursion The grammar S → A α | δ A → S β is also left-recursive because S → S β α This left-recursion can also be eliminated See book, Section 4.3 for general algorithm 29

Summary of Recursive Descent Simple and general parsing strategy  Left-recursion must be eliminated first  … but that can be done automatically Unpopular because of backtracking  Thought to be too inefficient In practice, backtracking is eliminated by restricting the grammar 30

Predictive Parsers Like recursive-descent but parser can “predict” which production to use  By looking at the next few tokens  No backtracking Predictive parsers accept LL(k) grammars  L means “left-to-right” scan of input  L means “leftmost derivation”  k means “predict based on k tokens of lookahead” In practice, LL(1) is used 31

LL(1) Languages In recursive-descent, for each non-terminal and input token, may be a choice of production LL(1) means that for each non-terminal and token there is only one production Can be specified via 2D tables  One dimension for current non-terminal to expand  One dimension for next token  A table entry contains one production 32

Predictive Parsing and Left Factoring Recall the grammar E → T + E | T T → int | int * T | ( E ) Hard to predict because  For T two productions start with int  For E it is not clear how to predict A grammar must be left-factored before use for predictive parsing 33

Left-Factoring Example Recall the grammar E → T + E | T T → int | int * T | ( E ) Factor out common prefixes of productions E → T X X → + E | ε T → ( E ) | int Y Y → * T | ε 34

Left-Factoring Example Left-factored grammar E → T X X → + E | ε T → ( E ) | int Y Y → * T | ε Token stream is: int * int 35 E T int Y * x T Y ε ε

LL(1) Parsing Table Example Left-factored grammar E → T X X → + E | ε T → ( E ) | int Y Y → * T | ε LL(1) parsing table: 36 int*+()$ ET X X+ Eεε Tint Y( E ) Y* Tεεε

LL(1) Parsing Table Example Consider the [E, int] entry  “When current non-terminal is E and next input is int, use production E → T X”  This production can generate a int in the first place Consider the [Y,+] entry  “When current non-terminal is Y and current token is +, get rid of Y”  Y can be followed by + only in a derivation in which Y → ε 37

LL(1) Parsing Tables - Errors Blank entries indicate error situations  Consider the [E,*] entry  “There is no way to derive a string starting with * from non-terminal E” 38

Using Parsing Tables Method similar to recursive descent, except  For each non-terminal S  We look at the next token a  And chose the production shown at [S, a] We use a stack to keep track of pending nonterminals We reject when we encounter an error state We accept when we encounter end-of-input 39

LL(1) Parsing Algorithm initialize stack = and next repeat case stack of : if T[X,*next] = Y1…Yn then stack ← ; else error (); : if t == *next ++ then stack ← ; else error (); until stack == 40

LL(1) Parsing Example Stack Input Action E $ int* int $ T X T X $ int *int $ int Y int Y X $ int *int $ terminal Y X $ * int $ * T * T X $ * int $terminal T X $ int $ int Y int Y X $ int $ terminal Y X $ $ ε X $ $ ε $ $ ACCEPT 41

Constructing Parsing Tables LL(1) languages are those defined by a parsing table for the LL(1) algorithm No table entry can be multiply defined We want to generate parsing tables from CFG 42

Constructing Parsing Tables If A → α, where in the line of A we place α ? In the column of t where t can start a string derived from α  α =>* t β  We say that t ∈ First(α) In column of t if α is ε and t can follow an A  S =>* β A t δ  We say t ∈ Follow(A) 43

Computing First Sets Definition: First(X) = { t | X =>* tα} ∪ {ε | X =>* ε} Algorithm sketch (see book for details): 1. For all terminals t do First(t) ← { t } 2. If X → A 1 … A k  If a ∈ First(A 1 ), add a to First(X)  Everything in First(A 1 ) is in First(X)  If A 1 does not drive ε stop  If A 1 =>* ε then we add First(A 2 ), and so on 3. For each production X → ε, add ε in First(X) 44

First Sets - Example Recall the grammar E → T X X → + E | ε T → ( E ) | int YY → * T | ε First sets First( ( ) = { ( } First( T ) = {int, ( } First( ) ) = { ) } First( E ) = {int, ( } First(int) = {int } First( X ) = {+, ε } First( + ) = { + } First( Y ) = {*, ε } First( * ) = { * } 45

Computing Follow Sets Definition: Follow(B) = { t | S =>* β B t δ } If S is the start symbol then $ ∈ Follow(S) If A → α B β then First(β) - ε is in Follow(B) If A → α B or A → α B β and ε ∈ First(β)  Follow(A) is in Follow(B) 46

Follow Sets. Example Recall the grammar E → T X X → + E | ε T → ( E ) | int Y Y → * T | ε Follow sets Follow( + ) = {int, ( } Follow( E ) = {), $} Follow( ( ) = {int, ( } Follow( X ) = {), $} Follow( * ) = {int, ( }Follow( T ) = {+, ), $} Follow( ) ) = {+, ), $} Follow( Y ) = {+, ), $} Follow(int) = {*, +, ), $} 47

Constructing LL(1) Parsing Tables Construct a parsing table T for CFG, G For each production A → α in G do:  For each terminal t ∈ First(α) do T[A, t] = α  If ε ∈ First(α), for each t ∈ Follow(A) do T[A, t] = α  If ε ∈ First(α) and $ ∈ Follow(A) do T[A, $] = α 48

Constructing LL(1) Parsing Tables Grammar E → T X X → + E | ε T → ( E ) | int Y Y → * T | ε 49 int*+()$ ET X X+ Eεε Tint Y( E ) Y* Tεεε First Sets First( T ) = {int, ( } First( E ) = {int, ( } First( X ) = {+, ε } First( Y ) = {*, ε } Follow Sets Follow( X ) = {), $} Follow( E ) = {), $} Follow( T ) = {+, ), $} Follow( Y ) = {+, ), $}

LL(1) Parsing Example Stack Input Action E $ int* int $ T X T X $ int *int $ int Y int Y X $ int *int $ terminal Y X $ * int $ * T * T X $ * int $terminal T X $ int $ int Y int Y X $ int $ terminal Y X $ $ ε X $ $ ε $ $ ACCEPT 50

Predictive Parsing for Dangling Else Grammar Dangling else grammar S → i E t S | i E t S e S | a E → b Left factoring S → i E t S S’ | a S’ → e S | ε E → b 51

Predictive Parsing for Dangling Else Grammar S → i E t S S’ | a S’ → e S | ε E → b 52 abeiT$ SS→aS→iEtSS’ S’S’→eS S’→ε EE→b First(S) = {i, a} First(E) = {b} First(S’) = {e, ε } Follow(S) = {e, $} Follow(S’) = {e, $} Follow(E) = {t}

Error Handling in Syntax Analysis Goals  Report the presence of errors clearly and accurately  Recover from each error quickly To detect subsequent errors  Add minimal overhead to the processing of correct programs 53

Error Recovery Strategies Panic-Mode Recovery  Discards input symbols one at a time  Synchronizing tokens is used Follow set, keyword, etc Phrase-Level Recovery  Perform local correction on the remaining inputs  Replace a comma by a semicolon, delete an extraneous semicolon  For the empty cells of the parsing table implement the error correcting routines 54

Error Recovery Strategies Error Productions  Augment the grammar for erroneous inputs Global Correction  Make as few changes as possible in processing an incorrect input string Read section

Error Recovery id+*()$ E E' T T' F E → TE' T → FT' F → id E → +TE1 synch T' + ε synch T' →* FT' synch E → TE' T → FT' F → (E) E → ε synch T‘→ ε synch E → ε synch T→ ε synch 56 Table entry [A, a] is empty input a is skipped If the entry is synch then the stack top is popped If the stack top terminal does not match input then stack top is popped

Error Recovery: Panic Mode StackInputRemark E $ TE'$ FT'E' $ id T'E'$ TIE' $ * FT'E' $ FT'E' $ TIE' $ E' $ + TE' $ TE' $ FT'E' $ id T'E' $ T'E' $ E' $ $ ) id * + id $ id * + id $ * + i d$ + id $ id $ $ error, skip ) id is in FIRST(E) error, M [F, +] = synch F has been popped 57

Bottom-up Parsing Bottom-up parsing is more general than top- down parsing  Efficient although difficult by hand  Similar ideas of top-down parsing Bottom-up is the preferred method in practice Reading: Section

Bottom-up Parsing Bottom-up parsers don’t need left factored grammars Hence we can revert to the “natural” grammar for our example: E → T + E | T T → int * T | int | (E) Consider the string: int * int + int 59

Bottom-up Parsing Bottom-up parsing reduces a string to the start symbol by inverting productions: int * int + int T → int int * T + int T → int * T T + int T → int T + T E → T T + E E → T + E E 60

Observation Read productions from bottom-up parse in reverse (i.e., from bottom to top) This is a rightmost derivation! int * int + int T → int int * T + int T → int * T T + int T → int T + T E → T T + E E → T + E E 61

Trivial Bottom-Up Parsing Algorithm Let I = input string repeat pick a non-empty substring β of I where X→ β is a production if no such β, backtrack replace one β by X in I until I = “S” (the start symbol) or all possibilities are exhausted 62