Spring 16 CSCI 4430, A Milanova 1 Announcements HW1 due on Monday February 8 th Name and date your submission Submit electronically in Homework Server.

Slides:



Advertisements
Similar presentations
Parsing V: Bottom-up Parsing
Advertisements

Compiler Construction
Chap. 5, Top-Down Parsing J. H. Wang Mar. 29, 2011.
Top-Down Parsing.
By Neng-Fa Zhou Syntax Analysis lexical analyzer syntax analyzer semantic analyzer source program tokens parse tree parser tree.
ISBN Chapter 4 Lexical and Syntax Analysis The Parsing Problem Recursive-Descent Parsing.
1 Predictive parsing Recall the main idea of top-down parsing: Start at the root, grow towards leaves Pick a production and try to match input May need.
CS 310 – Fall 2006 Pacific University CS310 Parsing with Context Free Grammars Today’s reference: Compilers: Principles, Techniques, and Tools by: Aho,
Parsing — Part II (Ambiguity, Top-down parsing, Left-recursion Removal)
1 The Parser Its job: –Check and verify syntax based on specified syntax rules –Report errors –Build IR Good news –the process can be automated.
1 Chapter 4: Top-Down Parsing. 2 Objectives of Top-Down Parsing an attempt to find a leftmost derivation for an input string. an attempt to construct.
Professor Yihjia Tsai Tamkang University
Table-driven parsing Parsing performed by a finite state machine. Parsing algorithm is language-independent. FSM driven by table (s) generated automatically.
Top-Down Parsing.
Chapter 3 Chang Chi-Chung Parse tree intermediate representation The Role of the Parser Lexical Analyzer Parser Source Program Token Symbol.
– 1 – CSCE 531 Spring 2006 Lecture 7 Predictive Parsing Topics Review Top Down Parsing First Follow LL (1) Table construction Readings: 4.4 Homework: Program.
CPSC 388 – Compiler Design and Construction
CSC3315 (Spring 2009)1 CSC 3315 Lexical and Syntax Analysis Hamid Harroud School of Science and Engineering, Akhawayn University
CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars.
Syntax and Semantics Structure of programming languages.
Parsing Chapter 4 Parsing2 Outline Top-down v.s. Bottom-up Top-down parsing Recursive-descent parsing LL(1) parsing LL(1) parsing algorithm First.
Chapter 9 Syntax Analysis Winter 2007 SEG2101 Chapter 9.
Top-Down Parsing - recursive descent - predictive parsing
4 4 (c) parsing. Parsing A grammar describes the strings of tokens that are syntactically legal in a PL A recogniser simply accepts or rejects strings.
1 Chapter 5 LL (1) Grammars and Parsers. 2 Naming of parsing techniques The way to parse token sequence L: Leftmost R: Righmost Top-down  LL Bottom-up.
Chapter 5 Top-Down Parsing.
BİL 744 Derleyici Gerçekleştirimi (Compiler Design)1 Syntax Analyzer Syntax Analyzer creates the syntactic structure of the given source program. This.
4 4 (c) parsing. Parsing A grammar describes syntactically legal strings in a language A recogniser simply accepts or rejects strings A generator produces.
Parsing Jaruloj Chongstitvatana Department of Mathematics and Computer Science Chulalongkorn University.
-Mandakinee Singh (11CS10026).  What is parsing? ◦ Discovering the derivation of a string: If one exists. ◦ Harder than generating strings.  Two major.
Profs. Necula CS 164 Lecture Top-Down Parsing ICOM 4036 Lecture 5.
Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
1 Compiler Construction Syntax Analysis Top-down parsing.
Syntax and Semantics Structure of programming languages.
1 Lecture 5: Syntax Analysis (Section 2.2) CSCI 431 Programming Languages Fall 2002 A modification of slides developed by Felix Hernandez-Campos at UNC.
4 4 (c) parsing. Parsing A grammar describes syntactically legal strings in a language A recogniser simply accepts or rejects strings A generator produces.
Top Down Parsing - Part I Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
11 Chapter 4 Grammars and Parsing Grammar Grammars, or more precisely, context-free grammars, are the formalism for describing the structure of.
UNIT - 2 -Compiled by: Namratha Nayak | Website for Students | VTU - Notes - Question Papers.
COP4020 Programming Languages Parsing Prof. Xin Yuan.
Parsing — Part II (Top-down parsing, left-recursion removal) Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students.
Top-down Parsing lecture slides from C OMP 412 Rice University Houston, Texas, Fall 2001.
Top-down Parsing Recursive Descent & LL(1) Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412.
Top-Down Parsing CS 671 January 29, CS 671 – Spring Where Are We? Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract.
1 Context free grammars  Terminals  Nonterminals  Start symbol  productions E --> E + T E --> E – T E --> T T --> T * F T --> T / F T --> F F --> (F)
1 Nonrecursive Predictive Parsing  It is possible to build a nonrecursive predictive parser  This is done by maintaining an explicit stack.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
Top-Down Parsing.
Syntax Analyzer (Parser)
Parsing methods: –Top-down parsing –Bottom-up parsing –Universal.
1 Introduction to Parsing. 2 Outline l Regular languages revisited l Parser overview Context-free grammars (CFG ’ s) l Derivations.
LECTURE 7 Lex and Intro to Parsing. LEX Last lecture, we learned a little bit about how we can take our regular expressions (which specify our valid tokens)
COMP 3438 – Part II-Lecture 5 Syntax Analysis II Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
1 Topic #4: Syntactic Analysis (Parsing) CSC 338 – Compiler Design and implementation Dr. Mohamed Ben Othman ( )
Chapter 2 (part) + Chapter 4: Syntax Analysis S. M. Farhad 1.
UMBC  CSEE   1 Chapter 4 Chapter 4 (b) parsing.
COMP 3438 – Part II-Lecture 6 Syntax Analysis III Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
Spring 16 CSCI 4430, A Milanova 1 Announcements HW1 will be out this evening Due Monday, 2/8 Submit in HW Server AND at start of class on 2/8 A review.
CMSC 330: Organization of Programming Languages Pushdown Automata Parsing.
Syntax Analysis By Noor Dhia Syntax analysis:- Syntax analysis or parsing is the most important phase of a compiler. The syntax analyzer considers.
Programming Languages Translator
Context free grammars Terminals Nonterminals Start symbol productions
Top-down parsing cannot be performed on left recursive grammars.
4 (c) parsing.
Top-Down Parsing CS 671 January 29, 2008.
Lecture 7: Introduction to Parsing (Syntax Analysis)
Ambiguity, Precedence, Associativity & Top-Down Parsing
Announcements HW2 due on Tuesday Fall 18 CSCI 4430, A Milanova.
Nonrecursive Predictive Parsing
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Presentation transcript:

Spring 16 CSCI 4430, A Milanova 1 Announcements HW1 due on Monday February 8 th Name and date your submission Submit electronically in Homework Server AND on paper in the beginning of class You may submit late on Thursday February 11 th for 50% credit No other submissions accepted questions to

Spring 16 CSCI 4430, A Milanova 2 Last Class Formal languages describe and recognize Programming language syntax Regular languages specify tokens (e.g., keywords, identifiers, numeric literals, etc.) Generated by Regular expressions Recognized by DFAs (in a compiler, the scanner) Context-free languages describe more complex constructs (e.g., expressions and statements) Generated by Context-free grammars Recognized by PDAs (in a compiler, called parser) We reviewed regular expressions, CFGs, derivation, parse, ambiguity. Scanning

Spring 16 CSCI 4430, A Milanova 3 Today’s Lecture Outline Top-down parsing vs. bottom-up parsing Top-down parsing Introduction A backtracking parser Recursive descent predictive parser Table-driven predictive parser LL(1) parsing table FIRST, FOLLOW and PREDICT sets Constructing LL(1) parsing tables

Programming Language Syntax, Parsing Read: Scott, Chapter and 2.3.2

Spring 16 CSCI 4430, A Milanova A Simple Calculator Language Scanner Parser Character stream: position = initial + rate * time; Parse tree: Token stream: id = id + id * id ; 5 asst_stmt  id = expr ; // asst_stmt is the start symbol expr  expr + expr | expr * expr | id id = expr + id expr * id asst_stmt (Parse tree simplified to fit on slide.) ;

Spring 16 CSCI 4430, A Milanova A Simple Calculator Language Scanner Parser Character stream: position + initial = rate * time; Parse tree: Token stream: id + id = id * id ; 6 asst_stmt  id = expr ; // asst_stmt is the start symbol expr  expr + expr | expr * expr | id Token stream is ill-formed according to our grammar, parse tree construction fails, therefore Syntax error! Most compiler errors occur in the parser.

Spring 16 CSCI 4430, A Milanova 7 Parsing Objective: build a parse tree for an input string of tokens, from a single scan of input! Only special subclasses of context-free grammars can do this Two approaches Top-down: builds parse tree from the root to the leaves Bottom-up: builds parse tree from the leaves to the top Both are easily automated

Grammar for Comma-separated Lists list  id list_tail // list is the start symbol list_tail , id list_tail | ; Generates comma-separated lists of id ’s. E.g., id; id, id, id; For example: list  id list_tail  id, id list_tail  id, id ; Spring 16 CSCI 4430, A Milanova 8

9 Top-down Parsing Terminals are seen in the order of appearance in the token stream id, id, id ; The parse tree is constructed From the top to the leaves Corresponds to a left-most derivation Look at left-most nonterminal in current sentential form, and lookahead terminal and “predict”, which production to apply list  id list_tail list_tail , id list_tail | ; list id list_tail ;, id, list_tail id

Spring 16 CSCI 4430, A Milanova 10 Bottom-up Parsing Terminals are seen in the order of appearance in the token stream id, id, id ; The parse tree is constructed From the leaves to the top A right-most derivation in reverse list  id list_tail list_tail , id list_tail | ; list id list_tail ;, id, list_tail id

Spring 16 CSCI 4430, A Milanova 11 Top-down Predictive Parsing “Predicts” production to apply based on one or more lookahead token(s) Predictive parsers work with LL(k) grammars First L stands for “left-to-right” scan of input Second L stands for “left-most” derivation Parse corresponds to left-most derivation k stands for “need k tokens of lookahead to predict” We are interested in LL(1)

Spring 16 CSCI 4430, A Milanova 12 Question Can we always predict what production to apply based on one token of lookahead? id, id, id ; Yes, there is at most one choice (i.e., at most one production that applies) This grammar is an LL(1) grammar list  id list_tail list_tail , id list_tail | ; list id list_tail ;, id, list_tail id

Spring 16 CSCI 4430, A Milanova 13 Question A new grammar What language does it generate? Same, comma-separated lists Can we predict based on one token of lookahead? id, id, id ; list  list_prefix ; list_prefix  list_prefix, id | id list list_prefix ; ?

Spring 16 CSCI 4430, A Milanova 14 Aside: Top-down Depth-first Parsing For each nonterminal, exhaustively try productions in order, backtracking if necessary Consider the grammar. S is the start symbol S  c A d A  ab | a Consider string c a d

15 Aside: Top-down Depth-first Parsing Input string cad Start with start symbol Try S  c A d Leaf c matches input c Try a rule for A: A  ab Leaf a matches input a But b ≠ d. Backtrack to A Try second rule for A: A  a Leaf a matches a. d matches d Done: S  c A d  cad S  c A d A  ab | a S c A d abaab

16 Aside: Top-down Depth-first Parsing Grammar S  aSbS | bSaS | ε Input string abba S aSbSaSbSaSbSbSaSaSbSaS b SbSaSaSbSaSbSaSbSbSaSbSaS ε aSbS ε bSaSbSaS ε aSbSbSaSaSbSaSbSbSaSbSaS ε bSaS ε ε ε bSaS ε ε aSbSaSbSbSaSaSbS …more steps… aSbS εε Spring 16 CSCI 4430, A Milanova

17 Aside: Backtracking, more generally A general search technique Searches for a solution in (large) space 1. We have partial solution Make next choice and expand solution If solution, then done If (partial) solution invalid, backtrack to the last point p where there is untried choice (undoing all work since that point). Repeat 1. If there is no such p, then there is no solution If partial solution valid, repeat 1. Spring 16 CSCI 4430, A Milanova

Aside: Depth-first Parsing A string of tokens to parse: t 1 t 2 … t n Begin with start symbol sentential form Let A be leftmost nonterminal in current sentential form. Exhaustively try each production for A backtracking if necessary E.g., input cad S => c A d (try A  ab, get cabd, no match, backtrack) c A d => (try A  a, get cad, DONE!) S  c A d A  ab | a

19 Aside: Depth-first Parsing Sentential form is t 1 t 2 … t k A… Initially k = 0 and A… is the start symbol Try a production for A (leftmost nonterminal) Say, A  t k+1 t k+2 B… to get t 1 t 2 … t k t k+1 t k+2 B…, Backtrack if necessary Accept when there are no more nonterminals and all terminals match, or reject when there are no more productions left A problematic strategy… Spring 16 CSCI 4430, A Milanova

20 Top-down Predictive Parsing Back to predictive parsing! “Predicts” production to apply based on one or more lookahead token(s) No backtracking! Parser always gets it right Predictive parsers work with LL(k) grammars

Spring 16 CSCI 4430, A Milanova 21 Top-down Predictive Parsing Expression grammar: Not LL(1) Unambiguous version: Still not LL(1). Why? LL(1) version: expr  expr + expr | expr * expr | id expr  expr + term | term term  term * id | id expr  term term_tail term_tail  + term term_tail | ε term  id factor_tail factor_tail  * id factor_tail | ε

Exercise Draw the parse tree for expression id + id * id + id Spring 16 CSCI 4430, A Milanova 22 expr  term term_tail term_tail  + term term_tail | ε term  id factor_tail factor_tail  * id factor_tail | ε

Spring 16 CSCI 4430, A Milanova 23 Lecture Outline Top-down parsing vs. bottom-up parsing Top-down parsing Introduction A backtracking parser Recursive descent predictive parsing Table-driven top-down predictive parsing LL(1) parsing table FIRST, FOLLOW and PREDICT sets Constructing LL(1) parsing tables

Spring 16 CSCI 4430, A Milanova 24 Recursive Descent Each nonterminal has a procedure The right-hand-sides (rhs) for the nonterminal form the body of its procedure lookahead() Peeks at current token in input stream match(t) if lookahead() == t then consume current token, else PARSE_ERROR

25 Recursive Descent start() case lookahead() of id: expr(); match( $$ ) ( $$ - end-of-input marker) otherwise PARSE_ERROR expr() case lookahead() of id: term(); term_tail() otherwise PARSE_ERROR term_tail() case lookahead() of +: match(‘+’); term(); term_tail() $$: skip otherwise: PARSE_ERROR start  expr $$ expr  term term_tailterm_tail  + term term_tail | ε term  id factor_tailfactor_tail  * id factor_tail | ε Predicting production term_tail  + term term_tail Predicting epsilon production term_tail  ε

Spring 16 CSCI 4430, A Milanova 26 Recursive Descent term() case lookahead() of id : match(‘ id ’); factor_tail() otherwise: PARSE_ERROR factor_tail() case lookahead() of * : match(‘ * ’); match(‘ id ’); factor_tail(); +,$$ : skip otherwise PARSE_ERROR Predicting production factor_tail  *id factor_tail Predicting production factor_tail  ε start  expr $$ expr  term term_tailterm_tail  + term term_tail | ε term  id factor_tailfactor_tail  * id factor_tail | ε

Spring 16 CSCI 4430, A Milanova 27 LL(1) Parsing Table But how does the parser “predict”? It uses the LL(1) parsing table One dimension is nonterminal to expand Other dimension is lookahead token We are interested in one token of lookahead Entry “nonterminal on token” contains the production to apply or contains nothing

28 LL(1) Parsing Table id+*$$ start expr $$ --- expr term term_tail --- term_tail - + term term_tail - ε term id factor_tail --- factor_tail- ε * id factor_tail ε start  expr $$ expr  term term_tailterm_tail  + term term_tail | ε term  id factor_tailfactor_tail  * id factor_tail | ε Spring 16 CSCI 4430, A Milanova

29 Question id,;$$ start list list_tail start  list $$ list  id list_tail list_tail , id list_tail | ; Fill in the LL(1) parsing table for the comma- separated list grammar list $$ id list_tail, id list_tail ; Spring 16 CSCI 4430, A Milanova

30 Table-driven Top-down Parsing Uses parse_stack, parse_table parse_stack.push(start_symbol) loop expected_sym : symbol := parse_stack.pop if expected_sym is a terminal or $$ then match(expected_sym) if expected_sym = $$ then return// SUCCESS! else // expected_sym is nonterminal if parse_table[expected_sym,lookahead()] = ERROR then return PARSE_ERROR else production : production := parse_table[expected_sym,lookahead()] foreach sym : symbol in reverse from production parse_stack.push(sym)

Spring 16 CSCI 4430, A Milanova 31 Lecture Outline Top-down parsing vs. bottom-up parsing Top-down parsing Introduction A backtracking parser Recursive descent predictive parsing Table-driven top-down predictive parsing LL(1) parsing table FIRST, FOLLOW and PREDICT sets Constructing LL(1) parsing tables

Spring 16 CSCI 4430, A Milanova 32 Constructing LL(1) Parsing Tables We can construct an LL(1) parsing table for any context-free grammar In general, the table will have multiply-defined entries. That is, for some nonterminal and lookahead token, more than one productions apply A grammar whose LL(1) parsing table has no multiply-defined entries is said to be LL(1) grammar LL(1) grammars are a very special subset of context-free grammars

33 Intuition Top-down parsing Parse tree is built from the top to the leaves Always expand the leftmost nonterminal expr termterm_tail id factor_tail id + id + id*id factor_tail  * id factor_tail f actor_tail  ε What production applies for factor_tail on + ? + does not belong to an expansion of factor_tail. However, factor_tail has an epsilon production and + belongs to an expansion of term_tail which follows factor_tail. Thus, predict the epsilon production. ε expr  term term_tail term_tail  + term term_tail | ε term  id factor_tail factor_tail  * id factor_tail | ε

Spring 16 CSCI 4430, A Milanova 34 Intuition Top-down parsing Parse tree is built from the top to the leaves Always expand the leftmost nonterminal expr termterm_tail id factor_tail id + id+id*id What production applies for term_tail on + ? + is the first symbol in expansions of + term term_tail. Thus, predict production term_tail  + term term_tail ε term_tail  + term term_tail term_tail  ε expr  term term_tail term_tail  + term term_tail | ε term  id factor_tail factor_tail  * id factor_tail | ε + term term_tail

Spring 16 CSCI 4430, A Milanova 35 FIRST and FOLLOW sets Let α be any sequence of nonterminals and terminals FIRST(α) contains the set of terminals a that begin the strings derived from α If there is a derivation α  * ε, then ε is in FIRST(α) Let A be a nonterminal FOLLOW(A) contains the set of terminals b that can appear immediately to the right of A in some sentential form. In other words, there is a derivation S  * …A b …  *…

Spring 16 CSCI 4430, A Milanova 36 Computing FIRST Given a grammar, apply these rules until no more terminals or ε can be added to any FIRST(α) set (1) If α starts with a terminal a, then FIRST(α) = { a } (2) If α is a nonterminal X and X  ε, then add ε to FIRST(X) (3) If α is nonterminal X  Y 1 Y 2 …Y k then place a in FIRST(X) if for some i, a is in FIRST(Y i ) and ε is in all of FIRST(Y 1 ), … FIRST(Y i-1 ). If ε is in all of FIRST(Y 1 ), … FIRST(Y k ), add ε to FIRST(X). Everything in FIRST(Y 1 ) is surely in FIRST(X) If Y 1 does not derive ε, then we add nothing more; Otherwise, we add FIRST(Y 2 ), and so on Notation: α is an arbitrary sequence of terminals and nonterminals.

Example FIRST(start) = { id } FIRST(expr) = { id } FIRST(term) = { id } FIRST(term_tail) = { +,ε } FIRST( + term term_tail) = { + } FIRST(factor_tail) = 37 start  expr $$ expr  term term_tailterm_tail  + term term_tail | ε term  id factor_tailfactor_tail  * id factor_tail | ε Spring 16 CSCI 4430, A Milanova

Question Compute FIRST sets: FIRST(start) = FIRST(list) = FIRST(list_tail) = FIRST(list $$ ) = FIRST(, id list_tail) = 38 start  list $$ list  id list_tail list_tail , id list_tail | ; Spring 16 CSCI 4430, A Milanova

39 Computing FOLLOW Given a grammar, apply these rules until nothing can be added to any FOLLOW(A) set (1) If there is a production A  αBβ, then everything in FIRST(β) except for ε is in FOLLOW(B) (2) If there is a production A  αB, or a production A  αBβ where FIRST(β) contains ε, then everything in FOLLOW(A) is in FOLLOW(B) Notation: A,B,S are nonterminals. α,β are arbitrary sequences of terminals and nonterminals.

Example FOLLOW(expr) = { $$ } FOLLOW(term) = { +, $$ } start  expr $$  term term_tail $$  term + term term_tail $$  term + term $$ 40 start  expr $$ expr  term term_tailterm_tail  + term term_tail | ε term  id factor_tailfactor_tail  * id factor_tail | ε + follows term $$ follows term Spring 16 CSCI 4430, A Milanova