MIT 6.035 Top-Down Parsing Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.

Slides:



Advertisements
Similar presentations
Parsing V: Bottom-up Parsing
Advertisements

Compiler Construction
Pushdown Automata Consists of –Pushdown stack (can have terminals and nonterminals) –Finite state automaton control Can do one of three actions (based.
Parsing III (Eliminating left recursion, recursive descent parsing)
ISBN Chapter 4 Lexical and Syntax Analysis The Parsing Problem Recursive-Descent Parsing.
1 Predictive parsing Recall the main idea of top-down parsing: Start at the root, grow towards leaves Pick a production and try to match input May need.
Parsing — Part II (Ambiguity, Top-down parsing, Left-recursion Removal)
1 The Parser Its job: –Check and verify syntax based on specified syntax rules –Report errors –Build IR Good news –the process can be automated.
Professor Yihjia Tsai Tamkang University
Top-Down Parsing.
Recursive Descent Parsing Top-down parsing: build tree from root symbol Each production corresponds to one recursive procedure Each procedure recognizes.
CSC3315 (Spring 2009)1 CSC 3315 Lexical and Syntax Analysis Hamid Harroud School of Science and Engineering, Akhawayn University
2.2 A Simple Syntax-Directed Translator Syntax-Directed Translation 2.4 Parsing 2.5 A Translator for Simple Expressions 2.6 Lexical Analysis.
1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.
Parsing Chapter 4 Parsing2 Outline Top-down v.s. Bottom-up Top-down parsing Recursive-descent parsing LL(1) parsing LL(1) parsing algorithm First.
Top-Down Parsing - recursive descent - predictive parsing
4 4 (c) parsing. Parsing A grammar describes the strings of tokens that are syntactically legal in a PL A recogniser simply accepts or rejects strings.
Introduction to Parsing Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Parsing III (Top-down parsing: recursive descent & LL(1) )
Parsing Jaruloj Chongstitvatana Department of Mathematics and Computer Science Chulalongkorn University.
-Mandakinee Singh (11CS10026).  What is parsing? ◦ Discovering the derivation of a string: If one exists. ◦ Harder than generating strings.  Two major.
Profs. Necula CS 164 Lecture Top-Down Parsing ICOM 4036 Lecture 5.
Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Top Down Parsing - Part I Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Introduction to Parsing
Comp 311 Principles of Programming Languages Lecture 3 Parsing Corky Cartwright August 28, 2009.
Muhammad Idrees, Lecturer University of Lahore 1 Top-Down Parsing Top down parsing can be viewed as an attempt to find a leftmost derivation for an input.
Parsing — Part II (Top-down parsing, left-recursion removal) Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students.
Top-down Parsing lecture slides from C OMP 412 Rice University Houston, Texas, Fall 2001.
Parsing — Part II (Top-down parsing, left-recursion removal) Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students.
Top-down Parsing Recursive Descent & LL(1) Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412.
Top-Down Parsing CS 671 January 29, CS 671 – Spring Where Are We? Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract.
1 Context free grammars  Terminals  Nonterminals  Start symbol  productions E --> E + T E --> E – T E --> T T --> T * F T --> T / F T --> F F --> (F)
Top-down Parsing. 2 Parsing Techniques Top-down parsers (LL(1), recursive descent) Start at the root of the parse tree and grow toward leaves Pick a production.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
Top-Down Parsing.
CSE 5317/4305 L3: Parsing #11 Parsing #1 Leonidas Fegaras.
1 Introduction to Parsing. 2 Outline l Regular languages revisited l Parser overview Context-free grammars (CFG ’ s) l Derivations.
LECTURE 7 Lex and Intro to Parsing. LEX Last lecture, we learned a little bit about how we can take our regular expressions (which specify our valid tokens)
CS 330 Programming Languages 09 / 25 / 2007 Instructor: Michael Eckmann.
1 Topic #4: Syntactic Analysis (Parsing) CSC 338 – Compiler Design and implementation Dr. Mohamed Ben Othman ( )
1 CMPSC 160 Translation of Programming Languages Fall 2002 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #6 Parsing.
UMBC  CSEE   1 Chapter 4 Chapter 4 (b) parsing.
Parsing III (Top-down parsing: recursive descent & LL(1) )
Spring 16 CSCI 4430, A Milanova 1 Announcements HW1 due on Monday February 8 th Name and date your submission Submit electronically in Homework Server.
Introduction to Parsing
Parsing — Part II (Top-down parsing, left-recursion removal)
Parsing III (Top-down parsing: recursive descent & LL(1) )
A Simple Syntax-Directed Translator
Introduction to Parsing
Programming Languages Translator
Context free grammars Terminals Nonterminals Start symbol productions
Lecture #12 Parsing Types.
Introduction to Parsing (adapted from CS 164 at Berkeley)
Parsing IV Bottom-up Parsing
Parsing — Part II (Top-down parsing, left-recursion removal)
Chapter 4 Top-Down Parsing Part-1 September 8, 2018
4 (c) parsing.
Parsing Techniques.
Top-Down Parsing CS 671 January 29, 2008.
Lecture 7: Introduction to Parsing (Syntax Analysis)
Lecture 8: Top-Down Parsing
LL and Recursive-Descent Parsing Hal Perkins Autumn 2011
Introduction to Parsing
Introduction to Parsing
Parsing — Part II (Top-down parsing, left-recursion removal)
LL and Recursive-Descent Parsing
LL and Recursive-Descent Parsing Hal Perkins Autumn 2009
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
LL and Recursive-Descent Parsing Hal Perkins Winter 2008
Presentation transcript:

MIT Top-Down Parsing Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Orientation Language specification Lexical structure – regular expressions Syntactic structure – grammar This Lecture - recursive descent parsers Code parser as set of mutually recursive procedures Structure of program matches structure of grammar

Starting Point Assume lexical analysis has produced a sequence of tokens Each token has a type and value Types correspond to terminals Values to contents of token read in Examples Int 549 – integer token with value 549 read in if - if keyword, no need for a value AddOp + - add operator, value +

Basic Approach Start with Start symbol Build a leftmost derivation If leftmost symbol is nonterminal, choose a production and apply it If leftmost symbol is terminal, match against input If all terminals match, have found a parse! Key: find correct productions for nonterminals

Graphical Illustration of Leftmost Derivation Apply Production Here NT 1 T 1 T 2 T 3 NT 2 NT 3 Not Here Sentential Form

Grammar for Parsing Example Start  Expr Expr  Expr + Term Expr  Expr - Term Expr  Term Term  Term * Int Term  Term / Int Term  Int Set of tokens is { +, -, *, /, Int }, where Int = [0-9][0-9]* For convenience, may represent each Int n token by n

Start Parse Tree Sentential Form Remaining Input 2-2*2 Start Current Position in Parse Tree Parsing Example

Applied Production Start  Expr Start Parse Tree Sentential Form Remaining Input 2-2*2 Expr Current Position in Parse Tree Parsing Example

Applied Production Expr  Expr - Term Parse Tree Sentential Form Remaining Input 2-2*2 Expr - Term Start Expr Term Expr- Parsing Example

Applied Production Expr  Term Start Parse Tree Sentential Form Remaining Input 2-2*2 Term - Term Expr Term Expr- Term Parsing Example

Applied Production Term  Int Start Parse Tree Sentential Form Remaining Input 2-2*2 Expr Term Expr- Term Int Int - Term Parsing Example

Start Parse Tree Sentential Form Remaining Input 2-2*2 2 - Term Expr Term Expr- Term Match Input Token! Int 2 Parsing Example

Start Parse Tree Sentential Form Remaining Input -2*2 2 - Term Expr Term Expr- Term Match Input Token! Int 2 Parsing Example

Start Parse Tree Sentential Form Remaining Input 2*2 2 - Term Expr Term Expr- Term Match Input Token! Int 2 Parsing Example

Applied Production Term  Term * Int Start Parse Tree Sentential Form Remaining Input 2*2 2 - Term*Int Expr Term Expr- Term Int* Int 2 Parsing Example

Applied Production Term  Int Start Parse Tree Sentential Form Remaining Input 2*2 2 - Int * Int Expr Term Expr- Term Int* Int 2 Int Parsing Example

Match Input Token! Start Parse Tree Sentential Form Remaining Input 2* * Int Expr Term Expr- Term Int* Int 2 Parsing Example

Match Input Token! Start Parse Tree Sentential Form Remaining Input * * Int Expr Term Expr- Term Int* Int 2 Parsing Example

Match Input Token! Start Parse Tree Sentential Form Remaining Input * Int Expr Term Expr- Term Int* Int 2 Parsing Example

Start Parse Tree Sentential Form Remaining Input *2 Expr Term Expr- Term Int 2* Parse Complete!

Summary Three Actions (Mechanisms) Apply production to expand current nonterminal in parse tree Match current terminal (consuming input) Accept the parse as correct Parser generates preorder traversal of parse tree visit parents before children visit siblings from left to right

Policy Problem Which production to use for each nonterminal? Classical Separation of Policy and Mechanism One Approach: Backtracking Treat it as a search problem At each choice point, try next alternative If it is clear that current try fails, go back to previous choice and try something different General technique for searching Used a lot in classical AI and natural language processing (parsing, speech recognition)

Backtracking Example Start Parse Tree Sentential Form Remaining Input 2-2*2 Start

Applied Production Start  Expr Backtracking Example Start Parse Tree Sentential Form Remaining Input 2-2*2 Expr

Applied Production Expr  Expr + Term Backtracking Example Start Parse Tree Sentential Form Remaining Input 2-2*2 Expr + Term Expr Term Expr+

Applied Production Expr  Term Backtracking Example Start Parse Tree Sentential Form Remaining Input 2-2*2 Term + Term Expr Term Expr+ Term

Applied Production Term  Int Backtracking Example Start Parse Tree Sentential Form Remaining Input 2-2*2 Int + Term Expr Term Expr+ Term Int Match Input Token!

Applied Production Term  Int Backtracking Example Start Parse Tree Sentential Form Remaining Input -2*2 2 - Term Expr Term Expr+ Term Int 2 Can’t Match Input Token!

Applied Production Start  Expr Backtracking Example Start Parse Tree Sentential Form Remaining Input 2-2*2 Expr So Backtrack!

Applied Production Expr  Expr - Term Backtracking Example Start Parse Tree Sentential Form Remaining Input 2-2*2 Expr - Term Expr Term Expr-

Term Applied Production Expr  Term Backtracking Example Start Parse Tree Sentential Form Remaining Input 2-2*2 Term - Term Expr Term Expr-

Term Applied Production Term  Int Backtracking Example Start Parse Tree Sentential Form Remaining Input 2-2*2 Int - Term Expr Term Expr- Int

Match Input Token! Term Backtracking Example Start Parse Tree Sentential Form Remaining Input -2*2 2 - Term Expr Term Expr- Int 2

Match Input Token! Term Backtracking Example Start Parse Tree Sentential Form Remaining Input 2*2 2 - Term Expr Term Expr- Int 2

Left Recursion + Top-Down Parsing = Infinite Loop Example Production: Term  Term*Num Potential parsing steps: Term Num * Term Num * Term Num *

General Search Issues Three components Search space (parse trees) Search algorithm (parsing algorithm) Goal to find (parse tree for input program) Would like to (but can’t always) ensure that Find goal (hopefully quickly) if it exists Search terminates if it does not Handled in various ways in various contexts Finite search space makes it easy Exploration strategies for infinite search space Sometimes one goal more important (model checking) For parsing, hack grammar to remove left recursion

Eliminating Left Recursion Start with productions of form A  A  A   ,  sequences of terminals and nonterminals that do not start with A Repeated application of A  A  builds parse tree like this: A  A  A  

Eliminating Left Recursion Replacement productions –A  A  A   R R is a new nonterminal –A   R   R – R   A  A   A  R  R  R  Old Parse Tree New Parse Tree

Hacked Grammar Original Grammar Fragment Term  Term * Int Term  Term / Int Term  Int New Grammar Fragment Term  Int Term’ Term’  * Int Term’ Term’  / Int Term’ Term’  

Term Int* Term Int * Term IntTerm’ Int* Term’ Int* Term’  Parse Tree Comparisons Original GrammarNew Grammar

Eliminating Left Recursion Changes search space exploration algorithm Eliminates direct infinite recursion But grammar less intuitive Sets things up for predictive parsing

Predictive Parsing Alternative to backtracking Useful for programming languages, which can be designed to make parsing easier Basic idea Look ahead in input stream Decide which production to apply based on next tokens in input stream We will use one token of lookahead

Predictive Parsing Example Grammar Start  Expr Expr  Term Expr’ Expr’  + Expr’ Expr’  - Expr’ Expr’   Term  Int Term’ Term’  * Int Term’ Term’  / Int Term’ Term’  

Choice Points Assume Term’ is current position in parse tree Have three possible productions to apply Term’  * Int Term’ Term’  / Int Term’ Term’   Use next token to decide If next token is *, apply Term’  * Int Term’ If next token is /, apply Term’  / Int Term’ Otherwise, apply Term’  

Predictive Parsing + Hand Coding = Recursive Descent Parser One procedure per nonterminal NT Productions NT   1, …, NT   n Procedure examines the current input symbol T to determine which production to apply If T  First(  k ) Apply production k Consume terminals in  k (check for correct terminal) Recursively call procedures for nonterminals in  k Current input symbol stored in global variable token Procedures return true if parse succeeds false if parse fails

Example Boolean Term() if (token = Int n) token = NextToken(); return(TermPrime()) else return(false) Boolean TermPrime() if (token = *) token = NextToken(); if (token = Int n) token = NextToken(); return(TermPrime()) else return(false) else if (token = /) token = NextToken(); if (token = Int n) token = NextToken(); return(TermPrime()) else return(false) else return(true) Term  Int Term’ Term’  * Int Term’ Term’  / Int Term’ Term’  

Multiple Productions With Same Prefix in RHS Example Grammar NT  if then NT  if then else Assume NT is current position in parse tree, and if is the next token Unclear which production to apply Multiple k such that T  First(  k ) if  First(if then) if  First(if then else)

Solution: Left Factor the Grammar New Grammar Factors Common Prefix Into Single Production NT  if then NT’ NT’  else NT’   No choice when next token is if! All choices have been unified in one production.

Nonterminals What about productions with nonterminals? NT  NT 1  1 NT  NT 2  2 Must choose based on possible first terminals that NT 1 and NT 2 can generate What if NT 1 or NT 2 can generate  ? Must choose based on  1 and  2

NT derives  Two rules NT   implies NT derives  NT  NT 1... NT n and for all 1  i  n NT i derives  impliesNT derives 

Fixed Point Algorithm for Derives for all nonterminals NT set NT derives  to be false for all productions of the form NT   set NT derives  to be true while (some NT derives  changed in last iteration) for all productions of the form NT  NT 1... NT n if (for all 1  i  n NT i derives  ) set NT derives  to be true

First(  ) T  First(  ) if T can appear as the first symbol in a derivation starting from  1) T  First(T ) 2) First(S )  First(S  ) 3) NT derives  implies First(  )  First(NT  ) 4) NT  S  implies First(S  )  First(NT ) Notation T is a terminal, NT is a nonterminal, S is a terminal or nonterminal, and  is a sequence of terminals or nonterminals

Rules + Request Generate System of Subset Inclusion Constraints Grammar Term’  * Int Term’ Term’  / Int Term’ Term’   Request: What is First(Term’ )? Constraints First(* Num Term’ )  First(Term’ ) First(/ Num Term’ )  First(Term’ ) First(*)  First(* Num Term’ ) First(/)  First(/ Num Term’ ) *  First(*) /  First(/) Rules 1) T  First(T ) 2) First(S)  First(S  ) 3) NT derives  implies First(  )  First(NT  ) 4) NT  S  implies First(S  )  First(NT )

Constraint Propagation Algorithm Constraints First(* Num Term’ )  First(Term’ ) First(/ Num Term’ )  First(Term’ ) First(*)  First(* Num Term’ ) First(/)  First(/ Num Term’ ) *  First(*) /  First(/) Solution First(Term’ ) = {} First(* Num Term’ ) = {} First(/ Num Term’ ) = {} First(*) = {*} First(/) = {/} Initialize Sets to {} Propagate Constraints Until Fixed Point

Constraint Propagation Algorithm Constraints First(* Num Term’ )  First(Term’ ) First(/ Num Term’ )  First(Term’ ) First(*)  First(* Num Term’ ) First(/)  First(/ Num Term’ ) *  First(*) /  First(/) Solution First(Term’ ) = {} First(* Num Term’ ) = {} First(/ Num Term’ ) = {} First(*) = {*} First(/) = {/} Grammar Term’  * Int Term’ Term’  / Int Term’ Term’  

Constraint Propagation Algorithm Solution First(Term’ ) = {} First(* Num Term’ ) = {*} First(/ Num Term’ ) = {/} First(*) = {*} First(/) = {/} Constraints First(* Num Term’ )  First(Term’ ) First(/ Num Term’ )  First(Term’ ) First(*)  First(* Num Term’ ) First(/)  First(/ Num Term’ ) *  First(*) /  First(/) Grammar Term’  * Int Term’ Term’  / Int Term’ Term’  

Constraint Propagation Algorithm Solution First(Term’ ) = {*,/} First(* Num Term’ ) = {*} First(/ Num Term’ ) = {/} First(*) = {*} First(/) = {/} Constraints First(* Num Term’ )  First(Term’ ) First(/ Num Term’ )  First(Term’ ) First(*)  First(* Num Term’ ) First(/)  First(/ Num Term’ ) *  First(*) /  First(/) Grammar Term’  * Int Term’ Term’  / Int Term’ Term’  

Grammar Term’  * Int Term’ Term’  / Int Term’ Term’   Constraint Propagation Algorithm Solution First(Term’ ) = {*,/} First(* Num Term’ ) = {*} First(/ Num Term’ ) = {/} First(*) = {*} First(/) = {/} Constraints First(* Num Term’ )  First(Term’ ) First(/ Num Term’ )  First(Term’ ) First(*)  First(* Num Term’ ) First(/)  First(/ Num Term’ ) *  First(*) /  First(/)

Building A Parse Tree Have each procedure return the section of the parse tree for the part of the string it parsed Use exceptions to make code structure clean

Building Parse Tree In Example Term() if (token = Int n) oldToken = token; token = NextToken(); node = TermPrime(); if (node == NULL) return oldToken; else return(new TermNode(oldToken, node); else throw SyntaxError TermPrime() if (token = *) || (token = /) first = token; next = NextToken(); if (next = Int n) token = NextToken(); return(new TermPrimeNode(first, next, TermPrime()) else throw SyntaxError else return(NULL)

Parse Tree for 2*3*4 Term Int 2 Term’ Int 3 * Term’ Int 4 * Term’  Term Int 3 * Term Int 4 * Int 2 Concrete Parse Tree Desired Abstract Parse Tree

Why Use Hand-Coded Parser? Why not use parser generator? What do you do if your parser doesn’t work? Recursive descent parser – write more code Parser generator Hack grammar But if parser generator doesn’t work, nothing you can do If you have complicated grammar Increase chance of going outside comfort zone of parser generator Your parser may NEVER work

Bottom Line Recursive descent parser properties Probably more work But less risk of a disaster - you can almost always make a recursive descent parser work May have easier time dealing with resulting code Single language system No need to deal with potentially flaky parser generator No integration issues with automatically generated code If your parser development time is small compared to rest of project, or you have a really complicated language, use hand-coded recursive descent parser

Summary Top-Down Parsing Use Lookahead to Avoid Backtracking Parser is Hand-Coded Set of Mutually Recursive Procedures

Direct Generation of Abstract Tree TermPrime builds an incomplete tree Missing leftmost child Returns root and incomplete node (root, incomplete) = TermPrime() Called with token = * Remaining tokens = 3 * 4 Term Int 3 * Term Int 4 * root incomplete Missing left child to be filled in by caller

Code for Term Term() if (token = Int n) leftmostInt = token; token = NextToken(); (root, incomplete) = TermPrime(); if (root == NULL) return leftmostInt; incomplete.leftChild = leftmostInt; return root; else throw SyntaxError Int 2 token 2*3*4 Input to parse

Code for Term Term() if (token = Int n) leftmostInt = token; token = NextToken(); (root, incomplete) = TermPrime(); if (root == NULL) return leftmostInt; incomplete.leftChild = leftmostInt; return root; else throw SyntaxError Int 2 token 2*3*4 Input to parse

Code for Term Term() if (token = Int n) leftmostInt = token; token = NextToken(); (root, incomplete) = TermPrime(); if (root == NULL) return leftmostInt; incomplete.leftChild = leftmostInt; return root; else throw SyntaxError Int 2 token 2*3*4 Input to parse

Code for Term Term() if (token = Int n) leftmostInt = token; token = NextToken(); (root, incomplete) = TermPrime(); if (root == NULL) return leftmostInt; incomplete.leftChild = leftmostInt; return root; else throw SyntaxError Term Int 3 * Term Int 4 * Int 2 root incomplete leftmostInt 2*3*4 Input to parse

Code for Term Term() if (token = Int n) leftmostInt = token; token = NextToken(); (root, incomplete) = TermPrime(); if (root == NULL) return leftmostInt; incomplete.leftChild = leftmostInt; return root; else throw SyntaxError Term Int 3 * Term Int 4 * Int 2 root incomplete leftmostInt 2*3*4 Input to parse

Code for Term Term() if (token = Int n) leftmostInt = token; token = NextToken(); (root, incomplete) = TermPrime(); if (root == NULL) return leftmostInt; incomplete.leftChild = leftmostInt; return root; else throw SyntaxError Term Int 3 * Term Int 4 * Int 2 root incomplete leftmostInt 2*3*4 Input to parse

Code for TermPrime TermPrime() if (token = *) || (token = /) op = token; next = NextToken(); if (next = Int n) token = NextToken(); (root, incomplete) = TermPrime(); if (root == NULL) root = new ExprNode(NULL, op, next); return (root, root); else newChild = new ExprNode(NULL, op, next); incomplete.leftChild = newChild; return(root, newChild); else throw SyntaxError else return(NULL,NULL) Missing left child to be filled in by caller