Prof. Bodik CS 164 Lecture 61 Building a Parser II CS164 3:30-5:00 TT 10 Evans.

Slides:



Advertisements
Similar presentations
Parsing V: Bottom-up Parsing
Advertisements

lec02-parserCFG March 27, 2017 Syntax Analyzer
Translator Architecture Code Generator ParserTokenizer string of characters (source code) string of tokens abstract program string of integers (object.
1 Parsing The scanner recognizes words The parser recognizes syntactic units Parser operations: Check and verify syntax based on specified syntax rules.
Functional Design and Programming Lecture 9: Lexical analysis and parsing.
9/27/2006Prof. Hilfinger, Lecture 141 Syntax-Directed Translation Lecture 14 (adapted from slides by R. Bodik)
6/12/2015Prof. Hilfinger CS164 Lecture 111 Bottom-Up Parsing Lecture (From slides by G. Necula & R. Bodik)
Top-Down Parsing.
Context-Free Grammars Lecture 7
Prof. Bodik CS 164 Lecture 81 Grammars and ambiguity CS164 3:30-5:00 TT 10 Evans.
Parsing III (Eliminating left recursion, recursive descent parsing)
1 Foundations of Software Design Lecture 23: Finite Automata and Context-Free Grammars Marti Hearst Fall 2002.
Parsing — Part II (Ambiguity, Top-down parsing, Left-recursion Removal)
Prof. Fateman CS 164 Lecture 91 Bottom-Up Parsing Lecture 9.
1 The Parser Its job: –Check and verify syntax based on specified syntax rules –Report errors –Build IR Good news –the process can be automated.
CS 536 Spring Ambiguity Lecture 8. CS 536 Spring Announcement Reading Assignment –“Context-Free Grammars” (Sections 4.1, 4.2) Programming.
Professor Yihjia Tsai Tamkang University
Parsing II : Top-down Parsing Lecture 7 CS 4318/5331 Apan Qasem Texas State University Spring 2015 *some slides adopted from Cooper and Torczon.
1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.
CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars.
Prof. Bodik CS 164 Lecture 51 Building a Parser I CS164 3:30-5:00 TT 10 Evans.
Top-Down Parsing - recursive descent - predictive parsing
BİL 744 Derleyici Gerçekleştirimi (Compiler Design)1 Syntax Analyzer Syntax Analyzer creates the syntactic structure of the given source program. This.
Intro to Lexing & Parsing CS 153. Two pieces conceptually: – Recognizing syntactically valid phrases. – Extracting semantic content from the syntax. E.g.,
PART I: overview material
Profs. Necula CS 164 Lecture Top-Down Parsing ICOM 4036 Lecture 5.
Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Parsing Lecture 5 Fri, Jan 28, Syntax Analysis The syntax of a language is described by a context-free grammar. Each grammar rule has the form A.
Top Down Parsing - Part I Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Introduction to Parsing
Comp 311 Principles of Programming Languages Lecture 3 Parsing Corky Cartwright August 28, 2009.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 3: Introduction to Syntactic Analysis.
Parsing — Part II (Top-down parsing, left-recursion removal) Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students.
Top-down Parsing lecture slides from C OMP 412 Rice University Houston, Texas, Fall 2001.
Parsing — Part II (Top-down parsing, left-recursion removal) Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students.
Top-down Parsing Recursive Descent & LL(1) Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412.
Top-Down Parsing CS 671 January 29, CS 671 – Spring Where Are We? Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract.
Top-down Parsing. 2 Parsing Techniques Top-down parsers (LL(1), recursive descent) Start at the root of the parse tree and grow toward leaves Pick a production.
1 A Simple Syntax-Directed Translator CS308 Compiler Theory.
Top-Down Parsing.
Syntax Analyzer (Parser)
1 Pertemuan 7 & 8 Syntax Analysis (Parsing) Matakuliah: T0174 / Teknik Kompilasi Tahun: 2005 Versi: 1/6.
LECTURE 4 Syntax. SPECIFYING SYNTAX Programming languages must be very well defined – there’s no room for ambiguity. Language designers must use formal.
1 Introduction to Parsing. 2 Outline l Regular languages revisited l Parser overview Context-free grammars (CFG ’ s) l Derivations.
COMP 3438 – Part II-Lecture 5 Syntax Analysis II Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
1 Topic #4: Syntactic Analysis (Parsing) CSC 338 – Compiler Design and implementation Dr. Mohamed Ben Othman ( )
Chapter 2 (part) + Chapter 4: Syntax Analysis S. M. Farhad 1.
UMBC  CSEE   1 Chapter 4 Chapter 4 (b) parsing.
Compiler Construction Lecture Five: Parsing - Part Two CSC 2103: Compiler Construction Lecture Five: Parsing - Part Two Joyce Nakatumba-Nabende 1.
CMSC 330: Organization of Programming Languages Pushdown Automata Parsing.
Syntax Analysis Or Parsing. A.K.A. Syntax Analysis –Recognize sentences in a language. –Discover the structure of a document/program. –Construct (implicitly.
Syntax Analysis By Noor Dhia Syntax analysis:- Syntax analysis or parsing is the most important phase of a compiler. The syntax analyzer considers.
lec02-parserCFG May 8, 2018 Syntax Analyzer
Parsing — Part II (Top-down parsing, left-recursion removal)
Programming Languages Translator
CS510 Compiler Lecture 4.
Introduction to Parsing (adapted from CS 164 at Berkeley)
Parsing — Part II (Top-down parsing, left-recursion removal)
4 (c) parsing.
Parsing Techniques.
Compiler Design 4. Language Grammars
Syntax-Directed Translation
Top-Down Parsing CS 671 January 29, 2008.
Lecture 7: Introduction to Parsing (Syntax Analysis)
Lecture 8: Top-Down Parsing
Ambiguity, Precedence, Associativity & Top-Down Parsing
Programming Language Syntax 5
Parsing — Part II (Top-down parsing, left-recursion removal)
lec02-parserCFG May 27, 2019 Syntax Analyzer
Parsing CSCI 432 Computer Science Theory
Presentation transcript:

Prof. Bodik CS 164 Lecture 61 Building a Parser II CS164 3:30-5:00 TT 10 Evans

Prof. Bodik CS 164 Lecture 6 2 Administrativia PA2 assigned today –due in 12 days WA1 assigned today –due in a week –it’s a practice for the exam First midterm –Oct 5 –will contain some project-inspired questions

Prof. Bodik CS 164 Lecture 6 3 Overview Grammars derivations Recursive descent parser Eliminating left recursion

Prof. Bodik CS 164 Lecture 6 4 Grammars Programming language constructs have recursive structure. –which is why our hand-written parser had this structure, too An expression is either: number, or variable, or expression + expression, or expression - expression, or ( expression ), or …

Prof. Bodik CS 164 Lecture 6 5 Context-free grammars (CFG) a natural notation for this recursive structure grammar for our balanced parens expressions: BalancedExpression  a | ( BalancedExpression ) describes (generates) strings of symbols: –a, (a), ((a)), (((a))), … like regular expressions but can refer to –other expressions (here, BalancedExpression) –and do this recursively (giving is “non-finite state”)

Prof. Bodik CS 164 Lecture 6 6 Example: arithmetic expressions Simple arithmetic expressions: E  n | id | ( E ) | E + E | E * E Some elements of this language: –id –n–n –( n ) –n + id –id * ( id + id )

Prof. Bodik CS 164 Lecture 6 7 Symbols: Terminals and Nonterminals grammars use two kinds of symbols terminals: –no rules for replacing them –once generated, terminals are permanent –these are tokens of our language nonterminals: –to be replaced (expanded) –in regular expression lingo, these serve as names of expressions –start non-terminal: the first symbol to be expanded

Prof. Bodik CS 164 Lecture 6 8 Notational Conventions In these lecture notes, let’s adopt a notation: –Non-terminals are written upper-case –Terminals are written lower-case or as symbols, e.g., token LPAR is written as ( –The start symbol is the left-hand side of the first production

Prof. Bodik CS 164 Lecture 6 9 Derivations This is how a grammar generates strings: –think of grammar rules (called productions) as rewrite rules Derivation: the process of generating a string 1.begin with the start non-terminal 2.rewrite the non-terminal with some of its productions 3.select a non-terminal in your current string i.if no non-terminal left, done. ii.otherwise go to step 2.

Prof. Bodik CS 164 Lecture 6 10 Example: derivation Grammar: E  n | id | ( E ) | E + E | E * E a derivation: Erewrite E with ( E ) ( E ) rewrite E with n ( n )this is the final string of terminals another derivation (written more concisely): E  ( E )  ( E * E )  ( E + E * E )  ( n + E * E )  ( n + id * E )  ( n + id * id )

Prof. Bodik CS 164 Lecture 6 11 So how do derivations help us in parsing? A program (a string of tokens) has no syntax error if it can be derived from the grammar. –but so far you only know how to derive some (any) string, not how to check if a given string is derivable So how to do parsing? –a naïve solution: derive all possible strings and check if your program is among them –not as bad as it sounds: there are parsers that do this, kind of. Coming soon.

Prof. Bodik CS 164 Lecture 6 12 Decaf Example A fragment of Decaf: STMT  while ( EXPR ) STMT | id ( EXPR ) ; EXPR  EXPR + EXPR | EXPR – EXPR | EXPR < EXPR | ( EXPR ) | id

Prof. Bodik CS 164 Lecture 6 13 Decaf Example (Cont.) Some elements of the (fragment of) language: Question: One of the strings is not from the language. Which one? id ( id ) ; id ( ( ( ( id ) ) ) ) ; while ( id < id ) id ( id ) ; while ( while ( id ) ) id ( id ) ; while ( id ) while ( id ) while ( id ) id ( id ) ; STMT  while ( EXPR ) STMT | id ( EXPR ) ; EXPR  EXPR + EXPR | EXPR – EXPR | EXPR < EXPR | ( EXPR ) | id

Prof. Bodik CS 164 Lecture 6 14 CFGs (definition) A CFG consists of –A set of terminal symbols T –A set of non-terminal symbols N –A start symbol S (a non-terminal) –A set of productions: produtions are of two forms (X  N) X  , or X  Y 1 Y 2... Y n where Y i  N  T

Prof. Bodik CS 164 Lecture 6 15 context-free grammars what is “context-free”? –means the grammar is not context-sensitive context-sensitive gramars –can describe more languages than CFGs –because their productions restrict when a non- terminal can be rewritten. An example production: d N  d A B c –meaning: N can be rewritten into ABc only when preceded by d –can be used to encode semantic checks, but parsing is hard

Prof. Bodik CS 164 Lecture 6 16 Now let’s parse a string recursive descent parser derives all strings –until it matches derived string with the input string –or until it is sure there is a syntax error

Prof. Bodik CS 164 Lecture 6 17 Recursive Descent Parsing Consider the grammar E  T + E | T T  int | int * T | ( E ) Token stream is: int 5 * int 2 Start with top-level non-terminal E Try the rules for E in order

Prof. Bodik CS 164 Lecture 6 18 Recursive Descent Parsing. Example (Cont.) Try E 0  T 1 + E 2 Then try a rule for T 1  ( E 3 ) –But ( does not match input token int 5 Try T 1  int. Token matches. –But + after T 1 does not match input token * Try T 1  int * T 2 –This will match but + after T 1 will be unmatched Have exhausted the choices for T 1 –Backtrack to choice for E 0

Prof. Bodik CS 164 Lecture 6 19 Recursive Descent Parsing. Example (Cont.) Try E 0  T 1 Follow same steps as before for T 1 –And succeed with T 1  int * T 2 and T 2  int –With the following parse tree E0E0 T1T1 int 5 * T2T2 int 2

Prof. Bodik CS 164 Lecture 6 20 A Recursive Descent Parser (2) Define boolean functions that check the token string for a match of –A given token terminal bool term(TOKEN tok) { return in[next++] == tok; } –A given production of S (the n th ) bool S n () { … } –Any production of S: bool S() { … } These functions advance next

Prof. Bodik CS 164 Lecture 6 21 A Recursive Descent Parser (3) For production E  T + E bool E 1 () { return T() && term(PLUS) && E(); } For production E  T bool E 2 () { return T(); } For all productions of E (with backtracking) bool E() { int save = next; return (next = save, E 1 ()) || (next = save, E 2 ()); }

Prof. Bodik CS 164 Lecture 6 22 A Recursive Descent Parser (4) Functions for non-terminal T bool T 1 () { return term(OPEN) && E() && term(CLOSE); } bool T 2 () { return term(INT) && term(TIMES) && T(); } bool T 3 () { return term(INT); } bool T() { int save = next; return (next = save, T 1 ()) || (next = save, T 2 ()) || (next = save, T 3 ()); }

Prof. Bodik CS 164 Lecture 6 23 Recursive Descent Parsing. Notes. To start the parser –Initialize next to point to first token –Invoke E() Notice how this simulates our backtracking example from lecture Easy to implement by hand Predictive parsing is more efficient

Prof. Bodik CS 164 Lecture 6 24 Recursive Descent Parsing. Notes. Easy to implement by hand –An example implementation is provided as a supplement “Recursive Descent Parsing” But does not always work …

Prof. Bodik CS 164 Lecture 6 25 Recursive-Descent Parsing Parsing: given a string of tokens t 1 t 2... t n, find its parse tree Recursive-descent parsing: Try all the productions exhaustively –At a given moment the fringe of the parse tree is: t 1 t 2 … t k A … –Try all the productions for A: if A  BC is a production, the new fringe is t 1 t 2 … t k B C … –Backtrack when the fringe doesn’t match the string –Stop when there are no more non-terminals

Prof. Bodik CS 164 Lecture 6 26 When Recursive Descent Does Not Work Consider a production S  S a: –In the process of parsing S we try the above rule –What goes wrong? A left-recursive grammar has a non-terminal S S  + S  for some  Recursive descent does not work in such cases –It goes into an  loop

Prof. Bodik CS 164 Lecture 6 27 Elimination of Left Recursion Consider the left-recursive grammar S  S  |  S generates all strings starting with a  and followed by a number of  Can rewrite using right-recursion S   S’ S’   S’ | 

Prof. Bodik CS 164 Lecture 6 28 Elimination of Left-Recursion. Example Consider the grammar S  1 | S 0 (  = 1 and  = 0 ) can be rewritten as S  1 S’ S’  0 S’ | 

Prof. Bodik CS 164 Lecture 6 29 More Elimination of Left-Recursion In general S  S  1 | … | S  n |  1 | … |  m All strings derived from S start with one of  1,…,  m and continue with several instances of  1,…,  n Rewrite as S   1 S’ | … |  m S’ S’   1 S’ | … |  n S’ | 

Prof. Bodik CS 164 Lecture 6 30 General Left Recursion The grammar S  A  |  A  S  is also left-recursive because S  + S   This left-recursion can also be eliminated See [ASU], Section 4.3 for general algorithm

Prof. Bodik CS 164 Lecture 6 31 Summary of Recursive Descent Simple and general parsing strategy –Left-recursion must be eliminated first –… but that can be done automatically Unpopular because of backtracking –Thought to be too inefficient In practice, backtracking is eliminated by restricting the grammar