Copyright © 2009 Elsevier Chapter 2 :: Programming Language Syntax Programming Language Pragmatics Michael L. Scott.

Slides:



Advertisements
Similar presentations
Compiler Construction
Advertisements

Bottom up Parsing Bottom up parsing trys to transform the input string into the start symbol. Moves through a sequence of sentential forms (sequence of.
Chap. 5, Top-Down Parsing J. H. Wang Mar. 29, 2011.
Pushdown Automata Consists of –Pushdown stack (can have terminals and nonterminals) –Finite state automaton control Can do one of three actions (based.
YANGYANG 1 Chap 5 LL(1) Parsing LL(1) left-to-right scanning leftmost derivation 1-token lookahead parser generator: Parsing becomes the easiest! Modifying.
Mooly Sagiv and Roman Manevich School of Computer Science
Predictive Parsing l Find derivation for an input string, l Build a abstract syntax tree (AST) –a representation of the parsed program l Build a symbol.
6/12/2015Prof. Hilfinger CS164 Lecture 111 Bottom-Up Parsing Lecture (From slides by G. Necula & R. Bodik)
Parsing III (Eliminating left recursion, recursive descent parsing)
COS 320 Compilers David Walker. last time context free grammars (Appel 3.1) –terminals, non-terminals, rules –derivations & parse trees –ambiguous grammars.
Parsing V Introduction to LR(1) Parsers. from Cooper & Torczon2 LR(1) Parsers LR(1) parsers are table-driven, shift-reduce parsers that use a limited.
Parsing — Part II (Ambiguity, Top-down parsing, Left-recursion Removal)
Prof. Fateman CS 164 Lecture 91 Bottom-Up Parsing Lecture 9.
COS 320 Compilers David Walker. last time context free grammars (Appel 3.1) –terminals, non-terminals, rules –derivations & parse trees –ambiguous grammars.
Table-driven parsing Parsing performed by a finite state machine. Parsing algorithm is language-independent. FSM driven by table (s) generated automatically.
1 CIS 461 Compiler Design & Construction Fall 2012 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #12 Parsing 4.
1 Bottom-up parsing Goal of parser : build a derivation –top-down parser : build a derivation by working from the start symbol towards the input. builds.
Shift/Reduce and LR(1) Professor Yihjia Tsai Tamkang University.
Bottom-up parsing Goal of parser : build a derivation
CPSC 388 – Compiler Design and Construction
CSC3315 (Spring 2009)1 CSC 3315 Lexical and Syntax Analysis Hamid Harroud School of Science and Engineering, Akhawayn University
Parsing IV Bottom-up Parsing Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Syntax and Semantics Structure of programming languages.
1 Chapter 5 LL (1) Grammars and Parsers. 2 Naming of parsing techniques The way to parse token sequence L: Leftmost R: Righmost Top-down  LL Bottom-up.
Chapter 5 Top-Down Parsing.
Syntax. Syntax defines what is grammatically valid in a programming language – Set of grammatical rules – E.g. in English, a sentence cannot begin with.
Chap. 6, Bottom-Up Parsing J. H. Wang May 17, 2011.
Parsing G Programming Languages May 24, 2012 New York University Chanseok Oh
Parsing Jaruloj Chongstitvatana Department of Mathematics and Computer Science Chulalongkorn University.
Profs. Necula CS 164 Lecture Top-Down Parsing ICOM 4036 Lecture 5.
1 Compiler Construction Syntax Analysis Top-down parsing.
Review 1.Lexical Analysis 2.Syntax Analysis 3.Semantic Analysis 4.Code Generation 5.Code Optimization.
Syntax and Semantics Structure of programming languages.
1 Lecture 5: Syntax Analysis (Section 2.2) CSCI 431 Programming Languages Fall 2002 A modification of slides developed by Felix Hernandez-Campos at UNC.
Copyright © 2009 Elsevier Chapter 2 :: Programming Language Syntax Programming Language Pragmatics Michael L. Scott.
Prof. Necula CS 164 Lecture 8-91 Bottom-Up Parsing LR Parsing. Parser Generators. Lecture 6.
CS 326 Programming Languages, Concepts and Implementation Instructor: Mircea Nicolescu Lecture 4.
1 Compiler Construction Syntax Analysis Top-down parsing.
Parsing — Part II (Top-down parsing, left-recursion removal) Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students.
Top-down Parsing lecture slides from C OMP 412 Rice University Houston, Texas, Fall 2001.
Top-down Parsing Recursive Descent & LL(1) Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412.
Top-Down Parsing CS 671 January 29, CS 671 – Spring Where Are We? Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract.
Top-down Parsing. 2 Parsing Techniques Top-down parsers (LL(1), recursive descent) Start at the root of the parse tree and grow toward leaves Pick a production.
Top-Down Parsing.
CS 330 Programming Languages 09 / 25 / 2007 Instructor: Michael Eckmann.
Parsing III (Top-down parsing: recursive descent & LL(1) )
Bernd Fischer RW713: Compiler and Software Language Engineering.
Bottom Up Parsing CS 671 January 31, CS 671 – Spring Where Are We? Finished Top-Down Parsing Starting Bottom-Up Parsing Lexical Analysis.
Parsing V LR(1) Parsers. LR(1) Parsers LR(1) parsers are table-driven, shift-reduce parsers that use a limited right context (1 token) for handle recognition.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 6: LR grammars and automatic parser generators.
Compilers: Bottom-up/6 1 Compiler Structures Objective – –describe bottom-up (LR) parsing using shift- reduce and parse tables – –explain how LR.
Spring 16 CSCI 4430, A Milanova 1 Announcements HW1 due on Monday February 8 th Name and date your submission Submit electronically in Homework Server.
CMSC 330: Organization of Programming Languages Pushdown Automata Parsing.
Syntax and Semantics Structure of programming languages.
CS 326 Programming Languages, Concepts and Implementation
Programming Languages Translator
Chapter 2 :: Programming Language Syntax
Chapter 2 :: Programming Language Syntax
Chapter 2 :: Programming Language Syntax
Table-driven parsing Parsing performed by a finite state machine.
CS 363 – Chapter 2 The first 2 phases of compilation Scanning Parsing
Chapter 2 :: Programming Language Syntax
CS 3304 Comparative Languages
Top-Down Parsing CS 671 January 29, 2008.
Chapter 2 :: Programming Language Syntax
Chapter 2 :: Programming Language Syntax
Kanat Bolazar February 16, 2010
Announcements HW2 due on Tuesday Fall 18 CSCI 4430, A Milanova.
Chapter 2 :: Programming Language Syntax
Chapter 2 :: Programming Language Syntax
Presentation transcript:

Copyright © 2009 Elsevier Chapter 2 :: Programming Language Syntax Programming Language Pragmatics Michael L. Scott

Copyright © 2009 Elsevier Parsing Fortunately, there are large classes of grammars for which we can build parsers that run in linear time –The two most important classes are called LL and LR LL stands for 'Left-to-right, Leftmost derivation'. LR stands for 'Left-to-right, Rightmost derivation’

Copyright © 2009 Elsevier Parsing LL parsers are also called 'top-down', or 'predictive' parsers & LR parsers are also called 'bottom-up', or 'shift-reduce' parsers There are several important sub-classes of LR parsers –SLR –LALR We won't be going into detail on the differences between them

Copyright © 2009 Elsevier Parsing Every LL(1) grammar is also LR(1), though right recursion in production tends to require very deep stacks and complicates semantic analysis Every CFL that can be parsed deterministically has an SLR(1) grammar (which is LR(1)) Every deterministic CFL with the prefix property (no valid string is a prefix of another valid string) has an LR(0) grammar

Copyright © 2009 Elsevier Parsing You commonly see LL or LR (or whatever) written with a number in parentheses after it –This number indicates how many tokens of look-ahead are required in order to parse –Almost all real compilers use one token of look-ahead The expression grammar (with precedence and associativity) you saw before is LR(1), but not LL(1)

Copyright © 2009 Elsevier LL Parsing Here is an LL(1) grammar (Fig 2.15): 1. program → stmt list $$$ 2. stmt_list → stmt stmt_list 3. | ε 4. stmt → id := expr 5. | read id 6. | write expr 7. expr→ term term_tail 8. term_tail → add op term term_tail 9. | ε

Copyright © 2009 Elsevier LL Parsing LL(1) grammar (continued) 10. term→ factor fact_tailt 11. fact_tail → mult_op fact fact_tail 12. | ε 13. factor → ( expr ) 14. | id 15. | number 16. add_op → | mult_op → * 19. | /

Copyright © 2009 Elsevier LL Parsing Like the bottom-up grammar, this one captures associativity and precedence, but most people don't find it as pretty –for one thing, the operands of a given operator aren't in a RHS together! –however, the simplicity of the parsing algorithm makes up for this weakness How do we parse a string with this grammar? –by building the parse tree incrementally

Copyright © 2009 Elsevier LL Parsing Example (average program) read A read B sum := A + B write sum write sum / 2 We start at the top and predict needed productions on the basis of the current left-most non-terminal in the tree and the current input token

Copyright © 2009 Elsevier LL Parsing Parse tree for the average program (Figure 2.17)

Copyright © 2009 Elsevier LL Parsing Table-driven LL parsing: you have a big loop in which you repeatedly look up an action in a two-dimensional table based on current leftmost non-terminal and current input token. The actions are (1) match a terminal (2) predict a production (3) announce a syntax error

Copyright © 2009 Elsevier LL Parsing LL(1) parse table for parsing for calculator language

Copyright © 2009 Elsevier LL Parsing To keep track of the left-most non-terminal, you push the as-yet-unseen portions of productions onto a stack –for details see Figure 2.20 The key thing to keep in mind is that the stack contains all the stuff you expect to see between now and the end of the program –what you predict you will see

Copyright © 2009 Elsevier LL Parsing Problems trying to make a grammar LL(1) –left recursion example: id_list→ id | id_list, id equivalently id_list→ id id_list_tail id_list_tail →, id id_list_tail | epsilon we can get rid of all left recursion mechanically in any grammar

Copyright © 2009 Elsevier LL Parsing Problems trying to make a grammar LL(1) –common prefixes: another thing that LL parsers can't handle solved by "left-factoring” example: stmt → id := expr | id ( arg_list ) equivalently stmt → id id_stmt_tail id_stmt_tail → := expr | ( arg_list) we can eliminate left-factor mechanically

Copyright © 2009 Elsevier LL Parsing Note that eliminating left recursion and common prefixes does NOT make a grammar LL –there are infinitely many non-LL LANGUAGES, and the mechanical transformations work on them just fine –the few that arise in practice, however, can generally be handled with kludges

Copyright © 2009 Elsevier LL Parsing Problems trying to make a grammar LL(1) –the"dangling else" problem prevents grammars from being LL(1) (or in fact LL(k) for any k) –the following natural grammar fragment is ambiguous (Pascal) stmt → if cond then_clause else_clause | other_stuff then_clause → then stmt else_clause → else stmt | epsilon

Copyright © 2009 Elsevier LL Parsing The less natural grammar fragment can be parsed bottom-up but not top-down stmt → balanced_stmt | unbalanced_stmt balanced_stmt → if cond then balanced_stmt else balanced_stmt | other_stuff unbalanced_stmt → if cond then stmt | if cond then balanced_stmt else unbalanced_stmt

Copyright © 2009 Elsevier LL Parsing The usual approach, whether top-down OR bottom-up, is to use the ambiguous grammar together with a disambiguating rule that says –else goes with the closest then or –more generally, the first of two possible productions is the one to predict (or reduce)

Copyright © 2009 Elsevier LL Parsing Better yet, languages (since Pascal) generally employ explicit end-markers, which eliminate this problem In Modula-2, for example, one says: if A = B then if C = D then E := F end else G := H end Ada says 'end if'; other languages say 'fi'

Copyright © 2009 Elsevier LL Parsing One problem with end markers is that they tend to bunch up. In Pascal you say if A = B then … else if A = C then … else if A = D then … else if A = E then … else...; With end markers this becomes if A = B then … else if A = C then … else if A = D then … else if A = E then … else...; end; end; end; end;

Copyright © 2009 Elsevier LL Parsing The algorithm to build predict sets is tedious (for a "real" sized grammar), but relatively simple It consists of three stages: –(1) compute FIRST sets for symbols –(2) compute FOLLOW sets for non-terminals (this requires computing FIRST sets for some strings) –(3) compute predict sets or table for all productions

Copyright © 2009 Elsevier LL Parsing It is conventional in general discussions of grammars to use –lower case letters near the beginning of the alphabet for terminals –lower case letters near the end of the alphabet for strings of terminals –upper case letters near the beginning of the alphabet for non-terminals –upper case letters near the end of the alphabet for arbitrary symbols –greek letters for arbitrary strings of symbols

Copyright © 2009 Elsevier LL Parsing Algorithm First/Follow/Predict: –FIRST(α) == {a : α →* a β} ∪ (if α =>* ε THEN {ε} ELSE NULL) –FOLLOW(A) == {a : S → + α A a β} ∪ (if S →* α A THEN {ε} ELSE NULL) –Predict (A → X 1... X m ) == (FIRST (X 1... X m ) - {ε}) ∪ (if X 1,..., X m →* ε then FOLLOW (A) ELSE NULL) Details following…

Copyright © 2009 Elsevier LL Parsing

Copyright © 2009 Elsevier LL Parsing

Copyright © 2009 Elsevier LL Parsing If any token belongs to the predict set of more than one production with the same LHS, then the grammar is not LL(1) A conflict can arise because –the same token can begin more than one RHS –it can begin one RHS and can also appear after the LHS in some valid program, and one possible RHS is 

Copyright © 2009 Elsevier LR Parsing LR parsers are almost always table-driven: –like a table-driven LL parser, an LR parser uses a big loop in which it repeatedly inspects a two- dimensional table to find out what action to take –unlike the LL parser, however, the LR driver has non-trivial state (like a DFA), and the table is indexed by current input token and current state –the stack contains a record of what has been seen SO FAR (NOT what is expected)

Copyright © 2009 Elsevier LR Parsing A scanner is a DFA –it can be specified with a state diagram An LL or LR parser is a PDA –Early's & CYK algorithms do NOT use PDAs –a PDA can be specified with a state diagram and a stack the state diagram looks just like a DFA state diagram, except the arcs are labeled with pairs, and in addition to moving to a new state the PDA has the option of pushing or popping a finite number of symbols onto/off the stack

Copyright © 2009 Elsevier LR Parsing An LL(1) PDA has only one state! –well, actually two; it needs a second one to accept with, but that's all (it's pretty simple) –all the arcs are self loops; the only difference between them is the choice of whether to push or pop –the final state is reached by a transition that sees EOF on the input and the stack

Copyright © 2009 Elsevier LR Parsing An SLR/LALR/LR PDA has multiple states –it is a "recognizer," not a "predictor" –it builds a parse tree from the bottom up –the states keep track of which productions we might be in the middle The parsing of the Characteristic Finite State Machine (CFSM) is based on –Shift –Reduce

Copyright © 2009 Elsevier LR Parsing To illustrate LR parsing, consider the grammar (Figure 2.24, Page 73): 1. program→ stmt list $$$ 2. stmt_list → stmt_list stmt 3. | stmt 4. stmt → id := expr 5. | read id 6. | write expr 7. expr → term 8. | expr add op term

Copyright © 2009 Elsevier LR Parsing LR grammar (continued): 9. term → factor 10. | term mult_op factor 11. factor →( expr ) 12. | id 13. | number 14. add op → | mult op → * 17. | /

Copyright © 2009 Elsevier LR Parsing This grammar is SLR(1), a particularly nice class of bottom-up grammar –it isn't exactly what we saw originally –we've eliminated the epsilon production to simplify the presentation For details on the table driven SLR(1) parsing please note the following slides

Copyright © 2009 Elsevier LR Parsing

Copyright © 2009 Elsevier LR Parsing

Copyright © 2009 Elsevier LR Parsing

Copyright © 2009 Elsevier LR Parsing SLR parsing is based on –Shift –Reduce and also –Shift & Reduce (for optimization)