1 Lecture 9 Bottom-up parsers Datalog, CYK parser, Earley parser Ras Bodik Ali and Mangpo Hack Your Language! CS164: Introduction to Programming Languages.

Slides:



Advertisements
Similar presentations
Compiler Construction
Advertisements

Semantics Static semantics Dynamic semantics attribute grammars
C O N T E X T - F R E E LANGUAGES ( use a grammar to describe a language) 1.
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Exercise: Balanced Parentheses
CYK Parser Von Carla und Cornelia Kempa. Overview Top-downBottom-up Non-directional methods Unger ParserCYK Parser.
Simplifying CFGs There are several ways in which context-free grammars can be simplified. One natural way is to eliminate useless symbols those that cannot.
1 Introduction to Computability Theory Lecture12: Reductions Prof. Amos Israeli.
Introduction to Computability Theory
Constraint Logic Programming Ryan Kinworthy. Overview Introduction Logic Programming LP as a constraint programming language Constraint Logic Programming.
9/27/2006Prof. Hilfinger, Lecture 141 Syntax-Directed Translation Lecture 14 (adapted from slides by R. Bodik)
6/12/2015Prof. Hilfinger CS164 Lecture 111 Bottom-Up Parsing Lecture (From slides by G. Necula & R. Bodik)
CS 330 Programming Languages 09 / 13 / 2007 Instructor: Michael Eckmann.
Context-Free Grammars Lecture 7
COS 320 Compilers David Walker. last time context free grammars (Appel 3.1) –terminals, non-terminals, rules –derivations & parse trees –ambiguous grammars.
Lecture #8, Feb. 7, 2007 Shift-reduce parsing,
Prof. Bodik CS 164 Lecture 61 Building a Parser II CS164 3:30-5:00 TT 10 Evans.
CS 536 Spring Introduction to Bottom-Up Parsing Lecture 11.
Parsing — Part II (Ambiguity, Top-down parsing, Left-recursion Removal)
Prof. Fateman CS 164 Lecture 91 Bottom-Up Parsing Lecture 9.
COS 320 Compilers David Walker. last time context free grammars (Appel 3.1) –terminals, non-terminals, rules –derivations & parse trees –ambiguous grammars.
Parsing SLP Chapter 13. 7/2/2015 Speech and Language Processing - Jurafsky and Martin 2 Outline  Parsing with CFGs  Bottom-up, top-down  CKY parsing.
CPSC Compiler Tutorial 3 Parser. Parsing The syntax of most programming languages can be specified by a Context-free Grammar (CGF) Parsing: Given.
2/20/2008Prof. Hilfinger CS164 Lecture 121 Earley’s Algorithm: General Context-Free Parsing Lecture 12 P. N. Hilfinger.
CSE 413 Programming Languages & Implementation Hal Perkins Autumn 2012 Context-Free Grammars and Parsing 1.
8/19/2015© Hal Perkins & UW CSEC-1 CSE P 501 – Compilers Parsing & Context-Free Grammars Hal Perkins Winter 2008.
1 Lecture 10 Syntax-Directed Translation grammar disambiguation, Earley parser, syntax-directed translation Ras Bodik Shaon Barman Thibaud Hottelier Hack.
1 Lecture 8 Grammars and Parsers grammar and derivations, recursive descent parser vs. CYK parser, Prolog vs. Datalog Ras Bodik with Ali & Mangpo Hack.
2.2 A Simple Syntax-Directed Translator Syntax-Directed Translation 2.4 Parsing 2.5 A Translator for Simple Expressions 2.6 Lexical Analysis.
Formal Models of Computation Part II The Logic Model
CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars.
1 Chapter 5 LL (1) Grammars and Parsers. 2 Naming of parsing techniques The way to parse token sequence L: Leftmost R: Righmost Top-down  LL Bottom-up.
February 18, 2015CS21 Lecture 181 CS21 Decidability and Tractability Lecture 18 February 18, 2015.
1 Lecture 8 Parsers grammar derivations, recursive descent parser vs. CYK parser, Prolog vs. Datalog Ras Bodik Shaon Barman Thibaud Hottelier Hack Your.
Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Chapter Twenty-ThreeModern Programming Languages1 Formal Semantics.
Interpretation Environments and Evaluation. CS 354 Spring Translation Stages Lexical analysis (scanning) Parsing –Recognizing –Building parse tree.
Formal Semantics Chapter Twenty-ThreeModern Programming Languages, 2nd ed.1.
Prof. Necula CS 164 Lecture 8-91 Bottom-Up Parsing LR Parsing. Parser Generators. Lecture 6.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 3: Introduction to Syntactic Analysis.
CS 4705 Lecture 10 The Earley Algorithm. Review Top-Down vs. Bottom-Up Parsers –Both generate too many useless trees –Combine the two to avoid over-generation:
Closure Properties Lemma: Let A 1 and A 2 be two CF languages, then the union A 1  A 2 is context free as well. Proof: Assume that the two grammars are.
Top-down Parsing lecture slides from C OMP 412 Rice University Houston, Texas, Fall 2001.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
Top-Down Parsing CS 671 January 29, CS 671 – Spring Where Are We? Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract.
Overview of Previous Lesson(s) Over View  In our compiler model, the parser obtains a string of tokens from the lexical analyzer & verifies that the.
Costas Busch - LSU1 Parsing. Costas Busch - LSU2 Compiler Program File v = 5; if (v>5) x = 12 + v; while (x !=3) { x = x - 3; v = 10; } Add v,v,5.
COMP 3438 – Part II-Lecture 5 Syntax Analysis II Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
Exercises on Chomsky Normal Form and CYK parsing
Bottom Up Parsing CS 671 January 31, CS 671 – Spring Where Are We? Finished Top-Down Parsing Starting Bottom-Up Parsing Lexical Analysis.
COMP 3438 – Part II-Lecture 6 Syntax Analysis III Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
Compiler Construction Lecture Five: Parsing - Part Two CSC 2103: Compiler Construction Lecture Five: Parsing - Part Two Joyce Nakatumba-Nabende 1.
CS 2130 Lecture 18 Bottom-Up Parsing or Shift-Reduce Parsing Warning: The precedence table given for the Wff grammar is in error.
BNF A CFL Metalanguage Some Variations Particular View to SLK Copyright © 2015 – Curt Hill.
Parsing & Context-Free Grammars
Table-driven parsing Parsing performed by a finite state machine.
Simplifications of Context-Free Grammars
Parsing & Context-Free Grammars Hal Perkins Autumn 2011
(Slides copied liberally from Ruth Anderson, Hal Perkins and others)
Syntax-Directed Translation
Top-Down Parsing CS 671 January 29, 2008.
Lecture 7: Introduction to Parsing (Syntax Analysis)
Parsing Costas Busch - LSU.
LL and Recursive-Descent Parsing Hal Perkins Autumn 2011
Parsing & Context-Free Grammars Hal Perkins Summer 2004
LL and Recursive-Descent Parsing Hal Perkins Autumn 2009
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
LL and Recursive-Descent Parsing Hal Perkins Winter 2008
Parsing & Context-Free Grammars Hal Perkins Autumn 2005
COMPILER CONSTRUCTION
Presentation transcript:

1 Lecture 9 Bottom-up parsers Datalog, CYK parser, Earley parser Ras Bodik Ali and Mangpo Hack Your Language! CS164: Introduction to Programming Languages and Compilers, Spring 2013 UC Berkeley

Hidden slides This slide deck contains hidden slides that may help in studying the material. These slides show up in the exported pdf file but when you view the ppt file in Slide Show mode. 2

Today Datalog a special subset of Prolog CYK parser builds the parse bottom up Earley parser solves CYK’s inefficiency 3

Prolog Parser top-down parser: builds the parse tree by descending from the root

Parser for the full expression grammar E ::= T | T + E T ::= F | F * T F ::= a e(In,Out) :- t(In, Out). e(In,Out) :- t(In, [+|R]), e(R,Out). t(In,Out) :- f(In, Out). t(In,Out) :- f(In, [*|R]), t(R,Out). f([a|Out],Out). parse(S) :- e(S,[]). ?- parse([a,+,a,*,a]). --> true 5 What answers does this query return? parse([a,+,a],Out)?

Construct also the parse tree E = T | T + E T = F | F * T F = a e(In,Out,e(T1)) :- t(In, Out, T1). e(In,Out,e(T1,+,T2)) :- t(In, [+|R], T1), e(R,Out,T2). t(In,Out,t(T1)) :- f(In, Out, T1). t(In,Out,t(T1,*,T2)) :- f(In, [*|R], T1), t(R,Out,T2). f([a|Out],Out,f(a)). parse(S,T) :- e(S,[],T). ?- parse([a,+,a,*,a],T). T = e(t(f(a)), +, e(t(f(a), *, t(f(a)))) 6

Construct also the AST E = T | T + E T = F | F * T F = a e(In,Out,T1) :- t(In, Out, T1). e(In,Out,plus(T1,T2)) :- t(In, [+|R], T1), e(R,Out,T2). t(In,Out,T1) :- f(In, Out, T1). t(In,Out,times(T1,T2)):- f(In, [*|R], T1), t(R,Out,T2). f([a|Out],Out, a). parse(S,T) :- e(S,[],T). ?- parse([a,+,a,*,a],T). T = plus(a, times(a, a)) 7

Datalog (a subset of Prolog, more or less)

Datalog: a well-behaved subset of Prolog Datalog is a restricted subset of Prolog disallows compound terms as arguments of predicatespredicates p(1, 2) is admissible but not p(f1(1), 2). Hence can’t use lists. only allows range-restricted variables, each variable in the head of a rule must also appear in a not-negated clause in the body of this rule. Hence we can compute values of variables from ground facts. imposes stratification restrictions on the use of negationstratification this can be satisfied by simply not using negation, is possible From wikipedia: Query evaluation in Datalog is based on first order logic, and is thus sound and complete.first order logicsoundcomplete See The Art of Prolog for why Prolog is not logic (Sec 11.3) 9

Why do we care about Datalog? Predictable semantics: all Datalog programs terminate (unlike Prolog programs) – thanks to the restrictions above, which make the set of all possible proofs finiteterminate Efficient evaluation: Uses bottom-up evaluation (dynamic programming). Various methods have been proposed to efficiently perform queries, e.g. the Magic Sets algorithm, [3] [3] If interested, see more in wikipedia. 10

More why do we care about Datalog? We can mechanically derive famous parsers Mechanically == without thinking too hard. Indeed, the rest of the lecture is about this 1)CYK parser == Datalog version of Prolog rdp 2)Earley == Magic Set transformation of CYK There is a bigger cs164 lesson here: restricting your language may give you desirable properties Just think how much easier your PA1 interpreter would be to implement without having to support recursion. Although it would be much less useful without recursion. Luckily, with Datalog, we don’t lose anything (when it comes to parsing). 11

CYK parser (can we run a parser in polynomial time?)

Turning our Prolog parser into Datalog Recursive descent in Prolog, for E ::= a | a+E e([a|Out], Out). e([a,+|R], Out) :- e(R,Out). Let’s check the datalog rules: No negation: check Range restricted: check Compound predicates: nope (uses lists) 13

Turning our Prolog parser into Datalog, cont. Let’s refactor the program a little, using the grammar E --> a | E + E | E * E Yes, with Datalog, we can use left-recursive grammars! Datalog parser: e(i,j) is true iff the substring input[i:j] can be derived from the non-terminal E. input[i:j] is input from index i to index j-1 e(I,I+1) :- input[I]==‘a’. e(I,J) :- e(I,K), input[K]==‘+’, e(K+1,J). e(I,J) :- e(I,K), input[K]==‘*’, e(K+1,J). 14

A graphical way to visualize the evaluation Initial graph: the input (terminals) Repeat: add non-terminal edges until no more can be added. An edge is added when adjacent edges form rhs of a grammar production. a1a *4*4 E6E6 a3a3 E 11 a5a5 E9E9 E7E7 E8E8 Input: a + a * a 15 E 10

Bottom-up evaluation of the Datalog program Input: a + a * a Let’s compute which facts we know hold we’ll deduce facts gradually until no more can be deduced Step 1: base case (process input segments of length 1) e(0,1) = e(2,3) = e(4,5) = true Step 2: inductive case (input segments of length 3) e(0,3) = true // using rule #2 e(2,5) = true // using rule #3 Step 2 again: inductive case (segments of length 5) e(0,5) = true // using either rule #2 or #3 16

Visualize this parser in tabular form

Home exercise: find the bug in this CYK algo We assume that each rule is of the form A → BC, ie two symbols on rhs. for i=0,N-1 do add (i,i+1,nonterm(input[i])) to graph -- create nonterminal edges A → d enqueue( (i,i+1,nonterm(input[i])) ) -- nonterm() maps d to A ! while queue not empty do (j,k,B)=dequeue() for each edge (i,j,A) do -- for each edge “left-adjacent” to (j,k,B) if rule T → AB exists then if edge e=(i,k,T) does not exists then add e to graph; enqueue(e) for each edge (k,l,C) do -- for each edge “right-adjacent” to (j,k,B)... analogous... end while if edge (0,N,S) does not exist then “syntax error”

Constructing the parse tree Nodes in parse tree correspond to edges in CYK reduction –edge e=(0,N,S) corresponds to the root of parse tree r –edges that caused insertion of e are children of r Helps to label edges with entire productions –not just the LHS symbol of the production –make symbols unique with subscripts –such labels make the parse tree explicit 19

A graphical way to visualize this evaluation Parse tree: a1a *4*4 E6E6 a3a3 E 11 a5a5 E9E9 E7E7 E8E8 Input: a + a * a 20 E 10

Example of CYK execution int 1 id 2 id 4 TYPE 6 -->int 1,3,3 DECL 10 --> TYPE 6 VARLIST 9 ; 5 ;5;5 VARLIST 9 -->VARLIST 7, 3 id 4 VARLIST 7 -->id 2 VARLIST 8 -->id 4 DECL 10 TYPE 6 VARLIST 9 VARLIST 7 id 2,3,3 id 4 ;5;5 int 1 21 you should be able to reconstruct the grammar from this parse tree (find the productions in the parse tree)

Grammars, derivations, parse trees Example grammar DECL --> TYPE VARLIST ; TYPE --> int | float VARLIST --> id | VARLIST, id Example string int id, id ; Derivation of the string DECL --> TYPE VARLIST ; --> int VARLIST ; --> … --> --> int id, id ; 22 DECL 10 TYPE 6 VARLIST 9 VARLIST 7 id 2,3,3 id 4 ;5;5 int 1

CYK execution TYPE 6 -->int 1 DECL 10 --> TYPE 6 VARLIST 9 ; 5 VARLIST 9 -->VARLIST 7, 3 id 4 VARLIST 7 -->id 2 int 1 id 2 id 4,3,3 ;5;5 VARLIST 8 -->id 4 DECL 10 TYPE 6 VARLIST 9 VARLIST 7 id 2,3,3 id 4 ;5;5 int 1 23

Key invariant Edge (i,j,T) exists iff T -->* input[i:j] –T -->* input[i:j] means that the i:j slice of input can be derived from T in zero or more steps –T can be either terminal or non-terminal Corollary: –input is from L(G) iff the algorithm creates the edge (0,N,S) –N is input length

Constructing the parse tree from the CYK graph TYPE 6 -->int 1 DECL 10 --> TYPE 6 VARLIST 9 ; 5 VARLIST 9 -->VARLIST 7, 3 id 4 VARLIST 7 -->id 2 int 1 id 2 id 4,3,3 ;5;5 VARLIST 8 -->id 4 DECL 10 TYPE 6 VARLIST 9 VARLIST 7 id 2,3,3 id 4 ;5;5 int 1 25

CYK graph to parse tree Parse tree nodes obtained from CYK edges are grammar productions Parse tree edges obtained from reductions (ie which rhs produced the lhs) 26

CYK Parser Builds the parse bottom-up given grammar containing A → B C, when you find adjacent B C in the CYK graph, reduce B C to A 27

CYK: the algorithm CYK is easiest for grammars in Chomsky Normal Form CYK is asymptotically more efficient in this form O(N 3 ) time, O(N 2 ) space. Chomsky Normal Form: production forms allowed: A → BC or A → d or S → ε(only start non-terminal can derive  ) Each grammar can be rewritten to this form

CYK: dynamic programming Systematically fill in the graph with solutions to subproblems –what are these subproblems? When complete: –the graph contains all possible solutions to all of the subproblems needed to solve the whole problem Solves reparsing inefficiencies –because subtrees are not reparsed but looked up

Complexity, implementation tricks Time complexity: O(N 3 ), Space complexity: O(N 2 ) –convince yourself this is the case –hint: consider the grammar to be constant size? Implementation: –the graph implementation may be too slow –instead, store solutions to subproblems in a 2D array solutions[i,j] stores a list of labels of all edges from i to j

Earley Parser

Inefficiency in CYK CYK may build useless parse subtrees –useless = not part of the (final) parse tree –true even for non-ambiguous grammars Example grammar: E ::= E+id | id input: id+id+id Can you spot the inefficiency? This inefficiency is a difference between O(n 3 ) and O(n 2 ) It’s parsing 100 vs 1000 characters in the same time!

Example grammar: E → E+id | id three useless reductions are done (E 7, E 8 and E 10 ) id 1 ++ E 6 -->id 1 id 3 E 11 --> E 9 + id 5 id 5 E 9 -->E 6 + id 3 E 8 -->id 5 E 7 -->id 3 E 10 -->E 7 + E 8

Earley parser fixes (part of) the inefficiency space complexity: –Earley and CYK are O(N 2 ) time complexity: –unambiguous grammars: Earley is O(N 2 ), CYK is O(N 3 ) –plus the constant factor improvement due to the inefficiency why learn about Earley? –idea of Earley states is used by the faster parsers, like LALR –so you learn the key idea from those modern parsers –You will implement it in PA4 –In HW4 (required), you will optimize an inefficient version of Earley

Key idea Process the input left-to-right as opposed to arbitrarily, as in CYK Reduce only productions that appear non-useless consider only reductions with a chance to be in the parse tree Key idea decide whether to reduce based on the input seen so far after seeing more, we may still realize we built a useless tree The algorithm Propagate a “context” of the parsing process. Context tells us what nonterminals can appear in the parse at the given point of input. Those that cannot won’t be reduced.

Key idea: suppress useless reductions grammar: E → E+id | id id 1 ++id 3 id 5

The intuition 37 Use CYK edges (aka reductions), plus more edges. Idea: We ask “What CYK edges can possibly start in node 0?” 1)those reducing to the start non-terminal 2)those that may produce non-terminals needed by (1) 3)those that may produce non-terminals needed by (2), etc 37 id 1 ++ E-->id id 3 E --> T 0 + id id 5 grammar: E --> T + id | id T --> E T 0 --> E

Prediction Prediction (def): determining which productions apply at current point of input performed top-down through the grammar by examining all possible derivation sequences this will tell us which non-terminals we can use in the tree (starting at the current point of the string) we will do prediction not only at the beginning of parsing but at each parsing step

Example (1) Initial predicted edges: id 1 ++ E-->. id id 3 E -->. T + id id 5 grammar: E --> T + id | id T --> E T -->. E

Example (1.1) Let’s compress the visual representation: these three edges  single edge with three labels id 1 ++id 3 E -->. T + id E-->. id T -->. E id 5 grammar: E --> T + id | id T --> E

Example (2) We add a complete edge, which leads to another complete edge, and that in turn leads to a in- progress edge id 1 ++id 3 E -->. T + id E-->. id T -->. E id 5 grammar: E --> T + id | id T --> E E--> id. T --> E. E --> T. + id

Example (3) We advance the in-progress edge, the only edge we can add at this point. 42 id 1 ++id 3 E -->. T + id E-->. id T -->. E id 5 grammar: E --> T + id | id T --> E E--> id. T --> E. E --> T. + id E --> T +. id

Example (4) Again, we advance the in-progress edge. But now we created a complete edge. 43 id 1 ++id 3 E -->. T + id E-->. id T -->. E id 5 grammar: E --> T + id | id T --> E E--> id. T --> E. E --> T. + id E --> T +. id E --> T + id.

Example (5) The complete edge leads to reductions to another complete edge, exactly as in CYK. id 1 ++id 3 E -->. T + id E-->. id T -->. E id 5 grammar: E --> T + id | id T --> E E--> id. T --> E. E --> T. + id E --> T +. id E --> T + id. T --> E.

Example (6) We also advance the predicted edge, creating a new in-progress edge. 45 id 1 ++id 3 E -->. T + id E-->. id T -->. E id 5 grammar: E --> T + id | id T --> E E--> id. T --> E. E --> T. + id E --> T +. id E --> T + id. T --> E. E --> T. + id

Example (7) We also advance the predicted edge, creating a new in-progress edge. id 1 ++id 3 E -->. T + id E-->. id T -->. E id 5 E--> id. T --> E. E --> T. + id E --> T +. id E --> T + id. T --> E. E --> T. + id E --> T +. id

Example (8) Advance again, creating a complete edge, which leads to a another complete edges and an in-progress edge, as before. Done. id 1 ++id 3 E -->. T + id E-->. id T -->. E id 5 E--> id. T --> E. E --> T. + id E --> T +. id E --> T + id. T --> E. E --> T. + id E --> T +. id E --> T + id. T --> E. E --> T. + id

Example (a note) Compare with CYK: We avoided creating these six CYK edges. id 1 ++id 3 id 5 E --> id T --> E E --> id T --> E E --> T + id T --> E

Generalize CYK edges: Three kinds of edges Productions extended with a dot ‘.’. indicates position of input (how much of the rule we saw) Completed: A --> B C. We found an input substring that reduces to A These are the original CYK edges. Predicted: A -->. B C we are looking for a substring that reduces to A … (ie, if we allowed to reduce to A) … but we have seen nothing of B C yet In-progress: A --> B. C like (2) but have already seen substring that reduces to B

Earley Algorithm Three main functions that do all the work: For all terminals in the input, left to right: Scanner: moves the dot across a terminal found next on the input Repeat until no more edges can be added: Predict: adds predictions into the graph Complete: move the dot to the right across a non-terminal when that non-terminal is found

HW4 You’ll get a clean implementation of Earley in Python It will visualize the parse. But it will be very slow. Your goal will be to optimize its data structures And change the grammar a little. To make the parser run in linear time. 51