Parsing G22.2110 Programming Languages May 24, 2012 New York University Chanseok Oh

Slides:



Advertisements
Similar presentations
Chapter 3 Syntax Analysis
Advertisements

Chap. 5, Top-Down Parsing J. H. Wang Mar. 29, 2011.
YANGYANG 1 Chap 5 LL(1) Parsing LL(1) left-to-right scanning leftmost derivation 1-token lookahead parser generator: Parsing becomes the easiest! Modifying.
Mooly Sagiv and Roman Manevich School of Computer Science
Predictive Parsing l Find derivation for an input string, l Build a abstract syntax tree (AST) –a representation of the parsed program l Build a symbol.
6/12/2015Prof. Hilfinger CS164 Lecture 111 Bottom-Up Parsing Lecture (From slides by G. Necula & R. Bodik)
1 Chapter 5: Bottom-Up Parsing (Shift-Reduce). 2 - attempts to construct a parse tree for an input string beginning at the leaves (the bottom) and working.
1 Bottom Up Parsing. 2 Bottom-Up Parsing l Bottom-up parsing is more general than top-down parsing »And just as efficient »Builds on ideas in top-down.
1 Contents Introduction A Simple Compiler Scanning – Theory and Practice Grammars and Parsing LL(1) Parsing LR Parsing Lex and yacc Semantic Processing.
By Neng-Fa Zhou Syntax Analysis lexical analyzer syntax analyzer semantic analyzer source program tokens parse tree parser tree.
Bottom-Up Syntax Analysis Mooly Sagiv Textbook:Modern Compiler Design Chapter (modified)
ISBN Chapter 4 Lexical and Syntax Analysis.
Bottom-Up Syntax Analysis Mooly Sagiv html:// Textbook:Modern Compiler Design Chapter
CS Summer 2005 Top-down and Bottom-up Parsing - a whirlwind tour June 20, 2005 Slide acknowledgment: Radu Rugina, CS 412.
1 CMPSC 160 Translation of Programming Languages Fall 2002 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #10 Parsing.
ISBN Chapter 4 Lexical and Syntax Analysis The Parsing Problem Recursive-Descent Parsing.
Bottom-Up Syntax Analysis Mooly Sagiv & Greta Yorsh Textbook:Modern Compiler Design Chapter (modified)
Bottom Up Parsing.
Bottom-Up Syntax Analysis Mooly Sagiv & Greta Yorsh Textbook:Modern Compiler Design Chapter (modified)
Bottom-Up Syntax Analysis Mooly Sagiv html:// Textbook:Modern Compiler Implementation in C Chapter 3.
COS 320 Compilers David Walker. last time context free grammars (Appel 3.1) –terminals, non-terminals, rules –derivations & parse trees –ambiguous grammars.
Table-driven parsing Parsing performed by a finite state machine. Parsing algorithm is language-independent. FSM driven by table (s) generated automatically.
1 Bottom-up parsing Goal of parser : build a derivation –top-down parser : build a derivation by working from the start symbol towards the input. builds.
Chapter 4 Lexical and Syntax Analysis. Chapter 4 Topics Introduction Lexical Analysis The Parsing Problem Recursive-Descent Parsing Bottom-Up Parsing.
Lexical and syntax analysis
1 214 review. 2 What we have learnt Generate scanner and parser –We do not program directly –Instead we write the specifications for the scanner and parser.
Copyright © 2009 Elsevier Chapter 2 :: Programming Language Syntax Programming Language Pragmatics Michael L. Scott.
Review: –How do we define a grammar (what are the components in a grammar)? –What is a context free grammar? –What is the language defined by a grammar?
4 4 (c) parsing. Parsing A grammar describes the strings of tokens that are syntactically legal in a PL A recogniser simply accepts or rejects strings.
1 Chapter 5 LL (1) Grammars and Parsers. 2 Naming of parsing techniques The way to parse token sequence L: Leftmost R: Righmost Top-down  LL Bottom-up.
Chapter 5 Top-Down Parsing.
BİL 744 Derleyici Gerçekleştirimi (Compiler Design)1 Syntax Analyzer Syntax Analyzer creates the syntactic structure of the given source program. This.
Syntax. Syntax defines what is grammatically valid in a programming language – Set of grammatical rules – E.g. in English, a sentence cannot begin with.
CS 330 Programming Languages 09 / 26 / 2006 Instructor: Michael Eckmann.
10/13/2015IT 3271 Tow kinds of predictive parsers: Bottom-Up: The syntax tree is built up from the leaves Example: LR(1) parser Top-Down The syntax tree.
Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
1 Compiler Construction Syntax Analysis Top-down parsing.
CS 153 A little bit about LR Parsing. Background We’ve seen three ways to write parsers:  By hand, typically recursive descent  Using parsing combinators.
Syntactic Analysis Natawut Nupairoj, Ph.D. Department of Computer Engineering Chulalongkorn University.
1 Lecture 5: Syntax Analysis (Section 2.2) CSCI 431 Programming Languages Fall 2002 A modification of slides developed by Felix Hernandez-Campos at UNC.
COP4020 Programming Languages Syntax Prof. Robert van Engelen (modified by Prof. Em. Chris Lacher)
Introduction to Parsing
Chapter 5: Bottom-Up Parsing (Shift-Reduce)
ISBN Chapter 4 Lexical and Syntax Analysis.
Prof. Necula CS 164 Lecture 8-91 Bottom-Up Parsing LR Parsing. Parser Generators. Lecture 6.
CS 326 Programming Languages, Concepts and Implementation Instructor: Mircea Nicolescu Lecture 4.
Top-Down Parsing CS 671 January 29, CS 671 – Spring Where Are We? Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract.
Parsing Chapter 15. The Job of a Parser Examine a string and decide whether or not it is a syntactically well-formed member of L(G), and If it is, assign.
CS 330 Programming Languages 09 / 25 / 2007 Instructor: Michael Eckmann.
UMBC  CSEE   1 Chapter 4 Chapter 4 (b) parsing.
Bottom Up Parsing CS 671 January 31, CS 671 – Spring Where Are We? Finished Top-Down Parsing Starting Bottom-Up Parsing Lexical Analysis.
Compilers: Bottom-up/6 1 Compiler Structures Objective – –describe bottom-up (LR) parsing using shift- reduce and parse tables – –explain how LR.
Bottom-up parsing. Bottom-up parsing builds a parse tree from the leaves (terminals) to the start symbol int E T * TE+ T (4) (2) (3) (5) (1) int*+ E 
Spring 16 CSCI 4430, A Milanova 1 Announcements HW1 due on Monday February 8 th Name and date your submission Submit electronically in Homework Server.
COMP 3438 – Part II - Lecture 4 Syntax Analysis I Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
CMSC 330: Organization of Programming Languages Pushdown Automata Parsing.
CS 326 Programming Languages, Concepts and Implementation
50/50 rule You need to get 50% from tests, AND
Lexical and Syntax Analysis
Chapter 2 :: Programming Language Syntax
Chapter 2 :: Programming Language Syntax
Bottom-Up Syntax Analysis
Syntax.
CS 3304 Comparative Languages
Top-Down Parsing CS 671 January 29, 2008.
COP4020 Programming Languages
LALR Parsing Adapted from Notes by Profs Aiken and Necula (UCB) and
LL and Recursive-Descent Parsing
Syntax Analysis - 3 Chapter 4.
Kanat Bolazar February 16, 2010
Presentation transcript:

Parsing G Programming Languages May 24, 2012 New York University Chanseok Oh

Chapter 2 Scanning Parsing

Overview – Scanner, Tokenizer, Lexer, Lexical Analyzer IF ( A >=.30 ) THEN { … IF, LPARAN, IDENT(A), GTE, FPN(.30), RPARAN, THEN, … Tokens, Lexemes DFA, NFA, Regular expressions lex, flex, Jlex – Parser DPDA, Deterministic context-free grammars Yacc, Bison

Table of Contents – Practical parsers( Linear time) LL(top-down, predictive) LR(bottom-up, shift-reduce) – Related side-topics Ambiguity, Language and parser hierarchy – Examples: Simple Calculator Language

A Language – A set of strings (of given symbols) { finite, set, with, five, strings } { ab, aaba, abbaba, … } { 0 n 1 n } { a i b j | i < j } { void main() { int i = 0 }, … } – Is an input string in the language? cf. Recursive, Turing-decidable languages

Context-Free Languages (CFL) – Languages that can be generated by CFG’s – Languages that can be determined by PDA’s – Not all languages are CF. – CFG: suitable for most PL’s. := PERIOD – Deterministic CFL

Example Here is our CFG: Input: sum, a1, ptr ; S:= id A A:=, id A A := ;

Parse Tree S A A A sum a1 ptr,, ; S:= id A A:=, id A A := ;

Ambiguous Grammars – Is it ambiguous? Undecidable. – No general procedure for converting to unambiguous grammars – Can be allowed to some extent for deterministic parsing, e.g., by defining precedence or associativity. E E + E E – E E * E E / E

Parsers – LL (Left-to-right, Left-most derivation) Top-down Predictive Simple and easy to understand – LR (Left-to-right, Right-most derivation) Bottom-up Shift-reduce Most common in production-level SLR (Simple) LALR (Look-ahead)

LL(k) Parser – LL(k) Parser Uses k look-ahead symbols Does not backtrack (deterministic). – LL(1) is the most popular kind of LL parser. – LL(k) Languages Not all CFL’s are LL(k) languages. CFL LL(k)

LL Parsing Example It is an LL grammar. The language is also LL. Input to parse: sum, a1, ptr ; := id :=, id := ; CFL LL

Parse Tree suma1ptr,,; := id :=, id := ;

LR Parser – LR(k) parser Uses k look-ahead symbols. Usually k is 1, and the term LR Parser is often intended to refer to this case. – LR(k) Languages Not all CFL’s are LR(k) languages. CFL LR

Language Relationships Unambiguous languagesAmbiguous languages LR(0)SLR LALRLR(1) LL(0) LL(1)

LR Parsing Example With the same grammar, It is also an LR grammar, and the language is LR. Input to parse (as before): sum, a1, ptr ; id_list id id_list_tail id_list_tail, id id_list_tail id_list_tail ; CFL LR(1) LL

Parse Tree suma1ptr,,; := id :=, id := ;

Another LR Parsing Example Consider a modified grammar, The grammar is not LL, (though the language itself is both LR and LL). := ; :=, id := id

LR Parsing ;, a1 ptr, sum := ; :=, id := id

Simple Calculator Language 3 + ( 4 * 1 ) total := 7 read n write ( 10 – ( total + 1 ) / 3 * n )

Simple Arithmetic Expression E E + E | E – E E * E | E / E E id | number | ( E )

Simple Arithmetic Expression – LL language, but not LL grammar (yet LR one) – Two most common obstacles to “LL(1)-ness” Left-recursion Common prefixes expr term | expr add_op term term factor | term mult_op factor factor id | number | ( expr ) add_op + | - mult_op * | / stmt stmt stmt_list id := expr id ( arg_list )

stmt id := expr id ( arg_list ) Converting to LL-Grammars – Alternatively, you can employ conflict-resolution rules. stmt_list stmt stmt_list | є stmt id | stmt_list_tail stmt_list_tail := expr | ( arg_list ) stmt stmt stmt_list

Converted LL(1) Grammar expr term term_tail term_tail add_op term term_tail | є term factor | factor_tail factor_tail mult_op factor factor_tail | є factor ( expr ) | id | number add_op + | - mult_op * | / CFL LL Not every CFG can be converted to LL grammar. Why?

LL(1) for Simple Calculator Language program stmt_list $$ stmt_list stmt stmt_list | є stmt id := expr | read id | write expr expr term term_tail term_tail add_op term term_tail | є term factor factor_tail factor_tail mult_op factor factor_tail | є factor ( expr ) | id | number add_op + | - mult_op * | / Added three more production rules to the previous LL(1) grammar for expressions.

LL Parsing – Input program read A read B sum := A + B write sum write sum / 2

Predict Sets program stmt_list $$ {id, read, write, $$} stmt_list stmt stmt_list {id, read, write} | є {$$} stmt id := expr {id} read id {read} | write expr {write} expr term term_tail {(, id, number} term_tail add_op term term_tail {+,-} є {), id, read, write, $$} term factor factor_tail {(, id, number} factor_tail mult_op factor factor_tail {*, /} є {+, -, ), id, read, write, $$} factor ( expr ) {(} | id {id} | number {number} add_op + {+} | - {-} mult_op * {*} | / {/}

Predict Sets – Notice the pair-wise disjoint sets: {id}, {read},{write} – You are to expand stmt. – Look ahead 1 token (LL(1)). stmt id := expr {id} read id {read} write expr {write}

LL(1) program stmt_list $$ stmt_list stmt stmt_list | є stmt id := expr | read id | write expr expr term term_tail term_tail add_op term term_tail | є term factor factor_tail factor_tail mult_op factor factor_tail | є factor ( expr ) | id | number add_op + | - mult_op * | /

Better grammar: LR(1) – M ore intuitive than LL However, not exactly the same language (no empty string) – Left-recursive is advantageous. program stmt_list $$ stmt_list stmt_list stmt | stmt stmt id := expr | read id | write expr expr term | expr add_op term term factor | term mult_op factor factor id | number | ( expr ) add_op + | - mult_op * | /

LR Parsing – With the same input program, read A read B sum := A + B write sum write sum / 2

State Transition Diagram program ● stmt_list $$ stmt_list ● stmt_list stmt ● stmt stmt ● id := expr ● read id ● write expr State 0(Initial state) stmt read ● id State 1 stmt read id ● State 1’ read id Reduce (shifting stmt from a viewpoint of State 0) stmt_list stmt ● stmt Reduce (shifting stmt_list) State 0’ program stmt_list ● $$ stmt_list stmt_list ● stmt stmt ● id := expr ● read id ● write expr State 2 stmt_list

Shift/Reduce Conflicts Reduce/Reduce Conflicts expr ● term factor id ● … expr id ● factor id ●

Resolving Conflicts LR(0) – Any LR language has an LR(0) grammar (with $$). – Not practical: prohibitively large and unintuitive SLR – SLR grammar: no shift/reduce or reduce/reduce conflicts when using FOLLOW sets – FOLLOW sets: also used in LL to generate PREDICT sets LALR(1) – LALR(1) grammar (may not be SLR) – Same states as SLR – Improvement over SLR with local look-ahead – LALR’s are the most common parsers in practice. LR(1) – LR(1) grammars (may not be LALR(1) or SLR)