Parsing G22.2110 Programming Languages May 24, 2012 New York University Chanseok Oh

Parsing G22.2110 Programming Languages May 24, 2012 New York University Chanseok Oh (chanseok@cs.nyu.edu)

Chapter 2 Scanning Parsing

Overview – Scanner, Tokenizer, Lexer, Lexical Analyzer IF ( A >=.30 ) THEN { … IF, LPARAN, IDENT(A), GTE, FPN(.30), RPARAN, THEN, … Tokens, Lexemes DFA, NFA, Regular expressions lex, flex, Jlex – Parser DPDA, Deterministic context-free grammars Yacc, Bison

Table of Contents – Practical parsers( Linear time) LL(top-down, predictive) LR(bottom-up, shift-reduce) – Related side-topics Ambiguity, Language and parser hierarchy – Examples: Simple Calculator Language

A Language – A set of strings (of given symbols) { finite, set, with, five, strings } { ab, aaba, abbaba, … } { 0 n 1 n } { a i b j | i < j } { void main() { int i = 0 }, … } – Is an input string in the language? cf. Recursive, Turing-decidable languages

Context-Free Languages (CFL) – Languages that can be generated by CFG’s – Languages that can be determined by PDA’s – Not all languages are CF. – CFG: suitable for most PL’s. := PERIOD – Deterministic CFL

Example Here is our CFG: Input: sum, a1, ptr ; S:= id A A:=, id A A := ;

Parse Tree S A A A sum a1 ptr,, ; S:= id A A:=, id A A := ;

Ambiguous Grammars – Is it ambiguous? Undecidable. – No general procedure for converting to unambiguous grammars – Can be allowed to some extent for deterministic parsing, e.g., by defining precedence or associativity. E E + E E – E E * E E / E

Parsers – LL (Left-to-right, Left-most derivation) Top-down Predictive Simple and easy to understand – LR (Left-to-right, Right-most derivation) Bottom-up Shift-reduce Most common in production-level SLR (Simple) LALR (Look-ahead)

LL(k) Parser – LL(k) Parser Uses k look-ahead symbols Does not backtrack (deterministic). – LL(1) is the most popular kind of LL parser. – LL(k) Languages Not all CFL’s are LL(k) languages. CFL LL(k)

LL Parsing Example It is an LL grammar. The language is also LL. Input to parse: sum, a1, ptr ; := id :=, id := ; CFL LL

Parse Tree suma1ptr,,; := id :=, id := ;

LR Parser – LR(k) parser Uses k look-ahead symbols. Usually k is 1, and the term LR Parser is often intended to refer to this case. – LR(k) Languages Not all CFL’s are LR(k) languages. CFL LR

Language Relationships Unambiguous languagesAmbiguous languages LR(0)SLR LALRLR(1) LL(0) LL(1)

LR Parsing Example With the same grammar, It is also an LR grammar, and the language is LR. Input to parse (as before): sum, a1, ptr ; id_list id id_list_tail id_list_tail, id id_list_tail id_list_tail ; CFL LR(1) LL

Parse Tree suma1ptr,,; := id :=, id := ;

Another LR Parsing Example Consider a modified grammar, The grammar is not LL, (though the language itself is both LR and LL). := ; :=, id := id

LR Parsing ;, a1 ptr, sum := ; :=, id := id

Simple Calculator Language 3 + ( 4 * 1 ) total := 7 read n write ( 10 – ( total + 1 ) / 3 * n )

Simple Arithmetic Expression E E + E | E – E E * E | E / E E id | number | ( E )

Simple Arithmetic Expression – LL language, but not LL grammar (yet LR one) – Two most common obstacles to “LL(1)-ness” Left-recursion Common prefixes expr term | expr add_op term term factor | term mult_op factor factor id | number | ( expr ) add_op + | - mult_op * | / stmt stmt stmt_list id := expr id ( arg_list )

stmt id := expr id ( arg_list ) Converting to LL-Grammars – Alternatively, you can employ conflict-resolution rules. stmt_list stmt stmt_list | є stmt id | stmt_list_tail stmt_list_tail := expr | ( arg_list ) stmt stmt stmt_list

LL Parsing – Input program read A read B sum := A + B write sum write sum / 2

Predict Sets program stmt_list $$ {id, read, write, $$} stmt_list stmt stmt_list {id, read, write} | є {$$} stmt id := expr {id} read id {read} | write expr {write} expr term term_tail {(, id, number} term_tail add_op term term_tail {+,-} є {), id, read, write, $$} term factor factor_tail {(, id, number} factor_tail mult_op factor factor_tail {*, /} є {+, -, ), id, read, write, $$} factor ( expr ) {(} | id {id} | number {number} add_op + {+} | - {-} mult_op * {*} | / {/}

Predict Sets – Notice the pair-wise disjoint sets: {id}, {read},{write} – You are to expand stmt. – Look ahead 1 token (LL(1)). stmt id := expr {id} read id {read} write expr {write}

LR Parsing – With the same input program, read A read B sum := A + B write sum write sum / 2

State Transition Diagram program ● stmt_list $$ stmt_list ● stmt_list stmt ● stmt stmt ● id := expr ● read id ● write expr State 0(Initial state) stmt read ● id State 1 stmt read id ● State 1’ read id Reduce (shifting stmt from a viewpoint of State 0) stmt_list stmt ● stmt Reduce (shifting stmt_list) State 0’ program stmt_list ● $$ stmt_list stmt_list ● stmt stmt ● id := expr ● read id ● write expr State 2 stmt_list

Shift/Reduce Conflicts Reduce/Reduce Conflicts expr ● term factor id ● … expr id ● factor id ●

Resolving Conflicts LR(0) – Any LR language has an LR(0) grammar (with $$). – Not practical: prohibitively large and unintuitive SLR – SLR grammar: no shift/reduce or reduce/reduce conflicts when using FOLLOW sets – FOLLOW sets: also used in LL to generate PREDICT sets LALR(1) – LALR(1) grammar (may not be SLR) – Same states as SLR – Improvement over SLR with local look-ahead – LALR’s are the most common parsers in practice. LR(1) – LR(1) grammars (may not be LALR(1) or SLR)

Parsing G22.2110 Programming Languages May 24, 2012 New York University Chanseok Oh

Similar presentations

Presentation on theme: "Parsing G22.2110 Programming Languages May 24, 2012 New York University Chanseok Oh"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Parsing G22.2110 Programming Languages May 24, 2012 New York University Chanseok Oh

Similar presentations

Presentation on theme: "Parsing G22.2110 Programming Languages May 24, 2012 New York University Chanseok Oh"— Presentation transcript:

Similar presentations

About project

Feedback