Prof. Bodik CS 164 Lecture 51 Building a Parser I CS164 3:30-5:00 TT 10 Evans.

Slides:



Advertisements
Similar presentations
1 Pass Compiler 1. 1.Introduction 1.1 Types of compilers 2.Stages of 1 Pass Compiler 2.1 Lexical analysis 2.2. syntactical analyzer 2.3. Code generation.
Advertisements

176 Formal Languages and Applications: We know that Pascal programming language is defined in terms of a CFG. All the other programming languages are context-free.
9/27/2006Prof. Hilfinger, Lecture 141 Syntax-Directed Translation Lecture 14 (adapted from slides by R. Bodik)
1 Semantic Processing. 2 Contents Introduction Introduction A Simple Compiler A Simple Compiler Scanning – Theory and Practice Scanning – Theory and Practice.
CS 330 Programming Languages 09 / 13 / 2007 Instructor: Michael Eckmann.
Context-Free Grammars Lecture 7
Syntax Analysis Mooly Sagiv html:// Textbook:Modern Compiler Implementation in C Chapter 3.
Prof. Bodik CS 164 Lecture 61 Building a Parser II CS164 3:30-5:00 TT 10 Evans.
Syntax Analysis Mooly Sagiv html:// Textbook:Modern Compiler Design Chapter 2.2 (Partial) Hashlama 11:00-14:00.
Chapter 3 Program translation1 Chapt. 3 Language Translation Syntax and Semantics Translation phases Formal translation models.
1 Foundations of Software Design Lecture 23: Finite Automata and Context-Free Grammars Marti Hearst Fall 2002.
Compiler Summary Mooly Sagiv html://
Chapter 2 A Simple Compiler
Prof. Fateman CS 164 Lecture 71 More Finite Automata/ Lexical Analysis /Introduction to Parsing Lecture 7.
1 Scanning Aaron Bloomfield CS 415 Fall Parsing & Scanning In real compilers the recognizer is split into two phases –Scanner: translate input.
1 Lecture 10 Syntax-Directed Translation grammar disambiguation, Earley parser, syntax-directed translation Ras Bodik Shaon Barman Thibaud Hottelier Hack.
Abstract Syntax Trees Lecture 14 Wed, Mar 3, 2004.
2.2 A Simple Syntax-Directed Translator Syntax-Directed Translation 2.4 Parsing 2.5 A Translator for Simple Expressions 2.6 Lexical Analysis.
1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.
Syntax Directed Definitions Synthesized Attributes
CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars.
Compiler1 Chapter V: Compiler Overview: r To study the design and operation of compiler for high-level programming languages. r Contents m Basic compiler.
Semantic Analysis (Generating An AST) CS 471 September 26, 2007.
Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.
CS 461 – Oct. 7 Applications of CFLs: Compiling Scanning vs. parsing Expression grammars –Associativity –Precedence Programming language (handout)
PART I: overview material
Lab 3: Using ML-Yacc Zhong Zhuang
Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 8: Semantic Analysis and Symbol Tables.
Interpretation Environments and Evaluation. CS 354 Spring Translation Stages Lexical analysis (scanning) Parsing –Recognizing –Building parse tree.
Review 1.Lexical Analysis 2.Syntax Analysis 3.Semantic Analysis 4.Code Generation 5.Code Optimization.
Bernd Fischer RW713: Compiler and Software Language Engineering.
Introduction to Parsing
Introduction Lecture 1 Wed, Jan 12, The Stages of Compilation Lexical analysis. Syntactic analysis. Semantic analysis. Intermediate code generation.
D Goforth COSC Translating High Level Languages Note error in assignment 1: #4 - refer to Example grammar 3.4, p. 126.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 3: Introduction to Syntactic Analysis.
Context-free grammars. Roadmap Last time – Regex == DFA – JLex for generating Lexers This time – CFGs, the underlying abstraction for Parsers.
Overview of Previous Lesson(s) Over View  In our compiler model, the parser obtains a string of tokens from the lexical analyzer & verifies that the.
What am I? while b != 0 if a > b a := a − b else b := b − a return a AST == Abstract Syntax Tree.
Syntax Analysis – Part I EECS 483 – Lecture 4 University of Michigan Monday, September 17, 2006.
C H A P T E R T W O Linking Syntax And Semantics Programming Languages – Principles and Paradigms by Allen Tucker, Robert Noonan.
LECTURE 4 Syntax. SPECIFYING SYNTAX Programming languages must be very well defined – there’s no room for ambiguity. Language designers must use formal.
CPSC 388 – Compiler Design and Construction Parsers – Syntax Directed Translation.
1 Introduction to Parsing. 2 Outline l Regular languages revisited l Parser overview Context-free grammars (CFG ’ s) l Derivations.
CS 330 Programming Languages 09 / 25 / 2007 Instructor: Michael Eckmann.
1 February 23, February 23, 2016February 23, 2016February 23, 2016 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University.
CSC 4181 Compiler Construction
Syntax-Directed Definitions CS375 Compilers. UT-CS. 1.
CMSC 330: Organization of Programming Languages Pushdown Automata Parsing.
Syntax Analysis Or Parsing. A.K.A. Syntax Analysis –Recognize sentences in a language. –Discover the structure of a document/program. –Construct (implicitly.
Last Chapter Review Source code characters combination lexemes tokens pattern Non-Formalization Description Formalization Description Regular Expression.
Announcements/Reading
Basic Program Analysis
A Simple Syntax-Directed Translator
Introduction to Parsing
Introduction to Parsing (adapted from CS 164 at Berkeley)
Structure of programming languages
Textbook:Modern Compiler Design
PROGRAMMING LANGUAGES
4 (c) parsing.
CSE 3302 Programming Languages
Basic Program Analysis: AST
Syntax-Directed Translation
Compilers - 2 CSCI/CMPE 3334 David Egle.
ENERGY 211 / CME 211 Lecture 15 October 22, 2008.
CSE401 Introduction to Compiler Construction
Lecture 7: Introduction to Parsing (Syntax Analysis)
CS 3304 Comparative Languages
Syntax-Directed Translation
CS 3304 Comparative Languages
Presentation transcript:

Prof. Bodik CS 164 Lecture 51 Building a Parser I CS164 3:30-5:00 TT 10 Evans

Prof. Bodik CS 164 Lecture 5 2 PA2 in PA2, you’ll work in pairs, no exceptions –except the exception if odd # of students hate team projects? form a “coalition team” –team members work alone, but discuss design, clarify the handout, keep a common eye on the newsgroup, etc share some or all code, at the very least their test cases! –a win-win proposition: work mainly alone but hedge your grade each member submits his/her project, graded separately score: the lower-scoring team member gets a bonus equal to half the difference between his and his partner’s score

Prof. Bodik CS 164 Lecture 5 3 Administrativia Section room change –3113 Etcheverry moving next door, to 3111 Etch. –starting 9/22.

Prof. Bodik CS 164 Lecture 5 4 Overview What does a parser do, again? –its two tasks –parse tree vs. AST A hand-written parser –and why it gets hard to get it right

What does a parser do?

Prof. Bodik CS 164 Lecture 5 6 Recall: The Structure of a Compiler scanner parser checker code gen Decaf program (stream of characters) stream of tokens Abstract Syntax Tree (AST) AST with annotations (types, declarations) maybe x86

Prof. Bodik CS 164 Lecture 5 7 Recall: Syntactic Analysis Input: sequence of tokens from scanner Output: abstract syntax tree Actually, –parser first builds a parse tree –AST is then built by translating the parse tree –parse tree rarely built explicitly; only determined by, say, how parser pushes stuff to stack –our lectures first focus on constructing the parse tree; later we’ll show the translation to AST.

Prof. Bodik CS 164 Lecture 5 8 Example Decaf 4*(2+3) Parser input NUM(4) TIMES LPAR NUM(2) PLUS NUM(3) RPAR Parser output (AST): * NUM(4) + NUM(2)NUM(3)

Prof. Bodik CS 164 Lecture 5 9 Parse tree for the example leaves are tokens NUM(4) TIMES LPAR NUM(2) PLUS NUM(3) RPAR EXPR

Prof. Bodik CS 164 Lecture 5 10 Another example Decaf if (x == y) { a=1; } Parser input IF LPAR ID EQ ID RPAR LBR ID AS INT SEMI RBR Parser output (AST): IF-THEN == ID = INT

Prof. Bodik CS 164 Lecture 5 11 Parse tree for the example IF LPAR ID == ID RPAR LBR ID = INT SEMI RBR EXPR EXPR STMT BLOCK STMT leaves are tokens

Prof. Bodik CS 164 Lecture 5 12 Parse tree vs. abstract syntax tree Parse tree –contains all tokens, including those that parser needs “only” to discover intended nesting: parentheses, curly braces statement termination: semicolons –technically, parse tree shows concrete syntax Abstract syntax tree (AST) –abstracts away artifacts of parsing, by flattening tree hierarchies, dropping tokens, etc. –technically, AST shows abstract syntax

Prof. Bodik CS 164 Lecture 5 13 Comparison with Lexical Analysis PhaseInputOutput LexerSequence of characters Sequence of tokens ParserSequence of tokens AST, built from parse tree

Prof. Bodik CS 164 Lecture 5 14 Summary Parser performs two tasks: –syntax checking a program with a syntax error may produce an AST that’s different than intended by the programmer –parse tree construction usually implicit used to build the AST

How to build a parser for Decaf?

Prof. Bodik CS 164 Lecture 5 16 Writing the parser Can do it all by hand, of course –ok for small languages, but hard for Decaf Just like with the scanner, we’ll write ourselves a parser generator –we’ll concisely describe Decaf’s syntactic structure that is, how expressions, statements, definitions look like –and the generator produces a working parser Let’s start with a hand-written parser –to see why we want a parser generator

Prof. Bodik CS 164 Lecture 5 17 First example: balanced parens Our problem: check the syntax –are parentheses in input string balanced? The simple language –parenthesized number literals –Ex.: 3, (4), ((1)), (((2))), etc Before we look at the parser –why aren’t finite automata sufficient for this task?

Prof. Bodik CS 164 Lecture 5 18 Why can’t DFA/NFA’s find syntax errors? When checking balanced parentheses, FA’s can either –accept all correct (i.e., balanced) programs but also some incorrect ones, or –reject all incorrect programs but also reject some correct ones. Problem: finite state –can’t count parens seen so far ( ) ) ( ) () ) ( ) )

Prof. Bodik CS 164 Lecture 5 19 Parser code preliminaries Let TOKEN be an enumeration type of tokens: –INT, OPEN, CLOSE, PLUS, TIMES, NUM, LPAR, RPAR Let the global in[] be the input string of tokens Let the global next be an index in the token string

Prof. Bodik CS 164 Lecture 5 20 Parsers use stack to implement infinite state Balanced parentheses parser: void Parse() { nextToken = in[next++]; if (nextToken == NUM) return; if (nextToken != LPAR) print(“syntax error”); Parse(); if (in[next++] != RPAR) print(“syntax error”); }

Prof. Bodik CS 164 Lecture 5 21 Where’s the parse tree constructed? In this parser, the parse is given by the call tree: For the input string (((1))) : Parse() 1 ( ( ( ) ) )

Prof. Bodik CS 164 Lecture 5 22 Second example: subtraction expressions The language of this example: 1, 1-2, 1-2-3, (1-2)-3, (2-(3-4)), etc void Parse() { if (in[++next] == NUM) { if (in[++next] == MINUS) { Parse(); } } else if (in[next] == LPAR) { Parse(); if (in[++next] != RPAR) print(“syntax error”); } else print(“syntax error”); }

Prof. Bodik CS 164 Lecture 5 23 Subtraction expressions continued Observations: –a more complex language hence, harder to see how the parser works (and if it works correctly at all) –the parse tree is actually not really what we want consider input what’s undesirable about this parse tree’s structure? -3 Parse() - 1 2

Prof. Bodik CS 164 Lecture 5 24 We need a clean syntactic description Just like with the scanner, writing the parser by hand is painful and error-prone –consider adding +, *, / to the last example! So, let’s separate the what and the how –what: the syntactic structure, described with a context-free grammar –how: the parser, which reads the grammar, the input and produces the parse tree