Introduction to Parsing

Slides:



Advertisements
Similar presentations
9/27/2006Prof. Hilfinger, Lecture 141 Syntax-Directed Translation Lecture 14 (adapted from slides by R. Bodik)
Advertisements

6/12/2015Prof. Hilfinger CS164 Lecture 111 Bottom-Up Parsing Lecture (From slides by G. Necula & R. Bodik)
Context-Free Grammars Lecture 7
Prof. Bodik CS 164 Lecture 81 Grammars and ambiguity CS164 3:30-5:00 TT 10 Evans.
1 Foundations of Software Design Lecture 23: Finite Automata and Context-Free Grammars Marti Hearst Fall 2002.
ParsingParsing. 2 Front-End: Parser  Checks the stream of words and their parts of speech for grammatical correctness scannerparser source code tokens.
Lecture 14 Syntax-Directed Translation Harry Potter has arrived in China, riding the biggest initial print run for a work of fiction here since the Communist.
2.2 A Simple Syntax-Directed Translator Syntax-Directed Translation 2.4 Parsing 2.5 A Translator for Simple Expressions 2.6 Lexical Analysis.
1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.
CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars.
CS Describing Syntax CS 3360 Spring 2012 Sec Adapted from Addison Wesley’s lecture notes (Copyright © 2004 Pearson Addison Wesley)
PART I: overview material
Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Bernd Fischer RW713: Compiler and Software Language Engineering.
Introduction to Parsing
CPS 506 Comparative Programming Languages Syntax Specification.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 3: Introduction to Syntactic Analysis.
Compiler Design Introduction 1. 2 Course Outline Introduction to Compiling Lexical Analysis Syntax Analysis –Context Free Grammars –Top-Down Parsing –Bottom-Up.
Overview of Previous Lesson(s) Over View  In our compiler model, the parser obtains a string of tokens from the lexical analyzer & verifies that the.
Syntax Analysis - Parsing Compiler Design Lecture (01/28/98) Computer Science Rensselaer Polytechnic.
Unit-3 Parsing Theory (Syntax Analyzer) PREPARED BY: PROF. HARISH I RATHOD COMPUTER ENGINEERING DEPARTMENT GUJARAT POWER ENGINEERING & RESEARCH INSTITUTE.
1 A Simple Syntax-Directed Translator CS308 Compiler Theory.
Chapter 3 Context-Free Grammars Dr. Frank Lee. 3.1 CFG Definition The next phase of compilation after lexical analysis is syntax analysis. This phase.
Syntax Analyzer (Parser)
Overview of Previous Lesson(s) Over View 3 Model of a Compiler Front End.
LECTURE 4 Syntax. SPECIFYING SYNTAX Programming languages must be very well defined – there’s no room for ambiguity. Language designers must use formal.
CPSC 388 – Compiler Design and Construction Parsers – Syntax Directed Translation.
1 Introduction to Parsing. 2 Outline l Regular languages revisited l Parser overview Context-free grammars (CFG ’ s) l Derivations.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
Chapter 3 – Describing Syntax CSCE 343. Syntax vs. Semantics Syntax: The form or structure of the expressions, statements, and program units. Semantics:
Chapter 3 – Describing Syntax
Describing Syntax and Semantics
Parsing #1 Leonidas Fegaras.
lec02-parserCFG May 8, 2018 Syntax Analyzer
A Simple Syntax-Directed Translator
Parsing & Context-Free Grammars
CS 326 Programming Languages, Concepts and Implementation
Programming Languages Translator
CS510 Compiler Lecture 4.
Lexical and Syntax Analysis
Introduction to Parsing (adapted from CS 164 at Berkeley)
Parsing IV Bottom-up Parsing
Chapter 3 – Describing Syntax
PROGRAMMING LANGUAGES
PARSE TREES.
CS 363 Comparative Programming Languages
4 (c) parsing.
Basic Program Analysis: AST
Parsing & Context-Free Grammars Hal Perkins Autumn 2011
Compiler Design 4. Language Grammars
Lexical and Syntax Analysis
(Slides copied liberally from Ruth Anderson, Hal Perkins and others)
Syntax-Directed Translation
Top-Down Parsing CS 671 January 29, 2008.
Chapter 2: A Simple One Pass Compiler
Lecture 7: Introduction to Parsing (Syntax Analysis)
CSC 4181Compiler Construction Context-Free Grammars
CS 3304 Comparative Languages
Lecture 4: Lexical Analysis & Chomsky Hierarchy
Syntax-Directed Translation
CS 3304 Comparative Languages
LR Parsing. Parser Generators.
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
CSC 4181 Compiler Construction Context-Free Grammars
Parsing & Context-Free Grammars Hal Perkins Summer 2004
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
lec02-parserCFG May 27, 2019 Syntax Analyzer
Parsing & Context-Free Grammars Hal Perkins Autumn 2005
COMPILER CONSTRUCTION
COP 4620 / 5625 Programming Language Translation / Compiler Writing Fall 2003 Lecture 2, 09/04/2003 Prof. Roy Levow.
Presentation transcript:

Introduction to Parsing Lecture 8 Adapted from slides by G. Necula and R. Bodik 2/8/2008 Prof. Hilfinger CS164 Lecture 8

Prof. Hilfinger CS164 Lecture 8 Outline Limitations of regular languages Parser overview Context-free grammars (CFG’s) Derivations Syntax-Directed Translation 2/8/2008 Prof. Hilfinger CS164 Lecture 8

Languages and Automata Formal languages are very important in CS Especially in programming languages Regular languages The weakest formal languages widely used Many applications We will also study context-free languages 2/8/2008 Prof. Hilfinger CS164 Lecture 8

Limitations of Regular Languages Intuition: A finite automaton that runs long enough must repeat states Finite automaton can’t remember # of times it has visited a particular state Finite automaton has finite memory Only enough to store in which state it is Cannot count, except up to a finite limit E.g., language of balanced parentheses is not regular: { (i )i | i  0} 2/8/2008 Prof. Hilfinger CS164 Lecture 8

The Structure of a Compiler Lexical analysis Source Tokens Parsing Today we start Interm. Language Code Gen. Machine Code Optimization 2/8/2008 Prof. Hilfinger CS164 Lecture 8

The Functionality of the Parser Input: sequence of tokens from lexer Output: abstract syntax tree of the program 2/8/2008 Prof. Hilfinger CS164 Lecture 8

Prof. Hilfinger CS164 Lecture 8 Example Pyth: if x == y: z =1 else: z = 2 Parser input: IF ID == ID : ID = INT  ELSE : ID = INT  Parser output (abstract syntax tree): IF-THEN-ELSE == = ID INT 2/8/2008 Prof. Hilfinger CS164 Lecture 8

Prof. Hilfinger CS164 Lecture 8 Why A Tree? Each stage of the compiler has two purposes: Detect and filter out some class of errors Compute some new information or translate the representation of the program to make things easier for later stages Recursive structure of tree suits recursive structure of language definition With tree, later stages can easily find “the else clause”, e.g., rather than having to scan through tokens to find it. 2/8/2008 Prof. Hilfinger CS164 Lecture 8

Comparison with Lexical Analysis Phase Input Output Lexer Sequence of characters Sequence of tokens Parser Syntax tree 2/8/2008 Prof. Hilfinger CS164 Lecture 8

Prof. Hilfinger CS164 Lecture 8 The Role of the Parser Not all sequences of tokens are programs . . . . . . Parser must distinguish between valid and invalid sequences of tokens We need A language for describing valid sequences of tokens A method for distinguishing valid from invalid sequences of tokens 2/8/2008 Prof. Hilfinger CS164 Lecture 8

Programming Language Structure Programming languages have recursive structure Consider the language of arithmetic expressions with integers, +, *, and ( ) An expression is either: an integer an expression followed by “+” followed by expression an expression followed by “*” followed by expression a ‘(‘ followed by an expression followed by ‘)’ int , int + int , ( int + int) * int are expressions 2/8/2008 Prof. Hilfinger CS164 Lecture 8

Notation for Programming Languages An alternative notation: E  int E  E + E E  E * E E  ( E ) We can view these rules as rewrite rules We start with E and replace occurrences of E with some right-hand side E  E * E  ( E ) * E  ( E + E ) * E  …  (int + int) * int 2/8/2008 Prof. Hilfinger CS164 Lecture 8

Prof. Hilfinger CS164 Lecture 8 Observation All arithmetic expressions can be obtained by a sequence of replacements Any sequence of replacements forms a valid arithmetic expression This means that we cannot obtain ( int ) ) by any sequence of replacements. Why? This set of rules is a context-free grammar 2/8/2008 Prof. Hilfinger CS164 Lecture 8

Context-Free Grammars A CFG consists of A set of non-terminals N By convention, written with capital letter in these notes A set of terminals T By convention, either lower case names or punctuation A start symbol S (a non-terminal) A set of productions Assuming E  N E  e , or E  Y1 Y2 ... Yn where Yi  N  T 2/8/2008 Prof. Hilfinger CS164 Lecture 8

Prof. Hilfinger CS164 Lecture 8 Examples of CFGs Simple arithmetic expressions: E  int E  E + E E  E * E E  ( E ) One non-terminal: E Several terminals: int, +, *, (, ) Called terminals because they are never replaced By convention the non-terminal for the first production is the start one 2/8/2008 Prof. Hilfinger CS164 Lecture 8

Prof. Hilfinger CS164 Lecture 8 The Language of a CFG Read productions as replacement rules: X  Y1 ... Yn Means X can be replaced by Y1 ... Yn X  e Means X can be erased (replaced with empty string) 2/8/2008 Prof. Hilfinger CS164 Lecture 8

Prof. Hilfinger CS164 Lecture 8 Key Idea Begin with a string consisting of the start symbol “S” Replace any non-terminal X in the string by a right-hand side of some production X  Y1 … Yn Repeat (2) until there are only terminals in the string The successive strings created in this way are called sentential forms. 2/8/2008 Prof. Hilfinger CS164 Lecture 8

The Language of a CFG (Cont.) More formally, may write X1 … Xi-1 Xi Xi+1… Xn  X1 … Xi-1 Y1 … Ym Xi+1 … Xn if there is a production Xi  Y1 … Ym 2/8/2008 Prof. Hilfinger CS164 Lecture 8

The Language of a CFG (Cont.) Write X1 … Xn * Y1 … Ym if X1 … Xn  …  …  Y1 … Ym in 0 or more steps 2/8/2008 Prof. Hilfinger CS164 Lecture 8

L(G) = { a1 … an | S * a1 … an and every ai is a terminal } The Language of a CFG Let G be a context-free grammar with start symbol S. Then the language of G is: L(G) = { a1 … an | S * a1 … an and every ai is a terminal } 2/8/2008 Prof. Hilfinger CS164 Lecture 8

Prof. Hilfinger CS164 Lecture 8 Examples: S  0 also written as S  0 | 1 S  1 Generates the language { “0”, “1” } What about S  1 A A  0 | 1 A  0 | 1 A What about S   | ( S ) 2/8/2008 Prof. Hilfinger CS164 Lecture 8

Prof. Hilfinger CS164 Lecture 8 Pyth Example A fragment of Pyth: Compound  while Expr: Block | if Expr: Block Elses Elses   | else: Block | elif Expr: Block Elses Block  Stmt_List | Suite (Formal language papers use one-character non-terminals, but we don’t have to!) 2/8/2008 Prof. Hilfinger CS164 Lecture 8

Derivations and Parse Trees A derivation is a sequence of sentential forms resulting from the application of a sequence of productions S  …  … A derivation can be represented as a parse tree Start symbol is the tree’s root For a production X  Y1 … Yn add children Y1, …, Yn to node X 2/8/2008 Prof. Hilfinger CS164 Lecture 8

Prof. Hilfinger CS164 Lecture 8 Derivation Example Grammar E  E + E | E * E | (E) | int String int * int + int 2/8/2008 Prof. Hilfinger CS164 Lecture 8

Derivation Example (Cont.)  E + E  E * E + E  int * E + E  int * int + E  int * int + int 2/8/2008 Prof. Hilfinger CS164 Lecture 8

Derivation in Detail (1) 2/8/2008 Prof. Hilfinger CS164 Lecture 8

Derivation in Detail (2)  E + E E + E 2/8/2008 Prof. Hilfinger CS164 Lecture 8

Derivation in Detail (3)  E + E  E * E + E E + E E * E 2/8/2008 Prof. Hilfinger CS164 Lecture 8

Derivation in Detail (4)  E + E  E * E + E  int * E + E E + E E * E int 2/8/2008 Prof. Hilfinger CS164 Lecture 8

Derivation in Detail (5)  E + E  E * E + E  int * E + E  int * int + E E + E E * E int int 2/8/2008 Prof. Hilfinger CS164 Lecture 8

Derivation in Detail (6)  E + E  E * E + E  int * E + E  int * int + E  int * int + int E + E E * E int int int 2/8/2008 Prof. Hilfinger CS164 Lecture 8

Prof. Hilfinger CS164 Lecture 8 Notes on Derivations A parse tree has Terminals at the leaves Non-terminals at the interior nodes A left-right traversal of the leaves is the original input The parse tree shows the association of operations, the input string does not ! There may be multiple ways to match the input Derivations (and parse trees) choose one 2/8/2008 Prof. Hilfinger CS164 Lecture 8

The Payoff: parser as a translator syntax-directed translation parser stream of tokens ASTs, or assembly code syntax + translation rules (typically hardcoded in the parser) 2/8/2008 Prof. Hilfinger CS164 Lecture 8

Mechanism of syntax-directed translation syntax-directed translation is done by extending the CFG a translation rule is defined for each production given X  d A B c the translation of X is defined recursively using translation of nonterminals A, B values of attributes of terminals d, c constants 2/8/2008 Prof. Hilfinger CS164 Lecture 8

To translate an input string: Build the parse tree. Working bottom-up Use the translation rules to compute the translation of each nonterminal in the tree Result: the translation of the string is the translation of the parse tree's root nonterminal. Why bottom up? a nonterminal's value may depend on the value of the symbols on the right-hand side, so translate a non-terminal node only after children translations are available. 2/8/2008 Prof. Hilfinger CS164 Lecture 8

Example 1: Arithmetic expression to value Syntax-directed translation rules: E  E + T E1.trans = E2.trans + T.trans E  T E.trans = T.trans T  T * F T1.trans = T2.trans * F.trans T  F T.trans = F.trans F  int F.trans = int.value F  ( E ) F.trans = E.trans 2/8/2008 Prof. Hilfinger CS164 Lecture 8

Example 1: Bison/Yacc Notation E : E + T { $$ = $1 + $3; } T : T * F { $$ = $1 * $3; } F : int { $$ = $1; } F : ‘(‘ E ‘) ‘ { $$ = $2; } KEY: $$ : Semantic value of left-hand side $n : Semantic value of nth symbol on right-hand side 2/8/2008 Prof. Hilfinger CS164 Lecture 8

Example 1 (cont): Annotated Parse Tree Input: 2 * (4 + 5) T (18) F (9) T (2) * ( ) E (9) F (2) E (4) + T (5) int (2) T (4) F (5) F (4) int (5) 2/8/2008 Prof. Hilfinger CS164 Lecture 8 int (4)

Example 2: Compute the type of an expression E -> E + E if $1 == INT and $3 == INT: $$ = INT else: $$ = ERROR E -> E and E if $1 == BOOL and $3 == BOOL: $$ = BOOL E -> E == E if $1 == $3 and $2 != ERROR: $$ = BOOL E -> true $$ = BOOL E -> false $$ = BOOL E -> int $$ = INT E -> ( E ) $$ = $2 2/8/2008 Prof. Hilfinger CS164 Lecture 8

Prof. Hilfinger CS164 Lecture 8 Example 2 (cont) Input: (2 + 2) == 4 E (BOOL) E (INT) E (INT) == ( E (INT) ) int (INT) E (INT) + E (INT) int (INT) int (INT) 2/8/2008 Prof. Hilfinger CS164 Lecture 8

Building Abstract Syntax Trees Examples so far, streams of tokens translated into integer values, or types Translating into ASTs is not very different 2/8/2008 Prof. Hilfinger CS164 Lecture 8

Prof. Hilfinger CS164 Lecture 8 AST vs. Parse Tree AST is condensed form of a parse tree operators appear at internal nodes, not at leaves. "Chains" of single productions are collapsed. Lists are "flattened". Syntactic details are omitted e.g., parentheses, commas, semi-colons AST is a better structure for later compiler stages omits details having to do with the source language, only contains information about the essential structure of the program. 2/8/2008 Prof. Hilfinger CS164 Lecture 8

Example: 2 * (4 + 5) Parse tree vs. AST F E * ) + ( int (5) int (4) * 2 + 4 5 int (2) 2/8/2008 Prof. Hilfinger CS164 Lecture 8

AST-building translation rules E  E + T $$ = new PlusNode($1, $3) E  T $$ = $1 T  T * F $$ = new TimesNode($1, $3) T  F $$ = $1 F  int $$ = new IntLitNode($1) F  ( E ) $$ = $2 2/8/2008 Prof. Hilfinger CS164 Lecture 8

Example: 2 * (4 + 5): Steps in Creating AST F E * ) + ( int (5) int (4) + 5 4 2 (Only some of the semantic values are shown) int (2) 2/8/2008 Prof. Hilfinger CS164 Lecture 8

Leftmost and Rightmost Derivations  E + E  E + int  E * E + int  E * int + int  int * int + int E  E + E  E * E + E  int * E + E  int * int + E  int * int + int Leftmost derivation: always act on leftmost non-terminal Rightmost derivation: always act on rightmost non-terminal 2/8/2008 Prof. Hilfinger CS164 Lecture 8

rightmost Derivation in Detail (1) 2/8/2008 Prof. Hilfinger CS164 Lecture 8

rightmost Derivation in Detail (2)  E + E E E + E 2/8/2008 Prof. Hilfinger CS164 Lecture 8

rightmost Derivation in Detail (3)  E + E  E + int E E + E int 2/8/2008 Prof. Hilfinger CS164 Lecture 8

rightmost Derivation in Detail (4)  E + E  E + int  E * E + int E E + E E * E int 2/8/2008 Prof. Hilfinger CS164 Lecture 8

rightmost Derivation in Detail (5)  E + E  E + int  E * E + int  E * int + int E E + E E * E int int 2/8/2008 Prof. Hilfinger CS164 Lecture 8

rightmost Derivation in Detail (6)  E + E  E + int  E * E + int  E * int + int  int * int + int E E + E E * E int int int 2/8/2008 Prof. Hilfinger CS164 Lecture 8

Aside: Canonical Derivations Take a look at that last derivation in reverse. The active part (red) tends to move left to right. We call this a reverse rightmost or canonical derivation. Comes up in bottom-up parsing. We’ll return to it in a couple of lectures. 2/8/2008 Prof. Hilfinger CS164 Lecture 8

Derivations and Parse Trees For each parse tree there is exactly one leftmost and one rightmost derivation The difference is the order in which branches are added, not the structure of the tree. 2/8/2008 Prof. Hilfinger CS164 Lecture 8

Summary of Derivations We are not just interested in whether s  L(G) Also need derivation (or parse tree) and AST. Parse trees slavishly reflect the grammar. Abstract syntax trees abstract from the grammar, cutting out detail that interferes with later stages. A derivation defines a parse tree But one parse tree may have many derivations Derivations drive translation (to ASTs, etc.) Leftmost and rightmost derivations most important in parser implementation 2/8/2008 Prof. Hilfinger CS164 Lecture 8