Introduction to Parsing

Slides:



Advertisements
Similar presentations
1 Parsing The scanner recognizes words The parser recognizes syntactic units Parser operations: Check and verify syntax based on specified syntax rules.
Advertisements

CS5371 Theory of Computation
9/27/2006Prof. Hilfinger, Lecture 141 Syntax-Directed Translation Lecture 14 (adapted from slides by R. Bodik)
6/12/2015Prof. Hilfinger CS164 Lecture 111 Bottom-Up Parsing Lecture (From slides by G. Necula & R. Bodik)
Context-Free Grammars Lecture 7
Prof. Bodik CS 164 Lecture 81 Grammars and ambiguity CS164 3:30-5:00 TT 10 Evans.
January 14, 2015CS21 Lecture 51 CS21 Decidability and Tractability Lecture 5 January 14, 2015.
Prof. Bodik CS 164 Lecture 61 Building a Parser II CS164 3:30-5:00 TT 10 Evans.
1 Foundations of Software Design Lecture 23: Finite Automata and Context-Free Grammars Marti Hearst Fall 2002.
Parsing — Part II (Ambiguity, Top-down parsing, Left-recursion Removal)
Prof. Fateman CS 164 Lecture 91 Bottom-Up Parsing Lecture 9.
CS 536 Spring Ambiguity Lecture 8. CS 536 Spring Announcement Reading Assignment –“Context-Free Grammars” (Sections 4.1, 4.2) Programming.
Lecture 9UofH - COSC Dr. Verma 1 COSC 3340: Introduction to Theory of Computation University of Houston Dr. Verma Lecture 9.
COP4020 Programming Languages
ParsingParsing. 2 Front-End: Parser  Checks the stream of words and their parts of speech for grammatical correctness scannerparser source code tokens.
CPSC Compiler Tutorial 3 Parser. Parsing The syntax of most programming languages can be specified by a Context-free Grammar (CGF) Parsing: Given.
Prof. Fateman CS 164 Lecture 71 More Finite Automata/ Lexical Analysis /Introduction to Parsing Lecture 7.
CSE 413 Programming Languages & Implementation Hal Perkins Autumn 2012 Context-Free Grammars and Parsing 1.
8/19/2015© Hal Perkins & UW CSEC-1 CSE P 501 – Compilers Parsing & Context-Free Grammars Hal Perkins Winter 2008.
1 Syntax and Semantics The Purpose of Syntax Problem of Describing Syntax Formal Methods of Describing Syntax Derivations and Parse Trees Sebesta Chapter.
1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.
CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars.
Syntax and Semantics Structure of programming languages.
BİL 744 Derleyici Gerçekleştirimi (Compiler Design)1 Syntax Analyzer Syntax Analyzer creates the syntactic structure of the given source program. This.
PART I: overview material
Profs. Necula CS 164 Lecture Top-Down Parsing ICOM 4036 Lecture 5.
Lecture # 9 Chap 4: Ambiguous Grammar. 2 Chomsky Hierarchy: Language Classification A grammar G is said to be – Regular if it is right linear where each.
Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
C H A P T E R TWO Syntax and Semantic.
Syntax and Semantics Structure of programming languages.
Parsing Lecture 5 Fri, Jan 28, Syntax Analysis The syntax of a language is described by a context-free grammar. Each grammar rule has the form A.
CMSC 330: Organization of Programming Languages Context-Free Grammars.
Bernd Fischer RW713: Compiler and Software Language Engineering.
Prof. Necula CS 164 Lecture 8-91 Bottom-Up Parsing LR Parsing. Parser Generators. Lecture 6.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 3: Introduction to Syntactic Analysis.
Context-free grammars. Roadmap Last time – Regex == DFA – JLex for generating Lexers This time – CFGs, the underlying abstraction for Parsers.
Top-down Parsing lecture slides from C OMP 412 Rice University Houston, Texas, Fall 2001.
CS 208: Computing Theory Assoc. Prof. Dr. Brahim Hnich Faculty of Computer Sciences Izmir University of Economics.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
Syntax Analysis - Parsing Compiler Design Lecture (01/28/98) Computer Science Rensselaer Polytechnic.
Grammars Hopcroft, Motawi, Ullman, Chap 5. Grammars Describes underlying rules (syntax) of programming languages Compilers (parsers) are based on such.
Grammars CS 130: Theory of Computation HMU textbook, Chap 5.
Top-down Parsing. 2 Parsing Techniques Top-down parsers (LL(1), recursive descent) Start at the root of the parse tree and grow toward leaves Pick a production.
Unit-3 Parsing Theory (Syntax Analyzer) PREPARED BY: PROF. HARISH I RATHOD COMPUTER ENGINEERING DEPARTMENT GUJARAT POWER ENGINEERING & RESEARCH INSTITUTE.
Chapter 3 Context-Free Grammars Dr. Frank Lee. 3.1 CFG Definition The next phase of compilation after lexical analysis is syntax analysis. This phase.
Syntax Analysis – Part I EECS 483 – Lecture 4 University of Michigan Monday, September 17, 2006.
Top-Down Parsing.
Syntax Analyzer (Parser)
Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 2 Syntax A language that is simple to parse.
CSE 5317/4305 L3: Parsing #11 Parsing #1 Leonidas Fegaras.
CSC312 Automata Theory Lecture # 26 Chapter # 12 by Cohen Context Free Grammars.
Programming Languages and Design Lecture 2 Syntax Specifications of Programming Languages Instructor: Li Ma Department of Computer Science Texas Southern.
LECTURE 4 Syntax. SPECIFYING SYNTAX Programming languages must be very well defined – there’s no room for ambiguity. Language designers must use formal.
1 Introduction to Parsing. 2 Outline l Regular languages revisited l Parser overview Context-free grammars (CFG ’ s) l Derivations.
Chapter 4: Syntax analysis Syntax analysis is done by the parser. –Detects whether the program is written following the grammar rules and reports syntax.
Compiler Construction Lecture Five: Parsing - Part Two CSC 2103: Compiler Construction Lecture Five: Parsing - Part Two Joyce Nakatumba-Nabende 1.
CMSC 330: Organization of Programming Languages Pushdown Automata Parsing.
Syntax Analysis Or Parsing. A.K.A. Syntax Analysis –Recognize sentences in a language. –Discover the structure of a document/program. –Construct (implicitly.
Syntax and Semantics Structure of programming languages.
Introduction to Parsing
Programming Languages Translator
CS510 Compiler Lecture 4.
Lexical and Syntax Analysis
Introduction to Parsing (adapted from CS 164 at Berkeley)
Parsing & Context-Free Grammars Hal Perkins Autumn 2011
Lexical and Syntax Analysis
Lecture 7: Introduction to Parsing (Syntax Analysis)
LR Parsing. Parser Generators.
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
COMPILER CONSTRUCTION
Presentation transcript:

Introduction to Parsing Lecture 4 Prof. Necula CS 164 Lecture 5

Programming Assignment 2 is out this week Administrivia Programming Assignment 2 is out this week Due October 1st Work in teams begins Required Readings Lex Manual Red Dragon Book Chapter 4 Prof. Necula CS 164 Lecture 5

Regular languages revisited Parser overview Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations Prof. Necula CS 164 Lecture 5

Languages and Automata Formal languages are very important in CS Especially in programming languages Regular languages The weakest formal languages widely used Many applications We will also study context-free languages Prof. Necula CS 164 Lecture 5

Limitations of Regular Languages Intuition: A finite automaton that runs long enough must repeat states Finite automaton can’t remember # of times it has visited a particular state Finite automaton has finite memory Only enough to store in which state it is Cannot count, except up to a finite limit E.g., language of balanced parentheses is not regular: { (i )i | i ¸ 0} Prof. Necula CS 164 Lecture 5

The Functionality of the Parser Input: sequence of tokens from lexer Output: parse tree of the program Prof. Necula CS 164 Lecture 5

IF ID = ID THEN INT ELSE INT FI Example Cool if x = y then 1 else 2 fi Parser input IF ID = ID THEN INT ELSE INT FI Parser output IF-THEN-ELSE = ID INT Prof. Necula CS 164 Lecture 5

Comparison with Lexical Analysis Phase Input Output Lexer Sequence of characters Sequence of tokens Parser Parse tree Prof. Necula CS 164 Lecture 5

Not all sequences of tokens are programs . . . The Role of the Parser Not all sequences of tokens are programs . . . . . . Parser must distinguish between valid and invalid sequences of tokens We need A language for describing valid sequences of tokens A method for distinguishing valid from invalid sequences of tokens Prof. Necula CS 164 Lecture 5

Context-Free Grammars Programming language constructs have recursive structure An EXPR is if EXPR then EXPR else EXPR fi , or while EXPR loop EXPR pool , or … Context-free grammars are a natural notation for this recursive structure Prof. Necula CS 164 Lecture 5

CFGs (Cont.) A CFG consists of A set of terminals T A set of non-terminals N A start symbol S (a non-terminal) A set of productions Assuming X  N X => e , or X => Y1 Y2 ... Yn where Yi  (N U T) Prof. Necula CS 164 Lecture 5

Notational Conventions In these lecture notes Non-terminals are written upper-case Terminals are written lower-case The start symbol is the left-hand side of the first production Prof. Necula CS 164 Lecture 5

Examples of CFGs A fragment of Cool: Prof. Necula CS 164 Lecture 5

Examples of CFGs (cont.) Simple arithmetic expressions: Prof. Necula CS 164 Lecture 5

Read productions as replacement rules: X => Y1 ... Yn X => e The Language of a CFG Read productions as replacement rules: X => Y1 ... Yn Means X can be replaced by Y1 ... Yn X => e Means X can be erased (replaced with empty string) Prof. Necula CS 164 Lecture 5

Begin with a string consisting of the start symbol “S” Key Idea Begin with a string consisting of the start symbol “S” Replace any non-terminal X in the string by a right-hand side of some production X => Y1 … Yn Repeat (2) until there are no non-terminals in the string Prof. Necula CS 164 Lecture 5

The Language of a CFG (Cont.) More formally, write X1 … Xi … Xn => X1 … Xi-1 Y1 … Ym Xi+1 … Xn if there is a production Xi => Y1 … Ym Prof. Necula CS 164 Lecture 5

The Language of a CFG (Cont.) Write X1 … Xn =>* Y1 … Ym if X1 … Xn => … => … => Y1 … Ym in 0 or more steps Prof. Necula CS 164 Lecture 5

{ a1 … an | S =>* a1 … an and every ai is a terminal } The Language of a CFG Let G be a context-free grammar with start symbol S. Then the language of G is: { a1 … an | S =>* a1 … an and every ai is a terminal } Prof. Necula CS 164 Lecture 5

Terminals are called because there are no rules for replacing them Once generated, terminals are permanent Terminals ought to be tokens of the language Prof. Necula CS 164 Lecture 5

L(G) is the language of CFG G Strings of balanced parentheses Examples L(G) is the language of CFG G Strings of balanced parentheses Two grammars: OR Prof. Necula CS 164 Lecture 5

Cool Example A fragment of COOL: Prof. Necula CS 164 Lecture 5

Some elements of the language Cool Example (Cont.) Some elements of the language Prof. Necula CS 164 Lecture 5

Simple arithmetic expressions: Arithmetic Example Simple arithmetic expressions: Some elements of the language: Prof. Necula CS 164 Lecture 5

The idea of a CFG is a big step. But: Notes The idea of a CFG is a big step. But: Membership in a language is “yes” or “no” we also need parse tree of the input Must handle errors gracefully Need an implementation of CFG’s (e.g., bison) Prof. Necula CS 164 Lecture 5

Form of the grammar is important More Notes Form of the grammar is important Many grammars generate the same language Tools are sensitive to the grammar Note: Tools for regular languages (e.g., flex) are also sensitive to the form of the regular expression, but this is rarely a problem in practice Prof. Necula CS 164 Lecture 5

Derivations and Parse Trees A derivation is a sequence of productions S => … => … A derivation can be drawn as a tree Start symbol is the tree’s root For a production X => Y1 … Yn add children Y1, …, Yn to node X Prof. Necula CS 164 Lecture 5

Derivation Example Grammar String Prof. Necula CS 164 Lecture 5

Derivation Example (Cont.) + E E * E id id id Prof. Necula CS 164 Lecture 5

Derivation in Detail (1) Prof. Necula CS 164 Lecture 5

Derivation in Detail (2) + E Prof. Necula CS 164 Lecture 5

Derivation in Detail (3) + E E * E Prof. Necula CS 164 Lecture 5

Derivation in Detail (4) + E E * E id Prof. Necula CS 164 Lecture 5

Derivation in Detail (5) + E E * E id id Prof. Necula CS 164 Lecture 5

Derivation in Detail (6) + E E * E id id id Prof. Necula CS 164 Lecture 5

An in-order traversal of the leaves is the original input Notes on Derivations A parse tree has Terminals at the leaves Non-terminals at the interior nodes An in-order traversal of the leaves is the original input The parse tree shows the association of operations, the input string does not Prof. Necula CS 164 Lecture 5

Left-most and Right-most Derivations The previous example is a left-most derivation At each step, replace the left-most non-terminal Here is an equivalent notion of a right-most derivation Prof. Necula CS 164 Lecture 5

Right-most Derivation in Detail (1) Prof. Necula CS 164 Lecture 5

Right-most Derivation in Detail (2) + E Prof. Necula CS 164 Lecture 5

Right-most Derivation in Detail (3) + E id Prof. Necula CS 164 Lecture 5

Right-most Derivation in Detail (4) + E E * E id Prof. Necula CS 164 Lecture 5

Right-most Derivation in Detail (5) + E E * E id id Prof. Necula CS 164 Lecture 5

Right-most Derivation in Detail (6) + E E * E id id id Prof. Necula CS 164 Lecture 5

Derivations and Parse Trees Note that right-most and left-most derivations have the same parse tree The difference is the order in which branches are added Prof. Necula CS 164 Lecture 5

Summary of Derivations We are not just interested in whether s  L(G) We need a parse tree for s A derivation defines a parse tree But one parse tree may have many derivations Left-most and right-most derivations are important in parser implementation Prof. Necula CS 164 Lecture 5