1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.

Slides:



Advertisements
Similar presentations
C O N T E X T - F R E E LANGUAGES ( use a grammar to describe a language) 1.
Advertisements

1 Parsing The scanner recognizes words The parser recognizes syntactic units Parser operations: Check and verify syntax based on specified syntax rules.
Grammars, constituency and order A grammar describes the legal strings of a language in terms of constituency and order. For example, a grammar for a fragment.
CS5371 Theory of Computation
9/27/2006Prof. Hilfinger, Lecture 141 Syntax-Directed Translation Lecture 14 (adapted from slides by R. Bodik)
6/12/2015Prof. Hilfinger CS164 Lecture 111 Bottom-Up Parsing Lecture (From slides by G. Necula & R. Bodik)
Context-Free Grammars Lecture 7
Prof. Bodik CS 164 Lecture 81 Grammars and ambiguity CS164 3:30-5:00 TT 10 Evans.
January 14, 2015CS21 Lecture 51 CS21 Decidability and Tractability Lecture 5 January 14, 2015.
Prof. Bodik CS 164 Lecture 61 Building a Parser II CS164 3:30-5:00 TT 10 Evans.
1 Foundations of Software Design Lecture 23: Finite Automata and Context-Free Grammars Marti Hearst Fall 2002.
Parsing — Part II (Ambiguity, Top-down parsing, Left-recursion Removal)
Prof. Fateman CS 164 Lecture 91 Bottom-Up Parsing Lecture 9.
CS 536 Spring Ambiguity Lecture 8. CS 536 Spring Announcement Reading Assignment –“Context-Free Grammars” (Sections 4.1, 4.2) Programming.
Lecture 9UofH - COSC Dr. Verma 1 COSC 3340: Introduction to Theory of Computation University of Houston Dr. Verma Lecture 9.
COP4020 Programming Languages
ParsingParsing. 2 Front-End: Parser  Checks the stream of words and their parts of speech for grammatical correctness scannerparser source code tokens.
CPSC Compiler Tutorial 3 Parser. Parsing The syntax of most programming languages can be specified by a Context-free Grammar (CGF) Parsing: Given.
Prof. Fateman CS 164 Lecture 71 More Finite Automata/ Lexical Analysis /Introduction to Parsing Lecture 7.
(2.1) Grammars  Definitions  Grammars  Backus-Naur Form  Derivation – terminology – trees  Grammars and ambiguity  Simple example  Grammar hierarchies.
Chapter 2 Syntax A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth.
CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars.
BİL 744 Derleyici Gerçekleştirimi (Compiler Design)1 Syntax Analyzer Syntax Analyzer creates the syntactic structure of the given source program. This.
Context Free Grammars CIS 361. Introduction Finite Automata accept all regular languages and only regular languages Many simple languages are non regular:
CS Describing Syntax CS 3360 Spring 2012 Sec Adapted from Addison Wesley’s lecture notes (Copyright © 2004 Pearson Addison Wesley)
PART I: overview material
Profs. Necula CS 164 Lecture Top-Down Parsing ICOM 4036 Lecture 5.
Lecture # 9 Chap 4: Ambiguous Grammar. 2 Chomsky Hierarchy: Language Classification A grammar G is said to be – Regular if it is right linear where each.
Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Copyright © by Curt Hill Grammar Types The Chomsky Hierarchy BNF and Derivation Trees.
Parsing Lecture 5 Fri, Jan 28, Syntax Analysis The syntax of a language is described by a context-free grammar. Each grammar rule has the form A.
11 Chapter 4 Grammars and Parsing Grammar Grammars, or more precisely, context-free grammars, are the formalism for describing the structure of.
Bernd Fischer RW713: Compiler and Software Language Engineering.
Introduction to Parsing
Prof. Necula CS 164 Lecture 8-91 Bottom-Up Parsing LR Parsing. Parser Generators. Lecture 6.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 3: Introduction to Syntactic Analysis.
A Programming Languages Syntax Analysis (1)
Context-free grammars. Roadmap Last time – Regex == DFA – JLex for generating Lexers This time – CFGs, the underlying abstraction for Parsers.
Top-down Parsing lecture slides from C OMP 412 Rice University Houston, Texas, Fall 2001.
CS 208: Computing Theory Assoc. Prof. Dr. Brahim Hnich Faculty of Computer Sciences Izmir University of Economics.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
Overview of Previous Lesson(s) Over View  In our compiler model, the parser obtains a string of tokens from the lexical analyzer & verifies that the.
Lecture 3: Parsing CS 540 George Mason University.
Syntax Analysis - Parsing Compiler Design Lecture (01/28/98) Computer Science Rensselaer Polytechnic.
Grammars Hopcroft, Motawi, Ullman, Chap 5. Grammars Describes underlying rules (syntax) of programming languages Compilers (parsers) are based on such.
Grammars CS 130: Theory of Computation HMU textbook, Chap 5.
Unit-3 Parsing Theory (Syntax Analyzer) PREPARED BY: PROF. HARISH I RATHOD COMPUTER ENGINEERING DEPARTMENT GUJARAT POWER ENGINEERING & RESEARCH INSTITUTE.
Chapter 3 Context-Free Grammars Dr. Frank Lee. 3.1 CFG Definition The next phase of compilation after lexical analysis is syntax analysis. This phase.
Syntax Analysis – Part I EECS 483 – Lecture 4 University of Michigan Monday, September 17, 2006.
Syntax Analyzer (Parser)
CSE 5317/4305 L3: Parsing #11 Parsing #1 Leonidas Fegaras.
Programming Languages and Design Lecture 2 Syntax Specifications of Programming Languages Instructor: Li Ma Department of Computer Science Texas Southern.
LECTURE 4 Syntax. SPECIFYING SYNTAX Programming languages must be very well defined – there’s no room for ambiguity. Language designers must use formal.
1 Introduction to Parsing. 2 Outline l Regular languages revisited l Parser overview Context-free grammars (CFG ’ s) l Derivations.
Chapter 4: Syntax analysis Syntax analysis is done by the parser. –Detects whether the program is written following the grammar rules and reports syntax.
Compiler Construction Lecture Five: Parsing - Part Two CSC 2103: Compiler Construction Lecture Five: Parsing - Part Two Joyce Nakatumba-Nabende 1.
Chapter 3 – Describing Syntax CSCE 343. Syntax vs. Semantics Syntax: The form or structure of the expressions, statements, and program units. Semantics:
Introduction to Parsing
Context-Free Grammars: an overview
Programming Languages Translator
CS510 Compiler Lecture 4.
Introduction to Parsing (adapted from CS 164 at Berkeley)
PARSE TREES.
Context-Free Grammars
Lecture 7: Introduction to Parsing (Syntax Analysis)
Context-Free Grammars 1
Lecture 4: Lexical Analysis & Chomsky Hierarchy
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
COMPILER CONSTRUCTION
Presentation transcript:

1 Introduction to Parsing Lecture 5

2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations

3 Languages and Automata Formal languages are very important in CS –Especially in programming languages Regular languages –The weakest formal languages widely used –Many applications We will also study context-free languages

4 Limitations of Regular Languages Intuition: A finite automaton that runs long enough must repeat states Finite automaton can’t remember # of times it has visited a particular state Finite automaton has finite memory –Only enough to store in which state it is –Cannot count, except up to a finite limit E.g., language of balanced parentheses is not regular: { ( i ) i | i >= 0}

5 The Functionality of the Parser Input: sequence of tokens from lexer Output: parse tree of the program

6 Example Cool if x = y then 1 else 2 fi Parser input IF ID = ID THEN INT ELSE INT FI Parser output IF-THEN-ELSE = ID INT

7 Comparison with Lexical Analysis PhaseInputOutput LexerSequence of characters Sequence of tokens ParserSequence of tokens Parse tree

8 The Role of the Parser Not all sequences of tokens are programs Parser must distinguish between valid and invalid sequences of tokens We need –A language for describing valid sequences of tokens –A method for distinguishing valid from invalid sequences of tokens

9 Programming Language Structure Programming languages have recursive structure Consider the language of arithmetic expressions with integers, +, *, and ( ) An expression is either: –an integer –an expression followed by “+” followed by expression –an expression followed by “*” followed by expression –a ‘(‘ followed by an expression followed by ‘)’ int, int + int, ( int + int) * int are expressions

10 Notation for Programming Languages An alternative notation: E --> int E --> E + E E --> E * E E --> ( E ) We can view these rules as rewrite rules –We start with E and replace occurrences of E with some right-hand side E --> E * E --> ( E ) * E --> ( E + E ) * E --> (int + int) * int

11 Observation All arithmetic expressions can be obtained by a sequence of replacements Any sequence of replacements forms a valid arithmetic expression This means that we cannot obtain ( int ) ) by any sequence of replacements. Why? This notation is a context free grammar

12 Context Free Grammars A CFG consists of –A set of non-terminals N By convention, written with capital letter in these notes –A set of terminals T By convention, either lower case names or punctuation –A start symbol S (a non-terminal) –A set of productions Assuming E Є N E --> , or E --> Y 1 Y 2... Y n where Y i Є N U T

13 Examples of CFGs Simple arithmetic expressions: E --> int E --> E + E E --> E * E E --> ( E ) –One non-terminal: E –Several terminals: int, +, *, (, ) Called terminals because they are never replaced –By convention the non-terminal for the first production is the start one

14 The Language of a CFG Read productions as replacement rules: X --> Y 1... Y n Means X can be replaced by Y 1... Y n X -->  Means X can be erased (replaced with empty string)

15 Key Idea 1.Begin with a string consisting of the start symbol “S” 2.Replace any non-terminal X in the string by a right-hand side of some production X --> Y 1 … Y n 3.Repeat (2) until there are only terminals in the string

16 The Language of a CFG (Cont.) More formally, write X 1 … X i-1 X i X i+1 … X n --> X 1 … X i-1 Y 1 … Y m X i+1 … X n if there is a production X i --> Y 1 … Y m

17 The Language of a CFG (Cont.) Write X 1 … X n --> * Y 1 … Y m if X 1 … X n --> … --> … --> Y 1 … Y m in 0 or more steps

18 The Language of a CFG Let G be a context-free grammar with start symbol S. Then the language of G is: { a 1 … a n | S --> * a 1 … a n and every a i is a terminal }

19 Examples: S --> 0 also written as S --> 0 | 1 S --> 1 Generates the language { “0”, “1” } What about S --> 1 A A --> 0 | 1 What about S --> 1 A A --> 0 | 1 A What about S -->  | ( S )

20 Arithmetic Example Simple arithmetic expressions: Some elements of the language:

21 Cool Example A fragment of COOL:

22 Cool Example (Cont.) Some elements of the language

23 Notes The idea of a CFG is a big step. But: Membership in a language is “yes” or “no” –we also need parse tree of the input Must handle errors gracefully Need an implementation of CFG’s (e.g., bison)

24 More Notes Form of the grammar is important –Many grammars generate the same language –Tools are sensitive to the grammar –Note: Tools for regular languages (e.g., flex) are also sensitive to the form of the regular expression, but this is rarely a problem in practice

25 Derivations and Parse Trees A derivation is a sequence of productions S --> … --> … A derivation can be drawn as a tree –Start symbol is the tree’s root –For a production X --> Y 1 … Y n add children Y 1, …, Y n to node X

26 Derivation Example Grammar String

27 Derivation Example (Cont.)

28 Derivation in Detail (1) E

29 Derivation in Detail (2) E EE+

30 Derivation in Detail (3) E E EE E+ *

31 Derivation in Detail (4) E E EE E+ * id

32 Derivation in Detail (5) E E EE E+ * id

33 Derivation in Detail (6) E E EE E+ id*

34 Notes on Derivations A parse tree has –Terminals at the leaves –Non-terminals at the interior nodes A left-right traversal of the leaves is the original input The parse tree shows the association of operations, the input string does not !

35 Left-most and Right-most Derivations The example is a left- most derivation –At each step, replace the left-most non-terminal There is an equivalent notion of a right-most derivation

36 Right-most Derivation in Detail (1) E

37 Right-most Derivation in Detail (2) E EE+

38 Right-most Derivation in Detail (3) E EE+ id

39 Right-most Derivation in Detail (4) E E EE E+ id*

40 Right-most Derivation in Detail (5) E E EE E+ id*

41 Right-most Derivation in Detail (6) E E EE E+ id*

42 Derivations and Parse Trees Note that for each parse tree there is a left- most and a right-most derivation The difference is the order in which branches are added

43 Summary of Derivations We are not just interested in whether s Є L(G) –We need a parse tree for s A derivation defines a parse tree –But one parse tree may have many derivations Left-most and right-most derivations are important in parser implementation