Parsing Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Principles Lecture 3.

Slides:



Advertisements
Similar presentations
AST Generation Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Concepts Lecture 9.
Advertisements

1 Parsing The scanner recognizes words The parser recognizes syntactic units Parser operations: Check and verify syntax based on specified syntax rules.
Chap. 5, Top-Down Parsing J. H. Wang Mar. 29, 2011.
Mooly Sagiv and Roman Manevich School of Computer Science
LL(1) Parsing LL(1) is a Top Down parsing scheme. Applies productions from goal symbol to derive grammar sentence. First L – Scanner moves from left to.
Context-Free Grammars Lecture 7
Prof. Bodik CS 164 Lecture 81 Grammars and ambiguity CS164 3:30-5:00 TT 10 Evans.
Parsing — Part II (Ambiguity, Top-down parsing, Left-recursion Removal)
1 The Parser Its job: –Check and verify syntax based on specified syntax rules –Report errors –Build IR Good news –the process can be automated.
Parsing II : Top-down Parsing Lecture 7 CS 4318/5331 Apan Qasem Texas State University Spring 2015 *some slides adopted from Cooper and Torczon.
CPSC Compiler Tutorial 3 Parser. Parsing The syntax of most programming languages can be specified by a Context-free Grammar (CGF) Parsing: Given.
Context-Free Grammar CSCI-GA.2590 – Lecture 3 Ralph Grishman NYU.
(2.1) Grammars  Definitions  Grammars  Backus-Naur Form  Derivation – terminology – trees  Grammars and ambiguity  Simple example  Grammar hierarchies.
CSE 413 Programming Languages & Implementation Hal Perkins Autumn 2012 Context-Free Grammars and Parsing 1.
1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.
Syntax and Semantics Structure of programming languages.
Attribute Grammars Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Principles Lecture 17.
Chapter 9 Syntax Analysis Winter 2007 SEG2101 Chapter 9.
Top-Down Parsing - recursive descent - predictive parsing
4 4 (c) parsing. Parsing A grammar describes the strings of tokens that are syntactically legal in a PL A recogniser simply accepts or rejects strings.
BİL 744 Derleyici Gerçekleştirimi (Compiler Design)1 Syntax Analyzer Syntax Analyzer creates the syntactic structure of the given source program. This.
10/13/2015IT 3271 Tow kinds of predictive parsers: Bottom-Up: The syntax tree is built up from the leaves Example: LR(1) parser Top-Down The syntax tree.
Syntax Analysis The recognition problem: given a grammar G and a string w, is w  L(G)? The parsing problem: if G is a grammar and w  L(G), how can w.
-Mandakinee Singh (11CS10026).  What is parsing? ◦ Discovering the derivation of a string: If one exists. ◦ Harder than generating strings.  Two major.
Grammars CPSC 5135.
PART I: overview material
Profs. Necula CS 164 Lecture Top-Down Parsing ICOM 4036 Lecture 5.
Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
3-1 Chapter 3: Describing Syntax and Semantics Introduction Terminology Formal Methods of Describing Syntax Attribute Grammars – Static Semantics Describing.
Syntax and Semantics Structure of programming languages.
Parsing Lecture 5 Fri, Jan 28, Syntax Analysis The syntax of a language is described by a context-free grammar. Each grammar rule has the form A.
Chapter 4 Top-Down Parsing Recursive-Descent Gang S. Liu College of Computer Science & Technology Harbin Engineering University.
Top-down Parsing lecture slides from C OMP 412 Rice University Houston, Texas, Fall 2001.
Top-Down Parsing CS 671 January 29, CS 671 – Spring Where Are We? Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract.
Top-down Parsing. 2 Parsing Techniques Top-down parsers (LL(1), recursive descent) Start at the root of the parse tree and grow toward leaves Pick a production.
Top-Down Parsing.
Syntax Analyzer (Parser)
1 Pertemuan 7 & 8 Syntax Analysis (Parsing) Matakuliah: T0174 / Teknik Kompilasi Tahun: 2005 Versi: 1/6.
CSE 5317/4305 L3: Parsing #11 Parsing #1 Leonidas Fegaras.
CS 330 Programming Languages 09 / 25 / 2007 Instructor: Michael Eckmann.
1 Topic #4: Syntactic Analysis (Parsing) CSC 338 – Compiler Design and implementation Dr. Mohamed Ben Othman ( )
Chapter 2 (part) + Chapter 4: Syntax Analysis S. M. Farhad 1.
UMBC  CSEE   1 Chapter 4 Chapter 4 (b) parsing.
Formal grammars A formal grammar is a system for defining the syntax of a language by specifying sequences of symbols or sentences that are considered.
Compiler Construction Lecture Five: Parsing - Part Two CSC 2103: Compiler Construction Lecture Five: Parsing - Part Two Joyce Nakatumba-Nabende 1.
COMP 3438 – Part II - Lecture 4 Syntax Analysis I Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
Syntax Analysis By Noor Dhia Syntax analysis:- Syntax analysis or parsing is the most important phase of a compiler. The syntax analyzer considers.
Syntax and Semantics Structure of programming languages.
CSE 3302 Programming Languages
Context-free grammars
Building AST's for RPAL Programs
Parsing & Context-Free Grammars
Programming Languages Translator
CS510 Compiler Lecture 4.
Context-free grammars, derivation trees, and ambiguity
Chapter 3 Context-Free Grammar and Parsing
Introduction to Parsing (adapted from CS 164 at Berkeley)
COP4620 – Programming Language Translators Dr. Manuel E. Bermudez
4 (c) parsing.
ENERGY 211 / CME 211 Lecture 15 October 22, 2008.
Top-down parsing Module 06.3 COP4020 – Programming Language Concepts Dr. Manuel E. Bermudez.
Compiler Design 7. Top-Down Table-Driven Parsing
Lecture 7: Introduction to Parsing (Syntax Analysis)
First, Follow and Select sets
Building AST's for RPAL Programs
Operator precedence and AST’s
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Operator Precedence and Associativity
Operator Precedence and Associativity
Parsing CSCI 432 Computer Science Theory
Presentation transcript:

Parsing Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Principles Lecture 3

Context-Free Grammars Definition: A context-free grammar (CFG) is a quadruple G = (, , P, S), where all productions are of the form A → , for A   and   (  u  )*. Re-writing using grammar rules: –β A γ => β  γ if A →  (derivation).

String Derivations Left-most derivation: At each step, the left-most nonterminal is re-written. Right-most derivation: At each step, the right-most nonterminal is re-written.

Derivation Trees Derivation trees: Describe re-writes, independently of the order (left-most or right-most). Each tree branch matches a production rule in the grammar.

Derivation Trees Notes: 1)Leaves are terminals. 2)Bottom contour is the sentence. 3)Left recursion causes left branching. 4)Right recursion causes right branching.

Goal of Parsing Examine input string, determine whether it's legal. Equivalent to building derivation tree. Added benefit: tree embodies syntactic structure of input. Therefore, tree should be unique.

Ambiguous Grammars Definition: A CFG is ambiguous if there exist two different right-most (or left- most, but not both) derivations for some sentence z. (Equivalent) Definition: A CFG is ambiguous if there exist two different derivation trees for some sentence z.

Ambiguous Grammars Classic ambiguities: – Simultaneous left/right recursion: E → E + E → i –Dangling else problem: S → if E then S → if E then S else S →

Operator Precedence and Associativity Let’s build a CFG for expressions consisting of: –elementary identifier i. –+ and - (binary ops) have lowest precedence, and are left associative. –* and / (binary ops) have middle precedence, and are right associative. –+ and - (unary ops) have highest precedence, and are right associative.

Corresponding Grammar for Expressions E → E + T E consists of T's, → E - T separated by –’s and +'s → T (lowest precedence). T → F * T T consists of F's, → F / T separated by *'s and /'s → F (next precedence). F → - F F consists of a single P, → + F preceded by +'s and -'s. → P (next precedence). P → '(' E ')' P consists of a parenthesized E, → i or a single i (highest precedence).

Operator Precedence and Associativity Operator precedence: –The lower in the grammar, the higher the precedence. Operator Associativity: –Tie breaker for precedence. –Left recursion in the grammar means left associativity of the operator, left branching in the tree. –Right recursion in the grammar means right associativity of the operator, right branching in the tree.

Building Derivation Trees Sample Input : - + i - i * ( i + i ) / i + i (Human) derivation tree construction: Bottom-up. On each pass, scan entire expression, process operators with highest precedence (parentheses are highest). Lowest precedence operators are last, at the top of tree.

Abstract Syntax Trees AST is a condensed version of the derivation tree. No noise (intermediate nodes). String-to-tree transduction grammar: – rules of the form A → ω => 's'. Build 's' tree node, with one child per tree from each nonterminal in ω.

Example E → E + T => + → E - T => - → T T → F * T => * → F / T => / → F F → - F => neg → + F => + → P P → '(' E ')' → i => i

Sample Input : - + i - i * ( i + i ) / i + i

String-to-Tree Transduction We transduce from vocabulary of input symbols, to vocabulary of tree node names. Could eliminate construction of unary + node, anticipating semantics. F → - F => neg → + F // no more unary + node → P

The Game of Syntactic Dominoes The grammar: E → E+TT → P*TP → (E) → T → P → i The playing pieces: An arbitrary supply of each piece (one per grammar rule). The game board: Start domino at the top. Bottom dominoes are the "input."

The Game of Syntactic Dominoes Game rules: –Add game pieces to the board. –Match the flat parts and the symbols. –Lines are infinitely elastic. Object of the game: –Connect start domino with the input dominoes. –Leave no unmatched flat parts.

Parsing Strategies Same as for the game of syntactic dominoes. –“Top-down” parsing: start at the start symbol, work toward the input string. –“Bottom-up” parsing: start at the input string, work towards the goal symbol. In either strategy, can process the input left- to-right or right-to-left 

Top-Down Parsing Attempt a left-most derivation, by predicting the re-write that will match the remaining input. Use a string (a stack, really) from which the input can be derived.

Top-Down Parsing Start with S on the stack. At every step, two alternatives: 1) (the stack) begins with a terminal t. Match t against the first input symbol. 2) begins with a nonterminal A. Consult an OPF (Omniscient Parsing Function) to determine which production for A would lead to a match with the first symbol of the input. The OPF does the “predicting” in such a predictive parser.

Classical Top-Down Parsing Algorithm Push (Stack, S); while not Empty (Stack) do if Top(Stack)  then if Top(Stack) = Head(input) then input := tail(input) Pop(Stack) else error (Stack, input) else P:= OPF (Stack, input) Push (Pop(Stack), RHS(P)) od

Top-Down Parsing Most parsing methods impose bounds on the amount of stack lookback and input lookahead. For programming languages, a common choice is (1,1). We must define OPF (A,t), where A is the top element of the stack, and t is the first symbol on the input. Storage requirements: O(n 2 ), where n is the size of the grammar vocabulary (a few hundred).

LL(1) Grammars Definition: A CFG G is LL(1) (Left-to-right, Left-most, one- symbol lookahead) iff for all A , and for all A → , A → ,   , Select (A → ) ∩ Select (A → ) =  Previous example: Grammar is not LL(1). More later on why, and what do to about it.

Example: S → A{b,  } A → bAd{b} → {d,  } Disjoint! Grammar is LL(1)! db  SS → AS → P AA →A → bAdA → (At most) one production per entry.

Parsing Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Principles Lecture 3