ASTs, Grammars, Parsing, Tree traversals

Slides:



Advertisements
Similar presentations
GRAMMARS & PARSING Lecture 7 CS2110 – Fall
Advertisements

GRAMMARS & PARSING Lecture 7 CS2110 – Fall Java Tips 2  Declare fields and methods public if they are to be visible outside the class; helper methods.
Topic 15 Implementing and Using Stacks
9/27/2006Prof. Hilfinger, Lecture 141 Syntax-Directed Translation Lecture 14 (adapted from slides by R. Bodik)
Context-Free Grammars Lecture 7
Chapter 2 A Simple Compiler
Class 4: Stacks. cis 335 Fall 2001 Barry Cohen What is a stack? n A stack is an ordered sequence of items, of which only the last (‘top’) item can be.
ADTS, GRAMMARS, PARSING, TREE TRAVERSALS Lecture 12 CS2110 – Spring
Binary Trees. Node structure Data A data field and two pointers, left and right.
Grammars and Parsing. Sentence  Noun Verb Noun Noun  boys Noun  girls Noun  dogs Verb  like Verb  see Grammars Grammar: set of rules for generating.
Chapter Chapter Summary Introduction to Trees Applications of Trees (not currently included in overheads) Tree Traversal Spanning Trees Minimum.
2.2 A Simple Syntax-Directed Translator Syntax-Directed Translation 2.4 Parsing 2.5 A Translator for Simple Expressions 2.6 Lexical Analysis.
1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.
CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars.
1 CSC 222: Computer Programming II Spring 2005 Stacks and recursion  stack ADT  push, pop, peek, empty, size  ArrayList-based implementation, java.util.Stack.
(c) University of Washington20d-1 CSC 143 Java Applications of Trees.
Parsing arithmetic expressions Reading material: These notes and an implementation (see course web page). The best way to prepare [to be a programmer]
1 Top Down Parsing. CS 412/413 Spring 2008Introduction to Compilers2 Outline Top-down parsing SLL(1) grammars Transforming a grammar into SLL(1) form.
Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Binary Trees 2 Overview Trees. Terminology. Traversal of Binary Trees. Expression Trees. Binary Search Trees.
Tree (new ADT) Terminology:  A tree is a collection of elements (nodes)  Each node may have 0 or more successors (called children)  How many does a.
Interpretation Environments and Evaluation. CS 354 Spring Translation Stages Lexical analysis (scanning) Parsing –Recognizing –Building parse tree.
ADTS, GRAMMARS, PARSING, TREE TRAVERSALS Lecture 12 CS2110 – Spring
Chapter 2. Design of a Simple Compiler J. H. Wang Sep. 21, 2015.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 3: Introduction to Syntactic Analysis.
GRAMMARS & PARSING Lecture 8 CS2110 – Spring If you are going to form a group for A2, please do it before tomorrow (Friday) noon.
Top-down Parsing lecture slides from C OMP 412 Rice University Houston, Texas, Fall 2001.
Top-down Parsing. 2 Parsing Techniques Top-down parsers (LL(1), recursive descent) Start at the root of the parse tree and grow toward leaves Pick a production.
1 A Simple Syntax-Directed Translator CS308 Compiler Theory.
TREES K. Birman’s and G. Bebis’s Slides. Tree Overview 2  Tree: recursive data structure (similar to list)  Each cell may have zero or more successors.
Generics with ArrayList and HashSet 1 ge·ner·ic adjective \jə ̇ˈ nerik, -rēk\ relating or applied to or descriptive of all members of a genus, species,
Wednesday, March 2 Homework #2 is posted Homework #2 is posted Due Friday, March 4 (BOC) Due Friday, March 4 (BOC) Program #5 is posted Program #5 is posted.
1 Introduction to Parsing. 2 Outline l Regular languages revisited l Parser overview Context-free grammars (CFG ’ s) l Derivations.
ADTS, GRAMMARS, PARSING, TREE TRAVERSALS Lecture 13 CS2110 – Spring
CMSC 330: Organization of Programming Languages Pushdown Automata Parsing.
Chapter 3 Lists, Stacks, Queues. Abstract Data Types A set of items – Just items, not data types, nothing related to programming code A set of operations.
Comp 411 Principles of Programming Languages Lecture 3 Parsing
CSE 3302 Programming Languages
Stacks Access is allowed only at one point of the structure, normally termed the top of the stack access to the most recently added item only Operations.
Grammars and Parsing.
A Simple Syntax-Directed Translator
Introduction to Parsing
CS510 Compiler Lecture 4.
Introduction to Parsing (adapted from CS 164 at Berkeley)
Textbook:Modern Compiler Design
Trees Lecture 12 CS2110 – Spring 2017.
Even-Even Devise a grammar that generates strings with even number of a’s and even number of b’s.
CO4301 – Advanced Games Development Week 2 Introduction to Parsing
PROGRAMMING LANGUAGES
Context-free Languages
ASTs, Grammars, Parsing, Tree traversals
Compiler Construction
Trees Lecture 12 CS2110 – Fall 2016.
CS 363 Comparative Programming Languages
4 (c) parsing.
Stack Data Structure, Reverse Polish Notation, Homework 7
CSE 3302 Programming Languages
Syntax-Directed Translation
ENERGY 211 / CME 211 Lecture 15 October 22, 2008.
ADTs, Grammars, Parsing, Tree traversals
Lecture 7: Introduction to Parsing (Syntax Analysis)
Syntax-Directed Translation
ASTs, Grammars, Parsing, Tree traversals
Announcements Prelim 1 on Tuesday! A4 will be posted today
CSC 143 Java Applications of Trees.
Trees, part 2 Lecture 13 CS2110 – Spring 2019
The Recursive Descent Algorithm
CO4301 – Advanced Games Development Week 3 Parsing Continued
Trees Lecture 10 CS2110 – Spring 2013.
Parsing CSCI 432 Computer Science Theory
Presentation transcript:

ASTs, Grammars, Parsing, Tree traversals Lecture 13 CS2110 – Spring 2017

Keep an eye out for rescheduling details. Prelim 1 tonight! Goldwin Smith 132 Uris Hall G01 SNOW DAY! Keep an eye out for rescheduling details. 5:30 Last name A, B, C Last name D..Z 7:30 Last name W..Z Last name A..V You should already know which time slot you’re taking the exam. Bring your Cornell ID. You will need it to get into the exam room. No recitation this week. Tuesday recitations are office hours open to all unless otherwise noted on Piazza.

Today: Parse Trees, Parsing, and Grammars Parse trees: text, section 23.36 Definition of Java Language, sometimes useful: docs.oracle.com/javase/specs/jls/se8/html/index.html Grammar for most of Java, for those who are curious: docs.oracle.com/javase/specs/jls/se8/html/jls-19.html Tree traversals –preorder, inorder, postorder: text, sections 23.13 .. 23.15.

Expression trees From last time: we can draw a syntax tree for 2 * 3 – (1 + 2 * 4) - * + 2 3 1 * 2 4 interface Expr { /* evalute this Expr and return the value*/ public abstract int eval(); /* return an infix representation */ public abstract String infix(); }

Tree traversals Other standard kinds of traversals preorder traversal Process root Process left subtree Process right subtree postorder traversal level-order traversal Not recursive: uses a queue (we’ll cover this later) “Walking” over the whole tree is a tree traversal Done often enough that there are standard names Previous example: in-order traversal Process left subtree Process root Process right subtree Note: Can do other processing besides printing

tree for (2 + 3) * (1 + 4) * + + 2 3 1 4 prefix and postfix notation proposed by Jan Lukasiewicz in 1951 Preorder traversal: 1. Visit the root 2. Visit left subtree, in preorder 3. Visit right subtree, in preorder + 2 3 + 1 4 *

tree for (2 + 3) * (1 + 4) Postfix is easy to compute. Process elements left to right. Number? Push it on a stack Binary operator? Remove two top stack elements, apply operator to it, push result on stack Unary operator? Remove top stack element, apply operator to it, push result on stack * + + 2 3 1 4 Postfix notation 2 3 + 1 4 + *

tree for (2 + 3) * (1 + 4) * + + 2 3 1 4 Postorder traversal: 1. Visit left subtree, in postorder 2. Visit right subtree, in postorder 3. Visit the root In about 1974, Gries paid $300 for an HP calculator, which had some memory and used postfix notation! Still works. Come up to see it. 2 3 + 1 4 + * Postfix notation (also called Reverse Polish Notation)

tree for (2 + 3) * (1 + 4) * + + 2 3 1 4 Inorder traversal: 1. Visit left subtree, in inorder 2. Visit the root 3. Visit right subtree, in inorder To help out, put parens around expressions with operators (2 + 3) (1 + 4) *

Prefix and Postfix Notation Not as strange as it looks! add(a, b) is prefix notation for the binary add operator! (in some languages, this is simply written add a b) n! is a postfix application of the factorial operator! No parentheses needed! Infix Prefix Postfix (5 + 3) * 4 5 + (3 * 4) 1+2+3*4-7 * + 5 3 4 + 5 * 3 4 + 1 + 2 - * 3 4 7 5 3 + 4 * 5 3 4 * + 1 2 + 3 4 * + 7 -

Expression trees: in code public interface Expr { String infix(); // returns an infix representation int eval(); // returns the value of the expression } public class Int implements Expr { private int v; public int eval() { return v; } public String infix() { return “ “ + v + “ “; } } public class Sum implements Expr { private Expr left, right; public int eval() { return left.eval() + right.eval(); } public String infix() { return "(" + left.infix() + “+” + right.infix() + ")";

Motivation for grammars The cat ate the rat. The cat ate the rat slowly. The small cat ate the big rat slowly. The small cat ate the big rat on the mat slowly. The small cat that sat in the hat ate the big rat on the mat slowly, then got sick. … Not all sequences of words are legal sentences The ate cat rat the How many legal sentences are there? How many legal Java programs? How do we know what programs are legal? http://docs.oracle.com/javase/specs/jls/se8/html/index.html

A Grammar Sentence  Noun Verb Noun Noun  goats Noun  astrophysics Noun  bunnies Verb  like | see White space between words does not matter A very boring grammar because the set of Sentences is finite (exactly 18 sentences) Our sample grammar has these rules: A Sentence can be a Noun followed by a Verb followed by a Noun A Noun can be goats or astrophysics or bunnies A Verb can be like or see

A Grammar Sentence  Noun Verb Noun Noun  goats Noun  astrophysics Noun  bunnies Verb  like Verb  see Grammar: set of rules for generating sentences of a language. Examples of Sentence: goats see bunnies bunnies like astrophysics The words goats, astrophysics, bunnies, like, see are called tokens or terminals The words Sentence, Noun, Verb are called nonterminals

A recursive grammar Sentence  Sentence and Sentence Sentence  Sentence or Sentence Sentence  Noun Verb Noun Noun  goats Noun  astrophysics Noun  bunnies Verb  like | see This grammar is more interesting than previous one because the set of Sentences is infinite What makes this set infinite? Answer: Recursive definition of Sentence

Aside What if we want to add a period at the end of every sentence? Sentence  Sentence and Sentence . Sentence  Sentence or Sentence . Sentence  Noun Verb Noun . Noun  … Does this work? No! This produces sentences like: goats like bunnies. and bunnies like astrophysics. . Sentence Sentence Sentence

Sentences with periods PunctuatedSentence  Sentence . Sentence  Sentence and Sentence Sentence  Sentence or Sentence Sentence  Noun Verb Noun Noun  goats Noun  astrophysics Noun  bunnies Verb  like Verb  see New rule adds a period only at end of sentence. Tokens are the 7 words plus the period (.) Grammar is ambiguous: goats like bunnies and bunnies like goats or bunnies like astrophysics

Grammars for programming languages Grammar describes every possible legal expression You could use the grammar for Java to list every possible Java program. (It would take forever.) Grammar tells the Java compiler how to “parse” a Java program and defines what is syntactically legal (the compiler accepts) docs.oracle.com/javase/specs/jls/se8/html/jls-2.html#jls-2.3 docs.oracle.com/javase/specs/jls/se8/html/jls-19.html

Grammar for simple expressions (not the best) E  integer E  ( E + E ) Simple expressions: An E can be an integer. An E can be ‘(’ followed by an E followed by ‘+’ followed by an E followed by ‘)’ Set of expressions defined by this grammar is a recursively-defined set Is language finite or infinite? Do recursive grammars always yield infinite languages? Some legal expressions: 2 (3 + 34) ((4+23) + 89) Some illegal expressions: (3 3 + 4 Tokens of this grammar: ( + ) and any integer

Parsing E  integer E  ( E + E ) Use a grammar in two ways: A grammar defines a language (i.e. the set of properly structured sentences) A grammar can be used to parse a sentence (thus, checking if a string is a sentence is in the language) To parse a sentence is to build a parse tree: much like diagramming a sentence Example: Show that ((4+23) + 89) is a valid expression E by building a parse tree E ( ) + 89 4 23

Ambiguity E + 1 2 Grammar is ambiguous if it allows two parse trees for a sentence. The grammar below, using no parentheses, is ambiguous. The two parse trees to right show this. We don’t know which + to evaluate first in the expression 1 + 2 + 3 3 21 E + 1 2 E  integer E  E + E 3 E  integer E  E + E

Recursive descent parsing Write a set of mutually recursive methods to check if a sentence is in the language (show how to generate parse tree later). One method for each nonterminal of the grammar. The method is completely determined by the rules for that nonterminal. On the next pages, we give a high-level version of the method for nonterminal E: E  integer E  ( E + E )

Parsing an E E  integer E  ( E + E ) /** Unprocessed input starts an E. Recognize that E, throwing away each piece from the input as it is recognized. Return false if error is detected and true if no errors. Upon return, processed tokens have been removed from input. */ public boolean parseE() ( 2 + ( 4 + 8 ) ) + 9 ) before call: already processed unprocessed ( 2 + ( 4 + 8 ) ) + 9 ) after call: already processed unprocessed (call returns true)

Expression trees: Class Hierarchy (interface) Expr Negation Int BinaryExpression (abstract) Sum Product Quotient public interface Expr { String infix(); // returns an infix representation int eval(); // returns the value of the expression // could easily also include prefix, postfix }

Specification: /** Unprocessed input starts an E. …*/ E  integer E  ( E + E ) public boolean parseE() { if (first token is an integer) remove it from input and return true; if (first token is not ‘(‘ ) return false else remove it from input; if (!parseE()) return false; if (first token is not ‘+‘ ) return false else remove it from input; if (first token is not ‘)‘ ) return false else remove it from input; return true; }

Illustration of parsing to check syntax E  integer E  ( E + E ) E E E ( 1 + ( 2 + 4 ) )

The scanner constructs tokens An object scanner of class Scanner is in charge of the input String. It constructs the tokens from the String as necessary. e.g. from the string “1464+634” build the token “1464”, the token “+”, and the token “634”. It is ready to work with the part of the input string that has not yet been processed and has thrown away the part that is already processed, in left-to-right fashion. already processed unprocessed ( 2 + ( 4 + 8 ) + 9 )

Change parser to generate a tree /** … Return an Expr for an E , or null if the string is illegal */ public Expr parseE() { if (next token is integer) { int val= the value of the token; remove the token from the input; return new Int(val); } if (next token is ‘(‘) remove it; else return null; Expr e1 = parseE(); if (next token is ‘+’) remove it; else return null; Expr e2 = parseE(); if (next token is ‘)’) remove it; else return null; return new Sum(e1, e2); E  integer E  ( E + E )

Grammar that gives precedence to * over + E -> T { + T } T -> F { * F } F -> integer F -> ( E ) Notation: { xxx } means 0 or more occurrences of xxx. E: Expression T: Term F: Factor E E T T T T F F F F F F + 3 * 4 says do * first + 3 * 4 Try to do + first, can’t complete tree

Does recursive descent always work? Some grammars cannot be used for recursive descent Trivial example (causes infinite recursion): S  b S  Sa Can rewrite grammar S  bA A  a A  aA For some constructs, recur- sive descent is hard to use Other parsing techniques exist – take the compiler writing course