Trees That Represent Grammars

Slides:



Advertisements
Similar presentations
Semantics Static semantics Dynamic semantics attribute grammars
Advertisements

Context-Free Grammars Lecture 7
Dr. Muhammed Al-Mulhem 1ICS ICS 535 Design and Implementation of Programming Languages Part 1 Fundamentals (Chapter 4) Denotational Semantics ICS.
2.2 A Simple Syntax-Directed Translator Syntax-Directed Translation 2.4 Parsing 2.5 A Translator for Simple Expressions 2.6 Lexical Analysis.
© The McGraw-Hill Companies, 2006 Chapter 4 Implementing methods.
Lexical Analysis Hira Waseem Lecture
Profs. Necula CS 164 Lecture Top-Down Parsing ICOM 4036 Lecture 5.
Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
COP 4620 / 5625 Programming Language Translation / Compiler Writing Fall 2003 Lecture 3, 09/11/2003 Prof. Roy Levow.
The Functions and Purposes of Translators Syntax (& Semantic) Analysis.
Muhammad Idrees, Lecturer University of Lahore 1 Top-Down Parsing Top down parsing can be viewed as an attempt to find a leftmost derivation for an input.
Smalltalk in a.NET World How to write a Smalltalk compiler without writing a VM John Brant
Overview of Previous Lesson(s) Over View  Symbol tables are data structures that are used by compilers to hold information about source-program constructs.
Top-down Parsing lecture slides from C OMP 412 Rice University Houston, Texas, Fall 2001.
Top-down Parsing. 2 Parsing Techniques Top-down parsers (LL(1), recursive descent) Start at the root of the parse tree and grow toward leaves Pick a production.
Top-Down Parsing.
Code Generation CPSC 388 Ellen Walker Hiram College.
Copyright © 2006 Addison-Wesley. All rights reserved.1-1 ICS 410: Programming Languages Chapter 3 : Describing Syntax and Semantics Denotational Semantics.
CS 330 Programming Languages 09 / 25 / 2007 Instructor: Michael Eckmann.
CS 2130 Lecture 18 Bottom-Up Parsing or Shift-Reduce Parsing Warning: The precedence table given for the Wff grammar is in error.
LECTURE 10 Semantic Analysis. REVIEW So far, we’ve covered the following: Compilation methods: compilation vs. interpretation. The overall compilation.
Department of Software & Media Technology
Comp 411 Principles of Programming Languages Lecture 3 Parsing
CS 3304 Comparative Languages
Lecture 9 Symbol Table and Attributed Grammars
Intro to compilers Based on end of Ch. 1 and start of Ch. 2 of textbook, plus a few additional references.
Programming what is C++
A Simple Syntax-Directed Translator
Constructing Precedence Table
Chapter 3 Lexical Analysis.
Lecture 5 Transition Diagrams
CS510 Compiler Lecture 4.
Chapter 2 :: Programming Language Syntax
Introduction to Parsing (adapted from CS 164 at Berkeley)
Compiler Construction (CS-636)
Compilers Welcome to a journey to CS419 Lecture5: Lexical Analysis:
Table-driven parsing Parsing performed by a finite state machine.
Parsing — Part II (Top-down parsing, left-recursion removal)
Trees.
4 (c) parsing.
Department of Software & Media Technology
Lexical and Syntax Analysis
Syntax-Directed Translation
Feedback from Assignment 1
CSE 3302 Programming Languages
Subject Name:Sysytem Software Subject Code: 10SCS52
Lecture 7: Introduction to Parsing (Syntax Analysis)
R.Rajkumar Asst.Professor CSE
Decidability Turing Machines Coded as Binary Strings
CS 3304 Comparative Languages
Decidability Turing Machines Coded as Binary Strings
4b Lexical analysis Finite Automata
CS 3304 Comparative Languages
Trees.
Chapter 2 :: Programming Language Syntax
4b Lexical analysis Finite Automata
The Recursive Descent Algorithm
Chapter 2 :: Programming Language Syntax
Introduction to Computer Science
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Grammars and Finite State Machines and Regular Expressions
Finishing Tool Construction
Building Readahead FSMs for Grammars
Building Readback FSMs for Readahead FSMs
Intro to compilers Based on end of Ch. 1 and start of Ch. 2 of textbook, plus a few additional references.
Class Relation.
Readahead FSMs, Readback FSMs, and Reduce States
Scanners/Parsers in a Nutshell
Overview of the Course.
Semantic Routines.
Presentation transcript:

Trees That Represent Grammars 95.3002 Trees That Represent Grammars

95.3002 Goals for This Section

To understand how to represent grammars as trees. Goals To understand how to represent grammars as trees. To understand how to build grammar objects with production objects that have FSMs for the rights parts from the trees. This can be done by hand or automated We have to write a program to do this. The program is referred to as a “tree walker” We developed a code generator in the first lecture. We are now about to develop another one that creates grammar objects instead of generating virtual machine code

95.3002 Grammars in Tree Form

We want to understand how to represent a grammar as a tree Goals We want to understand how to represent a grammar as a tree Why do we want to do this? grammar G {EndOfFile) ->A. A->(a | b)? c*. grammar tree production production leftPartWithLookahead rightPart leftPart rightPart G EndOfFile A A concatenation zeroOrOne zeroOrMore alternation c Yellow for trees, orange for tokens a b The point of the tree is to be equivalent

Recall What Compilers Do? A program written in text form Text in some programming language This is in concrete syntax The equivalent version in tree form Tree equivalent to the text This is an abstract syntax tree. It’s much much easier to manipulate than the text form.

95.3002 Examples

We are the Compiler Writers We design the trees. grammar production production leftPartWithLookahead rightPart leftPart rightPart G EndOfFile A A concatenation zeroOrOne zeroOrMore We could just change some labels alternation c a b grammar <- <- leftPartWithLookahead rightPart leftPart rightPart Dot for concatenation . G EndOfFile A A ? * | c a b

We are the Compiler Writers We can also change the structure of the trees. grammar Do we need left and right parts? <- <- leftPartWithLookahead rightPart leftPart rightPart . G EndOfFile A A ? * | c grammar a b <- <- But we do have to settle on some design. I have changed the design about a dozen times so far. . G EndOfFile A A ? * | c a b

There are 2 Ways to Construct The Trees Do it by hand. It’s doable for small examples. But surely, we don’t need this just on paper. We actually need to construct tree objects we used trees in the first assignment Write a program to do it. It can also handle large examples.

Constructing Trees by Hand To aid the process, use LISP lists. grammar production production leftPartWithLookahead rightPart leftPart rightPart G EndOfFile A A concatenation zeroOrOne zeroOrMore (grammar (production (leftPartWithLookahead G EndOfFIle) (rightPart A)) (production (leftPart A) (rightPart (concatenation (zeroOrOne (alternation a b)) (zeroOrMore c))))) alternation c a b

Then Write Some Code To Convert You may not recall but trees in the assignments had leaves that were tokens. A Tree class method fromArray: anObject "This assumes the array encode the tree in LISP notation; e.g. (root child1 child2 child3 ...)." (anObject isKindOf: Array) ifTrue: [ ^Tree new label: anObject first; children: (anObject rest collect: [:object | self fromArray: object])]. (anObject isKindOf: Number) ifTrue: [^Token new label: #Integer; symbol: anObject printString]. (anObject isKindOf: Character) ifTrue: [^Token new label: #Character; symbol: (String with: anObject)]. (anObject isKindOf: Symbol) ifTrue: [^Token new label: #Identifier; symbol: anObject]. "or Symbol" (anObject isKindOf: String) ifTrue: [^Token new label: #String; symbol: anObject]. "We can't distinguish a symbol from an identifier because all identifiers become symbols in array literals." ^self halt: 'Tree fromArray: did not expects the object ', anObject printString! ! A complete spec of the design is in the next assignment for those of you who wish to build your trees by hand.

Trees Constructed By A Program 95.3002 Trees Constructed By A Program

Having a Program Construct Trees You already have one; it’s the classes associated with scanner/parser. What you are missing are the scanner and parser tables. I’ll give you the tables. The tables were built from a description of the “grammar language”. Wilf: Open a browser on grammarTables.txt

The grammar for “grammars” describes What the language looks like Summary of what we saw The grammar for “grammars” describes What the language looks like How to build trees for text written in that language. It is in fact a spec for what the trees look like too

What’s Different With The Notes After iterating the descriptions, we decided that it might be better to create trees that are easier for a program to work with. It is in fact a spec for what the trees look like too We renamed all the tree and token labels with alphabetic names prefixed by “walk”. Instead of a tree labelled “grammar”, it’s actually “walkGrammar” Instead of a token labelled “identifier”, it’s actually “walkIdentifier” This way, we don’t need a mapping from the actual labels to preferred ones. WILF TODO: Need to make sure I finished doing this

95.3002 The Tree Walker

In a grammar builder class, we need an instance method The Tree Walker In a grammar builder class, we need an instance method walkTree: aTree …. along with a host of methods for walking specific trees walkIdentifier: aTree …. walkStar: aTree …. walkPlust: aTree ….

The Generic “walkTree” Method It’s a simple switch; i.e., it switches to the appropriate method to execute JUST like a switch statement but done in an object-oriented style. walkTree: aTree ^self perform: aTree label with: aTree Each method is free to use recursion to walk the children. All it has to worry about is what to expect back and what to return in the method.

Example 1: Star walkStar: aTree | fsm | How it’s described in parserGrammar.txt walkStar: RepetitionOption -> Expression "*" => walkStar: This could be anything Original version used to have '*' here. walkStar: aTree | fsm | fsm := self walkTree: (aTree child: 1). ^fsm star I know this kind of tree has 1 child What I would like to have when it gets back is an FSM But I need to build a star of that FSM.

Example 2: Minus walkMinus: aTree | fsm1 fsm2 | How it’s described in parserGrammar.txt walkMinus: RepetitionOption -> Expression "-" Expression => walkMinus: anything anything Original version used to have '-' here. walkMinus: aTree | fsm1 fsm2 | fsm1 := self walkTree: (aTree child: 1). fsm2 := self walkTree: (aTree child: 2). ^fsm1 minus: fsm2 I know this kind of tree has 1 child What I would like to have when they get back is an FSM But I need to subtract the two FSMs.

Example 3: Or walkOr: aTree | fsms | How it’s described in parserGrammar.txt walkOr: Alternation -> Concatenation -> Concatenation ("|" Concatenation)+ => walkAlternation: . anything anything … anything Original version used to have '|' here. walkOr: aTree | fsms | fsms := aTree children collect: [:child | self walkTree: child]. ^FiniteStateMachine or: fsms I know this kind of tree has an arbitrary number of children What I would like to have when they get back are independent FSMs But I need to or ALL the fsms The or: method had better work for 0 FSMs too…

Example 4: production . * If these guys return FSMs grammar production production leftPartWithLookahead rightPart leftPart rightPart . G EndOfFile A A ? * | c If these guys return FSMs a b Then I shouldn’t use recursion to get these guys But I could require “leftPartWithLookahead” and “leftPart” returns a pair instead: WAIT A MINUTE: EndOfFile IS A SIMPLE CASE. I’ve seen situations where that portion looked like “All – alphabetics” where each part was a macro. OK. I’m ready to implement

Example 4: production walkProduction: aTree grammar How it’s described in parserGrammar.txt production production Too long to show but the node is walkProduction: instead of production leftPartWithLookahead rightPart leftPart rightPart . G EndOfFile A A ? * walkProduction: aTree | leftPartArray rightPartFSM | leftPartArray := self walkTree: (aTree child: 1). rightPartFSM := self walkTree (aTree chile: 2). ^Production new leftPart: leftPartArray first; lookahead: leftPartArray second; rightPart: rightPartFsm | c a b Expected leftPart/lookahead pair Should it be stored in the grammar building now (or when walkGrammar: is done)? walkMacro: aTree “Left for you to do”. Should you share common code with the above? Presumably it should get stored as a macro…

Example 4: production . * walkRightPart: aTree grammar production production leftPartWithLookahead rightPart leftPart rightPart . G EndOfFile A A ? * walkRightPart: aTree ^self walkTree: (aTree child: 1) | c a b walkLeftPart: aTree | token | token := aTree child: 1. ^Array with: token symbol with: nil walkLeftPartWithLookahead: aTree | token fsm | token := aTree child: 1. fsm := self walkTree: (aTree child: 1). ^Array with: token symbol with: fsm allInitialStateTransitionSymbol Use recursion, expect an FSM

Example 5 Recall: fromParserTransitionLabel: aTransitionName fromScannerCollection: aCollection attributes: attributes walkString: aTree | string | “the tree must be a token” string := aTree symbol. ^self isParser ifTrue: [ name := TransitionName new symbol: string; attributes: self parserDefaults. ^self class fromParserTransitionLabel: name] ifFalse: [^self class fromScannerCollection: string attributes: self scannerDefaults] Knowledge about whether we are processing for a scanner or a parser has to be available to the object doing the processing (the sponsor/user|) – for us, the grammar builder.

How is walkIdentifier: different Example 6 How is walkIdentifier: different walkIdentifier: aTree “Code not shown” What if it’s a macro? The grammar builder should know, it’s been storing them. If it is, it should have an FSM for it already and it could be complicated. You know that most of your operations are destructive. So if you do a “star” of it in one place and it’s part of a “minus” somewhere else, the original gets MUCKED UP. How do you avoid that? You make a copy

Additional FSM Operations Some mechanism is needed to change the attributes associated with an FSM’s transitions. keywords attributes IdentificationCode -> Expression [look node noStack]. {..} star look node noStack c Expression -> … a whole bunch of things … -> {identifier}. The noStack keyword is a token in which the label is #noStack and the symbol also is #noStack or A b braces mean “look” walkAttributes: aTree | expression attributeTokens attributeSymbols | expression := aTree child: 1. attributeTokens := aTree children rest. “as in Lisp’s first and rest” attributeSymbols := attributesTokens collect: [:token | token symbol] fsm := self walkTree: expression. ^fsm override: attributeSymbols walkLook: aTree ^(self walkTree: (aTree child: 1)) override: #(#look). New FSM operation “override” needed

A Word About How to Work in Smalltalk Implement just “walkTree:” and then run with the real data. The first label you encounter will result in a halt with the message; e.g., “walkOr:” not understood. Just implement this method and run again. Even better, implement the missing method, browse down the call stack and restart. By the time you execute without a halt, you will have implemented everything you need for that example grammar. Try again on another grammar