Trees That Represent Grammars

Trees That Represent Grammars
Trees That Represent Grammars

Goals for This Section

To understand how to represent grammars as trees.
Goals To understand how to represent grammars as trees. To understand how to build grammar objects with production objects that have FSMs for the rights parts from the trees. This can be done by hand or automated We have to write a program to do this. The program is referred to as a “tree walker” We developed a code generator in the first lecture. We are now about to develop another one that creates grammar objects instead of generating virtual machine code

Grammars in Tree Form

We want to understand how to represent a grammar as a tree
Goals We want to understand how to represent a grammar as a tree Why do we want to do this? grammar G {EndOfFile) ->A. A->(a | b)? c*. grammar tree production production leftPartWithLookahead rightPart leftPart rightPart G EndOfFile A A concatenation zeroOrOne zeroOrMore alternation c Yellow for trees, orange for tokens a b The point of the tree is to be equivalent

Recall What Compilers Do?
A program written in text form Text in some programming language This is in concrete syntax The equivalent version in tree form Tree equivalent to the text This is an abstract syntax tree. It’s much much easier to manipulate than the text form.

Examples

We are the Compiler Writers
We design the trees. grammar production production leftPartWithLookahead rightPart leftPart rightPart G EndOfFile A A concatenation zeroOrOne zeroOrMore We could just change some labels alternation c a b grammar <- <- leftPartWithLookahead rightPart leftPart rightPart Dot for concatenation . G EndOfFile A A ? * | c a b

We are the Compiler Writers
We can also change the structure of the trees. grammar Do we need left and right parts? <- <- leftPartWithLookahead rightPart leftPart rightPart . G EndOfFile A A ? * | c grammar a b <- <- But we do have to settle on some design. I have changed the design about a dozen times so far. . G EndOfFile A A ? * | c a b

There are 2 Ways to Construct The Trees
Do it by hand. It’s doable for small examples. But surely, we don’t need this just on paper. We actually need to construct tree objects we used trees in the first assignment Write a program to do it. It can also handle large examples.

Constructing Trees by Hand
To aid the process, use LISP lists. grammar production production leftPartWithLookahead rightPart leftPart rightPart G EndOfFile A A concatenation zeroOrOne zeroOrMore (grammar (production (leftPartWithLookahead G EndOfFIle) (rightPart A)) (production (leftPart A) (rightPart (concatenation (zeroOrOne (alternation a b)) (zeroOrMore c))))) alternation c a b

Then Write Some Code To Convert
You may not recall but trees in the assignments had leaves that were tokens. A Tree class method fromArray: anObject "This assumes the array encode the tree in LISP notation; e.g. (root child1 child2 child3 ...)." (anObject isKindOf: Array) ifTrue: [ ^Tree new label: anObject first; children: (anObject rest collect: [:object | self fromArray: object])]. (anObject isKindOf: Number) ifTrue: [^Token new label: #Integer; symbol: anObject printString]. (anObject isKindOf: Character) ifTrue: [^Token new label: #Character; symbol: (String with: anObject)]. (anObject isKindOf: Symbol) ifTrue: [^Token new label: #Identifier; symbol: anObject]. "or Symbol" (anObject isKindOf: String) ifTrue: [^Token new label: #String; symbol: anObject]. "We can't distinguish a symbol from an identifier because all identifiers become symbols in array literals." ^self halt: 'Tree fromArray: did not expects the object ', anObject printString! ! A complete spec of the design is in the next assignment for those of you who wish to build your trees by hand.

Trees Constructed By A Program
Trees Constructed By A Program

Having a Program Construct Trees
You already have one; it’s the classes associated with scanner/parser. What you are missing are the scanner and parser tables. I’ll give you the tables. The tables were built from a description of the “grammar language”. Wilf: Open a browser on grammarTables.txt

The grammar for “grammars” describes What the language looks like
Summary of what we saw The grammar for “grammars” describes What the language looks like How to build trees for text written in that language. It is in fact a spec for what the trees look like too

What’s Different With The Notes
After iterating the descriptions, we decided that it might be better to create trees that are easier for a program to work with. It is in fact a spec for what the trees look like too We renamed all the tree and token labels with alphabetic names prefixed by “walk”. Instead of a tree labelled “grammar”, it’s actually “walkGrammar” Instead of a token labelled “identifier”, it’s actually “walkIdentifier” This way, we don’t need a mapping from the actual labels to preferred ones. WILF TODO: Need to make sure I finished doing this

The Tree Walker

In a grammar builder class, we need an instance method
The Tree Walker In a grammar builder class, we need an instance method walkTree: aTree …. along with a host of methods for walking specific trees walkIdentifier: aTree …. walkStar: aTree …. walkPlust: aTree ….

The Generic “walkTree” Method
It’s a simple switch; i.e., it switches to the appropriate method to execute JUST like a switch statement but done in an object-oriented style. walkTree: aTree ^self perform: aTree label with: aTree Each method is free to use recursion to walk the children. All it has to worry about is what to expect back and what to return in the method.

Example 1: Star walkStar: aTree | fsm |
How it’s described in parserGrammar.txt walkStar: RepetitionOption -> Expression "*" => walkStar: This could be anything Original version used to have '*' here. walkStar: aTree | fsm | fsm := self walkTree: (aTree child: 1). ^fsm star I know this kind of tree has 1 child What I would like to have when it gets back is an FSM But I need to build a star of that FSM.

Example 2: Minus walkMinus: aTree | fsm1 fsm2 |
How it’s described in parserGrammar.txt walkMinus: RepetitionOption -> Expression "-" Expression => walkMinus: anything anything Original version used to have '-' here. walkMinus: aTree | fsm1 fsm2 | fsm1 := self walkTree: (aTree child: 1). fsm2 := self walkTree: (aTree child: 2). ^fsm1 minus: fsm2 I know this kind of tree has 1 child What I would like to have when they get back is an FSM But I need to subtract the two FSMs.

Example 3: Or walkOr: aTree | fsms |
How it’s described in parserGrammar.txt walkOr: Alternation -> Concatenation -> Concatenation ("|" Concatenation)+ => walkAlternation: . anything anything … anything Original version used to have '|' here. walkOr: aTree | fsms | fsms := aTree children collect: [:child | self walkTree: child]. ^FiniteStateMachine or: fsms I know this kind of tree has an arbitrary number of children What I would like to have when they get back are independent FSMs But I need to or ALL the fsms The or: method had better work for 0 FSMs too…

Example 4: production . * If these guys return FSMs
grammar production production leftPartWithLookahead rightPart leftPart rightPart . G EndOfFile A A ? * | c If these guys return FSMs a b Then I shouldn’t use recursion to get these guys But I could require “leftPartWithLookahead” and “leftPart” returns a pair instead: WAIT A MINUTE: EndOfFile IS A SIMPLE CASE. I’ve seen situations where that portion looked like “All – alphabetics” where each part was a macro. OK. I’m ready to implement

Example 4: production walkProduction: aTree
grammar How it’s described in parserGrammar.txt production production Too long to show but the node is walkProduction: instead of production leftPartWithLookahead rightPart leftPart rightPart . G EndOfFile A A ? * walkProduction: aTree | leftPartArray rightPartFSM | leftPartArray := self walkTree: (aTree child: 1). rightPartFSM := self walkTree (aTree chile: 2). ^Production new leftPart: leftPartArray first; lookahead: leftPartArray second; rightPart: rightPartFsm | c a b Expected leftPart/lookahead pair Should it be stored in the grammar building now (or when walkGrammar: is done)? walkMacro: aTree “Left for you to do”. Should you share common code with the above? Presumably it should get stored as a macro…

Example 4: production . * walkRightPart: aTree
grammar production production leftPartWithLookahead rightPart leftPart rightPart . G EndOfFile A A ? * walkRightPart: aTree ^self walkTree: (aTree child: 1) | c a b walkLeftPart: aTree | token | token := aTree child: 1. ^Array with: token symbol with: nil walkLeftPartWithLookahead: aTree | token fsm | token := aTree child: 1. fsm := self walkTree: (aTree child: 1). ^Array with: token symbol with: fsm allInitialStateTransitionSymbol Use recursion, expect an FSM

Example 5 Recall: fromParserTransitionLabel: aTransitionName
fromScannerCollection: aCollection attributes: attributes walkString: aTree | string | “the tree must be a token” string := aTree symbol. ^self isParser ifTrue: [ name := TransitionName new symbol: string; attributes: self parserDefaults. ^self class fromParserTransitionLabel: name] ifFalse: [^self class fromScannerCollection: string attributes: self scannerDefaults] Knowledge about whether we are processing for a scanner or a parser has to be available to the object doing the processing (the sponsor/user|) – for us, the grammar builder.

How is walkIdentifier: different
Example 6 How is walkIdentifier: different walkIdentifier: aTree “Code not shown” What if it’s a macro? The grammar builder should know, it’s been storing them. If it is, it should have an FSM for it already and it could be complicated. You know that most of your operations are destructive. So if you do a “star” of it in one place and it’s part of a “minus” somewhere else, the original gets MUCKED UP. How do you avoid that? You make a copy

Additional FSM Operations
Some mechanism is needed to change the attributes associated with an FSM’s transitions. keywords attributes IdentificationCode -> Expression [look node noStack]. {..} star look node noStack c Expression -> … a whole bunch of things … -> {identifier}. The noStack keyword is a token in which the label is #noStack and the symbol also is #noStack or A b braces mean “look” walkAttributes: aTree | expression attributeTokens attributeSymbols | expression := aTree child: 1. attributeTokens := aTree children rest. “as in Lisp’s first and rest” attributeSymbols := attributesTokens collect: [:token | token symbol] fsm := self walkTree: expression. ^fsm override: attributeSymbols walkLook: aTree ^(self walkTree: (aTree child: 1)) override: #(#look). New FSM operation “override” needed

A Word About How to Work in Smalltalk
Implement just “walkTree:” and then run with the real data. The first label you encounter will result in a halt with the message; e.g., “walkOr:” not understood. Just implement this method and run again. Even better, implement the missing method, browse down the call stack and restart. By the time you execute without a halt, you will have implemented everything you need for that example grammar. Try again on another grammar

Trees That Represent Grammars

Similar presentations

Presentation on theme: "Trees That Represent Grammars"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Trees That Represent Grammars

Similar presentations

Presentation on theme: "Trees That Represent Grammars"— Presentation transcript:

Similar presentations

About project

Feedback