CSE 5317/4305 L5: Abstract Syntax1 Abstract Syntax Leonidas Fegaras.

Slides:



Advertisements
Similar presentations
Compiler Construction
Advertisements

Abstract Syntax Mooly Sagiv html:// 1.
1 Compiler Construction Intermediate Code Generation.
1 JavaCUP JavaCUP (Construct Useful Parser) is a parser generator Produce a parser written in java, itself is also written in Java; There are many parser.
Exercise: Balanced Parentheses
CSE 5317/4305 L4: Parsing #21 Parsing #2 Leonidas Fegaras.
CSE 5317/4305 L5: Abstract Syntax1 Abstract Syntax Leonidas Fegaras.
9/27/2006Prof. Hilfinger, Lecture 141 Syntax-Directed Translation Lecture 14 (adapted from slides by R. Bodik)
Context-Free Grammars Lecture 7
Parsing III (Eliminating left recursion, recursive descent parsing)
Environments and Evaluation
Yu-Chen Kuo1 Chapter 2 A Simple One-Pass Compiler.
Syntax Analysis Mooly Sagiv Textbook:Modern Compiler Design Chapter 2.2 (Partial)
Chapter 2 A Simple Compiler
UMBC Introduction to Compilers CMSC 431 Shon Vick 01/28/02.
Chapter 2 Syntax A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth.
2.2 A Simple Syntax-Directed Translator Syntax-Directed Translation 2.4 Parsing 2.5 A Translator for Simple Expressions 2.6 Lexical Analysis.
Syntax & Semantic Introduction Organization of Language Description Abstract Syntax Formal Syntax The Way of Writing Grammars Formal Semantic.
1 Abstract Syntax Tree--motivation The parse tree –contains too much detail e.g. unnecessary terminals such as parentheses –depends heavily on the structure.
CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars.
1 Chapter 2 A Simple Compiler. 2 Outlines 2.1 The Structure of a Micro Compiler 2.2 A Micro Scanner 2.3 The Syntax of Micro 2.4 Recursive Descent Parsing.
Language Translators - Lee McCluskey LANGUAGE TRANSLATORS: WEEK 21 LECTURE: Using JavaCup to create simple interpreters
1 Semantic Analysis Aaron Bloomfield CS 415 Fall 2005.
1 Top Down Parsing. CS 412/413 Spring 2008Introduction to Compilers2 Outline Top-down parsing SLL(1) grammars Transforming a grammar into SLL(1) form.
CS Describing Syntax CS 3360 Spring 2012 Sec Adapted from Addison Wesley’s lecture notes (Copyright © 2004 Pearson Addison Wesley)
Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 2 Syntax A language that is simple to parse.
PART I: overview material
Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
3-1 Chapter 3: Describing Syntax and Semantics Introduction Terminology Formal Methods of Describing Syntax Attribute Grammars – Static Semantics Describing.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 8: Semantic Analysis and Symbol Tables.
Interpretation Environments and Evaluation. CS 354 Spring Translation Stages Lexical analysis (scanning) Parsing –Recognizing –Building parse tree.
Review 1.Lexical Analysis 2.Syntax Analysis 3.Semantic Analysis 4.Code Generation 5.Code Optimization.
Formal Semantics Chapter Twenty-ThreeModern Programming Languages, 2nd ed.1.
CSE 425: Syntax II Context Free Grammars and BNF In context free grammars (CFGs), structures are independent of the other structures surrounding them Backus-Naur.
Chapter 2. Design of a Simple Compiler J. H. Wang Sep. 21, 2015.
CPS 506 Comparative Programming Languages Syntax Specification.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
Chapter 3 Describing Syntax and Semantics
CSE 5317/4305 L6: Semantic Analysis1 Semantic Analysis Leonidas Fegaras.
Top-down Parsing lecture slides from C OMP 412 Rice University Houston, Texas, Fall 2001.
Top-down Parsing. 2 Parsing Techniques Top-down parsers (LL(1), recursive descent) Start at the root of the parse tree and grow toward leaves Pick a production.
1 A Simple Syntax-Directed Translator CS308 Compiler Theory.
Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 2 Syntax A language that is simple to parse.
CSE 5317/4305 L3: Parsing #11 Parsing #1 Leonidas Fegaras.
C H A P T E R T W O Linking Syntax And Semantics Programming Languages – Principles and Paradigms by Allen Tucker, Robert Noonan.
Overview of Previous Lesson(s) Over View 3 Model of a Compiler Front End.
CPSC 388 – Compiler Design and Construction Parsers – Syntax Directed Translation.
1 Introduction to Parsing. 2 Outline l Regular languages revisited l Parser overview Context-free grammars (CFG ’ s) l Derivations.
1 Programming Languages (CS 550) Lecture 2 Summary Mini Language Interpreter Jeremy R. Johnson.
LECTURE 3 Compiler Phases. COMPILER PHASES Compilation of a program proceeds through a fixed series of phases.  Each phase uses an (intermediate) form.
Bernd Fischer COMP2010: Compiler Engineering Abstract Syntax Trees.
Syntax-Directed Definitions CS375 Compilers. UT-CS. 1.
Chapter 3 – Describing Syntax CSCE 343. Syntax vs. Semantics Syntax: The form or structure of the expressions, statements, and program units. Semantics:
Chapter 3 – Describing Syntax
Parsing #1 Leonidas Fegaras.
A Simple Syntax-Directed Translator
Constructing Precedence Table
Introduction to Parsing (adapted from CS 164 at Berkeley)
Compilers for Algorithmic Languages Design and Construction of Compilers Leonidas Fegaras.
PROGRAMMING LANGUAGES
Abstract Syntax Leonidas Fegaras.
Syntax-Directed Translation
Mini Language Interpreter Programming Languages (CS 550)
CSE 3302 Programming Languages
Parsing #2 Leonidas Fegaras.
R.Rajkumar Asst.Professor CSE
Representation, Syntax, Paradigms, Types
Parsing #2 Leonidas Fegaras.
Abstract Syntax Leonidas Fegaras.
COMPILER CONSTRUCTION
Presentation transcript:

CSE 5317/4305 L5: Abstract Syntax1 Abstract Syntax Leonidas Fegaras

CSE 5317/4305 L5: Abstract Syntax2 Abstract Syntax Tree (AST) A parser typically generates an Abstract Syntax Tree (AST): A parse tree is not an AST scannerparser get token token source file get next character AST E T E F T E F T F id(x) + id(y) * id(z) + * x y z

CSE 5317/4305 L5: Abstract Syntax3 Building Abstract Syntax Trees in Java abstract class Exp { } class IntegerExp extends Exp { public int value; public IntegerExp ( int n ) { value=n; } } class TrueExp extends Exp { public TrueExp () {} } class FalseExp extends Exp { public FalseExp () {} } class VariableExp extends Exp { public String value; public VariableExp ( String n ) { value=n; } }

CSE 5317/4305 L5: Abstract Syntax4 Exp (cont.) class BinaryExp extends Exp { public String operator; public Exp left; public Exp right; public BinaryExp ( String o, Exp l, Exp r ) { operator=o; left=l; right=r; } } class UnaryExp extends Exp { public String operator; public Exp operand; public UnaryExp ( String o, Exp e ) { operator=o; operand=e; } } class ExpList { public Exp head; public ExpList next; public ExpList ( Exp h, ExpList n ) { head=h; next=n; } }

CSE 5317/4305 L5: Abstract Syntax5 Exp (cont.) class CallExp extends Exp { public String name; public ExpList arguments; public CallExp ( String nm, ExpList s ) { name=nm; arguments=s; } } class ProjectionExp extends Exp { public Exp value; public String attribute; public ProjectionExp ( Exp v, String a ) { value=v; attribute=a; } }

CSE 5317/4305 L5: Abstract Syntax6 Exp (cont.) class RecordElements { public String attribute; public Exp value; public RecordElements next; public RecordElements ( String a, Exp v, RecordElements el ) { attribute=a; value=v; next=el; } } class RecordExp extends Exp { public RecordElements elements; public RecordExp ( RecordElements el ) { elements=el; } }

CSE 5317/4305 L5: Abstract Syntax7 Examples The AST for the input (x-2)+3 new BinaryExp("+", new BinaryExp("-", new VariableExp("x"), new IntegerExp(2)), new IntegerExp(3)) The AST for the input f(x.A,true) new CallExp(“f”, new ExpList(new ProjectionExp(new VariableExp("x"), “A”), new ExpList(new TrueExp(),null)))

CSE 5317/4305 L5: Abstract Syntax8 Gen A Java package for constructing and manipulating ASTs you are required to use Gen for your project it is basically a Java preprocessor that adds syntactic constructs to the Java language to make the task of handling ASTs easier – uses a universal class Ast to capture any kind of AST – supports easy construction of ASTs using the # syntax – supports pattern matching, editing, pretty-printing, etc – includes a symbol table class Architecture: Gen javac file.gen file.java file.class

CSE 5317/4305 L5: Abstract Syntax9 The Gen Ast Class abstract class Ast { } class Number extends Ast { public long value; public Number ( long n ) { value = n; } } class Real extends Ast { public double value; public Real ( double n ) { value = n; } } class Variable extends Ast { public String value; public Variable ( String s ) { value = s; } } class Astring extends Ast { public String value; public Astring ( String s ) { value = s; } }

CSE 5317/4305 L5: Abstract Syntax10 AST Nodes are Instances of Node class Node extends Ast { public String name; public Arguments args; public Node ( String n, Arguments a ) { tag = n; args = a; } } class Arguments { public Asthead; public Argumentstail; public Arguments ( Ast h, Arguments t ); public final static Arguments nil; public Arguments append ( Ast e ); }

CSE 5317/4305 L5: Abstract Syntax11 Example To construct Binop(Plus,x,Binop(Minus,y,z)) in Java, use: new Node("Binop", Arguments.nil.append(new Variable("Plus")).append(new Variable("x")).append(new Node("Binop", Arguments.nil.append(new Variable("Minus")).append(new Variable("y")).append(new Variable("z"))))) Ugly! You should never use this kind of code in your project Binop Plus x Binop Minus yz

CSE 5317/4305 L5: Abstract Syntax12 The # Brackets When you write # in your Gen file, it generates the following Java code: new Node("Binop", Arguments.nil.append(new Variable("Plus")).append(new Variable("x")).append(new Node("Binop", Arguments.nil.append(new Variable("Minus")).append(new Variable("y")).append(new Variable("z"))))) which represents the AST: Binop(Plus,x,Binop(Minus,y,z)) Binop Plus x Binop Minus yz

CSE 5317/4305 L5: Abstract Syntax13 Escaping a Value Using Backquote Objects of the class Ast can be included into the form generated by the # brackets by “escaping” them with a backquote (`) The operand of the escape operator is expected to be an object of class Ast that provides the value to “fill in” the hole in the bracketed text at that point –actually, an escaped string/int/double value is also lifted to an Ast For example Ast x = # ; Ast y = # ; Ast z = # ; are equivalent to: Ast x = # ; Ast y = # ; Ast z = # ;

CSE 5317/4305 L5: Abstract Syntax14 BNF of # bracketed ::= "# "an AST construction | "#[" arg ","... "," arg "]"an Arguments construction expr ::= namethe representation of a variable name | integerthe repr. of an integer | realthe repr. of a real number | stringthe repr. of a string | "`" nameescaping to the value of name | "`(" code ")"escaping to the value of code | name "(" arg ","... "," arg ")“the repr. of an AST node with >=0 children | "`" name "(" arg ","... "," arg ")"the repr. of an AST node with escaped name | expr opr expran AST node that represents a binary infix opr | "`" name "[" expr "]"variable substitution arg ::= exprthe repr. of an expression | "..." nameescaping to a list of ASTs bound to name | "...(" code ")"escaping to a list of ASTs returned by code

CSE 5317/4305 L5: Abstract Syntax15 “...” is for Arguments The three dots (...) construct is used to indicate a list of children in an AST node – name in “...name” must be an instance of the class Arguments For example, in Arguments r = #[join(a,b,p),select(c,q)]; Ast z = # ; z will be bound to #

CSE 5317/4305 L5: Abstract Syntax16 Example For example, # is equivalent to the following Java code: new Node(f, Arguments.nil.append(new Number(6)).append(r).append(new Node("g",Arguments.nil.append(new Astring("ab")).append(k(x)))).append(y) If f="h", r=#[2,z], y=#, and k(x) returns the value #, then the above term is equivalent to #

CSE 5317/4305 L5: Abstract Syntax17 Pattern Matching Gen provides a case statement syntax with patterns Patterns match the Ast representations with similar shape Escape operators applied to variables inside these patterns represent variable patterns, which “bind” to corresponding subterms upon a successful match This capability makes it particularly easy to write functions that perform source-to-source transformations

CSE 5317/4305 L5: Abstract Syntax18 Example A function that simplifies arithmetic expressions: Ast simplify ( Ast e ) { #case e | plus(`x,0) => return x; | times(`x,1) => return x; | times(`x,0) => return # ; | _ => return e; #end; } where the _ pattern matches any value. For example, simplify(# ) returns #

CSE 5317/4305 L5: Abstract Syntax19 BNF case_stmt ::= "#case" code case... case "#end" case ::= "|" expr guard "=>" code guard ::= ":" codean optional condition | expr ::= nameexact match with a variable name | integerexact match with an integer | realexact match with a real number | stringexact match with a string | "`" namematch with the value of name | "`(" code ")"match with the value of code | name "(" arg ","... "," arg ")“match with an AST node with zero or more children | "`" name "(" arg ","... "," arg ")"match with an AST node with escaped name | expr opr expran AST node that represents a binary infix operation | "`" name "[" expr "]"second-order matching | "_"match any Ast arg ::= exprmatch with an Ast | "..." namematch with a list of ASTs bound to name | "...(" code ")"match with a list of ASTs returned by code | "..."match the rest of the arguments

CSE 5317/4305 L5: Abstract Syntax20 Examples The pattern `f(...r) matches any Ast Node –when it is matched with #, it binds 1)f to the string "join" 2)r to the Arguments #[a,b,c] The following function adds the terms # and # as children to any Node e: Ast add_arg ( Ast e ) { #case e | `f(...r) => return # ; | `x => return x; #end; }

CSE 5317/4305 L5: Abstract Syntax21 Another Example The following function switches the inputs of a binary join found as a parameter to a Node e: Ast switch_join_args ( Ast e ) { #case e | `f(...r,join(`x,`y),...s) => return # ; | `x => return x; #end; }

CSE 5317/4305 L5: Abstract Syntax22 Second-Order Pattern Matching When `f[expr] is matched against an Ast e, it traverses the entire tree representation of e (in preorder) until it finds a tree node that matches the pattern expr –it fails when it does not find a match –when it finds a match it succeeds it binds the variables in the pattern expr it binds the variable f to a list of Ast (of class Arguments) that represents the path from the root Ast to the Ast node that matched the pattern This is best used in conjunction with the bracketed expression `f[e], which uses the path bound in f to construct a new Ast with expr replaced with e

CSE 5317/4305 L5: Abstract Syntax23 Misc Another syntactic construct in Gen is a for-loop that iterates over Arguments: "#for" name "in" code "do" code "#end" For example, #for v in #[a,b,c] do System.out.println(v); #end;

CSE 5317/4305 L5: Abstract Syntax24 Adding Semantic Actions to a Parser Grammar: E ::= T E' E' ::= + T E' | - T E' | T ::= num Recursive descent parser: int E () { return Eprime(T()); }; int Eprime ( int left ) { if (current_token=='+') { read_next_token(); return Eprime(left + T()); } else if (current_token=='-') { read_next_token(); return Eprime(left - T()); } else return left; }; int T () { if (current_token=='num') { int n = num_value; read_next_token(); return n; } else error(); };

CSE 5317/4305 L5: Abstract Syntax25 Table-Driven Predictive Parsers use the parse stack to push/pop both actions and symbols but they use a separate semantic stack to execute the actions push(S); read_next_token(); repeat X = pop(); if (X is a terminal or '$') if (X == current_token) read_next_token(); else error(); else if (X is an action) perform the action; else if (M[X,current_token] == "X ::= Y1 Y2... Yk") { push(Yk);... push(Y1); } else error(); until X == '$';

CSE 5317/4305 L5: Abstract Syntax26 Example Need to embed actions { code; } in the grammar rules Suppose that pushV and popV are the functions to manipulate the semantic stack The following is the grammar of an interpreter that uses the semantic stack to perform additions and subtractions: E ::= T E' $ { print(popV()); } E' ::= + T { pushV(popV() + popV()); } E' | - T { pushV(-popV() + popV()); } E' | T ::= num { pushV(num); } For example, for 1+5-2, we have the following sequence of actions: pushV(1); pushV(5); pushV(popV()+popV()); pushV(2); pushV(-popV()+popV()); print(popV());

CSE 5317/4305 L5: Abstract Syntax27 Bottom-Up Parsers can only perform an action after a reduction We can only have rules of the form X ::= Y1... Yn { action } where the action is always at the end of the rule; this action is evaluated after the rule X ::= Y1... Yn is reduced How? In addition to state numbers, the parser pushes values into the parse stack If we want to put an action in the middle of the rhs of a rule, we use a dummy nonterminal, called a marker For example, X ::= a { action } b is equivalent to X ::= M b M ::= a { action }

CSE 5317/4305 L5: Abstract Syntax28 CUP Both terminals and non-terminals are associated with typed values –these values are instances of the Object class (or of some subclass of the Object class) –the value associated with a terminal is in most cases an Object, except for an identifier which is a String, for an integer which is an Integer, etc –the typical values associated with non-terminals in a compiler are ASTs, lists of ASTs, etc You can retrieve the value of a symbol s at the lhs of a rule by using the notation s:x, where x is a variable name that hasn't appeared elsewhere in this rule The value of the non-terminal defined by a rule is called RESULT and should always be assigned a value in the action –eg if the non-terminal E is associated with an Integer object, then E ::= E:n PLUS E:m {: RESULT = n+m; :}

CSE 5317/4305 L5: Abstract Syntax29 Machinery The parse stack elements are of type struct( state: int, value: Object ) –int is the state number –Object is the value When a reduction occurs, the RESULT value is calculated from the values in the stack and is pushed along with the GOTO state Example: after the reduction by E ::= E:n PLUS E:m {: RESULT = n+m; :} the RESULT value is stack[top-2].value + stack[top].value which is the new value pushed in the stack along with the GOTO state

CSE 5317/4305 L5: Abstract Syntax30 ASTs in CUP Need to associate each non-terminal symbol with an AST type non terminal Astexp; non terminal Argumentsexpl; exp ::= exp:e1 PLUS exp:e2{: RESULT = new Node(plus_exp,e1,e2); :} | exp:e1 MINUS exp:e2{: RESULT = new Node(minus_exp,e1,e2); :} | id:nm LP expl:el RP{: RESULT = new Node(call_exp,el.reverse().cons(new Variable(nm))); :} | INT:n{: RESULT = new Number(n.intValue()); :} ; expl ::= expl:el COMMA exp:e{: RESULT = el.cons(e); :} | exp:e{: RESULT = nil.cons(e); :} ;