Abstract Syntax Trees Compiler Baojian Hua

Slides:



Advertisements
Similar presentations
Chapter 2-2 A Simple One-Pass Compiler
Advertisements

Lesson 6 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Abstract Syntax Mooly Sagiv html:// 1.
Abstract Syntax Tree Discrete Mathematics and Its Applications Baojian Hua
Elaboration or: Semantic Analysis Compiler Baojian Hua
9/27/2006Prof. Hilfinger, Lecture 141 Syntax-Directed Translation Lecture 14 (adapted from slides by R. Bodik)
Parsing Discrete Mathematics and Its Applications Baojian Hua
Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 2 Syntax A language that is simple to parse.
1 Semantic Processing. 2 Contents Introduction Introduction A Simple Compiler A Simple Compiler Scanning – Theory and Practice Scanning – Theory and Practice.
Standard ML- Part III Compiler Baojian Hua
ML-YACC David Walker COS 320. Outline Last Week –Introduction to Lexing, CFGs, and Parsing Today: –More parsing: automatic parser generation via ML-Yacc.
Parsing Compiler Baojian Hua Front End source code abstract syntax tree lexical analyzer parser tokens IR semantic analyzer.
Elaboration or: Semantic Analysis Compiler Baojian Hua
Context-Free Grammars Lecture 7
Syntax Analysis Mooly Sagiv html:// Textbook:Modern Compiler Implementation in C Chapter 3.
Code Generation Compiler Baojian Hua
CS 280 Data Structures Professor John Peterson. Lexer Project Questions? Must be in by Friday – solutions will be posted after class The next project.
Lexing Discrete Mathematics and Its Applications Baojian Hua
Chapter 2 A Simple Compiler
Abstract Syntax Mooly Sagiv html://
Automata and Regular Expression Discrete Mathematics and Its Applications Baojian Hua
Abstract Syntax Trees Lecture 14 Wed, Mar 3, 2004.
2.2 A Simple Syntax-Directed Translator Syntax-Directed Translation 2.4 Parsing 2.5 A Translator for Simple Expressions 2.6 Lexical Analysis.
Syntax Directed Definitions Synthesized Attributes
CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars.
LEX and YACC work as a team
LR Parsing Compiler Baojian Hua
COP4020 Programming Languages
Semantic Analysis (Generating An AST) CS 471 September 26, 2007.
Chapter 10: Compilers and Language Translation Invitation to Computer Science, Java Version, Third Edition.
1 Top Down Parsing. CS 412/413 Spring 2008Introduction to Compilers2 Outline Top-down parsing SLL(1) grammars Transforming a grammar into SLL(1) form.
Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Lab 3: Using ML-Yacc Zhong Zhuang
Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Chapter Twenty-ThreeModern Programming Languages1 Formal Semantics.
Lexical Analysis (I) Compiler Baojian Hua
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 8: Semantic Analysis and Symbol Tables.
Formal Semantics Chapter Twenty-ThreeModern Programming Languages, 2nd ed.1.
Towards the better software metrics tool motivation and the first experiences Gordana Rakić Zoran Budimac.
COP 4620 / 5625 Programming Language Translation / Compiler Writing Fall 2003 Lecture 1, 08/28/03 Prof. Roy Levow.
Abstract Syntax Trees Compiler Baojian Hua
Topic #1: Introduction EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
1 Compiler Design (40-414)  Main Text Book: Compilers: Principles, Techniques & Tools, 2 nd ed., Aho, Lam, Sethi, and Ullman, 2007  Evaluation:  Midterm.
1 LEX & YACC Tutorial February 28, 2008 Tom St. John.
INTRODUCTION TO COMPILERS(cond….) Prepared By: Mayank Varshney(04CS3019)
Compiler Construction By: Muhammad Nadeem Edited By: M. Bilal Qureshi.
Chapter 3 Context-Free Grammars and Parsing. The Parsing Process sequence of tokens syntax tree parser Duties of parser: Determine correct syntax Build.
The Model of Compilation Natawut Nupairoj, Ph.D. Department of Computer Engineering Chulalongkorn University.
What am I? while b != 0 if a > b a := a − b else b := b − a return a AST == Abstract Syntax Tree.
1 A Simple Syntax-Directed Translator CS308 Compiler Theory.
CPSC 388 – Compiler Design and Construction Parsers – Syntax Directed Translation.
Programming Language Concepts (CIS 635) Elsa L Gunter 4303 GITC NJIT,
LECTURE 3 Compiler Phases. COMPILER PHASES Compilation of a program proceeds through a fixed series of phases.  Each phase uses an (intermediate) form.
1 Asstt. Prof Navjot Kaur Computer Dept PRESENTED BY.
Bernd Fischer COMP2010: Compiler Engineering Abstract Syntax Trees.
Syntax-Directed Definitions CS375 Compilers. UT-CS. 1.
MiniJava Compiler A multi-back-end JIT compiler of Java.
Syntax Analysis Or Parsing. A.K.A. Syntax Analysis –Recognize sentences in a language. –Discover the structure of a document/program. –Construct (implicitly.
Compiler Design (40-414) Main Text Book:
A Simple Syntax-Directed Translator
CS510 Compiler Lecture 4.
Compiler Baojian Hua LR Parsing Compiler Baojian Hua
Chapter 3 Context-Free Grammar and Parsing
Introduction to Parsing (adapted from CS 164 at Berkeley)
Abstract Syntax Trees Lecture 14 Mon, Feb 28, 2005.
Emily Leland (Not Nick) Spring 2017
Programming Languages 2nd edition Tucker and Noonan
Adapted from slides by Nicholas Shahan, Dan Grossman, and Tam Dang
Operator precedence and AST’s
Chapter 10: Compilers and Language Translation
Presentation transcript:

Abstract Syntax Trees Compiler Baojian Hua

Front End source code abstract syntax tree lexical analyzer parser tokens IR semantic analyzer

Recap Lexer Program source to token sequence Parser token sequence, and answer Y or N Today’s topic: abstract syntax trees

Abstract Syntax Trees Parse trees encodes the grammatical structure of the source program However, they contain a lot of unnecessary information What are essential here? E E*E 15 (E) E +E 3 4

Abstract Syntax Trees For the compiler to understand an expression, it only need to know operators and operands punctuations, parentheses, etc. are not needed Similar for statements, functions, etc. E E*E 15 (E) E +E 3 4

Abstract Syntax Trees E E*E 15 (E) E +E 3 4 Times Int 15 Plus Int 3Int 4 Parse treeAbstract syntax tree

Concrete and Abstract Syntax Concrete Syntax is needed for parsing includes punctuation symbols, factoring, elimination of left recursion, depends on the format of the input Abstract Syntax is simpler, more convenient internal representation clean interface between the parser and the later phases of the compiler

Concrete and Abstract Syntax S E +E T F 2 T x 3 FT* F E ::= E + T | T T ::= T * F | F F ::= id | num | ( E ) * x

Concrete and Abstract Syntax * x E ::= id | num | E + E | E * E | ( E ) Plus Int 2Times Int 3Id x

AST Data Structures In the compiler, abstract syntax makes use of the implementation language to represent aspects of the grammatical structure Highly target and implementation languages dependent arts more than science

AST in SML (* data structures *) datatype exp = Int of int | Id of string | Add of exp * exp | Times of exp * exp E ::= id | num | E + E | E * E | ( E ) (* to encode “2+3*x” *) val prog = Add (Int 2, Times (Int 3, Id “x”)) (* Compile “2+3*x”. To be covered later… *) val x86 = compile (prog)

AST in SML (* calculate number of nodes in an ast *) fun numNodes e = case e of Int _ => 1 | Id _ => 1 | Add (e1, e2) => 1 + numNodes e1 + numNodes e2 | Times (e1, e2) => 1 + numNodes e1 + numNodes e2 (* Note this may be too inefficient, why? *)

AST in SML (* tail-recursion *) fun numNodes (e, n) = case e of Int _ => 1 + n | Id _ => 1 + n | Add (e1, e2) => let val n’ = numNodes (e1, n) in numNodes (e2, 1+n’) end | Times (e1, e2) => …(*similar)

AST in SML (* yet another version using reference *) val nodes = ref 0; val op ++ = fn x => x := !x + 1 fun numNodes e = case e of Int _ => ++ nodes | Id _ => ++ nodes | Add (e1, e2) => (numNodes e1 ; ++ nodes ; numNodes e2) ) | Times (e1, e2) => …(*similar)

AST in C /* data structures */ typedef struct exp *exp; enum expKind {INT, ID, ADD, TIMES}; struct exp { enum expKind kind; union { int i; char *id; struct {exp e1; exp e2;} add; struct {exp e1; exp e2;} times; } u; }; E ::= id | num | E + E | E * E | ( E )

AST in C /* sample program “2+3*x” */ exp e1 = malloc (sizeof (*e1)); e1->kind = INT; e1->u.i = 3; exp e2 = malloc (sizeof (*e2)); e2->kind = ID; e2->u.id = “x”; exp e3 = malloc (sizeof (*e3)); e3->kind = TIMES; e3->u.times.e1 = e1; e2->u.times.e2 = e2; … /* really boring and error-prone :-( */ E ::= id | num | E + E | E * E | ( E )

AST in C (* number of nodes again *) int numNodes (exp e) { switch (e->kind) { case INT: return 1; case ID: return 1; case ADD: case TIMES: return 1+numNodes(e->u.add.e1) +numNodes(e->u.add.e2); default: error (“impossible”); } Aha, C compiler is stupid!

AST in OO /* data structures */ abstract class Exp {} class Int extends Exp {…} class Id extends Exp {…} class Add extends Exp {…} class Times extends Exp {…} E ::= id | num | E + E | E * E | ( E ) /* to encode “2+3*x” */ Exp prog = new Add (new Int (2), new Times (new Int (3), new Id (“x”))) /* Not so ugly as C, but still boring */

AST in OO (* number of nodes again *) int numNodes (Exp e) { if (e instanceof Int) return 1; else if (e instanceof Id) return 1; else if (e instanceof ADD) { Add f = (Add)e; return 1+numNodes(f.e1)+numNodes(f.e2); } … }

AST Generations ML-Yacc uses an attribute-grammar scheme each nonterminal may have a semantic value associated with it when the parser reduces with (X ::= s1 … sn) a semantic action will be executed uses semantic values from symbols in si when parsing completes successfully parser returns semantic value associated with the start symbol usually an abstract syntax tree

Attribute Grammars E T F * * 4 3 * 4 * 4 2 factor term exp exp + exp + 3 exp + factor exp + term + 3 F S E T 4 F* T Each nonterminal is associated with a tree * +

Attribute Grammars datatype exp = Id of string | Num of int | Add of exp * exp | Times of exp * exp % e -> e PLUS e (Add (e1, e2)) | e TIMES e (Times (e1, e2)) | ID (Id ID) | NUM (Num NUM)

Source Position In one-pass compiler, error messages are precise early compilers never worry about with this But in a multi-pass compiler, source positions must be stored in AST itself (* Example *) type pos = … datatype exp = Int of int * pos | Id of string * pos | Add of exp * exp * pos | Times of exp * exp * pos

Source Position datatype exp = Id of string * pos | Num of int * pos | Add of exp * exp * pos | Times of exp * exp * pos % e -> e PLUS e (Add (e1, e2, PLUSleft)) | e TIMES e (Times (e1, e2, TIMESleft)) | ID (Id (ID, IDleft)) | NUM (Num (NUM, NUMleft))

Labs For lab #4, your job is to produce abstract syntax trees from source programs we ’ ve offered code skeleton, you should firstly familiarize yourself with it your job is to understand the “ layout ” function etc. and glue the parser by adding semantic actions Test your compiler carefully to make sure it parses the source programs correctly

Summary Abstract syntax trees are compiler internal representations of source programs interface between front-end and compiler later parts Abstract syntax trees design is language-dependent, and more art than science