COMPILER CONSTRUCTION

Slides:



Advertisements
Similar presentations
Chapter 6 Intermediate Code Generation
Advertisements

Intermediate Code Generation
Semantic Analysis Chapter 6. Two Flavors  Static (done during compile time) –C –Ada  Dynamic (done during run time) –LISP –Smalltalk  Optimization.
8. Code Generation. Generate executable code for a target machine that is a faithful representation of the semantics of the source code Depends not only.
Chapter 8 ICS 412. Code Generation Final phase of a compiler construction. It generates executable code for a target machine. A compiler may instead generate.
1 Compiler Construction Intermediate Code Generation.
Compiler Principle and Technology Prof. Dongming LU Mar. 28th, 2014.
Semantic analysis Parsing only verifies that the program consists of tokens arranged in a syntactically-valid combination, we now move on to semantic analysis,
Intermediate Representation I High-Level to Low-Level IR Translation EECS 483 – Lecture 17 University of Michigan Monday, November 6, 2006.
Intermediate code generation. Code Generation Create linear representation of program Result can be machine code, assembly code, code for an abstract.
Slide 1 Chapter 3 Attribute Grammars. Slide 2 Attribute Grammars Certain language structures cannot be described using EBNF. Attribute grammars are extensions.
CSC 8310 Programming Languages Meeting 2 September 2/3, 2014.
COMPILER CONSTRUCTION
Compiler Principle and Technology Prof. Dongming LU Apr. 29th, 2015.
Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.
LANGUAGE TRANSLATORS: WEEK 24 TRANSLATION TO ‘INTERMEDIATE’ CODE (overview) Labs this week: Tutorial Exercises on Code Generation.
The TINY sample language and it’s compiler
Compiler Chapter# 5 Intermediate code generation.
Unit-1 Introduction Prepared by: Prof. Harish I Rathod
Introduction to Compilers. Related Area Programming languages Machine architecture Language theory Algorithms Data structures Operating systems Software.
Introduction CPSC 388 Ellen Walker Hiram College.
Week 6(10.7): The TINY sample language and it ’ s compiler The TINY + extension of TINY Week 7 and 8(10.14 and 10.21): The lexical of TINY + Implement.
1 Compiler & its Phases Krishan Kumar Asstt. Prof. (CSE) BPRCE, Gohana.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
Code Generation CPSC 388 Ellen Walker Hiram College.
Code Generation How to produce intermediate or target code.
Compiler Construction CPCS302 Dr. Manal Abdulaziz.
1 Structure of a Compiler Source Language Target Language Semantic Analyzer Syntax Analyzer Lexical Analyzer Front End Code Optimizer Target Code Generator.
Overview of Compilation Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Principles Lecture 2.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 10 Ahmed Ezzat.
Review A program is… a set of instructions that tell a computer what to do. Programs can also be called… software. Hardware refers to… the physical components.
Compiler Principle and Technology Prof. Dongming LU Apr. 15th, 2015.
CS510 Compiler Lecture 1. Sources Lecture Notes Book 1 : “Compiler construction principles and practice”, Kenneth C. Louden. Book 2 : “Compilers Principles,
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
Chapter 1 Introduction Samuel College of Computer Science & Technology Harbin Engineering University.
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Lecture 9 Symbol Table and Attributed Grammars
CS 404 Introduction to Compiler Design
Compiler Design (40-414) Main Text Book:
Introduction Chapter : Introduction.
CS510 Compiler Lecture 1.
Introduction to Compiler Construction
System Software Unit-1 (Language Processors) A TOY Compiler
Intermediate code Jakub Yaghob
A Simple Syntax-Directed Translator
Chapter 3 Context-Free Grammar and Parsing
Compiler Construction (CS-636)
Introduction.
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
6.001 SICP Compilation Context: special purpose vs. universal machines
An Attribute Grammar for Tiny
Compiler Lecture 1 CS510.
Compiler Construction
Intermediate Code Generation
Code Generation.
Intermediate Representations
Chapter 6 Intermediate-Code Generation
CSE401 Introduction to Compiler Construction
Intermediate Representations
Three-address code A more common representation is THREE-ADDRESS CODE . Three address code is close to assembly language, making machine code generation.
Compiler Design 21. Intermediate Code Generation
Compiler Construction
Intermediate Code Generation
Course Overview PART I: overview material PART II: inside a compiler
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Introduction Chapter : Introduction.
Compiler Design 21. Intermediate Code Generation
Compiler Construction
Code generation and data types
COMPILER CONSTRUCTION
Presentation transcript:

COMPILER CONSTRUCTION Principles and Practice Kenneth C. Louden

8. Code Generation PART ONE

Generate executable code for a target machine that is a faithful representation of the semantics of the source code Depends not only on the characteristics of the source language but also on detailed information about the target architecture, the structure of the runtime environment, and the operating system running on the target machine

Contents Part One 8.1 Intermediate Code and Data Structure for code Generation 8.2 Basic Code Generation Techniques Other Parts 8.3 Code Generation of Data Structure Reference 8.4 Code Generation of Control Statements and Logical Expression 8.5 Code Generation of Procedure and Function calls 8.6 Code Generation on Commercial Compilers: Two Case Studies 8.7 TM: A Simple Target Machine 8.8 A Code Generator for the TINY Language 8.9 A Survey of Code Optimization Techniques 8.10 Simple Optimizations for TINY Code Generator

8.1 Intermediate Code and Data Structures for Code Generation

8.1.1 Three-Address Code

A data structure that represents the source program during translation is called an intermediate representation, or IR, for short Such an intermediate representation that resembles target code is called intermediate code Intermediate code is particularly useful when the goal of the compiler is to produce extremely efficient code; Intermediate code can also be useful in making a compiler more easily retarget-able. Study two popular forms of intermediate code: Three -Address code and P-code The most basic instruction of three-address code is designed to represent the evaluation of arithmetic expressions and has the following general form: X=y op z

Figure 8.1 Sample TINY program: { sample program in TINY language -- computes factorial } read x ; { input an integer } if 0 > x then { don’t compute if x <= 0 } fact:=1; repeat fact:=fact*x; x:=x-1 until x=0; write fact { output factorial of x } ends

The Three-address codes for above TINY program read x t1=x>0 if_false t1 goto L1 fact=1 label L2 t2=fact*x fact=t2 t3=x-1 x=t3 t4= x= =0 if_false t4 goto L2 write fact label L1 halt

8.1.2 Data Structures for the Implementation of Three-Address Code

A different implementation of three-address code is called a triple: The most common implementation is to implement three-address code as quadruple, which means that four fields are necessary: One for the operation and three for the addresses A different implementation of three-address code is called a triple: Use the instructions themselves to represent the temporaries. It requires that each three-address instruction be reference-able, either as an index in an array or as a pointer in a linked list.

Quadruple implementation for the three-address code of the previous example (rd, x , _ , _ ) (gt, x, 0, t1 ) (if_f, t1, L1, _ ) (asn, 1,fact, _ ) (lab, L2, _ , _ ) (mul, fact, x, t2 ) (asn, t2, fact, _ ) (sub, x, 1, t3 ) (asn, t3, x, _ ) (eq, x, 0, t4 ) (if_f, t4, L2, _) (wri, fact, _, _ ) (lab, L1, _ , _ ) (halt, _, _, _ )

C code defining data structures for the quadruples Typedef enum { rd, gt, if_f, asn, lab, mul, sub, eq, wri, halt, …} OpKind; Typedef enum { Empty, IntConst, String } AddrKind; Typedef struct { AddrKind kind; Union { int val; char * name; } contents; } Address { OpKind op; Address addr1, addr2, addr3; } Quad

A representation of the three-address code of the previous example as triples (0) (rd, x , _ ) (1) (gt, x, 0) (2) (if_f, (1), (11) ) (3) (asn, 1,fact ) (4) (mul, fact, x) (5) (asn, (4), fact ) (6) (sub, x, 1 ) (7) (asn, (6), x) (8) (eq, x, 0 ) (9) (if_f, (8), (4)) (10) (wri, fact, _ ) (11) (halt, _, _)

8.1.3 P-Code

It was designed to be the actual code for a hypothetical stack machine, called the P-machine, for which an interpreter was written on various actual machines Since P-code was designed to be directly executable, it contains an implicit description of a particular runtime environment, including data sizes, as well as a great deal of information specific to the P-machines.

The P-machine consists of a code memory, an unspecified data memory for named variables, and a stack for temporary data, together with whatever registers are needed to maintain the stack and support execution 2*a+(b-3) ldc 2 ; load constant 2 lod a ; load value of variable a mpi ; integer multiplication lod b ; load value of variable b ldc 3 ; load constant 3 sbi ; integer subtraction adi ; integer addition

Comparison of P-Code to Three-Address Code P-code is in many respects closer to actual machine code than three-address code. P-code instructions also require fewer addresses: One the other hand, P-code is less compact than three-address code in terms of numbers of instructions, and P-code is not “self-contained” in that the instructions operate implicitly on a stack. 2) Implementation of P-Code Historically, P-code has largely been generated as a text file, but the previous descriptions of internal data structure implementations for three-address code will also work with appropriate modification for P-code.

8.2 Basic Code Generation Techniques

8.2.1 Intermediate Code or Target Code as a Synthesized Attribute

For an example: a small subset of C expressions: Intermediate code generation can be viewed as an attribute computation. This code becomes a synthesized attribute that can be defined using an attribute grammar and generated either directly during parsing or by a post-order traversal of the syntax tree. For an example: a small subset of C expressions: Exp  id = exp | aexp Aexp  aexp + factor | factor Factor  (exp) | num | id

The expression (x=x+3)+4 has the following P-Code attribute: Grammar Rule Semantic Rules Exp1 id = exp2 exp1.pcode=”lda”|| id.strval++exp2.pcode++”stn” Exp  aexp exp.pcode=aexp.pcode Aexp1  aexp2 + factor aexp1.pcode=aexp2.pcode++factor.pcode++”adi” Aexp  factor aexp.pcode=factor.pcode Factor  (exp) factor.pcode=exp.pcode Factor  num factor.pcode=”ldc”||num.strval Factor  id factor.pcode=”lod”||id.strval The expression (x=x+3)+4 has the following P-Code attribute: Lda x Lod x Ldc 3 Adi Stn Ldc 4

2) Three-Address Code T1=x+3 x=t1 t2=t1+4 Grammar Rule Semantic Rules Exp1 id = exp2 exp1.name=exp2.name Exp1.tacode=exp2.tacode++ id.strval||”=”||exp2.name Exp  aexp exp.name=aexp.name; exp.tacode=aexp.tacode Aexp1  aexp2 + factor aexp1.name=newtemp( ) aexp1.tacode=aexp2.tacode++factor.tacode++ aexp1.name||”=”||aexp2.name||”+”||factor.name Aexp  factor aexp.name=factor.name; aexp.tacode=factor.tacode Factor  (exp) factor.name=exp.name; factor.tacode=exp.tacode Factor  num factor.namenum.strval; factor.tacode=” “ Factor  id factor.namenum.strval; factor.tacode=” “ T1=x+3 x=t1 t2=t1+4

8.2.2 Practical Code Generation

The basic algorithm can be described as the following recursive procedure Procedure gencode (T: treenode); Begin If T is not nil then Generate code to prepare for code of left child of T; Gencode(left child of T); Generate code to prepare for code of right child of T; Gencode(right child of T); Generate code to implement the action of T; End;

Typedef enum {plus, assign} optype; Typedef enum {OpKind, ConstKind, IdKind} NodeKind; Typedef struct streenode { NodeKind kind; Optype op; /* used with OpKind */ Struct streenod *lchild, *rchild; Int val; /* used with ConstKind */ Char * strval; /* used for identifiers and numbers */ } streenode; typedef streenode *syntaxtree;

A genCode procedure to generate P-code. Void genCode (SyntaxTree t) { char codestr[CODESIZE]; /* CODESIZE = max length of 1 line of P-code */ if (t !=NULL) { switch (t->kind) { case opKind: switch(t->op) { case Plus: genCode(t->lchild); genCode(t->rchild); emitCode(“adi”); break;

case Assign: sprintf(codestr, “%s %s”, “lda”, t->strval); emitcode(codestr); genCode(t->lchild); emitCode(“stn”); break; } case ConstKind: sprintf(codestr,”%s %s”,”ldc”,t->strval); emitCode(codestr);

case IdKind: sprintf(codestr,”%s %s”,”lod”,t->strval); emitCode(codestr); break; default: emitCode(“error”); }

Yacc specification for the generation P-code according to the attribute grammar of Table 8.1 %{ #define YYSTYPE char * /* make Yacc use strings as values */ /* other inclusion code … */ %} %token NUM ID %% exp : ID { sprintf (codestr, “%s %s”, “lda”, $1); emitCode ( codestr); } ‘ = ‘ exp { emitCode (“stn”); }

| aexp ; aexp : aexp ‘+’ factor {emitCode(“adi”);} | factor factor : ‘(‘ exp ‘)’ | NUM {sprintf(codestr, “%s %s”,”ldc”,$1); emitCode(codestr);} | ID {sprintf(codestr,”%s %s”, “lod”,$1); %% /*utility functions… */

8.2.3 Generation of Target Code from Intermediate Code

Code generation from intermediate code involves either or both of two standard techniques: Macro expansion and Static simulation Macro expansion involves replacing each kind of intermediate code instruction with an equivalent sequence of target code instructions. Static simulation involves a straight-line simulation of the effects of the intermediate code and generating target code to match these effects.

Consider the expression (x=x+3) +4, translate the P-code into three-address code: Lad x Lod x Ldc 3 Adi t1=x+3 Stn x=t1 Ldc 4 Adi t2=t1+4 We perform a static simulation of the P-machine stack to find three-address equivalence for the given code

We now consider the case of translating from three-address code to P-code, by simple macro expansion. A three-address instruction: a = b + c Can always be translated into the P-code sequence lda a lod b lod c adi sto

Then, the three-address code for the expression (x=x+3)+4: T1 = x + 3 X = t1 T2 = t1 + 4 Can be translated into the following P-code: Lda t1 Lod x Ldc 3 Adi Sto Lad x Lod t1 Lda t2 Ldc 4

End of Part One THANKS