Introduction to Intermediate Code, virtual machine implementation

Slides:



Advertisements
Similar presentations
1 Lecture 10 Intermediate Representations. 2 front end »produces an intermediate representation (IR) for the program. optimizer »transforms the code in.
Advertisements

Lecture 9: MIPS Instruction Set
10/9: Lecture Topics Starting a Program Exercise 3.2 from H+P Review of Assembly Language RISC vs. CISC.
Ch. 8 Functions.
Recursion vs. Iteration The original Lisp language was truly a functional language: –Everything was expressed as functions –No local variables –No iteration.
Binghamton University CS-220 Spring 2015 Binghamton University CS-220 Spring 2015 Object Code.
Program Representations. Representing programs Goals.
Prof. Fateman CS 164 Lecture 151 Language definition by interpreter, translator, continued Lecture 15.
3/17/2008Prof. Hilfinger CS 164 Lecture 231 Run-time organization Lecture 23.
CS 536 Spring Code generation I Lecture 20.
Assemblers Dr. Monther Aldwairi 10/21/20071Dr. Monther Aldwairi.
Overview C programming Environment C Global Variables C Local Variables Memory Map for a C Function C Activation Records Example Compilation.
Prof. Fateman CS 164 Lecture 201 Virtual Machine Structure Lecture 20.
4/6/08Prof. Hilfinger CS164 Lecture 291 Code Generation Lecture 29 (based on slides by R. Bodik)
Prof. Fateman CS 164 Lecture 161 Properly Tail Recursive Interpreter. Some timings of compilation vs interpretation Lecture 16.
Prof. Fateman CS 164 Lecture 191 Introduction to Intermediate Code, virtual machine implementation Lecture 19.
IT253: Computer Organization Lecture 4: Instruction Set Architecture Tonga Institute of Higher Education.
CSC 310 – Imperative Programming Languages, Spring, 2009 Virtual Machines and Threaded Intermediate Code (instead of PR Chapter 5 on Target Machine Architecture)
Computer Architecture Instruction Set Architecture Lynn Choi Korea University.
Prof. Fateman CS 164 Lecture 281 Virtual Machine Structure Lecture 28.
UMBC CMSC Common Lisp II. UMBC CMSC Input and Output Print is the most primitive output function > (print (list 'foo 'bar)) (FOO BAR) The.
Milos Hauskrecht (PDF) Hieu D. Vu (PPT) LISP PROGARMMING LANGUAGE.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 11: Functions and stack frames.
Code Generation CPSC 388 Ellen Walker Hiram College.
1 CS/COE0447 Computer Organization & Assembly Language Chapter 2 Part 3.
Prof. Fateman CS 164 Lecture 181 Language definition by interpreter Lecture 18.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 10 Ahmed Ezzat.
LECTURE 3 Translation. PROCESS MEMORY There are four general areas of memory in a process. The text area contains the instructions for the application.
Lecture 3 Translation.
CS 404 Introduction to Compiler Design
Information and Computer Sciences University of Hawaii, Manoa
Advanced Computer Systems
Component 1.6.
Computer Science 210 Computer Organization
Computer Architecture Instruction Set Architecture
MIPS Instruction Set Advantages
PROGRAMMING LANGUAGES
Compiler Construction (CS-636)
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Computer Architecture Instruction Set Architecture
CS216: Program and Data Representation
The compilation process
Syntax Simple Semantics
Run-time organization
A Closer Look at Instruction Set Architectures
William Stallings Computer Organization and Architecture 8th Edition
Instructions - Type and Format
Lecture 4: MIPS Instruction Set
Chapter 11 Introduction to Programming in C
Stack Frame Linkage.
CSCE Fall 2013 Prof. Jennifer L. Welch.
CSc 453 Interpreters & Interpretation
The Metacircular Evaluator
Lecture 30 (based on slides by R. Bodik)
The University of Adelaide, School of Computer Science
CSCE 121: Simple Computer Model Spring 2015
The University of Adelaide, School of Computer Science
Instruction encoding We’ve already seen some important aspects of processor design. A datapath contains an ALU, registers and memory. Programmers and compilers.
Chapter 11 Introduction to Programming in C
by Richard P. Paul, 2nd edition, 2000.
Computer Instructions
CSCE Fall 2012 Prof. Jennifer L. Welch.
Chapter 11 Introduction to Programming in C
Instruction encoding We’ve already seen some important aspects of processor design. A datapath contains an ALU, registers and memory. Programmers and compilers.
Instructions in Machine Language
Defining Functions with DEFUN
Abstraction and Repetition
Common Lisp II.
CSc 453 Interpreters & Interpretation
Chapter 11 Introduction to Programming in C
Presentation transcript:

Introduction to Intermediate Code, virtual machine implementation Lecture 27 Prof. Fateman CS 164 Lecture 27

Prof. Fateman CS 164 Lecture 27 Where do we go from here? testing AST generation Type checking Cleanup Interpreter in intermediate code/ assmblr Virtual machine out Prof. Fateman CS 164 Lecture 27

Prof. Fateman CS 164 Lecture 27 Details of Tiger  what the VM must support Details of the VM  what IC to generate in intermediate code/ assmblr Virtual machine out Prof. Fateman CS 164 Lecture 27

What is Intermediate Code? No single answer… Any encoding further from the source and closer to the machine (in our case, no line/column numbers is one step!) Generally relatively portable May support several languages (C, C++, Pascal, Fortran) where the resources needed are similar. Languages with widely differing semantics may require different IC. E.g. Java supporting Scheme is hard. Prof. Fateman CS 164 Lecture 27

Forms of Intermediate Code Virtual stack machine simplified “macro” machine 3-address code A :=B op C Register machine models with unlimited number of registers, mapped to real registers later Tree form (= lisp program, more or less) Prof. Fateman CS 164 Lecture 27

Using Intermediate Code We need to make some progress towards the machine we have in mind Subject to manipulation, “optimization” Perhaps several passes Or, we can generate assembler Or, we could plop down in absolute memory locations the binary instructions needed to execute the program “compile-and-go” system Prof. Fateman CS 164 Lecture 27

In practice, given a tree representation like an AST Instead of typechecking or interpreting the code, we are traversing it and putting together a list of instructions… generating code… that if run on a machine -- would execute the program. In other words instead of typechecking a loop,or going through a loop executing it, we write it out in assembler. Prof. Fateman CS 164 Lecture 27

Prof. Fateman CS 164 Lecture 27 The simplest example Compiling the Tiger program 3+4 The string “3+4” parses to (OpExp (1 . 2) + (IntExp (1 . 1) 3) (IntExp (1 . 3) 4) typechecker approves and says it is of type int. cleanup program produces (+ 3 4) We now want to produce code. Prof. Fateman CS 164 Lecture 27

Just a simple stack machine Compiling (+ 3 4) (comp '(+ 3 4))  ((pushi 3) (pushi 4) (+)) Result is a list of assembly language instructions push immediate constant 3 on stack push immediate constant 4 on stack + = take 2 args off stack and push sum on stack Conventions: result is left on top of stack. Location of these instructions unspecified. Lengthy notes in ic file describe virtual machine. Prof. Fateman CS 164 Lecture 27

It doesn’t have to look like lisp if we write a printing program (show-comp '(+ 3 4)) ;; prints out… pushi 3 pushi 4 + Prof. Fateman CS 164 Lecture 27

We need to have interacting programs, so lets make functions ;; We are shuffling around fns all the time in this compiler (defstruct fn name params code env) (top-comp1 '(+ 3 4)) #S(fn :name main :params nil :code ((args 0) (pushi 3) (pushi 4) (+) (exit 0)) :env (#(…))) (show-fn *) ;;prints this out args 0 pushi 3 pushi 4 + exit 0 nil ;;return value Conventions (args 0): no params. Exit is opcode to return to supervisor level of simulator. Not return from subroutine. Prof. Fateman CS 164 Lecture 27

How do we convert to machine code? (setf r (top-comp1 '(+ 3 4))) #S(fn :name main :params nil :code ((args 0) (pushi 3) (pushi 4) (+) (exit 0)) :env (#(…))) Assembler does this… makes list into vector/array.. (assemble r) ;; MODIFIES :code part of r :code #((args 0) (pushi 3) (pushi 4) (+) (exit 0)) We have ignored :env. It is a pointer to an array of Global functions… print, chr, ord, flush, … Prof. Fateman CS 164 Lecture 27

A slightly more interesting example (setf r (top-comp1 '(IfExp 1 "yes" "no")) ) #S(fn :name main :params nil …..) (show-fn r) ;; slightly more sophisticated show-fn; byte counts 0: args 0 4: pushi 1 8: jumpz L0003 12: pusha ="yes" 16: jump L0004 L0003: 24: pusha ="no" L0004: 32: exit 0 nil Prof. Fateman CS 164 Lecture 27

That example,assembled, has no labels (assemble r) #S(fn :name main :params nil …..) (show-fn r) 0: args 0 4: pushi 1 8: jumpz 20 12: pusha ="yes" 16: jump 24 20: pusha ="no" 24: exit 0 nil Prof. Fateman CS 164 Lecture 27

The 2-pass assembler… (50 lines of lisp) (in-package :tiger) (defun label-p(x)(symbolp x)) ; is x a label? (defun opcode (instr) (if (label-p instr) :label (first instr))) (defun args (instr) (if (listp instr) (rest instr))) (defun arg1 (instr) (if (listp instr) (second instr))) (defun arg2 (instr) (if (listp instr) (third instr))) (defsetf arg1 (instr) (val) `(setf (second ,instr) ,val)) ;; defsetf method makes (setf (arg1 x) foo) ;; mean the same as (setf (second x) foo) ;; we want to change arg1 in (opcode arg1 …) Prof. Fateman CS 164 Lecture 27

Prof. Fateman CS 164 Lecture 27 The 2-pass assembler… (defun assemble (fn) "Turn a list of instructions in a function into a vector." ;; count up the instructions and keep track of the labels ;; make a note of where the labels are (program counter ;; relative to start of function). Create a vector of instructions. ;; The transformed "assembled" program can support fast ;; jumps forward and back. (multiple-value-bind (length labels);extract 2 items from 1st pass (asm-first-pass (fn-code fn)) (setf (fn-code fn) ;; put vector back instead of list (asm-second-pass (fn-code fn) length labels)) fn)) Prof. Fateman CS 164 Lecture 27

The 1st pass counts up locations (defun asm-first-pass (code) ;;return 2 values: the total code length ;;and the labels assoc list (let ((length 0) (labels nil)) (dolist (instr code) (if (label-p instr) (push (cons instr length) labels) ;;assume instructions are 4 bytes (incf length 4))) (values (/ length 4) labels))) Prof. Fateman CS 164 Lecture 27

The 2nd pass resolves labels, makes vector (defun asm-second-pass (code length labels) ;;Put code into code-vector, adjusting for labels. ;;For each instruction that refers to a label, ;;put the position in for the label. (let ((addr 0) (code-vector (make-array length))) (dolist (instr code) (unless (label-p instr) (if (is instr '(jump jumpn jumpz save)) ;; a setf method that changes label to number (setf (arg1 instr) (cdr (assoc (arg1 instr) labels)))) (setf (aref code-vector addr) instr) (incf addr ))) code-vector)); returns a vector of code! Prof. Fateman CS 164 Lecture 27

The Tiger Virtual Machine The easy parts Prof. Fateman CS 164 Lecture 27

It is a stack-based architecture simulated by a lisp program The machine is started up running a particular function, f The program counter in that function is set to 0. In general the environment will look like a stack of arrays. This is a peculiarly lispish way of doing things, and could be done other, more efficient ways. On calling a new function a “macro” operation ARGS is used to copy arguments from the stack to the environment. This makes calling rather pricey. However, it makes return very neat. n-args is the number of args supplied to this function instr is a temp variable. Prof. Fateman CS 164 Lecture 27

We set up an update-program-counter/ execute loop (defun machine (f) (let* ((code (fn-code f)) ;an array of instructions (pc 0) (env (fn-env f)) (stack nil) (n-args 0) ;main program has no arguments (instr nil)) (loop ;; pick out the instruction at the given pc (setf instr (elt code (ash pc -2))) ; address ¥ 4 (if *debug-pc* (format t "~%Executing ~s at pc ~s; code is ~s" (fn-name f) pc instr)) (incf pc 4) ;; we assume 4 bytes per instruction (case (opcode instr) ;; ALL THE OPCODES GO HERE Prof. Fateman CS 164 Lecture 27

Prof. Fateman CS 164 Lecture 27 Sample Opcodes (defun machine (f) ;;; … stuff left out (case (opcode instr) ;; Variable/stack manipulation instructions: (lvar ; (lvar arg1 arg2) ;; take the value in the environment at offset arg2 ;; in offset arg1, and push on the computation stack (push (elt (elt env (arg1 instr)) (arg2 instr)) stack)) (lset (setf (elt (elt env (arg1 instr)) (arg2 instr)) (top stack))) (pop (pop stack))…. Prof. Fateman CS 164 Lecture 27

Prof. Fateman CS 164 Lecture 27 Architecture for arithmetic is based on underlying lisp arithmetic, e.g. + ´ #’+ (defun machine (f) ;;; … stuff left out (case (opcode instr) ;; … more stuff left out ;; arithmetic: use defn from the interpreter ((+ - * / < > <= >= = =) (setf stack (cons (funcall (cadr (assoc (opcode instr) tig-init-env)) (second stack) (first stack)) (cddr stack)))) ;; Other: exit provides exit code.. ((exit nil) (return (top stack))) Prof. Fateman CS 164 Lecture 27

Machine + assembler in assemble.cl 250 lines of code, including comments  (defun run (exp) ;;exp is (cleanup ast) "compile and run tiger code" (machine (assemble (top-comp1 exp)))) Prof. Fateman CS 164 Lecture 27