Implementing an Interpreter Dan Sugalski

Slides:



Advertisements
Similar presentations
Chapt.2 Machine Architecture Impact of languages –Support – faster, more secure Primitive Operations –e.g. nested subroutine calls »Subroutines implemented.
Advertisements

Chapter 16 Java Virtual Machine. To compile a java program in Simple.java, enter javac Simple.java javac outputs Simple.class, a file that contains bytecode.
Fabián E. Bustamante, Spring 2007 Machine-Level Programming II: Control Flow Today Condition codes Control flow structures Next time Procedures.
Instruction Set Design
Goal: Write Programs in Assembly
1 Lecture 10 Intermediate Representations. 2 front end »produces an intermediate representation (IR) for the program. optimizer »transforms the code in.
Dr. Ken Hoganson, © August 2014 Programming in R COURSE NOTES 2 Hoganson Language Translation.
1 Lecture 4: Procedure Calls Today’s topics:  Procedure calls  Large constants  The compilation process Reminder: Assignment 1 is due on Thursday.
Lecture 8: MIPS Instruction Set
Computer Organization and Architecture
1 Starting a Program The 4 stages that take a C++ program (or any high-level programming language) and execute it in internal memory are: Compiler - C++
Parrot in a nutshell1 Parrot in a Nutshell Dan Sugalski
Computer Organization and Architecture
Gary MarsdenSlide 1University of Cape Town Statements & Expressions Gary Marsden Semester 2 – 2000.
CPSC Compiler Tutorial 8 Code Generator (unoptimized)
Chapter 9 Subprogram Control Consider program as a tree- –Each parent calls (transfers control to) child –Parent resumes when child completes –Copy rule.
Intermediate code generation. Code Generation Create linear representation of program Result can be machine code, assembly code, code for an abstract.
Chapter 3 Assembly Language: Part 1. Machine language program (in hex notation) from Chapter 2.
CS 106 Introduction to Computer Science I 02 / 12 / 2007 Instructor: Michael Eckmann.
1. 2 FUNCTION INLINE FUNCTION DIFFERENCE BETWEEN FUNCTION AND INLINE FUNCTION CONCLUSION 3.
Guide To UNIX Using Linux Third Edition
1 Further OO Concepts II – Java Program at run-time Overview l Steps in Executing a Java Program. l Loading l Linking l Initialization l Creation of Objects.
COMP 14: Intro. to Intro. to Programming May 23, 2000 Nick Vallidis.
Inline Function. 2 Expanded in a line when it is invoked Ie compiler replace the function call with function code To make a function inline the function.
CSc 453 Interpreters & Interpretation Saumya Debray The University of Arizona Tucson.
CH12 CPU Structure and Function
Assembly & Machine Languages
Computer Science 210 Computer Organization The Instruction Execution Cycle.
Language Systems Chapter FourModern Programming Languages 1.
CIS Computer Programming Logic
MIPS coding. SPIM Some links can be found such as:
Instruction Set Architecture
JIT in webkit. What’s JIT See time_compilation for more info. time_compilation.
IT253: Computer Organization Lecture 4: Instruction Set Architecture Tonga Institute of Higher Education.
CS412/413 Introduction to Compilers and Translators May 3, 1999 Lecture 34: Compiler-like Systems JIT bytecode interpreter src-to-src translator bytecode.
Programming With C.
Java Programming, Second Edition Chapter One Creating Your First Java Program.
Interpretation Environments and Evaluation. CS 354 Spring Translation Stages Lexical analysis (scanning) Parsing –Recognizing –Building parse tree.
April 23, 2001Systems Architecture I1 Systems Architecture I (CS ) Lecture 9: Assemblers, Linkers, and Loaders * Jeremy R. Johnson Mon. April 23,
Execution of an instruction
LANGUAGE SYSTEMS Chapter Four Modern Programming Languages 1.
Introduction to Compilers. Related Area Programming languages Machine architecture Language theory Algorithms Data structures Operating systems Software.
Programming Languages
12/8/2015\course\cpeg323-07Fs\Topic2b-323.ppt1 Topic 2b High-Level languages and System Software (Languages) Introduction to Computer Systems Engineering.
Parrot in a nutshell1 Parrot in a Nutshell Dan Sugalski
Code Generation CPSC 388 Ellen Walker Hiram College.
Language Implementation Overview John Keyser Spring 2016.
LECTURE 3 Translation. PROCESS MEMORY There are four general areas of memory in a process. The text area contains the instructions for the application.
LLVM IR, File - Praakrit Pradhan. Overview The LLVM bitcode has essentially two things A bitstream container format Encoding of LLVM IR.
WCET 2007, Pisa, page 1 of 27 Tid rum WCET'2007 Analysing Switch-Case Tables by Partial Evaluation Niklas Holsti Tidorum Ltd
Design issues for Object-Oriented Languages
Lecture 3 Translation.
Assembly language.
Computer Architecture Instruction Set Architecture
Compiler Construction (CS-636)
Naming and Binding A computer system is merely a bunch of resources that are glued together with names Thus, much of system design is merely making choices.
The University of Adelaide, School of Computer Science
Computer Science 210 Computer Organization
Computer Science 210 Computer Organization
Lecture 4: MIPS Instruction Set
CSc 453 Interpreters & Interpretation
Lesson Objectives Aims Key Words Compiler, interpreter, assembler
Instruction Set Architectures Continued
Computer Organization and Design Assembly & Compilation
ECE 352 Digital System Fundamentals
Program Execution in Linux
Program Execution.
CSc 453 Interpreters & Interpretation
Design of the Parrot Virtual Machine The OO-ish bits
Dynamic Binary Translators and Instrumenters
Presentation transcript:

Implementing an Interpreter Dan Sugalski

Or: Going fast without writing code Dan Sugalski June 17, 2004

Basic Parrot Overview Bytecode-driven, register-based virtual machine Written in C with some platform- specific assembly Lots of platform and situation-specific code

The rest of the talk in a nutshell Powerful text processing is your friend Domain-specific languages are very handy Writing compilers happens when you least expect it Make friends with perl, Parse::RecDescent, Text::Balanced, or their equivalents

The program in memory Two main types: Graph walking Bytecode interpretation Perl & Ruby walk graphs Python, Parrot, & Z-Machine interpret bytecode

Program as graph Print “foo” A = A + 1 A < 10 True path False Path

Bunch of connected nodes Function pointer Parameter pointer (maybe) Next node pointer Back pointer (for conditionals) The odd flag and whatnot

Graphs are handy Cheap to build Generally a mildly cleaned up parse tree Don’t freeze to disk too well, though

Program as bytecode print “foo” add a, a, 1 lt a, 10, -2

Just a bunch of words Instruction words Parameters (inline constants, registers (maybe), constant table offsets) Branch offsets Absolute addresses (occasionally)

Bytecode’s handy Freezes to disk well Conceptually identical to machine code Bit more expensive to generate, though

Lots of ways to walk regardless Direct function calls Indirect function calls Big switch statement TIL translation Computed gotos JITting Translating to C

Direct function calls Function pointer embedded in instruction stream Often used with graph walkers Very rarely used with bytecode walking Simple loop: Fetch pointer Call function through pointer Get next pointer

Indirect function calls Look function up in a table Common for bytecode walkers, rare for graph walkers Simple loop: Fetch function number Look function up Call function Get next function number

Switch statement Like indirect functions, without the calling All function bodies in big switch statement Simple loop: Fetch function number Hit switch statement Get next function number

TIL code generation Preprocess code to build up executable Two stages. First: Look up function pointer Add call function code to current code block Lather, rinse, repeat Next, jump into newly built executable code

Computed Goto Like indirect function calls, without the function overhead Like the switch statement, without the switch Less simple loop: Fetch function number Look up destination address in table goto address GCC-specific

JITting Big table of code segments (or code macros), one per operation Build up chunk of executable code Jump into code Essentially compiling without bothering with object files

Translating to C Like JIT, only substitute opcode body source instead of machine code Generates a C file Compile, link, execute

Advantages to each Function calls are easy and overridable Switch is pretty fast TIL is faster, though platform dependent Computed gotos are fast but GCC dependent JITting is a lot of work Translating to C can be a pain, plus adds an extra compile step

What does Parrot do? Basically all of them From one set of sources Each core style has its advantages Parrot also pre-expands ops

Simple Op: Addition add I2, I4, 6 or I2 = add I4, 6 or I2 = I4 + 6

Simple Op: Addition inline op add(out INT, in INT, in INT) :base_core { $1 = $2 + $3; goto NEXT(); }

Op rules: out parameter mean new data in destination register in parameter mean incoming register or constant inout parameter mean change to register Previous add has four permutations: Register = (register|const) + (register|const)

Generated: Function Parrot_add_i_i_i (opcode_t *cur_opcode, Interp * interpreter) { #line 164 "ops/math.ops" IREG(1) = IREG(2) + IREG(3); return (opcode_t *)cur_opcode + 4; }

Generated: Switch case 464: /* Parrot_pred_add_i_i_i */ case 465: /* Parrot_pred_add_i_ic_i */ case 466: /* Parrot_pred_add_i_i_ic */ case 467: /* Parrot_pred_add_i_ic_ic */ { #line 164 "ops/math.ops" (*(INTVAL *)cur_opcode[1]) = (*(INTVAL *)cur_opcode[2]) + (*(INTVAL *)cur_opcode[3]); { cur_opcode += 4; goto SWITCH_AGAIN; }; }

Generated: Computed goto PC_464: /* Parrot_add_i_i_i */ { #line 164 "ops/math.ops" IREG(1) = IREG(2) + IREG(3); goto *core_cg_ops_addr[*(cur_opcode += 4)]; }

Generated: JIT Well, sort of First version of JIT compiled core C source to assembly Then parsed out the.s file and autogenerated the JIT pieces New JIT combo of hand-rolled assembly and TIL

Generated: C source IREG(2) = IREG(4) + 6;

Can add extra things too Bounds checking Automatic event checking Sanity assertions

Multiple op loops too Loops determine what happens between ops Tracing, bounds checking, profiling, quota checking Again, autogenerated Most deployed parrots will have two or three (Fastest, Safe, and trace/debug)

PMCs Like ops, heavily preprocessed PMC class files define Parent class for simple static single inheritance PMC properties Vtable functions Default MMD functions

PMC source structure C source pmclass Name properties { Vtable functions Default MMD functions } POD format docs interspersed in C comments

PMC preprocessor Generates.h file Generates.c file with: C source Post-processed vtable & MMD function source Vtable construction and registration code MMD function registration

Before preprocessing void set_number_native(FLOATVAL value) { PMC_int_val(SELF) = value; }

After preprocessing void Parrot_Integer_set_number_native(Parrot_Interp interpreter, PMC* pmc, FLOATVAL value) { PMC_int_val(pmc) = value; }

Preprocessor provides INTERP, SELF, and SUPER macros Parameter massaging (most implied) Function validation Name mangling Ability to do wholesale structural changes with no source changes

Lessons learned? Got me People shouldn’t write boilerplate code Only write what you really mean Get source as close to level of design as possible It’s not indecision, it’s flexibility! Heed your inner sloth

Questions? ?