1 Lecture 5 Implementing Coroutines regexes with coroutines, compile AST to bytecode, bytecode interpreter Ras Bodik Mangpo and Ali Hack Your Language!

Slides:



Advertisements
Similar presentations
Type Checking, Inference, & Elaboration CS153: Compilers Greg Morrisett.
Advertisements

Lecture 08a – Backpatching & Recap Eran Yahav 1 Reference: Dragon 6.2,6.3,6.4,6.6.
1 Compiler Construction Intermediate Code Generation.
1 Programming Languages (CS 550) Lecture Summary Functional Programming and Operational Semantics for Scheme Jeremy R. Johnson.
Intermediate Representation I High-Level to Low-Level IR Translation EECS 483 – Lecture 17 University of Michigan Monday, November 6, 2006.
ISBN Chapter 10 Implementing Subprograms.
1 Storage Registers vs. memory Access to registers is much faster than access to memory Goal: store as much data as possible in registers Limitations/considerations:
9/27/2006Prof. Hilfinger, Lecture 141 Syntax-Directed Translation Lecture 14 (adapted from slides by R. Bodik)
Fundamentals of Computer Science Lecture 14: Recursion Instructor: Evan Korth New York University.
Chapter 2: Algorithm Discovery and Design
CS 536 Spring Code generation I Lecture 20.
Cse321, Programming Languages and Compilers 1 6/19/2015 Lecture #18, March 14, 2007 Syntax directed translations, Meanings of programs, Rules for writing.
Run time vs. Compile time
Environments and Evaluation
4/6/08Prof. Hilfinger CS164 Lecture 291 Code Generation Lecture 29 (based on slides by R. Bodik)
Cs164 Prof. Bodik, Fall Symbol Tables and Static Checks Lecture 14.
Slide 1 Vitaly Shmatikov CS 345 Scope and Activation Records.
CSC 8310 Programming Languages Meeting 2 September 2/3, 2014.
1 Lecture 4 Building Control Abstractions with Coroutines iterators, tables, comprehensions, coroutines, lazy iterators, composing iterators Ras Bodik.
ITEC 352 Lecture 11 ISA - CPU. ISA (2) Review Questions? HW 2 due on Friday ISA –Machine language –Buses –Memory.
Fundamentals of Python: From First Programs Through Data Structures Chapter 14 Linear Collections: Stacks.
Prof. Bodik CS 164 Lecture 51 Building a Parser I CS164 3:30-5:00 TT 10 Evans.
COP4020 Programming Languages
Stacks & Recursion. Stack pushpop LIFO list - only top element is visible top.
Functions Part I (Syntax). What is a function? A function is a set of statements which is split off into a separate entity that can be used like a “new.
1 Lecture 5 Implementing Coroutines compile AST to bytecode, bytecode interpreter Ras Bodik Shaon Barman Thibaud Hottelier Hack Your Language! CS164: Introduction.
CS412/413 Introduction to Compilers and Translators May 3, 1999 Lecture 34: Compiler-like Systems JIT bytecode interpreter src-to-src translator bytecode.
Compiler course 1. Introduction. Outline Scope of the course Disciplines involved in it Abstract view for a compiler Front-end and back-end tasks Modules.
224 3/30/98 CSE 143 Recursion [Sections 6.1, ]
Interpretation Environments and Evaluation. CS 354 Spring Translation Stages Lexical analysis (scanning) Parsing –Recognizing –Building parse tree.
Formal Semantics Chapter Twenty-ThreeModern Programming Languages, 2nd ed.1.
CS 153: Concepts of Compiler Design October 5 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak
Unit-1 Introduction Prepared by: Prof. Harish I Rathod
Activation Records (in Tiger) CS 471 October 24, 2007.
1.  10% Assignments/ class participation  10% Pop Quizzes  05% Attendance  25% Mid Term  50% Final Term 2.
Topic #1: Introduction EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
1 Compiler Design (40-414)  Main Text Book: Compilers: Principles, Techniques & Tools, 2 nd ed., Aho, Lam, Sethi, and Ullman, 2007  Evaluation:  Midterm.
CS536 Semantic Analysis Introduction with Emphasis on Name Analysis 1.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
CS 0401 Debugging Hints and Programming Quirks for Java by John C. Ramirez University of Pittsburgh.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 11: Functions and stack frames.
The Interpreter Pattern (Behavioral) ©SoftMoore ConsultingSlide 1.
CS412/413 Introduction to Compilers Radu Rugina Lecture 13 : Static Semantics 18 Feb 02.
Compiler Construction CPCS302 Dr. Manal Abdulaziz.
BY: JAKE TENBERG & CHELSEA SHIPP PROJECT REVIEW: JGIBBERISH.
Prof. Necula CS 164 Lecture 171 Operational Semantics of Cool ICOM 4029 Lecture 10.
1 Lecture 14 Data Abstraction Objects, inheritance, prototypes Ras Bodik Shaon Barman Thibaud Hottelier Hack Your Language! CS164: Introduction to Programming.
Overview of Compilation Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Principles Lecture 2.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
Operational Semantics of Scheme
Chapter 14 Functions.
Advanced Computer Systems
Compiler Design (40-414) Main Text Book:
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Interpreters Study Semantics of Programming Languages through interpreters (Executable Specifications) cs7100(Prasad) L8Interp.
CS 536 / Fall 2017 Introduction to programming languages and compilers
Introduction to C++ Recursion
Emily Leland (Not Nick) Spring 2017
Lecture 15 (Notes by P. N. Hilfinger and R. Bodik)
The Metacircular Evaluator
CSE401 Introduction to Compiler Construction
The Metacircular Evaluator (Continued)
Adapted from slides by Nicholas Shahan and Dan Grossman
6.001 SICP Explicit-control evaluator
6.001 SICP Variations on a Scheme
Adapted from slides by Nicholas Shahan, Dan Grossman, and Tam Dang
Nicholas Shahan Spring 2016
6.001 SICP Interpretation Parts of an interpreter
Assignments and Procs w/Params
Rehearsal: Lazy Evaluation Infinite Streams in our lazy evaluator
Presentation transcript:

1 Lecture 5 Implementing Coroutines regexes with coroutines, compile AST to bytecode, bytecode interpreter Ras Bodik Mangpo and Ali Hack Your Language! CS164: Introduction to Programming Languages and Compilers, Spring 2013 UC Berkeley

Administrativia Exam dates: midterm 1: March 19 (80 min, during lecture) midterm 2: May 2 (80 min, during last lecture) final exam (poster and demo): May 15, Wed, 11:30am, the Woz, (3 hours) 2

What you will learn today Regexes with coroutines -backtracking in regex matching is concise w/ coroutines Implement Asymmetric Coroutines –why a recursive interpreter with implicit stack won’t do Compiling AST to bytecode -this is your first compiler; compiles AST to “flat” code Bytecode Interpreter -bytecode can be interpreted without recursion -hence no need to keep interpreter state on the call stack 3

PA2 PA2 was released today, due Monday –bytecode interpreter of coroutines –after PA2, you will be able to implement iterators –in PA3, you will build Prolog on top of your coroutines Before PA2, you may want some c/r exercises 1)lazy list concatenation 2)regexes Solve at least the lazy list problem before you start on PA2 To find links to these problems, look for slides titled Test Yourself. 4

“Honors” cs164? I am looking for a few Über-hacker teams to do a variant of the cs164 project. Not really a harder variant, just one for which we don’t have a full reference implementation. In PA2: instead of an coroutine interpreter, you’d build a compiler. Another, simpler variant: implement cs164 project in JS, not Python. 5

Review of Coroutines 6

Review of L4: Why Coroutines Loop calls iterator function to get the next item eg the next token from the input stream The iterator maintains state between two such calls eg pointer to the input stream. Maintenance of that state may be difficult see the permutation iterator in L4 See also Python justification for coroutinesjustification called generators in Python 7

Review of L4: Three coroutine constructs co = coroutine(body)lambda --> handle creates a coroutine whose body is the argument lambda, returns a handle to the coroutine, which is ready to run yv = resume(co, rv)handle x value --> value resumes the execution into the coroutine co, passing rv to co’s yield expression (rv becomes the value of yield) rv = yield(yv)value --> value transfers control to the coroutine that resumed into the current coroutine, passing yv to resume 8

Example f(1,2) def f(x,y) { cg=coroutine(g) resume(cg,x,y) } 9 def g(x,y) { ck=coroutine(k) h(x,y) def h(a,b) { 3 + yield(resume(ck,a,b)) } def k(x,y) { yield(x+y) } The call f(1,2) evaluates to 3.

Corner-case contest Identify cases for which we haven’t defined behavior: 1) yield executed in the main program let’s define main as a coroutine; yield at the top level thus behaves as exit() 2) resuming to itself * illegal; can only resume to coroutines waiting in yield statements or at the beginning of their body 3) return statement the (implicit) return statement yields back to resumer and terminates the coroutine *exercise: test your PA2 on such a program 10

States of a coroutine How many states do we want to distinguish? suspended, running, terminated Why do we want to distinguish them? to perform error checking at runtime: -do not resume into a running coroutine -do not resume into a terminated coroutine 11

The timeline cg=coroutine(g) v=resume(cg,1,2) resume continues print 12 def g(x,y) { coroutine(k) resume(ck,a,b) resume continues yield } def k(x,y){ yield(x+y) }

Are coroutines like calls? Which of resume, yield behaves like a call? resume, for two reasons: -like in regular call, control is guaranteed to return to resume (unless the program runs into an infinite loop) -we can specify the target of the call (via corou. handle) yield is more like return: -no guarantee that the coroutine will be resumed (eg, when a loop decides it need not iterate further) -yield cannot control where it returns (always returns to its resumer’s resume expression) 13

Test yourself Implement lazy list concatenation (link to the excercise)link to the excercise 14

Implement regular expression with c/r (abc|de)f parses into what AST? which we compile into this Lua code match("def", seq(alt(prim("abc"),prim("de")),prim("f"))) 15

Overview of c/r-based regex matching The nested function calls in the expression seq(alt(prim("abc"), prim("de")), prim("f")) create a closure that is the body of a coroutine C. C will be resumed into by the matcher’s top loop with the goal of iterating over all possible ways in which the pattern (abc|de)f matches the input “def”. The matcher loop will stop when a full match (from beginning to the end) has been found. For more details, see section 5.3 in Revisiting Coroutines.Revisiting Coroutines 16

Regexes with coroutines match("def", seq(alt(prim("abc"),prim("de")),prim("f"))) def match(S, patt) { // wrap turns a lambda into c/r iterator def m=coroutine.wrap(lambda(){ patt(S,0) }) for (pos in m) { // for all matches if (pos==len(S)) { // matched all input? return true } } return false } 17

prim -- matching a string literal (primitive goal) def prim(str) { // this lmbda will be the body of c/r lambda(S,pos) { deflen = len(str) // does str match S[pos:pos+len]? if (sub(S,pos,pos+len-1)==str) { yield(pos+len) } } } 18

Let’s implement alt What we came up with in the lecture: def alt(patt1,patt2) { lambda(S,pos) { def m = coroutine.wrap(lambda(){ patt1(S,pos) }) for (pos in m) { // for all matches yield pos } m = coroutine.wrap(lambda(){ patt2(S,pos) }) for (pos in m) { // for all matches yield pos } } } The code on next slide is semantically equivalent (and simpler). 19

alt -- alternative patterns (disjunction) def alt(patt1,patt2) { lambda(S,pos) { patt1(S,pos) patt2(S,pos) } 20

seq -- sequence of sub-patterns (conjunction) def seq(patt1,patt2) { lambda(S,pos) { def btpoint=coroutine.wrap( lambda(){ patt1(S,pos) } ) for npos in btpoint { patt2(S,npos) } 21

Language design question (1) Since resume behaves like a call, do we need to introduce a separate construct to resume into a coroutine? Could we just do this? co = coroutine(…) // creates a coroutine co(arg) // resume to the coroutine Yes, we could. But we will keep resume for clarity. Compare: in Python generators, resume is a method call.next() in a generator object. Do you like the fact that resume appears to be a call? 22

Language design question (2) In 164/Lua, a coroutine is created with corutine(lam). In Python, a coroutine is created by a call to function that syntactically contains a yield: def fib(): # function fib is a generator/corou a, b = 0, 1 while 1: yield b # … because it contains a yield a, b = b, a+b it = fib() # create a coroutine print it.next(); # resume to it with.next() print it.next(); print it.next() 23

Language design question (2, cont’d) Python creates coroutine by calling a function with yield? How does it impact programming? What if the function with yield needs to call itself recursively, as in permgen from Lecture 4? def permgen(a,n) {... yield(a)... permgen(a,n-1)... } def permutations(a) { def co = coroutine( permgen ) lambda () { resume(co, a) } } You should know a workaround from this limitation That is, how would you write permgen in Python? See how Python writes recursive tree generatorsrecursive tree generators 24

Exercise on Lua vs. Python A recursive Lua Fibonacci iterator: def fib(a,b) { yield b fib(b,a+b) } def fibIterator() { def co = coroutine( fib ) lambda () { resume(co, 0, 1) } } Rewrite it into a recursive Python generator. 25

Symmetric vs. asymmetric coroutines Symmetric: one construct (yield) –as opposed to resume and yield –yield(co,v): has the power to transfer to any coroutine –all transfers happen with yield (no need for resume) Asymmetric: –resume transfers from resumer (master) to corou (slave) –yield(v) always returns to its resumer 26

A language sufficient to explain coroutines Language with functions of two arguments: P ::= D*, E sequence of declarations followed by an expression D ::= def f(ID,ID) { P } E ::= n | ID(E,E) | def ID = coroutine(ID) | resume(ID,E,E) | yield(E) 27

Implementation notes Stackless Python is a controversial rethinking of the Python core (PEP 255)PEP

Summary Coroutines support backtracking. In match, backtracking happens in seq, where we pick a match for patt1 and try to extend it with a match for patt2. If we fail, we backtrack and pick the next match for patt1, and again see if it can be extended with a match for patt2. 29

Test yourself Draw the state of our coroutine interpreter (link to an excercise)link to an excercise 30

Why a recursive interpreter cannot implement (deep) coroutines 31

The plan We will attempt to extend PA1 to support c/r treating resume/yield as call/return This extension will fail, motivating a non-recursive interpreter architecture This failure should be unsurprising, but may clarify the contrast between PA1 and PA2 32

Pseudocode of PA1 recursive interpreter def eval(node) { switch (node.op) { case ‘n’: // integer lieteral return node.val case ‘call’: // E(E,E) def fn = eval(node.fun) def t1 = eval(node.arg1) def t2 = eval(node.arg2) return call(fn, t1, t2) } def call(fun, v1, v2) { // set up the new environment, mapping arg vals to params eval(body of fun, new_env) } 33

The recursive PA1 interpreter Program is represented as an AST –evaluated by walking the AST bottom-up State (context) of the interpreter: -control: what AST nodes to evaluate next (the “PC”) -data: intermediate values (arguments to operators, calls) Where is this state maintained? -control: on interpreter’s calls stack -data: in interpreter’s variables (eg, t1, t2) 34

An attempt for a recursive corou. interpreter def eval(node) { switch (node.op) { case ‘resume’: // resume E, E evaluate the expression that gives c/r hadle -- OK def co = eval(n.arg1) def t1 = eval(n.arg2) now resume to c/r by evaluating its body (resumes in the right point only on first resume to co) call(co.body, t1) case ‘yield’: // yield E return eval(node.arg) oops: this returns to eval() in the same cr, rather than to the resumer c/r as was desired } } 35

Draw env and call stacks at this point f(1,2) def f(x,y) { cg=coroutine(g) resume(cg,1,2) } 36 def g(x,y) { ck=coroutine(k) h(x,y) def h(a,b) { yield(m(a,b)+resume(ck,a,b)) } def k(x,y) { yield(x+y) } Draw the environment frames and the calls stacks in the interpreter: The interpreter's call stack contains the recursive eval() calls. - the control context (what code to execute next) The interpreter's environment has variables t1 and t2 (see prev slide). - the data context (values of intermediate values)

Summary A recursive PA1-like interpreter cannot yield because it maintains its control stack on the host call stack If it exits when yielding, it loses this context this exit would need to pop all the way from recursion The interpreter for a coroutine c must return –when c encounters a yield The interpreter will be called again –when some coroutine invokes resume(c, …) 37

The plan 38

Must re-architect the interpreter We need to create a non-recursive interpreter -non-recursive ==> yield can become a return to resumer all the context stored in separate data structures -explicit stack data structure is OK 39

The plan Linearize the AST so that the control context can be kept as a program counter (PC) a stack of these PCs will be used to implement call/return Store intermediate values in program’s variables in PA1, they were in the interpreter’s variables These two goals are achieved by translating the AST into register-based bytecode 40

AST-to-bytecode compiler 41

AST vs. bytecode Abstract Syntax Tree: values of sub-expressions do not have explicit names Bytecode: –list of instructions of the form x = y + z –x, y, z can be temporary variables invented by compiler –temporaries store values that would be in the variables of a recursive interpreter Example: AST (x+z)*y translates to bytecode: $1 = x+z // $1, $2: temp vars $2 = $1*y // return value of AST is in $2 42

Compile AST to bytecode Traverse AST, emiting code rather than evaluating when the generated code is executed, it evaluates the AST (see L3 for discussion of “non-standard” evaluation) What’s the type of translation function b? b : AST -> (Code, RegName) The function produces code c and register name r s.t.: when c is executed, the value of tree resides in register r 43

Compile AST to bytecode (cont) def b(tree): switch tree.op: +:(c1,r1) = b(t.left) (c2,r2) = b(t.right) def r = freshReg() def instr = “%s = %s + %s\n” % (r, r1, r2) return (c1 + c2 + instr, r) n: complete this so that you can correctly compile AST =: complete this so that you can compile AST x=1+3+4; y=x+2 44

Summary (1) Bytecode compiler: two tasks -translates AST into flat code (aka straight-line code) -benefit: interpreter does not need to remember which part of the tree to evaluate next Our bytecode is a “register-based bytecode” -it stores intermediate values in registers -these are virtual, not machine registers -we implement them as local variables We had seen stack-based bytecode -intermediate values were on data stack -see “Compiler” in Lecture 3 45

Summary (2) Bytecode translation recursively linearizes the AST -it tells the sub-translations in which temp variable the generated code should store the result -alternatively, the sub-translation picks a fresh temp variable and returns its name along with generated code 46

Test yourself Complete the bytecode copiler on slide

Bytecode interpreter 48

Bytecode interpreter Data (ie, environments for lexical scoping): same as in PA1 Control: coroutine: each c/r is a separate interpreter -these (resumable) interpreters share data environments -each has own control context (call stack + next PC) more on control in two slides … first, let’s examine the typical b/c interpreter loop it’s a loop, not a recursive call 49

The bytecode interpreter loop Note: this loop does not yet support resume/yield def eval(bytecode_array) { // program to be executed pc = 0 while (true) { curr_instr = bytecode_array[pc] switch (curr_instr.op) { case ‘+’: … // do + on args (as in PA1) pc++ // jump to next bc instr … } 50

Control, continued call: push PC to call stack set PC to the start of the target function return: pops the PC from call stack sets PC to it 51

Control, continued yield v: return from interpreter passing v to resume -PC after yield remains stored in c/r control context resume c, v: restart the interpreter of c pass v to c 52

Discussion Not quite a non-recursive interpreter while calls are handled with the interpreter loop, each resume recursively calls an interpreter instance What do we need to store in the c/r handle? -a pointer to an interpreter instance, which stores -the call stack -the next PC 53

Summary (1) Control context: -index into a bytecode array Call / return: do not recursively create a sub-interpeter -call: add the return address to the call stack -return: jump to return address popped from call stack 54

Summary (2) Create a coroutine: create a sub-interpreter -initialize its control context and environment -so that first resume start evaluating the coroutine -the coroutine is suspended, waiting for resume Resume / yield -resume: restarts the suspended coroutine -yield: store the context, go to suspend mode, return yield value 55

Test yourself Extend the bytecode interpreter on slide 50 so that it supports yield and resume. Hint: Note that this interpreter always starts at pc=0. Change eval so that it can start at any pc. 56

Reading Required: see the reading for Lecture 4 Recommended: Implementation of Python generators Fun: continuations via continuation passing style 57

Summary Implement Asymmetric Coroutines –why a recursive interpreter with implicit stack won’t do Compiling AST to bytecode -btw, compilation to assembly is pretty much the same Bytecode Interpreter -can exit without losing its state 58