Implementation of the Python Bytecode Compiler Jeremy Hylton Google.

Slides:



Advertisements
Similar presentations
Optional Static Typing Guido van Rossum (with Paul Prescod, Greg Stein, and the types-SIG)
Advertisements

Chapter 2-2 A Simple One-Pass Compiler
1 Programming Languages (CS 550) Mini Language Interpreter Jeremy R. Johnson.
Intermediate Code Generation
Semantic Analysis Chapter 6. Two Flavors  Static (done during compile time) –C –Ada  Dynamic (done during run time) –LISP –Smalltalk  Optimization.
Abstract Syntax Mooly Sagiv html:// 1.
1 Compiler Construction Intermediate Code Generation.
POSH Python Object Sharing Steffen Viken Valvåg In collaboration with Kjetil Jacobsen & Åge Kvalnes University of Tromsø, Norway Sponsored by Fast Search.
Compiler Construction
Intermediate Representation I High-Level to Low-Level IR Translation EECS 483 – Lecture 17 University of Michigan Monday, November 6, 2006.
9/27/2006Prof. Hilfinger, Lecture 141 Syntax-Directed Translation Lecture 14 (adapted from slides by R. Bodik)
Introduction to Code Generation Mooly Sagiv html:// Chapter 4.
1 Semantic Processing. 2 Contents Introduction Introduction A Simple Compiler A Simple Compiler Scanning – Theory and Practice Scanning – Theory and Practice.
Chapter 2 Chang Chi-Chung Lexical Analyzer The tasks of the lexical analyzer:  Remove white space and comments  Encode constants as tokens.
Tentative Schedule 20/12 Interpreter+ Code Generation 27/12 Code Generation for Control Flow 3/1 Activation Records 10/1 Program Analysis 17/1 Register.
Chapter 3 Program translation1 Chapt. 3 Language Translation Syntax and Semantics Translation phases Formal translation models.
Compiler Summary Mooly Sagiv html://
Chapter 2 Chang Chi-Chung Lexical Analyzer The tasks of the lexical analyzer:  Remove white space and comments  Encode constants as tokens.
Yu-Chen Kuo1 Chapter 2 A Simple One-Pass Compiler.
Introduction to Code Generation Mooly Sagiv html:// Chapter 4.
Chapter 2 A Simple Compiler
CSC 8310 Programming Languages Meeting 2 September 2/3, 2014.
Dr. Philip Cannata 1 fJyswan. Dr. Philip Cannata 2 10 Java (Object Oriented) ASP RDF (Horn Clause Deduction, Semantic Web) Relation Jython in Java This.
Abstract Syntax Trees Lecture 14 Wed, Mar 3, 2004.
2.2 A Simple Syntax-Directed Translator Syntax-Directed Translation 2.4 Parsing 2.5 A Translator for Simple Expressions 2.6 Lexical Analysis.
CSc 453 Semantic Analysis Saumya Debray The University of Arizona Tucson.
COP4020 Programming Languages
Semantic Analysis (Generating An AST) CS 471 September 26, 2007.
10/1/2015© Hal Perkins & UW CSEG-1 CSE P 501 – Compilers Intermediate Representations Hal Perkins Autumn 2009.
1 Semantic Analysis Aaron Bloomfield CS 415 Fall 2005.
Lesson 11 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
The TINY sample language and it’s compiler
1 COMP 3438 – Part II-Lecture 1: Overview of Compiler Design Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
COMPILERS Symbol Tables hussein suleman uct csc3003s 2007.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 8: Semantic Analysis and Symbol Tables.
Interpretation Environments and Evaluation. CS 354 Spring Translation Stages Lexical analysis (scanning) Parsing –Recognizing –Building parse tree.
410/510 1 of 18 Week 5 – Lecture 1 Semantic Analysis Compiler Construction.
1 Compiler Design (40-414)  Main Text Book: Compilers: Principles, Techniques & Tools, 2 nd ed., Aho, Lam, Sethi, and Ullman, 2007  Evaluation:  Midterm.
CS536 Semantic Analysis Introduction with Emphasis on Name Analysis 1.
Introduction CPSC 388 Ellen Walker Hiram College.
Week 6(10.7): The TINY sample language and it ’ s compiler The TINY + extension of TINY Week 7 and 8(10.14 and 10.21): The lexical of TINY + Implement.
12/18/2015© Hal Perkins & UW CSEG-1 CSE P 501 – Compilers Intermediate Representations Hal Perkins Winter 2008.
1 A Simple Syntax-Directed Translator CS308 Compiler Theory.
Semantic Analysis II Type Checking EECS 483 – Lecture 12 University of Michigan Wednesday, October 18, 2006.
1 Programming Languages (CS 550) Lecture 2 Summary Mini Language Interpreter Jeremy R. Johnson.
LECTURE 3 Compiler Phases. COMPILER PHASES Compilation of a program proceeds through a fixed series of phases.  Each phase uses an (intermediate) form.
Syntax-Directed Definitions CS375 Compilers. UT-CS. 1.
1 Topic 4: Abstract Syntax Symbol Tables COS 320 Compiling Techniques Princeton University Spring 2016 Lennart Beringer.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
R Byte Code Optimization Compiler (1) March
Road Map Regular Exprs, Context-Free Grammars Regular Exprs, Context-Free Grammars LR parsing algorithm LR parsing algorithm Building LR parse tables Building.
Kay Schlühr
Lecture 9 Symbol Table and Attributed Grammars
Chapter 3 – Describing Syntax
Compiler Design (40-414) Main Text Book:
Constructing Precedence Table
Overview of Compilation The Compiler BACK End
Abstract Syntax Trees Lecture 14 Mon, Feb 28, 2005.
CS 536 / Fall 2017 Introduction to programming languages and compilers
Basic Program Analysis: AST
CS 3304 Comparative Languages
Intermediate Representations Hal Perkins Autumn 2011
Conditional Statements
Chapter 6 Intermediate-Code Generation
CSE401 Introduction to Compiler Construction
Overview of Compilation The Compiler BACK End
Data Flow Analysis Compiler Design
Compiler Construction
Compiler Construction
Faculty of Computer Science and Information System
Presentation transcript:

Implementation of the Python Bytecode Compiler Jeremy Hylton Google

What to expect from this talk Intended for developers Explain key data structures and control flow Lots of code on slides

The New Bytecode Compiler Rewrote compiler from scratch for 2.5 –Emphasizes modularity –Work was almost done for Python 2.4 –Still uses original parser, pgen Traditional compiler abstractions –Abstract Syntax Tree (AST) –Basic blocks Goals –Ease maintenance, extensibility –Expose AST to Python programs

Compiler Architecture Tokenizer Parser AST Converter Code Generator Assembler Peephole Optimizer Source TextTokens Parse Tree AST __future__Symbol Table Blocks bytecode

Compiler Organization compile.c4,200 infrastructure700 code generator2,400 assembler500 peephole optimizer600 asdl.c,.h <100 pyarena.c100 future.c100 ast.c3,000 symtable.c1,400 Python-ast.c,.h1,900 (generated) Total10,800

Tokenize, Parse, AST Simple, hand-coded tokenizer –Synthesizes INDENT and DEDENT tokens pgen: parser generator –Input in Grammar/Grammar –Extended LL(1) grammar ast conversion –Collapses parse tree into abstract form –Future: extend pgen to generator ast directly

Grammar vs. Abstract Syntax compound_stmt: if_stmt | while_stmt | for_stmt | try_stmt | funcdef | … if_stmt: 'if' test ':' suite ('elif' test ':' suite)* ['else' ':' suite] for_stmt: 'for' exprlist 'in' testlist ':' suite ['else' ':' suite] suite: simple_stmt | NEWLINE INDENT stmt+ DEDENT test: and_test ('or' and_test)* | lambdef and_test: not_test ('and' not_test)* not_test: 'not' not_test | comparison comparison: expr (comp_op expr)* comp_op: ' '|'=='|'>='|' '|'!='|'in'|'not' 'in'|'is'|'is' 'not‘ stmt = For(expr target, expr iter, stmt* body, stmt* orelse) | If(expr test, stmt* body, stmt* orelse) | … expr = BinOp(expr left, operator op, expr right) | Compare(expr left, cmpop* ops, expr* comparators) | Call(expr func, expr* args, keyword* keywords, expr? starargs, expr? kwargs) | …

AST node types Modules (mod) Statements (stmt) Expressions (expr) –Expressions allowed on LHS have context slot Extras –Slots, comprehension, excepthandler, arguments –Operator types FunctionDef is complex –Children in two namespaces

Example Code L = [] for x in range(10): if x > 5: L.append(x * 2) else: L.append(x + 2)

Concrete Syntax Example (if_stmt, (1, 'if'), (test, (and_test, (not_test, (comparison, (expr, (xor_expr, (and_expr, (shift_expr, (arith_expr, (term, (factor, (power, (atom, (1, 'x')))))))))), (comp_op, (21, '>')), (expr, (xor_expr, (and_expr, (shift_expr, (arith_expr, (term, (factor, (power, (atom, (2, '5')))))))))))))), (11, ':'), …

Abstract Syntax Example For(Name('x', Load), Call(Name('range', Load), [Num(10)]), [If(Compare(Name('x', Load), [Lt], [Num(5)]), [Call(Attribute(Name('L', Load), Name('append', Load)), [BinOp(Name('x', Load), Mult, Num(2))])] [Call(Attribute(Name('L', Load), Name('append', Load)), [BinOp(Name('x', Load), Add, Num(2))])])])

Our Goal: Bytecode 2 0 BUILD_LIST 0 3 STORE_FAST 1 (L) 3 6 SETUP_LOOP 71 (to 80) 9 LOAD_GLOBAL 1 (range) 12 LOAD_CONST 1 (10) 15 CALL_FUNCTION 1 18 GET_ITER >> 19 FOR_ITER 57 (to 79) 22 STORE_FAST 0 (x) 4 25 LOAD_FAST 0 (x) 28 LOAD_CONST 2 (5) 31 COMPARE_OP 4 (>) 34 JUMP_IF_FALSE 21 (to 58) 37 POP_TOP 5 38 LOAD_FAST 1 (L) 41 LOAD_ATTR 3 (append) 44 LOAD_FAST 0 (x) 47 LOAD_CONST 3 (2) 50 BINARY_MULTIPLY 51 CALL_FUNCTION 1 54 POP_TOP 55 JUMP_ABSOLUTE 19 >> 58 POP_TOP 7 59 LOAD_FAST 1 (L) 62 LOAD_ATTR 3 (append) 65 LOAD_FAST 0 (x) 68 LOAD_CONST 3 (2) 71 BINARY_ADD 72 CALL_FUNCTION 1 75 POP_TOP 76 JUMP_ABSOLUTE 19 >> 79 POP_BLOCK

Strategy for Compilation Module-wide analysis –Check future statements –Build symbol table For variable, is it local, global, free? Makes two passes over block structure Compile one function at a time –Generate basic blocks –Assemble bytecode –Optimize generated code (out of order) –Code object stored in parent’s constant pool

Symbol Table Collect basic facts about symbols, block –Variables assigned, used; params, global stmts –Check for import *, unqualified exec, yield –Other tricky details Identify free, cell variables in second pass –Parent passes bound names down –Child passes free variables up –Implicit vs. explicit global vars

Name operations Five different load name opcodes –LOAD_FAST: array access for function locals –LOAD_GLOBAL: dict lookups for globals, builtins –LOAD_NAME: dict lookups for locals, globals –LOAD_DEREF: load free variable –LOAD_CLOSURE: loads cells to make closure Cells –Separate allocation for mutable variable –Stored in flat closure list –Separately garbage collected

Class namespaces class Spam: id = id(1) 1 0 LOAD_GLOBAL 0 (__name__) 3 STORE_NAME 1 (__module__) 2 6 LOAD_NAME 2 (id) 9 LOAD_CONST 1 (1) 12 CALL_FUNCTION 1 15 STORE_NAME 2 (id) 18 LOAD_LOCALS 19 RETURN_VALUE

Closures def make_adder(n): x = n def adder(y): return x + y return adder return make_adder def make_adder(n): 2 0 LOAD_FAST 0 (n) 3 STORE_DEREF 0 (x) 3 6 LOAD_CLOSURE 0 (x) 9 LOAD_CONST 1 ( ) 12 MAKE_CLOSURE 0 15 STORE_FAST 2 (adder) 5 18 LOAD_FAST 2 (adder) 21 RETURN_VALUE def adder(y): 4 0 LOAD_DEREF 0 (x) 3 LOAD_FAST 0 (y) 6 BINARY_ADD 7 RETURN_VALUE

Code generation input Discriminated unions –One for each AST type –Struct for each option –Constructor functions Literals –Stored as PyObject* –ast pass parses Identifiers –Also PyObject* –string typedef struct _stmt *stmt_ty; struct _stmt { enum {..., For_kind=8, While_kind=9, If_kind=10,... } kind; union { struct { expr_ty target; expr_ty iter; asdl_seq *body; asdl_seq *orelse; } For; struct { expr_ty test; asdl_seq *body; asdl_seq *orelse; } If; } int lineno; };

Code generation output Basic blocks –Start with jump target –Ends if there is a jump –Function is graph of blocks Instructions –Opcode + argument –Jump targets are pointers Helper functions –Create new blocks –Add instr to current block struct instr { unsigned char i_opcode; int i_oparg; struct basicblock_ *i_target; int i_lineno; // plus some one-bit flags }; struct basicblock_ { int b_iused; int b_ialloc; struct instr *b_instr; struct basicblock_ *b_next; int b_startdepth; int b_offset; // several details elided };

Code generation One visitor function for each AST type –Switch on kind enum –Emit bytecodes –Return immediately on error Heavy use of C macros –ADDOP(), ADDOP_JREL(), … –VISIT(), VISIT_SEQ(), … –Hides control flow

Code generation example static int compiler_if(struct compiler *c, stmt_ty s) { basicblock *end, *next; if (!(end = compiler_new_block(c))) return 0; if (!(next = compiler_new_block(c))) return 0; VISIT(c, expr, s->v.If.test); ADDOP_JREL(c, JUMP_IF_FALSE, next); ADDOP(c, POP_TOP); VISIT_SEQ(c, stmt, s- >v.If.body); ADDOP_JREL(c, JUMP_FORWARD, end); compiler_use_next_block(c, next); ADDOP(c, POP_TOP); if (s->v.If.orelse) VISIT_SEQ(c, stmt, s- >v.If.orelse); compiler_use_next_block(c, end); return 1; }

Assembler Lots of fiddly details –Linearize code –Compute stack space needed –Compute line number table (lnotab) –Compute jump offsets –Call PyCode_New() Peephole optimizer –Integrated at wrong end of assembler –Constant folding, simplify jumps

AST transformation Expose AST to Python programmers –Simplify analysis of programs –Generate code from modified AST Example: –Implement with statement as AST transform Ongoing work –BOF this afternoon at 3:15, Preston Trail

Loose ends compiler package –Should revise to support new AST types –Tricky compatibility issue Revise pgen to generate AST directly Develop toolkit for AST transforms Extend analysis, e.g. PEP 267