National Energy Research Scientific Computing Center (NERSC)

Slides:



Advertisements
Similar presentations
Programming Languages and Paradigms The C Programming Language.
Advertisements

Programming Languages Marjan Sirjani 2 2. Language Design Issues Design to Run efficiently : early languages Easy to write correctly : new languages.
 2005 Pearson Education, Inc. All rights reserved Introduction.
Chapter 5: Elementary Data Types Properties of types and objects –Data objects, variables and constants –Data types –Declarations –Type checking –Assignment.
Chapter 8 Runtime Support. How program structures are implemented in a computer memory? The evolution of programming language design has led to the creation.
3/17/2008Prof. Hilfinger CS 164 Lecture 231 Run-time organization Lecture 23.
Memory Management 2010.
10. The PyPy translation tool chain Toon Verwaest Thanks to Carl Friedrich Bolz for his kind permission to reuse and adapt his notes.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition Chapter 2: Operating-System Structures Modified from the text book.
Generative Programming. Generic vs Generative Generic Programming focuses on representing families of domain concepts Generic Programming focuses on representing.
Language Issues Misunderstimated? Sublimable? Hopefuller? "I know how hard it is for you to put food on your family.” "I know the human being and fish.
Introducing Java.
Introduction to Java Appendix A. Appendix A: Introduction to Java2 Chapter Objectives To understand the essentials of object-oriented programming in Java.
Language Systems Chapter FourModern Programming Languages 1.
McLab Tutorial Part 6 – Introduction to the McLab Backends MATLAB-to-MATLAB MATLAB-to-Fortran90 (McFor) McVM with JIT 6/4/2011Backends-
Loops: Handling Infinite Processes CS 21a: Introduction to Computing I First Semester,
Computer Science Detecting Memory Access Errors via Illegal Write Monitoring Ongoing Research by Emre Can Sezer.
Generative Programming. Automated Assembly Lines.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
COP4020 Programming Languages Names, Scopes, and Bindings Prof. Xin Yuan.
An evaluation of tools for static checking of C++ code E. Arderiu Ribera, G. Cosmo, S. M. Fisher, S. Paoli, M. Stavrianakou CHEP2000, Padova,
STL CSSE 250 Susan Reeder. What is the STL? Standard Template Library Standard C++ Library is an extensible framework which contains components for Language.
Getting Started With Java September 22, Java Bytecode  Bytecode : is a highly optimized set of instructions designed to be executed by the Java.
CINT & Reflex – The Future CINT’s Future Layout Reflex API Work In Progress: Use Reflex to store dictionary data Smaller memory footprint First step to.
Discussion of Python/ROOT “interoperability” David Lange June 13, 2016 D. Lange1.
Optimization of Python-bindings Wim T.L.P. Lavrijsen, LBNL ROOT Meeting, CERN; 01/15/10 C O M P U T A T I O N A L R E S E A R C H D I V I S I O N.
Python – It's great By J J M Kilner. Introduction to Python.
Chapter 1 Introduction Samuel College of Computer Science & Technology Harbin Engineering University.
Advanced Programming in C
Object Lifetime and Pointers
Computer Organization and Design Pointers, Arrays and Strings in C
Code Optimization.
Zuse’s Plankalkül – 1945 Never implemented Problems Zuse Solved
Chapter 1 Introduction.
Lecture 20 Optimizing Python.
Types for Programs and Proofs
APPENDIX a WRITING SUBROUTINES IN C
CS 326 Programming Languages, Concepts and Implementation
Rean Griffith‡, Gail Kaiser‡ Presented by Rean Griffith
Chapter 1 Introduction.
Classes & Objects.
12 Data abstraction Packages and encapsulation
CS212: Object Oriented Analysis and Design
Compiling Dynamic Data Structures in Python to Enable the Use of Multi-core and Many-core Libraries Bin Ren, Gagan Agrawal 9/18/2018.
The HP OpenVMS Itanium® Calling Standard
Chapter 2: Operating-System Structures
Tutorial C#.
Introduction to MATLAB
Python for Scientific Computing
Matlab as a Development Environment for FPGA Design
Use of Mathematics using Technology (Maltlab)
Pointers, Dynamic Data, and Reference Types
Inlining and Devirtualization Hal Perkins Autumn 2011
COP4020 Programming Languages
Closure Representations in Higher-Order Programming Languages
Towards JIT compiler for IO language Dynamic mixin optimization
More Object-Oriented Programming
Dr. Bhargavi Dept of CS CHRIST
Variables Title slide variables.
Classes and Objects.
Computer Organization and Design Assembly & Compilation
(Computer fundamental Lab)
Simulation And Modeling
System calls….. C-program->POSIX call
CS703 - Advanced Operating Systems
Chap 1. Getting Started Objectives
Chap 4. Programming Fundamentals
Interpreting Java Program Runtimes
Web Client Side Technologies Raneem Qaddoura
SPL – PS1 Introduction to C++.
Presentation transcript:

National Energy Research Scientific Computing Center (NERSC) Optimizations in Python-based HEP Analysis Sebastien Binet, Wim Lavrijsen CHEP '07 – Victoria, September 2007

Outline Introduction Target areas for optimization Code Analyzers and Generators Psyco, ShedSkin, PyPy Prototyping Results Optimization of PyROOT-based code Conclusions Future plans Resources

Introduction

Configuration, Component Loading Typical HEP Analysis Configuration, Component Loading - user interface - string manipulation - file/shell access Flexible/Dynamic - for loops - cuts and predicates - ROOT calls - math calls - array, vectors, matrices => target for optimizations Event Processing (repetitive, modular) Algorithmic Creation of Plots - selection, ordering, etc. of graphs - string manipulation - file/shell access Flexible/Dynamic

Flexibility is great, if it is needed What to Optimize? Flexibility is great, if it is needed W/o language support you have to write it! You're unlikely to do better than the language Flexibility is a burden, if unneeded You would pay for what you don't use C++: many complex trade- offs possible Python: typically one, straightforward way In inner loops / algorithmic code: No flexibility needed Constrained code blocks => Optimization

} Mock Example Normal, straightforward Python code A.k.a. “Fortran-style” Python With (readable) tricks: could gain ~20% >>> from ROOT import gRandom, TCanvas, TH1F >>> c1 = TCanvas('c1','Example',200,10,700,500) >>> hpx = TH1F('hpx','px',100,-4,4) >>> for i in xrange(25000): ... px = gRandom.Gaus() ... hpx.Fill(px) ... >>> hpx.Draw() >>> c1.Update() } inner loop of algorithmic code => Note: example chosen for clarity, real code uses TH1F::FillRandom( 'gaus', 25000 )

} Issues => Python creates many temporaries => Types are considered dynamic => No block-level optimizations applied >>> from ROOT import gRandom, TCanvas, TH1F >>> c1 = TCanvas('c1','Example',200,10,700,500) >>> hpx = TH1F('hpx','px',100,-4,4) >>> for i in xrange(25000): ... px = gRandom.Gaus() ... hpx.Fill(px) ... >>> hpx.Draw() >>> c1.Update() } Iterator temp => dynamic body Method lookups Object temporary Bound method temporary

Code Analyzers and Generators

Available Tools “Integrators” Pyrex, weave, PyROOT (using ACLiC), etc. C/C++ code (text) blocks in Python with easy sharing of variables Always visible to the end-user “Compilers” Psyco, ShedSkin, PyPy, etc. Analyze code for algorithmic blocks; translate to lower level language Fully automatable

PyCompilers and HEP code “Compiler” only handles pure Python Often does include standard libraries Thus, math is usually covered Notably, extension modules are opaque HEP code makes use of bindings Quite commonly dictionary based I.e. large fraction of code not handled And hence not optimized ... => Idea: feed dict info into compiler

Tool of Choice: PyPy Python interpreter written in Python Makes Python code first-class objects Allows for analysis and manipulation of code, including the full interpreter Mature and strong development support Psyco developer is one of the main authors EU “Information Society” project Translation framework Extensible with external types Fully customizable with new back-ends

PyPy Translation Framework Python code is translated into lower level, more static code Uses flow graphs to optimize calls and reduce temporaries Optimizer .c .py .cli .class Generator Annotator LLTypeSystem OOTypeSystem .mfl Employs specific backends to write code or VM bytecode Builds flow graphs and code blocks; derives static types

} Annotation Start with known or given types Constants and built-ins (known) Function arguments (given [when called]) Calculate flow graphs Locate joint points Divide code in blocks Fill in all known types Derived from initial known/given ones Add information from dictionary } graph structure of possible flows and outcomes =>

Annotation cont. Type deduction need not be complete Python (e.g. through C-API) as fallback Still allows partial translation/optimization Annotation process fully automatable Nominally at entry of function call Start (argument) types encountered before? If yes: dispatch to existing translated version If no: build new translated version Can be forced / pre-built semi- manually E.g. by parsing sample code

Annotation Example >>> def doFillMyHisto( h, val ): ... x = ROOT.gRandom.Gaus() * val ... return h.Fill( x ) ... >>> t = Translation( doFillMyHisto ) >>> t.annotate( [ TH1F, int ] ) -- type(h) is TH1F and type(val) is int -- type(x) is float, because: Gaus is TRandom::Gaus() which yields (C++)double mul( (python)float, int ) yields (python)float -- result is None, because: Fill is TH1F::Fill which yields (C++)void >>> doFillMyHisto = t.compile_c() >>> h, val = TH1F('hpx','px',100,-4,4), 10 >>> doFillMyHisto( h, val ) # normal call Explicitly in translation or at runtime; different (non-)choices can coexist user behind the scenes user

Equivalent to its C++ brother May allow further optimizations Function inlining Equivalent to its C++ brother May allow further optimizations Malloc removal Use values rather than objects Esp. for loop variables and iterators Dict special case: remove method objects Escape analysis and stack allocation Use stack for scope-lifetime objects For free from PyPy!

“Low Level” and “Object Oriented” Generation Two type systems “Low Level” and “Object Oriented” result = method( self, *args ) result = self.method( *args ) Two kinds of back-ends Code generation (e.g. C, JS, Lisp) JVM bytecodes (e.g. CLI/.Net and Java) With to-be-compiled code for boundary cases Customizable with hand-written helpers Cover cross-language calls etc. Automatic based on dictionary info

Prototyping Results

PyPy's extfunc & type support limited Prototype PyPy's extfunc & type support limited Primarily targets POSIX functions in C Type support not fully implemented => Perform type registration from dictionary => Minor logic changes inside PyPy annotator Some changes in PyROOT needed Missing conventional functional variables Used by PyPy for descriptive purposes Currently handled by modding PyPy instead

Use helper funcs to call C++ from C Back-end Cheat Use helper funcs to call C++ from C To be resolved by targetting LLVM Low Level Virtual Machine PyPy supported; can resolve C++ calls http://llvm.org extern “C” double TRandomGaus( void* self ) { return ((TRandom*)self)->Gaus(); } void TH1FFill( void* self, double x ) { ((TH1F*)self)->Fill( x ); } }

Code example from slide 6 Loop is part of the optimization Timing Results Simple linear timing Code example from slide 6 Loop is part of the optimization Preliminary Results: Python: Real time 0:06:00, CP time 360.790 PyPy: Real time 0:00:14, CP time 14.550 C++: Real time 0:00:10, CP time 10.930 C++ w/ stubs: Real time 0:00:14, CP time 14.200 => Very promising! Note: C++ w/ stubs (equal to the “Back-end Cheat”) added for reference only

Conclusions

Success! Shown proof-of-concept of enhancing a Python compiler with dictionary information, substantially improving the performance of HEP algorithmic code written in Python.

Future Plans Target LLVM Remove need for C to C++ stubs Package code outside PyPy Derive from and replace PyPy core classes Automation, user-friendliness Installation (incl. PyPy and LLVM) Auto-selection of optimization points (Py)ROOT specific optimizations Unroll pythonizations (e.g. TTree, STL)

Resources Code repository: http://nest.lbl.gov Packages and documentation http://codespeak.net/pypy/ http://llvm.org/ http://www.scipy.org/ http://shed-skin.blogspot.com/ http://cern.ch/wlav/pyroot/

Bonus Material

TTree Ease-of-Use Treat TTree as Python-style container Loop over TTree entries with for- loop ✔ Bounds and read-sanity checks ✔ Select slices (ranges) Treat TTree as C-style struct Access branches as data members ✔ Optimize Python access and I/O Judicious use of caches Read branches only on-demand

Optimization in PyROOT? Main ingredients in inner loop codes: Numeric: math + builtins + arrays Use NumPy with Psyco (JIT, x86 only) ROOT extension library calls Use dictionary info for static optimization Dynamically change internal call settings No autocast, return in local buffer, remove checks, etc. Need fixed block of python code User defined, different optimization levels But also such as THn::Fit and TFn::Draw

Prototyping Results (ROOT'07) Effectiveness varies wildly ... Optimizations on builtins: no effect Shortcircuit call stack: 45% No check/cast on object ptrs: 10% Drop bound method alloc: 25% Effects are independent, so cumulative Total gain of about a factor of 3 in speed Perhaps can be offered outside blocks Define configuration interface (or API)

Resources Documentation Chapter 20 of the ROOT User's Guide http://cern.ch/wlav/pyroot Examples $ROOTSYS/tutorials/pyroot/*.py Code Repository root.cern.ch/viewcvs/pyroot Python optimizations http://wiki.python.org/moin/PythonSp eed/ http://www.scipy.org/