Optimizing Python with Pyrex. Pyrex is cool but not magic Pyrex very seldom makes pure-Python code faster just through a recompile But if you understand.

Slides:



Advertisements
Similar presentations
Optional Static Typing Guido van Rossum (with Paul Prescod, Greg Stein, and the types-SIG)
Advertisements

Adapted from Scott, Chapter 6:: Control Flow Programming Language Pragmatics Michael L. Scott.
Coding Standard: General Rules 1.Always be consistent with existing code. 2.Adopt naming conventions consistent with selected framework. 3.Use the same.
Portability and Safety Mahdi Milani Fard Dec, 2006 Java.
Fundamentals of Python: From First Programs Through Data Structures
Lecture 27 Exam outline Boxing of primitive types in Java 1.5 Generic types in Java 1.5.
Recursion. Objectives At the conclusion of this lesson, students should be able to Explain what recursion is Design and write functions that use recursion.
Lecture 6: Linked Lists Linked lists Insert Delete Lookup Doubly-linked lists.
C++ / G4MICE Course Session 3 Introduction to Classes Pointers and References Makefiles Standard Template Library.
11 Values and References Chapter Objectives You will be able to: Describe and compare value types and reference types. Write programs that use variables.
And other languages…. must remember to check return value OR, must pass label/exception handler to every function Caller Function return status Caller.
Data Structures Using C++ 2E
Jun 16, 2014IAT 2651 Debugging. Dialectical Materialism  Dialectical materialism is a strand of Marxism, synthesizing Hegel's dialectics, which proposes.
Data Structures in Python By: Christopher Todd. Lists in Python A list is a group of comma-separated values between square brackets. A list is a group.
Extending Python with C (Part I – the Basics) June 2002 Brian Quinlan
C Programming Tutorial – Part I CS Introduction to Operating Systems.
‘Tirgul’ # 7 Enterprise Development Using Visual Basic 6.0 Autumn 2002 Tirgul #7.
Algorithm Programming Bar-Ilan University תשס"ח by Moshe Fresko.
Stephen P. Carl - CS 2421 Recursion Reading : Chapter 4.
Exceptions Syntax, semantics, and pragmatics Exceptions1.
CSE 425: Object-Oriented Programming I Object-Oriented Programming A design method as well as a programming paradigm –For example, CRC cards, noun-verb.
Chapter 13 Recursion. Learning Objectives Recursive void Functions – Tracing recursive calls – Infinite recursion, overflows Recursive Functions that.
Good Programming Practices for Building Less Memory-Intensive EDA Applications Alan Mishchenko University of California, Berkeley.
Arrays An array is a data structure that consists of an ordered collection of similar items (where “similar items” means items of the same type.) An array.
Defining New Types Lecture 21 Hartmut Kaiser
Recursion Recursion Chapter 12. Outline n What is recursion n Recursive algorithms with simple variables n Recursion and the run-time stack n Recursion.
Lecture 21 Multiple Inheritance. What is Multiple Inheritance? We defined inheritance earlier in the semester as a relationship between classes. If class.
Testing. 2 Overview Testing and debugging are important activities in software development. Techniques and tools are introduced. Material borrowed here.
CSIS 123A Lecture 9 Recursion Glenn Stevenson CSIS 113A MSJC.
Chapter 13. Procedural programming vs OOP  Procedural programming focuses on accomplishing tasks (“verbs” are important).  Object-oriented programming.
Built-in Data Structures in Python An Introduction.
Data TypestMyn1 Data Types The type of a variable is not set by the programmer; rather, it is decided at runtime by PHP depending on the context in which.
Introduction to C#. Why C#? Develop on the Following Platforms ASP.NET Native Windows Windows 8 / 8.1 Windows Phone WPF Android (Xamarin) iOS (Xamarin)
Programming in Java CSCI-2220 Object Oriented Programming.
(c) University of Washington15-1 CSC 143 Java List Implementation via Arrays Reading: 13.
CPS4200 Unix Systems Programming Chapter 2. Programs, Processes and Threads A program is a prepared sequence of instructions to accomplish a defined task.
(c) University of Washington16-1 CSC 143 Java Linked Lists Reading: Ch. 20.
(c) University of Washington16-1 CSC 143 Java Lists via Links Reading: Ch. 23.
Topic 3: C Basics CSE 30: Computer Organization and Systems Programming Winter 2011 Prof. Ryan Kastner Dept. of Computer Science and Engineering University.
Sorting: Implementation Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004.
Java Basics Opening Discussion zWhat did we talk about last class? zWhat are the basic constructs in the programming languages you are familiar.
Computer Graphics 3 Lecture 1: Introduction to C/C++ Programming Benjamin Mora 1 University of Wales Swansea Pr. Min Chen Dr. Benjamin Mora.
CS 261 Fall 2009 Dynamic Array Introduction (aka Vector, ArrayList)
ANU COMP2110 Software Design in 2003 Lecture 10Slide 1 COMP2110 Software Design in 2004 Lecture 12 Documenting Detailed Design How to write down detailed.
Fall 2015CISC/CMPE320 - Prof. McLeod1 CISC/CMPE320 Today: –Review declaration, implementation, simple class structure. –Add an exception class and show.
(c) , University of Washington18a-1 CSC 143 Java Searching and Recursion N&H Chapters 13, 17.
Chapter 5 Linked List by Before you learn Linked List 3 rd level of Data Structures Intermediate Level of Understanding for C++ Please.
Programming in java Packages Access Protection Importing packages Java program structure Interfaces Why interface Defining interface Accessing impln thru.
Templates Where the TYPE is generic. Templates for functions Used when the you want to perform the same operation on different data types. The definition.
LECTURE 20 Optimizing Python. THE NEED FOR SPEED By now, hopefully I’ve shown that Python is an extremely versatile language that supports quick and easy.
FIT Objectives By the end of this lecture, students should: understand the role of constructors understand how non-default constructors are.
Author: DoanNX Time: 45’.  OOP concepts  OOP in Java.
Data Structures Arrays and Lists Part 2 More List Operations.
Sorting: Implementation Fundamental Data Structures and Algorithms Margaret Reid-Miller 24 February 2004.
And other languages…. must remember to check return value OR, must pass label/exception handler to every function Caller Function return status Caller.
Recursion. Objectives At the conclusion of this lesson, students should be able to Explain what recursion is Design and write functions that use recursion.
Winter 2016CISC101 - Prof. McLeod1 CISC101 Reminders Assignment 5 is posted. Exercise 8 is very similar to what you will be doing with assignment 5. Exam.
Quiz 4 Topics Aid sheet is supplied with quiz. Functions, loops, conditionals, lists – STILL. New topics: –Default and Keyword Arguments. –Sets. –Strings.
Python C API overview References:
Pyragen A PYTHON WRAPPER GENERATOR TO APPLICATION CORE LIBRARIES Fernando PEREIRA, Christian THEIS - HSE/RP EDMS tech note:
Chapter 13 Recursion Copyright © 2016 Pearson, Inc. All rights reserved.
Recursion Version 1.0.
Lecture 20 Optimizing Python.
Chapter 5 Conclusion CIS 61.
Introduction to Python
CSC 143 Queues [Chapter 7].
Chapter 13 Recursion Copyright © 2010 Pearson Addison-Wesley. All rights reserved.
The IF Revisited A few more things Copyright © Curt Hill.
CSE 303 Concepts and Tools for Software Development
Presentation transcript:

Optimizing Python with Pyrex

Pyrex is cool but not magic Pyrex very seldom makes pure-Python code faster just through a recompile But if you understand how Pyrex works, you can dramatically improve Python program performance:  use static type checking,  static binding,  C calling conventions, etc.

Case Study 1: Bisect Module

Bisect module Originally coded in Python. Recoded in C recently for performance. Pure Python version is between 2 and 3 times slower than C version. Recompile as Pyrex improves performance by only a few percentage points.

Representative function def bisect_left(a, x, lo=0, hi=None): if hi is None: hi = len(a) while lo < hi: mid = (lo+hi)/2 if a[mid] < x: lo = mid+1 else: hi = mid return lo

Optimization step 1: Static types cdef int internal_bisect_left(a, x, int lo, int hi) except -1: cdef int mid while lo < hi: mid = (lo+hi)/2 if a[mid] < x: lo = mid + 1 else: hi = mid return lo

Why are we still slower? Pyrex calls PyObject_GetItem/SetItem rather than PySequence_GetItem/SetItem You can see the difference just in the type signatures! PyObject_GetItem(PyObject *o, PyObject *key); PySequence_GetItem(PyObject *o, int i);

Optimization step 2: Cheat Import and use PySequence_GetItem directly from Python.h if PySequence_GetItem(a,mid) < x: lo = mid + 1 else: hi = mid pyx

Can we automate this? I propose to change Pyrex to have first- class support for “sequence” types. I wrote a patch that does this.

Pyrex with sequence types Now my code looks like this: cdef int internal_bisect_left(sequence a, x, int lo, int hi) except -1: cdef int mid while lo < hi: mid = (lo+hi)/2 if a[mid] < x: lo = mid + 1 else: hi = mid return lo pyx pyx

One more twist The C bisect module has this code: if (PyList_Check(list)) { if (PyList_Insert(list, index, item) < 0) return NULL; } else { if (PyObject_CallMethod(list, "insert", "iO", index, item) == NULL) return NULL; }

Pyrex can do that too! if PyList_Check(a): PyList_Insert(a, lo, x) else: a.insert(lo, x) By the way, note how two lines of Pyrex equal approximately 6 of C. And yet they do all of the same error and reference checking!

Result Even so, Pyrex seems to come out ~25% slower than C. :( But half as many lines of code! No decrefs! No goto statements! Automatic exception propagation! Seems like a good trade-off to me!

Case Study 2: Fibonnaci A digression on benchmarking

Be careful As always, you have to be careful benchmarking and optimizing. For instance: GCC compiler flags can make all of the difference. Furthermore, applying them is not always straightforward.

Fibonacci optimizations Opt Level Pyrex Plain C None 0m25.790s 0m18.180s -O 0m13.990s 0m13.370s -O2 0m15.450s 0m9.440s -O3 0m9.720s 0m5.840s -03 -fun…* 0m7.430s 0m4.730s (* -fun… = -funroll-loops)

Case Study 3: Heap Queue

Heap queue Similar story to Bisect. But even with sequence types added, Pyrex loses out. heapq.c uses some highly optimized PyList APIs: #define PyList_GET_ITEM(op, i) (((PyListObject *)(op))->ob_item[i])

Pyrex can cheat too! cdef extern from "Python.h": int PyList_Check(object PyObj) int PyList_GET_SIZE(object PyList) cdef void *PyList_GET_ITEM(object PyList, int idx) … lastelt = PyList_GET_ITEM(lst, length-1)

Why does this work? Pyrex compiles to C. The C preprocessor will recognize calls to PyList_GET_ITEM Note that you have to be careful about refcounts!

Results With every cheat I could think of…  Crushed Python (as much as 6 times faster depending on the operation)  Didn’t quite rival C (30-60% slower).  Maybe somebody smarter than me can find some optimizations I missed… But no code like this: tmp = PyList_GET_ITEM(heap, childpos); Py_INCREF(tmp); Py_DECREF(PyList_GET_ITEM(heap, pos)); PyList_SET_ITEM(heap, pos, tmp);

Case Study 4: Py(e)xpat

Real-world example Years ago I was one of several people to help wrap Expat in C. It wasn’t rocket science but it was a pain. There were many niggly details.  E.g. a couple of 40 line functions to allow stack traces to propagate from callbacks through Python???  Ugly macros everywhere  Careful reference counting  Plus all of the usual C extension ugliness I tried again with Pyrex, yielding pyxpat.

Results The Pyrex version is consistently faster than the C version, but it may cheat a bit:  the C version implements some optional features that the Pyrex version does not.  in C, the benchmark doesn’t use the features  in Pyrex, it doesn’t even have the feature.  that means a few less conditionals and branches.  either way, Pyrex is impressively fast!

Statistics I parsed a file with the following statistics:  400 MB of data  5 million elements  10 million startElement/endElement callbacks into pure Python code  it took 3 minutes on my 700MHZ PowerBook.  this exercise is not I/O bound or Pyrex couldn’t consistently beat the C API.

One step further Pyxpat can explicitly expose a callback interface that uses Pyrex/C calling conventions rather than Python? Without changing our logic or interface, we can slash the time in half! It turns out that half of Pyexpat’s time is spent on the Python function call dance. And that’s even ignoring type coercion issues.

Conclusion

Where does Pyrex fit in? Pyrex code isn’t quite as fast as hand-coded C. Bu you can focus on algorithm rather than pointers and buffers. (sound familiar?) Pyrex code can live on a spectrum  from “almost as abstract as Python”  to “almost as efficient as C.” The next best thing to the best of both worlds.

What are the obstacles? You will only get performance out of Pyrex if you understand implementation details. Pyrex should allow programmers to declare information relevant to the optimizer. Pyrex ought to generate more efficient code by default.

Optimizing Pyrex itself Pyrex itself is ripe for optimization:  More knowledge about Python container types  Better string interning  Static binding of globals and builtins (yes, Pyrex looks up len() in __builtins__)  Bypass PyArg_ParseTuple when there are no arguments  PyObject_NEW instead of PyObject)_CallObject

My thoughts Given how easy Pyrex is, I predict that Python+Pyrex programs will typically go faster than Python+C programs given the same amount of programmer effort. If you are building a system that needs some performance…try Pyrex. It is probably fast enough for anything short of a database or network stack.