Compiling Dynamic Data Structures in Python to Enable the Use of Multi-core and Many-core Libraries Bin Ren, Gagan Agrawal 9/18/2018.

Compiling Dynamic Data Structures in Python to Enable the Use of Multi-core and Many-core Libraries
Bin Ren, Gagan Agrawal 9/18/2018

Outline Background Analysis of Python Efficiency
Design of the Framework Linearization Insertion Homo-Decision Experiments Evaluation Conclusion Related Work 9/18/2018

Background Programming landscape Productivity Performance
Traditional single-core programming model & programming languages: C/C++; Java; C#; Python; … Multi-core and Many-core programming model & programming languages: OpenMP; MPI; CUDA; OpenCL; … 9/18/2018

Existing Ways to Improve Productivity
High level parallel programming model/libraries Map-reduce/FREERIDE CUBLAS/Tensor Contractor … High level parallel programming languages Chapel, X10 Pig Latin, Sawzall 9/18/2018

Our Focus Start with an existing language: (Pure) Python
Popular across a lot of communities Easy learning curve Very high productivity Low performance Utilize the existing libraries More domain oriented Low level languages for multi-core/ many-core environments Optimized libraries – high performance Relatively Low productivity Start with an existing popular programming language 9/18/2018

Motivating Application
DGEMM (Double General Matrix Multiplication) Python: Nested lists Gen C++: Nested vector-based containers Manual C: 1/multi-dimensional arrays 9/18/2018

Possible factors to decrease the efficiency
Interpreted programming language Dynamic type inference Dynamic pointer based data structures: - list - dictionary - set - … 9/18/2018

Nested List Data Structure
data[l]: data[0] data[1] … data[l-1] b1[0] b1[1] b1[n-1] b2 a1[0] a1[1] a1[m-1] a2 a1’[0] a1’[1] a1’[m-1] 9/18/2018

Analysis of Python Efficiency
Possible approach to improve the performance Compiling Python to a low level programming language Handling the typing issues during the compilation Flattening the dynamic data structure to dense memory buffers Challenges: - Flattening nested dynamic data structures is not trivial - How to reduce the overhead incurred by the flattening operations - Only the homogeneous data can be stored into the continuous array 9/18/2018

Overview of the framework
Data Transformation; Transformation Insertion; Homo-Decision. 9/18/2018

Contributions Linearization Algorithm Two-stage Insertion algorithm
Transform the data set from pointer-intensive to dense memory buffer Two-stage Insertion algorithm A lightweight demand-driven IPRE to reduce the data transformation overhead Homo-Decision algorithm Whether the elements in a data set are in the same type 9/18/2018

Data Transformation Algorithms
data[l]: data[0] data[1] … data[l-1] b1[0] b1[1] b1[n-1] b2 a1[0] a1[1] a1[m-1] a2 a1’[0] a1’[1] a1’[m-1] Linearizing Alg Mapping Alg Linear_data[ ]: a1’[0] a1’[m-1] … a2 b2 m n l 9/18/2018

The Insertion Algorithm
Two-stage algorithm: Insert the linearization function immediately before every usage of the dynamic data structure Optimize the generated code by an inter-procedural partial redundancy elimination (IPRE) algorithm 9/18/2018

The Insertion Algorithm
Basic PRE algorithm Eliminate Duplicated Evaluations of Exps Eliminate Duplicated Function Calls of Linear 9/18/2018

Design of the framework
IPRE algorithm main kmeans_reduction kmeans update_clusters 9/18/2018

IPRE Overview Simpler PRE algorithm Demand-driven Light Weight
Developed by Paleri et al. Demand-driven Light Weight Analyze on a small number of procedures Assumption Inter-procedure pointer-analysis and alias-analysis Pull-out Strategy If there is no modification to list li between the entry and linearize(li) in some procedure, we can pull it outside the procedure Propagate Strategy The modification of list li in some procedure to its parent procedure 9/18/2018

IPRE points clusters 9/18/2018

The Data Flow Analysis to Check Homogeneity
Primitive way Check the type of the elements in the data set one by one Shortcoming: time consuming Our method Start from the program itself Check the possibility of assigning the data with different types Simple data flow analysis – high efficiency 9/18/2018

Example myList = [ ] loop header if (someCon) myList.append (a)
myList.append (b) … … = myList 9/18/2018

Experiment setup Environment Code versions
K-means and PCA (FREERIDE): Glenn System from OSC DGEMM and TM (CUBLAS & Tensor Contractor Generated CUDA code): Bale System from OSC Code versions Python: Python code Gen C++: Shedskin generated C++ code WOPRE: Generated code without IPRE WPRE: Generated code with IPRE OPT: Generated code with further optimization Manual: The manual C code 9/18/2018

K-Means 800M; k = 100; iter = 1 <left> & iter = 10 <right>
By IPRE, the overhead can be reduced, especially in the multi-iteration version; The performance of the final version is very similar to the manual version, and overhead is around 10%. For smaller data set (8M), Python needs sec and sec For the same data set (800M), Gen C++ needs sec and sec 9/18/2018

PCA 800M; row = 1000; column = 100, 000 For smaller data set (8M), Python runs sec For the same data set (800M), Gen C++ needs 3280 sec IPRE has to be applied; The performance of the final version is very similar to the manual version, and the overhead is around 10%~20%. 9/18/2018

Linear Algebra Applications
DGEMM (left) & Tensor Multiplication (right) The performance of DGEMM for 1000*1000 has been reported in former section The performance of TM: for conf. 1, python runs sec and Gen C++ runs sec The IPRE can reduce the overhead by around 50% 9/18/2018

Conclusion Propose a compilation framework
Compile pure Python to invoke multi-core and many-core libraries Design three novel algorithms Inter-procedural PRE algorithm, Homogeneity Checking algorithm and Linearization-Mapping schema Good experiments performance Generated code only 10%-20% slower than the hand-written C 9/18/2018

Related Work Improve the efficiency of Python
Using extension libraries - NumPy & SciPy - PyCUDA & PyOpenCL - Copperhead … Compiling to low level programming language - Cython - Pyrex - Shedskin … Improve the efficiency of Pointer-based data structure Explore the nature of irregular algorithms Automatic pool allocation Transformation of recursive data/computation structures Data reordering 9/18/2018

Thank you for your attention!
Any Questions? 9/18/2018

Compiling Dynamic Data Structures in Python to Enable the Use of Multi-core and Many-core Libraries Bin Ren, Gagan Agrawal 9/18/2018.

Similar presentations

Presentation on theme: "Compiling Dynamic Data Structures in Python to Enable the Use of Multi-core and Many-core Libraries Bin Ren, Gagan Agrawal 9/18/2018."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Compiling Dynamic Data Structures in Python to Enable the Use of Multi-core and Many-core Libraries Bin Ren, Gagan Agrawal 9/18/2018.

Similar presentations

Presentation on theme: "Compiling Dynamic Data Structures in Python to Enable the Use of Multi-core and Many-core Libraries Bin Ren, Gagan Agrawal 9/18/2018."— Presentation transcript:

Similar presentations

About project

Feedback