Presentation is loading. Please wait.

Presentation is loading. Please wait.

Compiling Dynamic Data Structures in Python to Enable the Use of Multi-core and Many-core Libraries Bin Ren, Gagan Agrawal 9/18/2018.

Similar presentations


Presentation on theme: "Compiling Dynamic Data Structures in Python to Enable the Use of Multi-core and Many-core Libraries Bin Ren, Gagan Agrawal 9/18/2018."— Presentation transcript:

1 Compiling Dynamic Data Structures in Python to Enable the Use of Multi-core and Many-core Libraries
Bin Ren, Gagan Agrawal 9/18/2018

2 Outline Background Analysis of Python Efficiency
Design of the Framework Linearization Insertion Homo-Decision Experiments Evaluation Conclusion Related Work 9/18/2018

3 Background Programming landscape Productivity Performance
Traditional single-core programming model & programming languages: C/C++; Java; C#; Python; Multi-core and Many-core programming model & programming languages: OpenMP; MPI; CUDA; OpenCL; 9/18/2018

4 Existing Ways to Improve Productivity
High level parallel programming model/libraries Map-reduce/FREERIDE CUBLAS/Tensor Contractor High level parallel programming languages Chapel, X10 Pig Latin, Sawzall 9/18/2018

5 Our Focus Start with an existing language: (Pure) Python
Popular across a lot of communities Easy learning curve Very high productivity Low performance Utilize the existing libraries More domain oriented Low level languages for multi-core/ many-core environments Optimized libraries – high performance Relatively Low productivity Start with an existing popular programming language 9/18/2018

6 Outline Background Analysis of Python Efficiency
Design of the Framework Linearization Insertion Homo-Decision Experiments Evaluation Conclusion Related Work 9/18/2018

7 Motivating Application
DGEMM (Double General Matrix Multiplication) Python: Nested lists Gen C++: Nested vector-based containers Manual C: 1/multi-dimensional arrays 9/18/2018

8 Possible factors to decrease the efficiency
Interpreted programming language Dynamic type inference Dynamic pointer based data structures: - list - dictionary - set - … 9/18/2018

9 Nested List Data Structure
data[l]: data[0] data[1] data[l-1] b1[0] b1[1] b1[n-1] b2 a1[0] a1[1] a1[m-1] a2 a1’[0] a1’[1] a1’[m-1] 9/18/2018

10 Analysis of Python Efficiency
Possible approach to improve the performance Compiling Python to a low level programming language Handling the typing issues during the compilation Flattening the dynamic data structure to dense memory buffers Challenges: - Flattening nested dynamic data structures is not trivial - How to reduce the overhead incurred by the flattening operations - Only the homogeneous data can be stored into the continuous array 9/18/2018

11 Outline Background Analysis of Python Efficiency
Design of the Framework Linearization Insertion Homo-Decision Experiments Evaluation Conclusion Related Work 9/18/2018

12 Overview of the framework
Data Transformation; Transformation Insertion; Homo-Decision. 9/18/2018

13 Contributions Linearization Algorithm Two-stage Insertion algorithm
Transform the data set from pointer-intensive to dense memory buffer Two-stage Insertion algorithm A lightweight demand-driven IPRE to reduce the data transformation overhead Homo-Decision algorithm Whether the elements in a data set are in the same type 9/18/2018

14 Data Transformation Algorithms
data[l]: data[0] data[1] data[l-1] b1[0] b1[1] b1[n-1] b2 a1[0] a1[1] a1[m-1] a2 a1’[0] a1’[1] a1’[m-1] Linearizing Alg Mapping Alg Linear_data[ ]: a1’[0] a1’[m-1] a2 b2 m n l 9/18/2018

15 The Insertion Algorithm
Two-stage algorithm: Insert the linearization function immediately before every usage of the dynamic data structure Optimize the generated code by an inter-procedural partial redundancy elimination (IPRE) algorithm 9/18/2018

16 The Insertion Algorithm
Basic PRE algorithm Eliminate Duplicated Evaluations of Exps Eliminate Duplicated Function Calls of Linear 9/18/2018

17 Design of the framework
IPRE algorithm main kmeans_reduction kmeans update_clusters 9/18/2018

18 IPRE Overview Simpler PRE algorithm Demand-driven Light Weight
Developed by Paleri et al. Demand-driven Light Weight Analyze on a small number of procedures Assumption Inter-procedure pointer-analysis and alias-analysis Pull-out Strategy If there is no modification to list li between the entry and linearize(li) in some procedure, we can pull it outside the procedure Propagate Strategy The modification of list li in some procedure to its parent procedure 9/18/2018

19 IPRE points clusters 9/18/2018

20 The Data Flow Analysis to Check Homogeneity
Primitive way Check the type of the elements in the data set one by one Shortcoming: time consuming Our method Start from the program itself Check the possibility of assigning the data with different types Simple data flow analysis – high efficiency 9/18/2018

21 Example myList = [ ] loop header if (someCon) myList.append (a)
myList.append (b) … = myList 9/18/2018

22 Outline Background Analysis of Python Efficiency
Design of the Framework Linearization Insertion Homo-Decision Experiments Evaluation Conclusion Related Work 9/18/2018

23 Experiment setup Environment Code versions
K-means and PCA (FREERIDE): Glenn System from OSC DGEMM and TM (CUBLAS & Tensor Contractor Generated CUDA code): Bale System from OSC Code versions Python: Python code Gen C++: Shedskin generated C++ code WOPRE: Generated code without IPRE WPRE: Generated code with IPRE OPT: Generated code with further optimization Manual: The manual C code 9/18/2018

24 K-Means 800M; k = 100; iter = 1 <left> & iter = 10 <right>
By IPRE, the overhead can be reduced, especially in the multi-iteration version; The performance of the final version is very similar to the manual version, and overhead is around 10%. For smaller data set (8M), Python needs sec and sec For the same data set (800M), Gen C++ needs sec and sec 9/18/2018

25 PCA 800M; row = 1000; column = 100, 000 For smaller data set (8M), Python runs sec For the same data set (800M), Gen C++ needs 3280 sec IPRE has to be applied; The performance of the final version is very similar to the manual version, and the overhead is around 10%~20%. 9/18/2018

26 Linear Algebra Applications
DGEMM (left) & Tensor Multiplication (right) The performance of DGEMM for 1000*1000 has been reported in former section The performance of TM: for conf. 1, python runs sec and Gen C++ runs sec The IPRE can reduce the overhead by around 50% 9/18/2018

27 Outline Background Analysis of Python Efficiency
Design of the Framework Linearization Insertion Homo-Decision Experiments Evaluation Conclusion Related Work 9/18/2018

28 Conclusion Propose a compilation framework
Compile pure Python to invoke multi-core and many-core libraries Design three novel algorithms Inter-procedural PRE algorithm, Homogeneity Checking algorithm and Linearization-Mapping schema Good experiments performance Generated code only 10%-20% slower than the hand-written C 9/18/2018

29 Related Work Improve the efficiency of Python
Using extension libraries - NumPy & SciPy - PyCUDA & PyOpenCL - Copperhead … Compiling to low level programming language - Cython - Pyrex - Shedskin … Improve the efficiency of Pointer-based data structure Explore the nature of irregular algorithms Automatic pool allocation Transformation of recursive data/computation structures Data reordering 9/18/2018

30 Thank you for your attention!
Any Questions? 9/18/2018


Download ppt "Compiling Dynamic Data Structures in Python to Enable the Use of Multi-core and Many-core Libraries Bin Ren, Gagan Agrawal 9/18/2018."

Similar presentations


Ads by Google