Reflections on Dynamic Languages and Parallelism David Padua University of Illinois at Urbana-Champaign 1.

Slides:



Advertisements
Similar presentations
Anthony Delprete Jason Mckean Ryan Pineres Chris Olszewski.
Advertisements

1 Chapter 1 Why Parallel Computing? An Introduction to Parallel Programming Peter Pacheco.
Introductory Courses in High Performance Computing at Illinois David Padua.
March 18, 2008SSE Meeting 1 Mary Hall Dept. of Computer Science and Information Sciences Institute Multicore Chips and Parallel Programming.
Data Analytics and Dynamic Languages Lee E. Edlefsen, Ph.D. VP of Engineering 1.
DISTRIBUTED AND HIGH-PERFORMANCE COMPUTING CHAPTER 7: SHARED MEMORY PARALLEL PROGRAMMING.
CS 584. A Parallel Programming Model We need abstractions to make it simple. The programming model needs to fit our parallel machine model. Abstractions.
Copyright © Richard N. Taylor, Nenad Medvidovic, and Eric M. Dashofy. All rights reserved. Software Connectors Software Architecture Lecture 7.
1: Operating Systems Overview
1 New Architectures Need New Languages A triumph of optimism over experience! Ian Watson 3 rd July 2009.
Program analysis and synthesis for parallel computing David Padua University of Illinois at Urbana-Champaign.
 Parallel Computer Architecture Taylor Hearn, Fabrice Bokanya, Beenish Zafar, Mathew Simon, Tong Chen.
Department of Computer Science
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 ECE 498AL Programming Massively Parallel Processors.
To GPU Synchronize or Not GPU Synchronize? Wu-chun Feng and Shucai Xiao Department of Computer Science, Department of Electrical and Computer Engineering,
Copyright © Richard N. Taylor, Nenad Medvidovic, and Eric M. Dashofy. All rights reserved. Software Connectors Software Architecture Lecture 7.
Hossein Bastan Isfahan University of Technology 1/23.
Programming for High Performance Computers John M. Levesque Director Cray’s Supercomputing Center Of Excellence.
Advances in Language Design
Lecture 29 Fall 2006 Lecture 29: Parallel Programming Overview.
Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.
OpenMP in a Heterogeneous World Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group University of Houston.
Design-Making Projects Work (Chapter7) n Large Projects u Design often distinct from analysis or coding u Project takes weeks, months or years to create.
LOGO OPERATING SYSTEM Dalia AL-Dabbagh
Operating System Review September 10, 2012Introduction to Computer Security ©2004 Matt Bishop Slide #1-1.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
SPACE TELESCOPE SCIENCE INSTITUTE Operated for NASA by AURA COS Pipeline Language(s) We plan to develop CALCOS using Python and C Another programming language?
PL/B: A Programming Language for the Blues George Almasi, Luiz DeRose, Jose Moreira, and David Padua.
Providing Policy Control Over Object Operations in a Mach Based System By Abhilash Chouksey
(Superficial!) Review of Uniprocessor Architecture Parallel Architectures and Related concepts CS 433 Laxmikant Kale University of Illinois at Urbana-Champaign.
AN EXTENDED OPENMP TARGETING ON THE HYBRID ARCHITECTURE OF SMP-CLUSTER Author : Y. Zhao 、 C. Hu 、 S. Wang 、 S. Zhang Source : Proceedings of the 2nd IASTED.
SJSU SPRING 2011 PARALLEL COMPUTING Parallel Computing CS 147: Computer Architecture Instructor: Professor Sin-Min Lee Spring 2011 By: Alice Cotti.
Definitions Speed-up Efficiency Cost Diameter Dilation Deadlock Embedding Scalability Big Oh notation Latency Hiding Termination problem Bernstein’s conditions.
High Performance Fortran (HPF) Source: Chapter 7 of "Designing and building parallel programs“ (Ian Foster, 1995)
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 Basic Parallel Programming Concepts Computational.
1 Parallel Programming Aaron Bloomfield CS 415 Fall 2005.
1CPSD Software Infrastructure for Application Development Laxmikant Kale David Padua Computer Science Department.
Spring 2003CSE P5481 Issues in Multiprocessors Which programming model for interprocessor communication shared memory regular loads & stores message passing.
Lecture 8 Page 1 CS 111 Online Other Important Synchronization Primitives Semaphores Mutexes Monitors.
1 "Workshop 31: Developing a Hands-on Undergraduate Parallel Programming Course with Pattern Programming SIGCSE The 44 th ACM Technical Symposium.
© David Kirk/NVIDIA, Wen-mei W. Hwu, and John Stratton, ECE 498AL, University of Illinois, Urbana-Champaign 1 CUDA Lecture 7: Reductions and.
FORTRAN History. FORTRAN - Interesting Facts n FORTRAN is the oldest Language actively in use today. n FORTRAN is still used for new software development.
13-1 Chapter 13 Concurrency Topics Introduction Introduction to Subprogram-Level Concurrency Semaphores Monitors Message Passing Java Threads C# Threads.
Compilers as Collaborators and Competitors of High-Level Specification Systems David Padua University of Illinois at Urbana-Champaign.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 ECE 498AL Spring 2010 Lecture 13: Basic Parallel.
Software Connectors Acknowledgement: slides mostly from Software Architecture: Foundations, Theory, and Practice; Richard N. Taylor, Nenad Medvidovic,
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
Fortran Compilers David Padua University of Illinois at Urbana-Champaign.
3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,
Exploring Parallelism with Joseph Pantoga Jon Simington.
Parallel Computing Presented by Justin Reschke
CPE779: Shared Memory and OpenMP Based on slides by Laxmikant V. Kale and David Padua of the University of Illinois.
PYTHON FOR HIGH PERFORMANCE COMPUTING. OUTLINE  Compiling for performance  Native ways for performance  Generator  Examples.
1 Parallel Processing Fundamental Concepts. 2 Selection of an Application for Parallelization Can use parallel computation for 2 things: –Speed up an.
CS498 DHP Program Optimization Fall Course organization  Instructors: Mar í a Garzar á n David Padua.
Programming paradigms
Pattern Parallel Programming
Programming Models for SimMillennium
MATLAB HPCS Extensions
Ganesh Bikshandi, Jia Guo, Daniel Hoeflinger, Gheorghe Almasi
COMP60621 Fundamentals of Parallel and Distributed Systems
Chapter 4: Threads & Concurrency
MATLAB HPCS Extensions
Software Architecture Lecture 7
Software Architecture Lecture 7
MATLAB HPCS Extensions
Software Architecture Lecture 7
COMP60611 Fundamentals of Parallel and Distributed Systems
Software Architecture Lecture 6
Presentation transcript:

Reflections on Dynamic Languages and Parallelism David Padua University of Illinois at Urbana-Champaign 1

Parallel Programming 2  Parallel programming is  Sometimes for Productivity  Because some problems are more naturally solved in parallel. For example, some simulations, reactive programs.  Most often for Performance  Serial machines are not powerful enough  For scalability across machine generations. Scalability more important than absolute performance for microprocessor industry.  Parallel programming can be  Implicit – Library/Runtime/compiler  Explicit – Threading, multiprocessing, parallel loops  Shared-memory  Distributed memory

Dynamic Languages 3  Dynamic languages are for  Productivity. They “Make programmers super productive”.  Not performance  DLs are typically slow.  (1000 ?) times slower than corresponding C or Fortran  Sufficiently fast for many problems and excellent for prototyping in all cases  But must manually rewrite prototype if performance is needed.

Parallel Programming with Dynamic Languages 4  Not always accepted by the DL community  Hearsay: javascript designers are unwilling to add parallel extensions.  Some in the python community prefer not to remove GIL – serial computing simplifies matters.  Not (always) great for performance  Not much of an effort is made for a highly efficient, effective form of parallelism.  For example, Python’s GIL and its implementation.  In MATLAB, programmer controlled communication from desktop to worker.

Parallel Programming with Dynamic Languages (cont.) 5  Not (always) great to facilitate expressing parallelism (productivity)  In some cases (e.g. MATLAB) parallel programing constructs were not part of the language at the beginning.  Sharing of data not always possible.  Python it seems that arrays can be shared between processes, but not other classes of data.  In MATLAB, there is no shared memory.  Message passing is the preferred form of communication.  Process to process in the case of Python.  Client to worker in the case of MATLAB  MATLAB’s parfor has complex design rules

Why Parallel Dynamic Language Programs ? 6  There are reasons to improve the current situation  Parallelism might be necessary for dynamic languages to have a future in the multicore era.  Lack of parallelism would mean no performance improvement across machine generations.  DLs are not totally performance oblivious. They are enabled by very powerful machines.  When parallelism is explicit  For some problems it helps productivity  Enable prototyping of high-performing parallel codes.  Super productive parallel programming ?  Can parallelism be used to close the performance gap with conventional languages ?

Detour. The real answer 7  But, if you want performance, you don’t need parallelism, all you need is a little

MaJIC 8

MaJIC Results 9

How to introduce parallelism 1. Autoparallelization 10  Parallelism is automatic via compiler/interpreter  Perfect productivity  But the technology does not work in al cases. Not even for scientific programs.  Next slide shows an simple experiment on vectorization  Three compilers and a few simple loops.  Technology is not there not even for vectorization.

How to introduce parallelism 1. Autoparallelization (cont.) 11

How to introduce parallelism 1. Autoparallelization (cont.) 12  Would dynamic compilation improve the situation ?  NO

How to introduce parallelism 2. Libraries of parallel kernels 13  Again, programmer does not need to do anything  Again, perfect from productivity point of view.  This is the performance model of MATLAB.  Good performance if most of the computation were represented in terms of kernels.  Parallelism if most of the computation were represented in terms of parallel kernels.  But not all programs can be written in terms of library routines.

How to introduce parallelism 3. (Data) Parallel Operators 14  Data parallel programs have good properties  Can be analyzed using sequential semantics  Parallelism is encapsulated  Can be used to enforce determinacy  For scientific codes, array notation produces highly compact and (sometimes) readable programs. Array notation introduced before parallelism (APL ca. 1960).  Recent (Re)Emergence of data parallel languages (e.g. Ct, )  But they are explicitly parallel  More complex program development than their sequential counterpart.  Not all forms of parallelism can be nicely represented  Pipelining  General task

15  Blocking/tiling crucial for locality and parallel programming.  Our approach makes tiles first class objects.  Referenced explicitly.  Manipulated using array operations such as reductions, gather, etc.. Joint work with IBM Research. G. Bikshandi, J. Guo, D. Hoeflinger, G. Almasi, B. Fraguela, M. Garzarán, D. Padua, and C. von Praun. Programming for Parallelism and Locality with Hierarchically Tiled. PPoPP, March (Data) Parallel Operators Extending MATLAB: Hierarchically Tiled Arrays

16 2 X 2 tiles map to distinct modules of a cluster 4 X 4 tiles Use to enhance locality on L1-cache 3. (Data) Parallel Operators Extending MATLAB: Hierarchically Tiled Arrays

17 h{1,1:2} h{2,1} h{2,1}(1,2) tiles 3. (Data) Parallel Operators Extending MATLAB: Hierarchically Tiled Arrays

18 for I=1:q:n for J=1:q:n for K=1:q:n for i=I:I+q-1 for j=J:J+q-1 for k=K:K+q-1 C(i,j)=C(i,j)+A(i,k)*B(k,j); end for i=1:m for j=1:m for k=1:m C{i,j} = C{i,j} +A{i,k}*B{k,j}; end 3. (Data) Parallel Operators Sequential MMM in MATLAB with HTAs

19 T2{:,:} B T1{:,:} A matmul function C = summa (A, B, C) for k=1:m T1 = repmat(A{:, k}, 1, m); T2 = repmat(B{k, :}, m, 1); C = C + matmul(T1{:,:},T2 {:,:}); end repmat broadcast parallel computation 3. (Data) Parallel Operators Parallel MMM in MATLAB with HTAs

How to introduce parallelism 4. General mechanisms 20  This means  Fork, join  Parallel loops  Synchronization, semaphores, monitors  Already in many languages. May need improvement, but no conceptual difficulty.  Maximize flexibility/maximize complexity  But, Good bye super productivity !  Race conditions  Tuning

Conclusions 21  DLs of the future are likely to accommodate parallelism better than they do today.  There are several possible approaches to introduce parallelism, but none is perfect.  When (if?) parallelism becomes the norm:  Performance will have a more important role in DL programming  Therefore, at least for some classes of problems, super productivity will suffer.  Advances in compilers, libraries, and language extensions will help recover some of the lost ground