Parallelism without Concurrency Charles E. Leiserson MIT.

Slides:



Advertisements
Similar presentations
Safe Open-Nested Transactions Jim Sukha MIT CSAIL Supercomputing Technologies Group Kunal Agrawal, I-Ting Angelina Lee, Bradley C. Kuszmaul, Charles E.
Advertisements

Multi-core Computing Lecture 3 MADALGO Summer School 2012 Algorithms for Modern Parallel and Distributed Models Phillip B. Gibbons Intel Labs Pittsburgh.
© 2009 Charles E. Leiserson and Pablo Halpern1 Introduction to Cilk++ Programming PADTAD July 20, 2009 Cilk, Cilk++, Cilkview, and Cilkscreen, are trademarks.
Software & Services Group PinPlay: A Framework for Deterministic Replay and Reproducible Analysis of Parallel Programs Harish Patil, Cristiano Pereira,
Pessimistic Software Lock-Elision Nir Shavit (Joint work with Yehuda Afek Alexander Matveev)
Rules for Designing Multithreaded Applications CET306 Harry R. Erwin University of Sunderland.
Introduction CSCI 444/544 Operating Systems Fall 2008.
Prof. Srinidhi Varadarajan Director Center for High-End Computing Systems.
The Path to Multi-core Tools Paul Petersen. Multi-coreToolsThePathTo 2 Outline Motivation Where are we now What is easy to do next What is missing.
Transactional Memory (TM) Evan Jolley EE 6633 December 7, 2012.
Timing Predictability - A Must for Avionics Systems - Reinhard Wilhelm Saarland University, Saarbrücken.
Lock-free Cache-friendly Software Queue for Decoupled Software Pipelining Student: Chen Wen-Ren Advisor: Wuu Yang 學生 : 陳韋任 指導教授 : 楊武 Abstract Multicore.
CISC 879 : Software Support for Multicore Architectures John Cavazos Dept of Computer & Information Sciences University of Delaware
Language Support for Lightweight transactions Tim Harris & Keir Fraser Presented by Narayanan Sundaram 04/28/2008.
1 New Architectures Need New Languages A triumph of optimism over experience! Ian Watson 3 rd July 2009.
1 CS 140 : Jan 27 – Feb 3, 2010 Multicore (and Shared Memory) Programming with Cilk++ Multicore and NUMA architectures Multithreaded Programming Cilk++
Homework 2 In the docs folder of your Berkeley DB, have a careful look at documentation on how to configure BDB in main memory. In the docs folder of your.
Designing Predictable and Robust Systems Tom Henzinger UC Berkeley and EPFL.
November 18, 2004 Embedded System Design Flow Arkadeb Ghosal Alessandro Pinto Daniele Gasperini Alberto Sangiovanni-Vincentelli
Software Issues Derived from Dr. Fawcett’s Slides Phil Pratt-Szeliga Fall 2009.
Learning From Mistakes—A Comprehensive Study on Real World Concurrency Bug Characteristics Shan Lu, Soyeon Park, Eunsoo Seo and Yuanyuan Zhou Appeared.
To GPU Synchronize or Not GPU Synchronize? Wu-chun Feng and Shucai Xiao Department of Computer Science, Department of Electrical and Computer Engineering,
© 2009 Mathew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen 1 Concurrency in Programming Languages Matthew J. Sottile Timothy G. Mattson Craig.
SEC(R) 2008 Intel® Concurrent Collections for C++ - a model for parallel programming Nikolay Kurtov Software and Services.
1 Programming Multicore Processors Aamir Shafi High Performance Computing Lab
Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.
Prospector : A Toolchain To Help Parallel Programming Minjang Kim, Hyesoon Kim, HPArch Lab, and Chi-Keung Luk Intel This work will be also supported by.
Lecture 3 – Parallel Performance Theory - 1 Parallel Performance Theory - 1 Parallel Computing CIS 410/510 Department of Computer and Information Science.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Multicore In Real-Time Systems – Temporal Isolation Challenges Due To Shared Resources Ondřej Kotaba, Jan Nowotsch, Michael Paulitsch, Stefan.
Multi-Core Architectures
Uncovering the Multicore Processor Bottlenecks Server Design Summit Shay Gal-On Director of Technology, EEMBC.
Software & the Concurrency Revolution by Sutter & Larus ACM Queue Magazine, Sept For CMPS Halverson 1.
© 2008 CILK ARTS, Inc.1 Executive Briefing: Multicore- Enabling SaaS Applications September 3, 2008 Cilk++, Cilk, Cilkscreen, and Cilk Arts are trademarks.
4.2.1 Programming Models Technology drivers – Node count, scale of parallelism within the node – Heterogeneity – Complex memory hierarchies – Failure rates.
Java Threads 11 Threading and Concurrent Programming in Java Introduction and Definitions D.W. Denbo Introduction and Definitions D.W. Denbo.
ROBERT BOCCHINO, ET AL. UNIVERSAL PARALLEL COMPUTING RESEARCH CENTER UNIVERSITY OF ILLINOIS A Type and Effect System for Deterministic Parallel Java *Based.
Jess: A Rule-Based Programming Environment Reporter: Yu Lun Kuo Date: April 10, 2006 Expert System.
HPC User Forum Back End Compiler Panel SiCortex Perspective Kevin Harris Compiler Manager April 2009.
Adaptive Multi-Threading for Dynamic Workloads in Embedded Multiprocessors 林鼎原 Department of Electrical Engineering National Cheng Kung University Tainan,
On the Performance of Window-Based Contention Managers for Transactional Memory Gokarna Sharma and Costas Busch Louisiana State University.
CS 460/660 Compiler Construction. Class 01 2 Why Study Compilers? Compilers are important – –Responsible for many aspects of system performance Compilers.
1. 2 Pipelining vs. Parallel processing  In both cases, multiple “things” processed by multiple “functional units” Pipelining: each thing is broken into.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University IWPSE 2003 Program.
Drinking from Both Glasses: Adaptively Combining Pessimistic and Optimistic Synchronization for Efficient Parallel Runtime Support Man Cao Minjia Zhang.
Deadlock Analysis with Fewer False Positives Thread T1: sync(G){ sync(L1){ sync(L2){} } }; T3 = new T3(); j3.start(); J3.join(); sync(L2){ sync(L1){} }
1 "Workshop 31: Developing a Hands-on Undergraduate Parallel Programming Course with Pattern Programming SIGCSE The 44 th ACM Technical Symposium.
Programmability Hiroshi Nakashima Thomas Sterling.
Virtual Application Profiler (VAPP) Problem – Increasing hardware complexity – Programmers need to understand interactions between architecture and their.
Platform Abstraction Group 3. Question How to deal with different types hardware and software platforms? What detail to expose to the programmer? What.
3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,
Parallel Computing Presented by Justin Reschke
Maintaining SP Relationships Efficiently, on-the-fly Jeremy Fineman.
Concurrency (Threads) Threads allow you to do tasks in parallel. In an unthreaded program, you code is executed procedurally from start to finish. In a.
April 8, 2005Transactions Everywhere 1 Bradley C. Kuszmaul Charles E. Leiserson MIT CSAIL Joint research with Scott Ananian, Krste Asanović, Michael A.
Page 1 2P13 Week 1. Page 2 Page 3 Page 4 Page 5.
KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association SYSTEM ARCHITECTURE GROUP DEPARTMENT OF COMPUTER.
Tuning Threaded Code with Intel® Parallel Amplifier.
Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,
Productive Performance Tools for Heterogeneous Parallel Computing
For Massively Parallel Computation The Chaotic State of the Art
Pattern Parallel Programming
A Methodology for System-on-a-Programmable-Chip Resources Utilization
Parallel Algorithm Design
Threads and Memory Models Hal Perkins Autumn 2011
Threads and Memory Models Hal Perkins Autumn 2009
Shared Memory Programming
Introduction to Operating Systems
Cilk and Writing Code for Hardware
Presentation transcript:

Parallelism without Concurrency Charles E. Leiserson MIT

Parallelism versus Concurrency Parallel computing A form of computing in which computations are broken into many pieces that are executed simultaneously. Concurrent computing A form of computing in which computations are designed as collections of interacting processes.

Eschew Concurrency  Concurrent computing is hard because interacting processes are inherently nondeterministic and risk concurrency anomalies such as deadlock and races.  Nondeterministic programs are difficult to understand and even more difficult to get right [Lee 2006, Bocchino et al. 2009].  To exploit multicore computers, average application programmers desperately need the field of parallel programming to move away from concurrency* and toward determinism (simplicity). *Concurrent computing is still essential for programming parallel and distributed workloads, as well as for implementing concurrency platforms for parallel programming. This work is best done by experts in concurrency, however, not by average programmers.

Theory Success: Cilk Technology  Cilk encapsulates the nondeterminism of scheduling, allowing average programmers to write deterministic parallel codes using only 3 keywords to indicate logical parallelism.  Cilk has a provably good scheduler that allows programmers to analyze scalability theoretically and in practice using work/span analysis.  Cilk’s serial semantics factors parallelism from the other issues of performance engineering, e.g., Cilk is cache friendly.  The Cilkscreen race detector offers provable guarantees of determinism by certifying the absence of determinacy races.  Cilk’s reducer hyperobjects encapsulate the nondeterminism of updates to nonlocal variables, providing deterministic behavior for parallel updates.  Cilk is embedded in Intel’s C++ compiler, and Intel has released an open-source GCC implementation:

Proposed “Dangerous” Research Agenda Applications still exist that cannot be easily programmed with Cilk’s fork-join programming model except by resorting to concurrency, e.g., software pipelining, task-graph execution, etc. Find a parallel- programming pattern where experts use locks. Encapsulate the pattern with a linguistic construct that provides serial semantics. Provide algorithmically sound runtime support and productivity tools. ——————— Iterate until Done ———————

Potential Recipe for Disaster?  Can a parallel-programming model provide a small set of linguistic mechanisms that completely eliminates the need for concurrency in parallel programming?  Can the programming model be designed to support effective theoretical performance models, especially considering the vagaries of modern hardware models.  Will the various linguistic mechanisms synergize or conflict with each other?  What kind of memory model should the programming model export (e.g., support for priority writes and priority reserve)?  Can theoretically sound productivity tools be created to aid the development of parallel programs?  How large a codebase can the parallel-programming model support?

Scalability  Algorithmic theory teaches us to study how software scales with problem size.  Parallel computing demands that we study how software scales with number of processors.  The real challenge will be how parallel-computing mechanisms scale with legacy codebase size.  Concurrency is a major obstacle. Source lines of codeSoftware artifact < 1,000 Classroom examples 1,000–10,000 Benchmarks 10,000–100,000 Libraries 100,000–1,000,000 Applications > 1,000,000Software platforms