Parallel programs Inf-2202 Concurrent and Data-intensive Programming Fall 2016 Lars Ailo Bongo

Slides:



Advertisements
Similar presentations
Distributed Systems CS
Advertisements

ECE669 L3: Design Issues February 5, 2004 ECE 669 Parallel Computer Architecture Lecture 3 Design Issues.
An Introduction To PARALLEL PROGRAMMING Ing. Andrea Marongiu
Reference: Message Passing Fundamentals.
Introduction CS 524 – High-Performance Computing.
ECE669 L4: Parallel Applications February 10, 2004 ECE 669 Parallel Computer Architecture Lecture 4 Parallel Applications.
ECE669 L15: Mid-term Review March 25, 2004 ECE 669 Parallel Computer Architecture Lecture 15 Mid-term Review.
Fundamental Design Issues for Parallel Architecture Todd C. Mowry CS 495 January 22, 2002.
1  1998 Morgan Kaufmann Publishers Chapter 9 Multiprocessors.
ECE669 L5: Grid Computations February 12, 2004 ECE 669 Parallel Computer Architecture Lecture 5 Grid Computations.
3.5 Interprocess Communication Many operating systems provide mechanisms for interprocess communication (IPC) –Processes must communicate with one another.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Parallel Programming Models and Paradigms
3.5 Interprocess Communication
Implications for Programming Models Todd C. Mowry CS 495 September 12, 2002.
Course Outline DayContents Day 1 Introduction Motivation, definitions, properties of embedded systems, outline of the current course How to specify embedded.
18-447: Computer Architecture Lecture 30B: Multiprocessors Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/22/2013.
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen 1 Concurrency in Programming Languages Matthew J. Sottile Timothy G. Mattson Craig.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.
Architectural Support for Fine-Grained Parallelism on Multi-core Architectures Sanjeev Kumar, Corporate Technology Group, Intel Corporation Christopher.
1 Multiprocessor and Real-Time Scheduling Chapter 10 Real-Time scheduling will be covered in SYSC3303.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 Basic Parallel Programming Concepts Computational.
Summary Background –Why do we need parallel processing? Moore’s law. Applications. Introduction in algorithms and applications –Methodology to develop.
Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
CS 484 Designing Parallel Algorithms Designing a parallel algorithm is not easy. There is no recipe or magical ingredient Except creativity We can benefit.
Lecture 3 : Performance of Parallel Programs Courtesy : MIT Prof. Amarasinghe and Dr. Rabbah’s course note.
CS- 492 : Distributed system & Parallel Processing Lecture 7: Sun: 15/5/1435 Foundations of designing parallel algorithms and shared memory models Lecturer/
Threads. Readings r Silberschatz et al : Chapter 4.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
Parallel Computing Presented by Justin Reschke
Concurrency and Performance Based on slides by Henri Casanova.
CDA-5155 Computer Architecture Principles Fall 2000 Multiprocessor Architectures.
Introduction Goal: connecting multiple computers to get higher performance – Multiprocessors – Scalability, availability, power efficiency Job-level (process-level)
740: Computer Architecture Memory Consistency Prof. Onur Mutlu Carnegie Mellon University.
Parallel Programming Models EECC 756 David D. McGann 18 May, 1999.
Lecture 5. Example for periority The average waiting time : = 41/5= 8.2.
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Fundamental Design Issues Dr. Xiao Qin Auburn.
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
Computer Architecture & Operations I
COMPUTER GRAPHICS CHAPTER 38 CS 482 – Fall 2017 GRAPHICS HARDWARE
18-447: Computer Architecture Lecture 30B: Multiprocessors
CMSC 611: Advanced Computer Architecture
Distributed Shared Memory
CS5102 High Performance Computer Systems Thread-Level Parallelism
Introduction inf-2202 Concurrent and Data-intensive Programming
From Algorithm to System to Cloud Computing
Parallel Programming By J. H. Wang May 2, 2017.
Parallel Programming Patterns
Morgan Kaufmann Publishers
Parallel Algorithm Design
Introduction to Parallelism.
Morgan Kaufmann Publishers
Parallel Programming in C with MPI and OpenMP
EE 193: Parallel Computing
Summary Background Introduction in algorithms and applications
Operating Systems Bina Ramamurthy CSE421 11/27/2018 B.Ramamurthy.
Parallelization of An Example Program
CS 584.
Introduction to Operating Systems
Fast Communication and User Level Parallelism
Threads Chapter 4.
Distributed Systems CS
Multiprocessor and Real-Time Scheduling
Prof. Leonardo Mostarda University of Camerino
EE 4xx: Computer Architecture and Performance Programming
Chapter 4 Multiprocessors
Mattan Erez The University of Texas at Austin
Summary Inf-2202 Concurrent and Data-Intensive Programming Fall 2016
Parallel Programming in C with MPI and OpenMP
Presentation transcript:

Parallel programs Inf-2202 Concurrent and Data-intensive Programming Fall 2016 Lars Ailo Bongo

Course topics Parallel programming –The parallelization process –Optimization of parallel programs Performance analysis Data-intensive computing

Parallel programs Supercomputing –Scientific applications Parallel programming was hard Parallel architectures were expensive –Still important! Data intensive computing –Will return to this topic Server applications –Databases, Web servers, App servers, etc Desktop applications –Games, image processing, etc Mobile phone applications –Multimedia, sensor based, etc GPU and hardware accelerator applications

Outline Parallel architectures Fundamental design issues Case studies Parallelization process Examples

Parallel architectures A parallel computer is “a collection of processing elements that communicate and cooperate to solve large problems fast” (Almasi and Gottlieb, 1989) –Conventional computer architecture –+ communication among processes –+ coordination among processes

Communication architecture Hardware/ software boundary? User/ system boundary? Defines: –Basic communication operations –Organizational structures to realize these operations

Parallel architectures Shared address space Message passing Data parallel processing –Bulk synchronous processing (Valiant 1990) –Google’s Pregel (Malewicz, et al., 2010) –MapReduce (Dean & Ghemawat, 2010) and Spark (Zaharia et al, 2012) Dataflow architectures (wikipedia1, wikipedia2) –VHDL, Verilog, Linda, Yahoo Pipes(?), Galaxy (?)

Outline Parallel architectures Fundamental design issues Case studies Parallelization process Examples

Fundamental design issues Communication abstraction Programming model requirements Naming Ordering Communication and replication Performance

Communication abstractions Well defined operations Suitable for optimization Communication abstractions in Pthreds? Go?

Programming model One or more threads of control operating on data What data can be named by which threads What operations can be performed on the named data What ordering exists among those operations Programming model for a uniprocessor? Pthreads programming model? Go programming model Why need for explicit synchronization primitives?

Naming Critical at each level of the architecture

Operations Operations that can be performed on the data Pthreads? Go? More exotic?

Ordering Important at all layers in the architecture Performance tricks If implicit ordering is not enough; need synchronization: –Mutual exclusion –Events / condition variables Point-to-point Global –Channels?

Communication and replication Related to each other Caching IPC Binding of data: –Write –Read –Data transfer –Data copy –IPC

Performance Data types, addressing modes, and communication abstractions specifies naming, ordering and synchronization for shared objects Performance characteristics specifies how they are actually used Metrics –Latency: the time for an operation –Bandwidth: rate at which operations are performed –Cost: impact on execution time

Outline Parallel architectures Fundamental design issues Case studies Parallelization process Examples

The Basic Local Alignment Search Tool (BLAST) BLAST finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. Popular to use Popular to parallelize

Nearest neighbor equation solver Example from chapter 2.3 in Parallel Computer Architecture: A Hardware/Software Approach. David Culler, J.P. Singh, Anoop Gupta. Morgan Kaufmann Common matrix based computation Well known parallel benchmark (SOR) Exercise

Deduplication Mandatory assignment 2

Outline Parallel architectures Fundamental design issues Case studies Parallelization process Examples

Parallelization process Goals: –Good performance –Efficient resource utilization –Low developer effort May be done at any layer

Parallelization process (2) Task: piece of work Process/thread: entity that performs the work Processor/core: physical processor cores

Parallelization process (3) 1.Decomposition of the computation into tasks 2.Assignment of tasks to processes 3.Orchestration of necessary data access, communication, and synchronization among processes 4.Mapping of processes to cores

Steps in the parallelization process

Decomposition Split computation into a collection of tasks Algorithmic Task granularity limits parallelism  Amdahl’s law

Assignment Algorithmic Goal: load balancing –All processes should do equal amount of work –Important for performance Goal: reduce communication volume –Send minimum amount of data Two types: –Static –Dynamic

Orchestration Specific to computer architecture, programming model, and programming language Goals: –Reduce communication cost –Reduce synchronization cost –Locality of data –Efficient scheduling –Reduce overhead

Mapping Specific to system or programming environment –Parallel system resource allocator –Queuing systems –OS scheduler

Goals of parallelization process StepArchitecture dependent? Major performance goals DecompositionMostly no Expose enough concurrency but not too much AssignmentMostly no Balance workload Reduce communication volume OrchestrationYes Reduce noninherent communication via data locality Reduce communication and synchronization cost as seen by the processor Reduce serialization to shared resources Schedule tasks to satisfy dependencies early MappingYes Put related threads on the same core if necessary Exploit locality in chip and network topology

Summary Fundamental design issues for parallel systems How to write a parallel program Examples