CISC 879 : Software Support for Multicore Architectures John Cavazos Dept of Computer & Information Sciences University of Delaware www.cis.udel.edu/~cavazos/cisc879.

Slides:



Advertisements
Similar presentations
Algorithm Design Methodologies Divide & Conquer Dynamic Programming Backtracking.
Advertisements

Parallel Programming Patterns Eun-Gyu Kim June 10, 2004.
Lecture 3: Parallel Algorithm Design
Parallel Programming Patterns Ralph Johnson. Why patterns? Patterns for Parallel Programming The road ahead.
Starting Parallel Algorithm Design David Monismith Based on notes from Introduction to Parallel Programming 2 nd Edition by Grama, Gupta, Karypis, and.
Partitioning and Divide-and-Conquer Strategies ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, Jan 23, 2013.
Parallel Processing (CS453). How does one decompose a task into various subtasks? While there is no single recipe that works for all problems, we present.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 Structuring Parallel Algorithms.
CISC 879 : Software Support for Multicore Architectures John Cavazos Dept of Computer & Information Sciences University of Delaware
Parallel Merging Advanced Algorithms & Data Structures Lecture Theme 15 Prof. Dr. Th. Ottmann Summer Semester 2006.
CISC 879 : Software Support for Multicore Architectures John Cavazos Dept of Computer & Information Sciences University of Delaware
High Performance Computing 1 Parallelization Strategies and Load Balancing Some material borrowed from lectures of J. Demmel, UC Berkeley.
CISC 879 : Software Support for Multicore Architectures John Cavazos Dept of Computer & Information Sciences University of Delaware
A Pattern Language for Parallel Programming Beverly Sanders University of Florida.
Unit 1. Sorting and Divide and Conquer. Lecture 1 Introduction to Algorithm and Sorting.
Parallel Processing (CS526) Spring 2012(Week 5).  There are no rules, only intuition, experience and imagination!  We consider design techniques, particularly.
Instructor: Dr. Sahar Shabanah Fall Lectures ST, 9:30 pm-11:00 pm Text book: M. T. Goodrich and R. Tamassia, “Data Structures and Algorithms in.
The Vector-Thread Architecture Ronny Krashinsky, Chris Batten, Krste Asanović Computer Architecture Group MIT Laboratory for Computer Science
Introduction to Parallel Programming MapReduce Except where otherwise noted all portions of this work are Copyright (c) 2007 Google and are licensed under.
Rechen- und Kommunikationszentrum (RZ) Parallelization at a Glance Christian Terboven / Aachen, Germany Stand: Version 2.3.
Algorithm structure Jakub Yaghob.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Designing and Evaluating Parallel Programs Anda Iamnitchi Federated Distributed Systems Fall 2006 Textbook (on line): Designing and Building Parallel Programs.
Computational issues in Carbon nanotube simulation Ashok Srinivasan Department of Computer Science Florida State University.
Software Pipelining for Stream Programs on Resource Constrained Multi-core Architectures IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEM 2012 Authors:
Dynamic Programming. Well known algorithm design techniques:. –Divide-and-conquer algorithms Another strategy for designing algorithms is dynamic programming.
Introduction to Parallel Rendering Jian Huang, CS 594, Spring 2002.
LATA: A Latency and Throughput- Aware Packet Processing System Author: Jilong Kuang and Laxmi Bhuyan Publisher: DAC 2010 Presenter: Chun-Sheng Hsueh Date:
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
CS 484 Designing Parallel Algorithms Designing a parallel algorithm is not easy. There is no recipe or magical ingredient Except creativity We can benefit.
Data Structures and Algorithms in Parallel Computing Lecture 2.
Lecture 3 : Performance of Parallel Programs Courtesy : MIT Prof. Amarasinghe and Dr. Rabbah’s course note.
Spatiotemporal Saliency Map of a Video Sequence in FPGA hardware David Boland Acknowledgements: Professor Peter Cheung Mr Yang Liu.
Finding concurrency Jakub Yaghob. Finding concurrency design space Starting point for design of a parallel solution Analysis The patterns will help identify.
Data Structures and Algorithms in Parallel Computing Lecture 7.
CDP Tutorial 3 Basics of Parallel Algorithm Design uses some of the slides for chapters 3 and 5 accompanying “Introduction to Parallel Computing”, Addison.
Computer Sciences Department1.  Property 1: each node can have up to two successor nodes (children)  The predecessor node of a node is called its.
© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007 ECE 498AL, University of Illinois, Urbana-Champaign 1 ECE 498AL Lecture 15: Basic Parallel Programming Concepts.
Computer Organization CS224 Fall 2012 Lesson 52. Introduction  Goal: connecting multiple computers to get higher performance l Multiprocessors l Scalability,
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 ECE 498AL Spring 2010 Programming Massively Parallel.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
A Pattern Language for Parallel Programming Beverly Sanders University of Florida.
Performance Analysis of Divide and Conquer Algorithms for the WHT Jeremy Johnson Mihai Furis, Pawel Hitczenko, Hung-Jen Huang Dept. of Computer Science.
Uses some of the slides for chapters 3 and 5 accompanying “Introduction to Parallel Computing”, Addison Wesley, 2003.
COMP7330/7336 Advanced Parallel and Distributed Computing Task Partitioning Dr. Xiao Qin Auburn University
COMP7330/7336 Advanced Parallel and Distributed Computing Task Partitioning Dynamic Mapping Dr. Xiao Qin Auburn University
Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,
Department of Computer Science, Johns Hopkins University Lecture 7 Finding Concurrency EN /420 Instructor: Randal Burns 26 February 2014.
COMP8330/7330/7336 Advanced Parallel and Distributed Computing Decomposition and Parallel Tasks (cont.) Dr. Xiao Qin Auburn University
Auburn University
Parallel Patterns.
Auburn University
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Exploratory Decomposition Dr. Xiao Qin Auburn.
Unit 1. Sorting and Divide and Conquer
Parallel Programming Patterns
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Mapping Techniques Dr. Xiao Qin Auburn University.
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Data Partition Dr. Xiao Qin Auburn University.
Task Scheduling for Multicore CPUs and NUMA Systems
CS 584 Lecture 3 How is the assignment going?.
Parallel Algorithm Design
Mattan Erez The University of Texas at Austin
MILEPOST GCC Lecture 4 John Cavazos
Mattan Erez The University of Texas at Austin
Mattan Erez The University of Texas at Austin
CS 584.
COMP60621 Fundamentals of Parallel and Distributed Systems
Mattan Erez The University of Texas at Austin
COMP60611 Fundamentals of Parallel and Distributed Systems
Algorithm Course Algorithms Lecture 3 Sorting Algorithm-1
Mattan Erez The University of Texas at Austin
Presentation transcript:

CISC 879 : Software Support for Multicore Architectures John Cavazos Dept of Computer & Information Sciences University of Delaware Lecture 6 Patterns for Parallel Programming

CISC 879 : Software Support for Multicore Architectures Lecture 6: Overview Design Patterns for Parallel Programs Finding Concurrency Algorithmic Structure Supporting Structures Implementation Mechanisms

CISC 879 : Software Support for Multicore Architectures Algorithm Structure Patterns Given a set of concurrent task, what’s next? Important questions based on target platform: How many cores will your algorithm support? Consider the order of magnitude How expensive is sharing? Architectures have different communication costs Is design constrained to hardware? Software typically outlives hardware Flexible to adapt to different architecture Does algorithm map well to programming environment? Consider language/library available

CISC 879 : Software Support for Multicore Architectures How to Organize Concurrency Major organizing principle implied by concurrency Organization by task Task Parallelism Divide and Conquer Organization by data decomposition Geometric Decomposition Recursive Data Organization by flow of data Pipeline Event-Based coordination

CISC 879 : Software Support for Multicore Architectures Organize by Tasks Organize By Tasks LinearRecursive Divide and Conquer Task Parallelism

CISC 879 : Software Support for Multicore Architectures Task Parallelism Tasks are linear (no structure or hierarchy) Can be completely independent Embarrassingly parallel Can have some dependencies Common factors Tasks are associated with loop iterations All tasks are known at beginning All tasks must complete However, there are exeptions to all of these

CISC 879 : Software Support for Multicore Architectures Task Parallelism (Examples) Ray Tracing Each ray is separate and independent Molecular Dynamics Vibrational, rotational, nonbonded forces are independent for each atom Branch-and-bound computations Repeatedly divide into smaller solution spaces until solution found Tasks weakly dependent through queue

CISC 879 : Software Support for Multicore Architectures Task Parallelism Three Key Elements Is Task definition adequate? Number of tasks and their computation Schedule Load Balancing Dependencies Removable Separable by replication

CISC 879 : Software Support for Multicore Architectures Schedule Slide Source: Introduction to Parallel Computing, Grama et al. Not all schedules of task equal in performance.

CISC 879 : Software Support for Multicore Architectures Divide and Conquer Recursive Program Structure Each subproblem generated by split becomes a task Subproblems may not be uniform Requires load balancing Slide Source: Dr. Rabbah, IBM, MIT Course IAP 2007

CISC 879 : Software Support for Multicore Architectures Organize by Data Organize By Data Decomposition LinearRecursive Data Geometric Decomposition

CISC 879 : Software Support for Multicore Architectures Organize by Data Operations on core data structure Geometric Decomposition Recursive Data

CISC 879 : Software Support for Multicore Architectures Geometric Deomposition Arrays and other linear structures Divide into contiguous substructures Example: Matrix multiply Data-centric algorithm and linear data structure (array) implies geometric decomposition

CISC 879 : Software Support for Multicore Architectures Recursive Data Lists, trees, and graphs Structures where you would use divide-and-conquer May seem that can only move sequentially through data structure But, there are ways to expose concurrency

CISC 879 : Software Support for Multicore Architectures Recursive Data Example Find the Root: Given a forest of directed trees find the root of each node Parallel approach: For each node, find its successor’s successor Repeat until no changes O(log n) vs O(n) Slide Source: Dr. Rabbah, IBM, MIT Course IAP 2007

CISC 879 : Software Support for Multicore Architectures Organize by Flow of Data Organize By Flow of Data RegularIrregular Event-Based Coordination Pipeline

CISC 879 : Software Support for Multicore Architectures Organize by Flow of Data Computation can be viewed as a flow of data going through a sequence of stages Pipeline: one-way predictable communication Event-based Coordination: unrestricted unpredictable communication

CISC 879 : Software Support for Multicore Architectures Pipeline performance Concurrency limited by pipeline depth Balance computation and communication (architecture dependent) Stages should be equally computationally intensive Slowest stage creates bottleneck Combine lightly loaded stages or decompose heavily- loaded stages Time to fill and drain pipe should be small