CISC 879 : Software Support for Multicore Architectures John Cavazos Dept of Computer & Information Sciences University of Delaware www.cis.udel.edu/~cavazos/cisc879.

Slides:



Advertisements
Similar presentations
Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.
Advertisements

Lecture 38: Chapter 7: Multiprocessors Today’s topic –Vector processors –GPUs –An example 1.
Parallel computer architecture classification
CISC 879 : Software Support for Multicore Architectures John Cavazos Dept of Computer & Information Sciences University of Delaware
Taxanomy of parallel machines. Taxonomy of parallel machines Memory – Shared mem. – Distributed mem. Control – SIMD – MIMD.
PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker.
MULTICORE, PARALLELISM, AND MULTITHREADING By: Eric Boren, Charles Noneman, and Kristen Janick.
Tuesday, September 12, 2006 Nothing is impossible for people who don't have to do it themselves. - Weiler.
Chapter 17 Parallel Processing.
 Parallel Computer Architecture Taylor Hearn, Fabrice Bokanya, Beenish Zafar, Mathew Simon, Tong Chen.
1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.
CMSC 611: Advanced Computer Architecture Parallel Computation Most slides adapted from David Patterson. Some from Mohomed Younis.
Flynn’s Taxonomy of Computer Architectures Source: Wikipedia Michael Flynn 1966 CMPS 5433 – Parallel Processing.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
CS 470/570:Introduction to Parallel and Distributed Computing.
Reference: / Parallel Programming Paradigm Yeni Herdiyeni Dept of Computer Science, IPB.
KUAS.EE Parallel Computing at a Glance. KUAS.EE History Parallel Computing.
ME964 High Performance Computing for Engineering Applications The Internet is a great way to get on the net. - Senator Bob Dole. © Dan Negrut, 2011 ME964.
HPC Technology Track: Foundations of Computational Science Lecture 2 Dr. Greg Wettstein, Ph.D. Research Support Group Leader Division of Information Technology.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
CS668- Lecture 2 - Sept. 30 Today’s topics Parallel Architectures (Chapter 2) Memory Hierarchy Busses and Switched Networks Interconnection Network Topologies.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
Multi-core architectures. Single-core computer Single-core CPU chip.
Multi-Core Architectures
1 Multi-core processors 12/1/09. 2 Multiprocessors inside a single chip It is now possible to implement multiple processors (cores) inside a single chip.
Flynn’s Taxonomy SISD: Although instruction execution may be pipelined, computers in this category can decode only a single instruction in unit time SIMD:
GPU in HPC Scott A. Friedman ATS Research Computing Technologies.
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Multiprocessing. Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005 Adapted from UC Berkeley "The Beauty and Joy of Computing"
Computer Organization David Monismith CS345 Notes to help with the in class assignment.
VTU – IISc Workshop Compiler, Architecture and HPC Research in Heterogeneous Multi-Core Era R. Govindarajan CSA & SERC, IISc
GPU Architecture and Programming
ICAL GPU 架構中所提供分散式運算 之功能與限制. 11/17/09ICAL2 Outline Parallel computing with GPU NVIDIA CUDA SVD matrix computation Conclusion.
Parallel Computing.
Multi-core processors. 2 Processor development till 2004 Out-of-order Instruction scheduling Out-of-order Instruction scheduling.
Computer Architecture And Organization UNIT-II Flynn’s Classification Of Computer Architectures.
EKT303/4 Superscalar vs Super-pipelined.
Lecture 3: Computer Architectures
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 May 2, 2006 Session 29.
Computer Architecture Lecture 24 Parallel Processing Ralph Grishman November 2015 NYU.
Parallel Computing Presented by Justin Reschke
A Survey of the Current State of the Art in SIMD: Or, How much wood could a woodchuck chuck if a woodchuck could chuck n pieces of wood in parallel? Wojtek.
Processor Performance & Parallelism Yashwant Malaiya Colorado State University With some PH stuff.
CS203 – Advanced Computer Architecture Performance Evaluation.
Processor Level Parallelism 1
Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.
CS203 – Advanced Computer Architecture
Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Parallel Hardware Dr. Xiao Qin Auburn.
Flynn’s Taxonomy Many attempts have been made to come up with a way to categorize computer architectures. Flynn’s Taxonomy has been the most enduring of.
Parallel Programming pt.1
CHAPTER SEVEN PARALLEL PROCESSING © Prepared By: Razif Razali.
CMSC 611: Advanced Computer Architecture
Multi-core processors
Flynn’s Classification Of Computer Architectures
Constructing a system with multiple computers or processors
Multi-Processing in High Performance Computer Architecture:
MILEPOST GCC Lecture 4 John Cavazos
MIMD Multiple instruction, multiple data
Different Architectures
Chapter 17 Parallel Processing
Symmetric Multiprocessing (SMP)
Constructing a system with multiple computers or processors
Constructing a system with multiple computers or processors
Constructing a system with multiple computers or processors
EE 4xx: Computer Architecture and Performance Programming
Chapter 4 Multiprocessors
Multicore and GPU Programming
Multicore and GPU Programming
Presentation transcript:

CISC 879 : Software Support for Multicore Architectures John Cavazos Dept of Computer & Information Sciences University of Delaware Lecture 2 A General Discussion on Parallelism

CISC 879 : Software Support for Multicore Architectures Lecture 2: Overview Flynn’s Taxonomy of Architectures Types of Parallelism Parallel Programming Models Commercial Multicore Architectures

CISC 879 : Software Support for Multicore Architectures Flynn’s Taxonomy of Arch. SISD - Single Instruction/Single Data SIMD - Single Instruction/Multiple Data MISD - Multiple Instruction/Single Data MIMD - Multiple Instruction/Multiple Data

CISC 879 : Software Support for Multicore Architectures Single Instruction/Single Data The typical machine you’re used to (before multicores). Slide Source: Wikipedia, Flynn’s Taxonomy

CISC 879 : Software Support for Multicore Architectures Single Instruction/Multiple Data Processors that execute same instruction on multiple pieces of data. Slide Source: Wikipedia, Flynn’s Taxonomy

CISC 879 : Software Support for Multicore Architectures Single Instruction/Multiple Data Each core executes same instruction simultaneously Vector-style of programming Natural for graphics and scientific computing Good choice for massively multicore

CISC 879 : Software Support for Multicore Architectures SISD versus SIMD SIMD very often requires compiler intervention. Slide Source: ars technica, Peakstream article

CISC 879 : Software Support for Multicore Architectures Multiple Instruction/Single Data Only Theoretical Machine. None ever implemented. Slide Source: Wikipedia, Flynn’s Taxonomy

CISC 879 : Software Support for Multicore Architectures Multiple Instruction/Multiple Data Many mainstream multicores fall into this category. Slide Source: Wikipedia, Flynn’s Taxonomy

CISC 879 : Software Support for Multicore Architectures Multiple Instruction/Multiple Data Each core works independently, simultaneously executing different instructions on different data Unique upper levels of cache and may have lower level of shared cache or coherent cache Cores can have SIMD-extensions (e.g., Cell B.E.) Programmed with a variety of models (OpenMP, MPI, pthreads, etc.

CISC 879 : Software Support for Multicore Architectures Lecture 2: Overview Flynn’s Taxonomy of Architecture Types of Parallelism Parallel Programming Models Commercial Multicore Architectures

CISC 879 : Software Support for Multicore Architectures Types of Parallelism Instructions: Slide Source: S. Amarasinghe, MIT 6189 IAP 2007

CISC 879 : Software Support for Multicore Architectures Pipelining Corresponds to SISD architecture. Slide Source: S. Amarasinghe, MIT 6189 IAP 2007

CISC 879 : Software Support for Multicore Architectures Instruction-Level Parallelism Dual instruction issue superscalar model. Again, corresponds to SISD architecture. Slide Source: S. Amarasinghe, MIT 6189 IAP 2007

CISC 879 : Software Support for Multicore Architectures Data-Level Parallelism Data Stream or Array Elements What architecture model from Flynn’s Taxonomy does this correspond to? Slide Source: Arch. of a Real-time Ray-Tracer, Intel

CISC 879 : Software Support for Multicore Architectures Data-Level Parallelism Data Stream or Array Elements Corresponds to SIMD architecture. Slide Source: Arch. of a Real-time Ray-Tracer, Intel

CISC 879 : Software Support for Multicore Architectures Data-Level Parallelism One operation (e.g., +) produces multiple results. X, Y, and result are arrays. Slide Source: Klimovitski & Macri, Intel

CISC 879 : Software Support for Multicore Architectures Thread-Level Parallelism P 1 P 2 P 3 P 4 P 5 P 6 Program partitioned into four threads. Multicore with 6 cores. Four threads each executed on separate cores. What architecture from Flynn’s Taxonomy does this correspond to? Slide Source: SciDAC Review, Threadstorm pic.

CISC 879 : Software Support for Multicore Architectures Thread-Level Parallelism P 1 P 2 P 3 P 4 P 5 P 6 Program partitioned into four threads. Multicore with 6 cores. Four threads each executed on separate cores. Corresponds to MIMD architecture. Slide Source: SciDAC Review, Threadstorm pic.

CISC 879 : Software Support for Multicore Architectures Lecture 2: Overview Flynn’s Taxonomy of Architecture Types of Parallelism Parallel Programming Models Commercial Multicore Architectures

CISC 879 : Software Support for Multicore Architectures Multicore Programming Models Message Passing Interface (MPI) OpenMP Threads Pthreads Cell threads Parallel Libraries Intel’s Thread Building Blocks (TBB) Microsoft’s Task Parallel Library SWARM (GTech) Charm++ (UIUC) STAPL (Texas A&M)

CISC 879 : Software Support for Multicore Architectures GPU Programming Models CUDA (Nvidia) C/C++ extensions Brook+ (AMD/ATI) AMD-enhanced implementation of Brook Brook (Stanford) Language extensions RapidMind platform Library and language extensions Works on multicores Commercialization of Sh (Waterloo)

CISC 879 : Software Support for Multicore Architectures Lecture 2: Overview Flynn’s Taxonomy of Architecture Types of Parallelism Parallel Programming Models Commercial Multicore Architectures

CISC 879 : Software Support for Multicore Architectures Generalized Multicore Slide Source: Michael McCool, Rapid Mind, SuperComputing, 2007

CISC 879 : Software Support for Multicore Architectures Cell B.E. Architecture Slide Source: Michael McCool, Rapid Mind, SuperComputing, 2007

CISC 879 : Software Support for Multicore Architectures NVIDIA GPU Architecture G80 Slide Source: Michael McCool, Rapid Mind, SuperComputing, 2007

CISC 879 : Software Support for Multicore Architectures Commercial Multcores Slide Source: Dave Patterson, Manycore and Multicore Computing Workshop, 2007

CISC 879 : Software Support for Multicore Architectures Intel Teraflops Video

CISC 879 : Software Support for Multicore Architectures Next Time More Parallel Programming Models Cell (Playstation 3) Architecture and Programming Models Read Chapters 1-3 and 5-7 of