HPC Technology Track: Foundations of Computational Science Lecture 2 Dr. Greg Wettstein, Ph.D. Research Support Group Leader Division of Information Technology.

Slides:



Advertisements
Similar presentations
Vectors, SIMD Extensions and GPUs COMP 4611 Tutorial 11 Nov. 26,
Advertisements

Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.
Lecture 38: Chapter 7: Multiprocessors Today’s topic –Vector processors –GPUs –An example 1.
Parallel computer architecture classification
Introduction: Foundations of Computational Science Dr. Greg Wettstein, Ph.D. Research Support Group Leader Division of Information Technology Adjunct Professor.
Taxanomy of parallel machines. Taxonomy of parallel machines Memory – Shared mem. – Distributed mem. Control – SIMD – MIMD.
Chapter 4 Threads, SMP, and Microkernels Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design.
PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker.
1 Threads, SMP, and Microkernels Chapter 4. 2 Process: Some Info. Motivation for threads! Two fundamental aspects of a “process”: Resource ownership Scheduling.
Tuesday, September 12, 2006 Nothing is impossible for people who don't have to do it themselves. - Weiler.
Chapter 17 Parallel Processing.
State Machines Timing Computer Bus Computer Performance Instruction Set Architectures RISC / CISC Machines.
Multiprocessors CSE 471 Aut 011 Multiprocessors - Flynn’s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) –Conventional uniprocessor.
CISC 879 : Software Support for Multicore Architectures John Cavazos Dept of Computer & Information Sciences University of Delaware
 Parallel Computer Architecture Taylor Hearn, Fabrice Bokanya, Beenish Zafar, Mathew Simon, Tong Chen.
1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.
Flynn’s Taxonomy of Computer Architectures Source: Wikipedia Michael Flynn 1966 CMPS 5433 – Parallel Processing.
CS 470/570:Introduction to Parallel and Distributed Computing.
Advanced Computer Architectures
Introduction to Parallel Processing 3.1 Basic concepts 3.2 Types and levels of parallelism 3.3 Classification of parallel architecture 3.4 Basic parallel.
Computer Architecture Parallel Processing
Reference: / Parallel Programming Paradigm Yeni Herdiyeni Dept of Computer Science, IPB.
KUAS.EE Parallel Computing at a Glance. KUAS.EE History Parallel Computing.
October 26, 2006 Parallel Image Processing Programming and Architecture IST PhD Lunch Seminar Wouter Caarls Quantitative Imaging Group.
Lecture 29 Fall 2006 Lecture 29: Parallel Programming Overview.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
1 Chapter 1 Parallel Machines and Computations (Fundamentals of Parallel Processing) Dr. Ranette Halverson.
HPC Technology Track: Foundations of Computational Science Lecture 1 Dr. Greg Wettstein, Ph.D. Research Support Group Leader Division of Information Technology.
Multicore Systems CET306 Harry R. Erwin University of Sunderland.
Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,
Department of Computer Science University of the West Indies.
Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
Processes and Threads Processes have two characteristics: – Resource ownership - process includes a virtual address space to hold the process image – Scheduling/execution.
Multiprocessing. Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005 Adapted from UC Berkeley "The Beauty and Joy of Computing"
Computer Organization David Monismith CS345 Notes to help with the in class assignment.
Flynn’s Architecture. SISD (single instruction and single data stream) SIMD (single instruction and multiple data streams) MISD (Multiple instructions.
Vector/Array ProcessorsCSCI 4717 – Computer Architecture CSCI 4717/5717 Computer Architecture Topic: Vector/Array Processors Reading: Stallings, Section.
PARALLEL PROCESSOR- TAXONOMY. CH18 Parallel Processing {Multi-processor, Multi-computer} Multiple Processor Organizations Symmetric Multiprocessors Cache.
Parallel Computing.
CS- 492 : Distributed system & Parallel Processing Lecture 7: Sun: 15/5/1435 Foundations of designing parallel algorithms and shared memory models Lecturer/
Outline Why this subject? What is High Performance Computing?
Pipelined and Parallel Computing Partition for 1 Hongtao Du AICIP Research Dec 1, 2005 Part 2.
EKT303/4 Superscalar vs Super-pipelined.
Lecture 3: Computer Architectures
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
Parallel Computing Presented by Justin Reschke
LECTURE #1 INTRODUCTON TO PARALLEL COMPUTING. 1.What is parallel computing? 2.Why we need parallel computing? 3.Why parallel computing is more difficult?
C.E. Goutis V.I.Kelefouras University of Patras Department of Electrical and Computer Engineering VLSI lab Date: 20/11/2015 Compilers for Embedded Systems.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
Processor Level Parallelism 1
Flynn’s Taxonomy Many attempts have been made to come up with a way to categorize computer architectures. Flynn’s Taxonomy has been the most enduring of.
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
Advanced Architectures
CHAPTER SEVEN PARALLEL PROCESSING © Prepared By: Razif Razali.
Parallel computer architecture classification
buses, crossing switch, multistage network.
Parallel Processing - introduction
CS 147 – Parallel Processing
Flynn’s Classification Of Computer Architectures
Team 1 Aakanksha Gupta, Solomon Walker, Guanghong Wang
Multi-Processing in High Performance Computer Architecture:
Pipelining and Vector Processing
Chapter 17 Parallel Processing
Symmetric Multiprocessing (SMP)
buses, crossing switch, multistage network.
Part 2: Parallel Models (I)
Introduction to Microprocessor Programming
Multicore and GPU Programming
Multicore and GPU Programming
Presentation transcript:

HPC Technology Track: Foundations of Computational Science Lecture 2 Dr. Greg Wettstein, Ph.D. Research Support Group Leader Division of Information Technology Adjunct Professor Department of Computer Science North Dakota State University

What is High Performance Computing? Definition: The solution of problems involving high degrees of computational complexity or data analysis which require specialized hardware and software systems.

What is Parallel Computing? Definition: A strategy of decreasing the time to solution of a computational problem by carrying out multiple elements of the computation at the same time.

Does HPC imply Parallel Computing? Typically but not always. HPC solutions may require specialized systems due to memory and/or I/O performance issues. Conversely parallel computing does not necessarily imply high performance computing.

Flynn's Taxonomy: Classification Strategy for Concurrent Execution SISD  Single Instruction, Single Data MISD  Multiple Instruction, Single Data SIMD *  Single Instruction, Multiple Data MIMD *  Multiple Instruction, Multiple Data * = Relevant to HPC

SIMD The Origin of HPC Architectural model at the heart of 'vector processors'. Performance enhancement in machines at origin of HPC:  CDC STAR-100 and Cray-1 Utility predicated on fact that mathematical operations on vectors or vector spaces are at the heart of linear algebra.

Vector Processing Diagram Vector Length = 8 'words' Vector elements Parallel mathematical operations +,-,*,/

Current SIMD Examples Embedded in modern x86 and x86_64 architectures.  primarily focus on graphics/signal processing  MMX, PNI, SSE2-4, AVX Foundation for current trend in 'GPGPU computing'  NVIDIA Tesla architecture Component of Larrabee architecture.

SSE Implementation Vector elements Parallel operations 100+ (SSE4) 128 bit XMM register Stride Length

MIMD Multiple Instruction Multiple Data Characterized by multiple execution threads operating on separate data elements. Threads may operate in shared or disjoint (distributed) memory configurations. Implementation example  SMP (Symmetric Multi-Processing)

SPMD The Basis for Modern HPC Defined as a single process executing a common program at different points. Different from SIMD in that execution is not in lockstep format. Common implementations:  shared memory: OpenMP Pthreads  distributed memory MPI

Characteristics of MD Models MIMD/SPMD requires active participation by programmer to implement 'orthogonalization'. SIMD requires active participation by the compiler with consideration by the programmer to support orthogonalization. Orthogonalization defn: The isolation of a problem into discrete elements capable of being independently resolved.

The Real World - A Continuum Practical programs do not exhibit strict model partitioning. More pragmatic model is to consider 'dimensions' of parallelism available to a program. Currently a total of four dimensions of parallelism are exploitable.

Dimensions of Parallelism First dimension.  Standard sequential programming with processor supplied ILP (Instruction Level Parallelism)  Referred to as 'free' or 'invisible' parallelism. Second dimension.  SIMD or OpenMP loop parallelism  characterized by isolation of the problem into a single system image  primarily supported by programming language or compiler

Dimensions of Parallelism - cont. Third dimension – Two subtypes.  use of MPI to partition problem into orthogonal elements partitioning is frequently implemented on multiple system images  MIMD threading on a single system image separate threads dispatched to handle separate tasks which can execute asynchronously Common HPC example is to 'thread' computation and Input/Output (I/O)

Dimensions of Parallelism - cont. Fourth dimension  partitioning of the problem into orthogonal elements which can be dispatched to a heterogeneous instruction architecture.  examples: GPGPU/CUDA PowerXcell SPU FPGA

Depth of Parallelism Measure of the complexity of parallelism implemented. Simplest metric is the count of the number of programmer implemented dimensions of parallelism on a single system image. Example  MPI implementation with SIMD loop vectorization on each node  Parallelism depth is two

Parallelism Analysis Example Process based MIMD application.  Depth = 1 MPI simulation with OpenMP loop vectorization.  Depth = 2 MPI partitioning with CUDA PTree offload and SIMD loop vectorization.  Depth = 3

Escalation of Complexity Dimension Architectural decisions must be based on cost benefit analysis of performance returns. Depth 1 N Least Most 14

Exercise Verify you have changeset which adds experimental code for SSE/SIMD based boolean PTree operators. Study the class methods implementing the AND and OR operators. Review and understood how vector and stride length effect the number of times a loop needs to be executed.

goto skills_lecture1;