Yulia Newton CS 147, Fall 2009 SJSU. What is it? “Parallel processing is the ability of an entity to carry out multiple operations or tasks simultaneously.

Slides:



Advertisements
Similar presentations
© 2009 Fakultas Teknologi Informasi Universitas Budi Luhur Jl. Ciledug Raya Petukangan Utara Jakarta Selatan Website:
Advertisements

Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.
Computer Organization and Architecture
CSE431 Chapter 7A.1Irwin, PSU, 2008 CSE 431 Computer Architecture Fall 2008 Chapter 7A: Intro to Multiprocessor Systems Mary Jane Irwin (
Distributed Systems CS
Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
Lecture 6: Multicore Systems
The University of Adelaide, School of Computer Science
Computer Architecture and the Fetch-Execute Cycle Parallel Processor Systems.
Vector Processing. Vector Processors Combine vector operands (inputs) element by element to produce an output vector. Typical array-oriented operations.
Computer Architecture and Data Manipulation Chapter 3.
GPUs. An enlarging peak performance advantage: –Calculation: 1 TFLOPS vs. 100 GFLOPS –Memory Bandwidth: GB/s vs GB/s –GPU in every PC and.

Chapter 17 Parallel Processing.
1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.
Introduction to Parallel Processing Ch. 12, Pg
Advanced Computer Architectures
Alternative Parallel Processing Approaches Jonathan Sagabaen.
Lecture 29 Fall 2006 Lecture 29: Parallel Programming Overview.
Basics and Architectures
1 Chapter 1 Parallel Machines and Computations (Fundamentals of Parallel Processing) Dr. Ranette Halverson.
Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,
Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
Multiprocessing. Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005 Adapted from UC Berkeley "The Beauty and Joy of Computing"
SJSU SPRING 2011 PARALLEL COMPUTING Parallel Computing CS 147: Computer Architecture Instructor: Professor Sin-Min Lee Spring 2011 By: Alice Cotti.
April 26, CSE8380 Parallel and Distributed Processing Presentation Hong Yue Department of Computer Science & Engineering Southern Methodist University.
Compiled by Maria Ramila Jimenez
Parallel Processing Steve Terpe CS 147. Overview What is Parallel Processing What is Parallel Processing Parallel Processing in Nature Parallel Processing.
Chapter 9: Alternative Architectures In this course, we have concentrated on single processor systems But there are many other breeds of architectures:
Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.
Super computers Parallel Processing By Lecturer: Aisha Dawood.
Vector/Array ProcessorsCSCI 4717 – Computer Architecture CSCI 4717/5717 Computer Architecture Topic: Vector/Array Processors Reading: Stallings, Section.
Copyright © 2011 Curt Hill MIMD Multiple Instructions Multiple Data.
Orange Coast College Business Division Computer Science Department CS 116- Computer Architecture Multiprocessors.
Classic Model of Parallel Processing
PARALLEL PROCESSOR- TAXONOMY. CH18 Parallel Processing {Multi-processor, Multi-computer} Multiple Processor Organizations Symmetric Multiprocessors Cache.
Pipelining and Parallelism Mark Staveley
Outline Why this subject? What is High Performance Computing?
Computer performance issues* Pipelines, Parallelism. Process and Threads.
Parallel Computing Erik Robbins. Limits on single-processor performance  Over time, computers have become better and faster, but there are constraints.
Lecture 3: Computer Architectures
Computer Organization CS224 Fall 2012 Lesson 52. Introduction  Goal: connecting multiple computers to get higher performance l Multiprocessors l Scalability,
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 May 2, 2006 Session 29.
3/12/2013Computer Engg, IIT(BHU)1 CONCEPTS-1. Pipelining Pipelining is used to increase the speed of processing It uses temporal parallelism In pipelining,
LECTURE #1 INTRODUCTON TO PARALLEL COMPUTING. 1.What is parallel computing? 2.Why we need parallel computing? 3.Why parallel computing is more difficult?
Autumn 2006CSE P548 - Dataflow Machines1 Von Neumann Execution Model Fetch: send PC to memory transfer instruction from memory to CPU increment PC Decode.
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
Classification of parallel computers Limitations of parallel processing.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
Alternative Parallel Processing Approaches
These slides are based on the book:
CS203 – Advanced Computer Architecture
Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Parallel Hardware Dr. Xiao Qin Auburn.
A Level Computing – a2 Component 2 1A, 1B, 1C, 1D, 1E.
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
Overview Parallel Processing Pipelining
Advanced Architectures
CHAPTER SEVEN PARALLEL PROCESSING © Prepared By: Razif Razali.
William Stallings Computer Organization and Architecture 8th Edition
Parallel Processing - introduction
Chapter 14 Instruction Level Parallelism and Superscalar Processors
Morgan Kaufmann Publishers
Parallel and Multiprocessor Architectures
Chapter 17 Parallel Processing
CSE8380 Parallel and Distributed Processing Presentation
AN INTRODUCTION ON PARALLEL PROCESSING
COMPUTER ORGANIZATION AND ARCHITECTURE
Presentation transcript:

Yulia Newton CS 147, Fall 2009 SJSU

What is it? “Parallel processing is the ability of an entity to carry out multiple operations or tasks simultaneously. The term is used in the contexts of both human cognition and machine computation.” - Wikipedia

Humans Human body is a parallel processing architecture Human body is the most complex machine known to man We perform parallel processing all the time

Why? We want to do MORE We want to do it FASTER

Limitations of Single Processor Physical Heat Electromagnetic interference Economic Cost

Parallel Computing Divide bigger tasks into smaller tasks Reminds you Divide and Conquer concept? Perform calculations of those tasks simultaneously (in parallel)

With Single CPU Parallel processing is possible with single-CPU, single- core computers Requires very sophisticated software called distributed processing software Most parallel computing is done with multiple CPU’s

Speedup from parallelization More processors = faster computing? Speed-up from parallelization is not linear Can do single task in a unit of time Can do twice as many tasks in same the amount of time Can do four times as many tasks in the same amount of time

Speedup Definition: Speedup is how much a parallel algorithm is faster than a corresponding sequential algorithm.

Gene Myron Amdahl Norwegian American computer architect and hi-tech entrepreneur

Amdahl's Law Formulated in 1960s Gives potential speed-up on parallel computing system Small portion of components/elements that cannot be parallelized will limit speed-up possible from parallelizable components/elements.

Amdahl's Law (cont’d) where : S is the overall speed-up of the system P is the fraction that is parallelizable

Amdahl's Law Alternative Form where f – fraction of the program that is enhanced s – speedup of the enhanced portion

Amdahl's Law (cont’d)

Program Amdahl's Law Example Program has four parts Code 1 Loop 1 Loop 2 Code 2 Part 1 = 11% Part 2 = 18% Part 3 = 23% Part 4 = 48%

Amdahl's Law Example (cont’d) Let’s say: P1 is not sped up, so S1 = 1 or 100% P2 is sped up 5×, so S2 = 500% P3 is sped up 20×, so S3 = 2000% P4 is sped up 1.6×, so S4 = 160%

Amdahl's Law Example (cont’d) Using P1/S1 + P2/S2 + P3/S3 + P4/S4:

Amdahl's Law Example (cont’d) is a little less than ½ of the original running time, which is 1 Overall speed boost is 1 / = or a little more than double the original speed Notice how the 20× and 5× speedup don't have much effect on the overall speed boost and running time when 11% is not sped up, and 48% is sped up by 1.6×

Amdahl's Law Limitations Based on fixed work load or fixed problem size (strong or hard scaling) Implies that machine size is irrelevant

John L. Gustafson American computer scientist and businessman

Gustafson's Law Formulated in 1988 Closely related to Amdahl’s Law Addresses the shortcomings of Amdahl's law, which cannot scale to match availability of computing power as the machine size increases

Gustafson's Law (cont’d) where P is the number of processors, S is the speedup, and α the non-parallelizable part of the process

Gustafson's Law (cont’d) Proposes a fixed time concept which leads to scaled speed up for larger problem sizes Uses weak or soft scaling

Gustafson's Law Limitations Some problems do not have fundamentally larger datasets Example: processing one data point per world citizen gets larger at only a few percent per year Nonlinear algorithms may make it hard to take advantage of parallelism "exposed" by Gustafson's law

Parallel Architectures Superscalar VLIW Vector Processors Interconnection Networks Shared Memory Multiprocessors Distributed Computing Dataflow Computing Neural Networks Systolic Arrays

Superscalar Architecture Implements instruction-level parallelism (multiple instructions are executed simultaneously in each cycle) Net effect is the same as pipelining Additional hardware is required Contains specialized instruction fetch unit (retrieves multiple instructions simultaneously from memory) Relies on both hardware and compiler

Superscalar Architecture Analogous to adding a lane to a highway – more physical resources are required but in the end more cars can pass through

Superscalar Architecture Examples: Pentium x86 processors Intel i960CA AMD series

VLIW Architecture Stands for Very Long Instruction Word Similar to Superscalar architecture Relies entirely on compiler Pack independent instructions into one long instruction Bigger size of compiled code

VLIW Architecture Examples: Intel’s Itanium, IA-64 Floating Point Systems' FPS164

Vector Processors Supercomputers are vector processor systems Specialized, heavily pipelined processors Efficient operations on entire vectors and matrices

Vector Processors Heavily pipelined so that arithmetic operations can be overlapped

Vector Processors (example) Instructions on a traditional processor: For I = 0 to VectorLength V3[i] = V1[i] + V2[i] Instructions on a vector processor: LDV V1, R1 LDV V2, R2 ADDV R3, R1, R2 STV R3, V3

Vector Processors Applications: Weather forecasting Medical diagnoses systems Image processing

Vector Processors Examples: Cray series of cupercomputers Xtrillion 3.0

Interconnection Networks MIMD (Multiple instruction stream, multiple data stream) type architecture Each processor has its own memory Processors are allowed to access other processors’ memory via network

Interconnection Networks Can be: Dynamic – allow paths between two components to change from one communication to another Static – do not allow a change of path between communications

Interconnection Networks Can be: Non-blocking – allow new connections in the presence of another simultaneous connections Blocking – do not allow simultaneous connections

Interconnection Networks Sample interconnection networks:

Shared Memory Architecture MIMD (Multiple instruction stream, multiple data stream) type architecture Single memory pool accessed by all processors Can be categorized by how the memory access is performed: UMA (Uniform Memory Access) – all processors have equal access to entire memory NUMA (Non-Uniform Memory Access) – each processor has access to its own piece of memory

Shared Memory Architecture Diagram of a Shared Memory system:

Distributed Computing Very loosely coupled multicomputer system, connected by buses or through a network Main advantage is cost Main disadvantage is slow communication due to the physical distance between computing units

Distributed Computing Example: SETI (Search for Extra Terrestrial Intelligence) group at UC Berkley analyzes data from radio-telescopes To help this project PC users can install SETI screen saver on their home computer Data will be analyzed during the processor’s idle time Half a million years of CPU time accumulated in about 18 months!

Data Flow Computing Von Neumann machines exhibit sequential control flow; data and instructions are segregated Dataflow computing – control of the program is directly tied to the data Order of the instructions does not matter Each instruction is considered a separate process Tokens are executed rather than instructions

Data Flow Computing Data Flow Graph for y=(c+3)*f

Data Flow Computing Data flow system diagram:

Neural Networks Good for massively parallel applications, fault tolerance and adapting to changing circumstances Based on parallel architecture of human brain Composed of large number of simple processing elements Trainable

Neural Networks Model human brain Perceptron is the simplest example

Neural Networks Applications: Quality control Financial and economic forecasting Oil and gas exploration Speech and pattern recognition Health care cost reduction Bankruptcy prediction

Systolic Arrays Networks of processing elements that rhythmically compute data by circulating it through the system Variations of SIMD architecture

Systolic Arrays Chain of processors in a systolic array system:

To Recap We completed overview of the following parallel processing architectures: Superscalar VLIW Vector Processors Interconnection Networks Shared Memory Multiprocessors Distributed Computing Dataflow Computing Neural Networks Systolic Arrays

References and Sources Null, Linda and Julia Lobur. “Computer Organization and Architecture” Null, Linda and Julia Lobur. “Computer Organization and Architecture”

Questions? Thank you!