3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 3.

Slides:



Advertisements
Similar presentations
Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.
Advertisements

Master/Slave Architecture Pattern Source: Pattern-Oriented Software Architecture, Vol. 1, Buschmann, et al.
Vector Processing. Vector Processors Combine vector operands (inputs) element by element to produce an output vector. Typical array-oriented operations.
Parallel Programming Yang Xianchun Department of Computer Science and Technology Nanjing University Introduction.
Extending the Unified Parallel Processing Speedup Model Computer architectures take advantage of low-level parallelism: multiple pipelines The next generations.
Parallel Programming Henri Bal Rob van Nieuwpoort Vrije Universiteit Amsterdam Faculty of Sciences.
Parallel Programming Henri Bal Vrije Universiteit Faculty of Sciences Amsterdam.
Introduction What is Parallel Algorithms? Why Parallel Algorithms? Evolution and Convergence of Parallel Algorithms Fundamental Design Issues.
Understanding Operating Systems 1 Overview Introduction Operating System Components Machine Hardware Types of Operating Systems Brief History of Operating.
Chapter 17 Parallel Processing.
State Machines Timing Computer Bus Computer Performance Instruction Set Architectures RISC / CISC Machines.
Multiprocessors CSE 471 Aut 011 Multiprocessors - Flynn’s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) –Conventional uniprocessor.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 1: Introduction What is an Operating System? Mainframe Systems Desktop Systems.
1/16/2008CSCI 315 Operating Systems Design1 Introduction Notice: The slides for this lecture have been largely based on those accompanying the textbook.
Parallel Programming Henri Bal Vrije Universiteit Amsterdam Faculty of Sciences.
CS 8625 High Performance and Parallel, Dr. Hoganson Copyright © 2001, 2004, 2005, 2006, 2008, Dr. Ken Hoganson CS8625-June-2-08 Class Will Start Momentarily…
CS 21a: Intro to Computing I Department of Information Systems and Computer Science Ateneo de Manila University.
1 Chapter 1 Parallel Machines and Computations (Fundamentals of Parallel Processing) Dr. Ranette Halverson.
Multi-core architectures. Single-core computer Single-core CPU chip.
Multi-Core Architectures
Scheduling Many-Body Short Range MD Simulations on a Cluster of Workstations and Custom VLSI Hardware Sumanth J.V, David R. Swanson and Hong Jiang University.
2015/10/14Part-I1 Introduction to Parallel Processing.
Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
المحاضرة الاولى Operating Systems. The general objectives of this decision explain the concepts and the importance of operating systems and development.
SJSU SPRING 2011 PARALLEL COMPUTING Parallel Computing CS 147: Computer Architecture Instructor: Professor Sin-Min Lee Spring 2011 By: Alice Cotti.
1.1 Operating System Concepts Introduction What is an Operating System? Mainframe Systems Desktop Systems Multiprocessor Systems Distributed Systems Clustered.
April 26, CSE8380 Parallel and Distributed Processing Presentation Hong Yue Department of Computer Science & Engineering Southern Methodist University.
ECE200 – Computer Organization Chapter 9 – Multiprocessors.
Chapter 1 — Computer Abstractions and Technology — 1 The Computer Revolution Progress in computer technology – Underpinned by Moore’s Law Makes novel applications.
Vector/Array ProcessorsCSCI 4717 – Computer Architecture CSCI 4717/5717 Computer Architecture Topic: Vector/Array Processors Reading: Stallings, Section.
Silberschatz and Galvin  Operating System Concepts Module 1: Introduction What is an operating system? Simple Batch Systems Multiprogramming.
Orange Coast College Business Division Computer Science Department CS 116- Computer Architecture Multiprocessors.
Chapter 2 Introduction to Systems Architecture. Chapter goals Discuss the development of automated computing Describe the general capabilities of a computer.
Parallel Processing & Distributed Systems Thoai Nam And Vu Le Hung.
Classic Model of Parallel Processing
Modeling Big Data Execution speed limited by: –Model complexity –Software Efficiency –Spatial and temporal extent and resolution –Data size & access speed.
Server HW CSIS 4490 n-Tier Client/Server Dr. Hoganson Server Hardware Mission-critical –High reliability –redundancy Massive storage (disk) –RAID for redundancy.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
3/12/2013Computer Engg, IIT(BHU)1 CONCEPTS-1. Pipelining Pipelining is used to increase the speed of processing It uses temporal parallelism In pipelining,
1.1 Sandeep TayalCSE Department MAIT 1: Introduction What is an operating system? Simple Batch Systems Multiprogramming Batched Systems Time-Sharing Systems.
3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,
CS4315A. Berrached:CMS:UHD1 Introduction to Operating Systems Chapter 1.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 1: Introduction What is an Operating System? Multiprocessor Systems Distributed.
1a.1 Parallel Computing and Parallel Computers ITCS 4/5145 Cluster Computing, UNC-Charlotte, B. Wilkinson, 2006.
Background Computer System Architectures Computer System Software.
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
Introduction Goal: connecting multiple computers to get higher performance – Multiprocessors – Scalability, availability, power efficiency Job-level (process-level)
CS203 – Advanced Computer Architecture Performance Evaluation.
Chapter 1 Introduction.
CS203 – Advanced Computer Architecture
Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Parallel Hardware Dr. Xiao Qin Auburn.
Applied Operating System Concepts
Introduction to Parallel Processing
Modeling Big Data Execution speed limited by: Model complexity
University of Technology
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
Overview Parallel Processing Pipelining
Computer Engg, IIT(BHU)
Spatial Analysis With Big Data
CS 21a: Intro to Computing I
What is Parallel and Distributed computing?
Operating System Concepts
CSE8380 Parallel and Distributed Processing Presentation
Chapter 1 Introduction.
Computer Evolution and Performance
Chapter 4 Multiprocessors
Vrije Universiteit Amsterdam
Operating System Concepts
Presentation transcript:

3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 3

Computing Technology ● Hardware ➢ Vacuum tubes, relay memory ➢ Discrete transistors, core memory ➢ Integrated circuits, pipelined CPU ➢ VLSI microprocessors, solid state memory ● Languages and Software ➢ Machine / Assembly languages ➢ Algol / Fortran with compilers, batch processing OS ➢ C, multiprocessing, timesharing OS ➢ C++ / Java, parallelizing compilers, distributed OS

Computing Technology ● The Driving Force Behind the Technology Advances ➢ The ever-increasing demands on computing power ✔ Scientific computing (e.g. Large-scale simulations) ✔ Commercial computing (e.g. Databases) ✔ 3D graphics and realistic animation ✔ Multimedia internet applications

Challenge Problem ● Simulations of the earth’s climate Resolution: 10 kilometers Period: 1 year Ocean and biosphere models: simple ➢ Total requirements: floating-point operations per second ➢ With a supercomputer capable of 10 Giga FLOPS, it will take 10 days to execute ● Real-time processing of 3D graphics Number of data elements: 10 9 (1024 in each dimension) Number of operations per element : 200 Update rate: 30 times per second ➢ Total requirements: 6.4 x operations per second ➢ With processor capable of 10 Giga IOPS, we need 640 of them

Motivations for Parallelism ● Conventional computers and sequential a single CPU a single stream of instructions executing one instruction at a time (not completely true) ➢ Single-CPU processor has a performance limit ➢ Moore’s Law can’t go on forever

Motivation for Parallelism ● How to increase computing power? Better processor design ➢ More transistors, larger caches, advanced architectures Better system design ➢ Faster / larger memory, faster buses, better OS Scale up the computer (parallelism) ➢ Replicate hardware at component or whole computer levels Parallel processor’s power is virtually unlimited ➢ Mega FLOPS each = 5 Giga FLOPS ➢ Mega FLOPS each = 50 Giga FLOPS ➢ 1, Mega FLOPS each = 500 Giga FLOPS

Motivation for Parallelism ● Additional Motivations ➢ Solving bigger problems ➢ Lowering cost

Terminology ● Hardware ➢ Multicomputers tightly networked, multiple uniform computers ➢ Multiprocessors tightly networked, multiple uniform processors with additional memory units ➢ Supercomputers general purpose and high-performance, nowadays almost always parallel ➢ Clusters Loosely networked commodity computers

Terminology ● Programming ➢ Pipelining divide computation into stages (segments) assign separate functional units to each stage ➢ Data Parallelism multiple (uniform) functional units apply same operation simultaneously to different elements of data set ➢ Control Parallelism multiple (specialized) functional units apply distinct operations to data elements concurrently

Terminology ● Performance ➢ Throughput number of results per unit time ➢ Speedup Time needed for the most efficient sequential algorithm S= —————————————————— Time needed on a pipelined / parallel machine

Terminology ➢ Scalability An algorithm is scalable if the available parallelism increases at least linearly with problem size An architecture is scalable if it gives same performance per processor, as the number of processors and the size of the problem are both increased Data-parallel algorithms tend to be more scalable than control-parallel algorithms

Example ● Problem ➢ Find all primes less than or equal to some positive integer n ● Method (the sieve algorithm) ➢ Write down all th integers from 1 to n ➢ Cross out from the list all multiples of 2, 3, 5, 7, … up to sqrt (n)

Example ● sequential Implementation ➢ Boolean array representing the integers from 1 to n ➢ Buffer for holding current prime ➢ Index for loop iterating through the array

Example ● Control-Parallel Approach ➢ Different processors strike out multiples of different primes ➢ The boolean array and the current prime is shared; each processor has its own private copy of loop index

Example ● Data-Parallel Approach ➢ Each processor responsible for a unique range of the integers, it does all the striking in that range ➢ Processor 1 is responsible for broadcasting its findings to other processors

Example ● Performance Analysis ➢ Sequential Algorithm Cost of sieving multiples of 2: [(n-3)/2] Cost of sieving multiples of 3: [(n-8)/3] Cost of sieving multiples of 5: [(n-24)/5]... For n=1,000, T=1,411

Example Control-Parallel Algorithm For p=2, n=1,000, T=706 For p=3, n=1,000, T=499 For p=4, n=1,000, T=499

Example ➢ Data-Parallel Algorithm Cost of broadcasting: k(P-1) Cost of striking: ([(n/p)/2]+ [(n/p)/3]+ … + [(n/p)/  k ])  For p=2, n=1,000, T≈781 For p=3, n=1,000, T≈ 471 For p=4, n=1,000, T≈ 337