Presentation is loading. Please wait.

Presentation is loading. Please wait.

Parallel Processing: Architecture Overview Rajkumar Buyya Grid Computing and Distributed Systems (GRIDS) Lab. The University of Melbourne Melbourne, Australia.

Similar presentations


Presentation on theme: "Parallel Processing: Architecture Overview Rajkumar Buyya Grid Computing and Distributed Systems (GRIDS) Lab. The University of Melbourne Melbourne, Australia."— Presentation transcript:

1 Parallel Processing: Architecture Overview Rajkumar Buyya Grid Computing and Distributed Systems (GRIDS) Lab. The University of Melbourne Melbourne, Australia www.gridbus.org WW Grid

2 Serial Vs. Parallel Q Please COUNTER COUNTER 1 COUNTER 2

3 Overview of the Talk  Introduction  Why Parallel Processing ?  Parallel System H/W Architecture  Parallel Operating Systems

4 PPPPPP  Microkernel Multi-Processor Computing System Threads Interface Hardware Operating System Process Processor Thread P P Applications Computing Elements Programming paradigms

5 Two Eras of Computing Architectures System Software/Compiler Applications P.S.Es Architectures System Software Applications P.S.Es Sequential Era Parallel Era 1940 50 60 70 80 90 2000 2030 Commercialization R & D Commodity

6 History of Parallel Processing The notion of parallel processing can be traced to a tablet dated around 100 BC. Tablet has 3 calculating positions capable of operating simultaneously. From this we can infer that: They were aimed at “speed” or “reliability”.

7 Motivating Factors Just as we learned to fly, not by constructing a machine that flaps its wings like birds, but by applying aerodynamics principles demonstrated by the nature... Similarly parallel processing has been modeled after those of biological species. Aggregated speed with which complex calculations carried out by (billions of) neurons demonstrate feasibility of PP. Individual neuron response speed is slow (ms) –

8 Why Parallel Processing? Ø Computation requirements are ever increasing -- visualization, distributed databases, simulations, scientific prediction (earthquake), etc. Ø Silicon based (sequential) architectures reaching physical limits in processing limits as they are constrained by: Ø the speed of light, thermodynamics

9 Age Growth 5 10 15 20 25 30 35 40 45.... Human Architecture! Growth Performance Vertical Horizontal

10 No. of Processors C.P.I 1 2.... Computational Power Improvement Multiprocessor Uniprocessor

11 Why Parallel Processing? Hardware improvements like pipelining, superscalar are not scaling well and require sophisticated compiler technology to exploit performance out of them. Techniques such as vector processing works well for certain kind of problems.

12 Why Parallel Processing? Significant development in networking technology is paving a way for network-based cost-effective parallel computing. The parallel processing technology is mature and is being exploited commercially.

13 Parallel Programs Consist of multiple active “processes” simultaneously solving a given problem. And the communication and synchronization between them (parallel processes) forms the core of parallel programming efforts.

14 Types of Parallel Systems Tightly Couple Systems: Shared Memory Parallel Smallest extension to existing systems Program conversion is incremental Distributed Memory Parallel Completely new systems Programs must be reconstructed Loosely Coupled Systems: Clusters Built using commodity systems Centralised management Grids Aggregation of distributed systems Decentralized management

15 Processing Elements Architecture

16 Processing Elements Flynn proposed a classification of computer systems based on a number of instruction and data streams that can be processed simultaneously. They are: SISD (Single Instruction and Single Data) Conventional computers SIMD (Single Instruction and Multiple Data) Data parallel, vector computing machines MISD (Multiple Instruction and Single Data) Systolic arrays MIMD (Multiple Instruction and Multiple Data) General purpose machine

17 SISD : A Conventional Computer  Speed is limited by the rate at which computer can transfer information internally. Processor Data Input Data Output Instructions Ex: PCs, Workstations

18 The MISD Architecture  More of an intellectual exercise than a practical configuration. Few built, but commercially not available Data Input Stream Data Output Stream Processor A Processor B Processor C Instruction Stream A Instruction Stream B Instruction Stream C

19 SIMD Architecture Ex: CRAY machine vector processing, Thinking machine cm* Intel MMX (multimedia support) C i <= A i * B i Instruction Stream Processor A Processor B Processor C Data Input stream A Data Input stream B Data Input stream C Data Output stream A Data Output stream B Data Output stream C

20 Unlike SISD, MISD, MIMD computer works asynchronously. Shared memory (tightly coupled) MIMD Distributed memory (loosely coupled) MIMD MIMD Architecture Processor A Processor B Processor C Data Input stream A Data Input stream B Data Input stream C Data Output stream A Data Output stream B Data Output stream C Instruction Stream A Instruction Stream B Instruction Stream C

21 MEMORYMEMORY BUSBUS Shared Memory MIMD machine Comm: Source PE writes data to GM & destination retrieves it  Easy to build, conventional OSes of SISD can be easily be ported  Limitation : reliability & expandibility. A memory component or any processor failure affects the whole system.  Increase of processors leads to memory contention. Ex. : Silicon graphics supercomputers.... MEMORYMEMORY BUSBUS Global Memory System Processor A Processor A Processor B Processor B Processor C Processor C MEMORYMEMORY BUSBUS

22 MEMORYMEMORY BUSBUS Distributed Memory MIMD l Communication : IPC (Inter-Process Communication) via High Speed Network. l Network can be configured to... Tree, Mesh, Cube, etc. l Unlike Shared MIMD  easily/ readily expandable  Highly reliable (any CPU failure does not affect the whole system) Processor A Processor A Processor B Processor B Processor C Processor C MEMORYMEMORY BUSBUS MEMORYMEMORY BUSBUS Memory System A Memory System A Memory System B Memory System B Memory System C Memory System C IPC channel IPC channel

23 Laws of caution..... l Speed of computation is proportional to the square root of system cost. i.e. Speed = Cost Speedup by a parallel computer increases as the logarithm of the number of processors. Speedup = log2(no. of processors) S P log 2 P C S

24 Caution.... Very fast development in network computing and related area have blurred concept boundaries, causing lot of terminological confusion : concurrent computing, parallel computing, multiprocessing, supercomputing, massively parallel processing, cluster computing, distributed computing, Internet computing, grid computing, etc. At the user level, even well-defined distinctions such as shared memory and distributed memory are disappearing due to new advances in technology. Good tools for parallel program development and debugging are yet to emerge.

25 Caution.... There is no strict delimiters for contributors to the area of parallel processing: computer architecture, operating systems, high-level languages, algorithms, databases, computer networks, … All have a role to play.

26 Operating Systems for High Performance Computing

27 Types of Parallel Systems Shared Memory Parallel Smallest extension to existing systems Program conversion is incremental Distributed Memory Parallel Completely new systems Programs must be reconstructed Clusters Slow communication form of Distributed

28 Operating Systems for PP MPP systems having thousands of processors requires OS radically different from current ones. Every CPU needs OS : to manage its resources to hide its details Traditional systems are heavy, complex and not suitable for MPP

29 Operating System Models Frame work that unifies features, services and tasks performed Three approaches to building OS.... Monolithic OS Layered OS Microkernel based OS Client server OS Suitable for MPP systems Simplicity, flexibility and high performance are crucial for OS.

30 Application Programs Application Programs System Services Hardware User Mode Kernel Mode Monolithic Operating System c Better application Performance c Difficult to extend Ex: MS-DOS

31 Layered OS lEasier to enhance lEach layer of code access lower level interface lLow-application performance Application Programs Application Programs System Services User Mode Kernel Mode Memory & I/O Device Mgmt Hardware Process Schedule Application Programs Application Programs Ex : UNIX

32 Traditional OS OS Designer OS Hardware User Mode Kernel Mode Application Programs Application Programs Application Programs Application Programs

33 New trend in OS design User Mode Kernel Mode Hardware Microkernel Servers Application Programs Application Programs Application Programs Application Programs

34 Microkernel/Client Server OS (for MPP Systems) lTiny OS kernel providing basic primitive (process, memory, IPC) lTraditional services becomes subsystems lMonolithic Application Perf. Competence lOS = Microkernel + User Subsystems Client Application Client Application Thread lib. Thread lib. File Server File Server Network Server Network Server Display Server Display Server Microkernel Hardware User Kernel Send Reply Ex: Mach, PARAS, Chorus, etc.

35 Few Popular Microkernel Systems, MACH, CMU, PARAS, C-DAC, Chorus, QNX,, (Windows)


Download ppt "Parallel Processing: Architecture Overview Rajkumar Buyya Grid Computing and Distributed Systems (GRIDS) Lab. The University of Melbourne Melbourne, Australia."

Similar presentations


Ads by Google