Introduction to Parallel Computing: Architectures, Systems, and Programming Prof. Rajkumar Buyya Cloud Computing and Distributed Systems (CLOUDS) Lab.

Slides:

Advertisements

Similar presentations

Threads, SMP, and Microkernels

Advertisements

Operating System Architecture and Distributed Systems

Operating System Architecture and Distributed Systems

Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.

1. Introduction to Parallel Computing

Lecture 38: Chapter 7: Multiprocessors Today’s topic –Vector processors –GPUs –An example 1.

Today’s topics Single processors and the Memory Hierarchy

Parallel Processing: Architecture Overview Subject Code: Rajkumar Buyya Grid Computing and Distributed Systems (GRIDS) Lab. The University of Melbourne.

Parallel Programming Models and Paradigms Prof. Rajkumar Buyya Cloud Computing and Distributed Systems (CLOUDS) Lab. The University of Melbourne, Australia.

Threads, SMP, and Microkernels Chapter 4. Process Resource ownership - process is allocated a virtual address space to hold the process image Scheduling/execution-

March 18, 2008SSE Meeting 1 Mary Hall Dept. of Computer Science and Information Sciences Institute Multicore Chips and Parallel Programming.

History of Distributed Systems Joseph Cordina

1 Threads, SMP, and Microkernels Chapter 4. 2 Process: Some Info. Motivation for threads! Two fundamental aspects of a “process”: Resource ownership Scheduling.

Parallel Programming Models and Paradigms

Chapter 17 Parallel Processing.

Multiprocessors CSE 471 Aut 011 Multiprocessors - Flynn’s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) –Conventional uniprocessor.

A. Frank - P. Weisberg Operating Systems Structure of Operating Systems.

CPE 731 Advanced Computer Architecture Multiprocessor Introduction

Parallel Processing: Architecture Overview Rajkumar Buyya Grid Computing and Distributed Systems (GRIDS) Lab. The University of Melbourne Melbourne, Australia.

DISTRIBUTED COMPUTING

CMSC 611: Advanced Computer Architecture Parallel Computation Most slides adapted from David Patterson. Some from Mohomed Younis.

Reference: / Parallel Programming Paradigm Yeni Herdiyeni Dept of Computer Science, IPB.

1 Parallel computing and its recent topics. 2 Outline 1. Introduction of parallel processing (1)What is parallel processing (2)Classification of parallel.

KUAS.EE Parallel Computing at a Glance. KUAS.EE History Parallel Computing.

Computer System Architectures Computer System Software

Chapter 4 Threads, SMP, and Microkernels Dave Bremer Otago Polytechnic, N.Z. ©2008, Prentice Hall Operating Systems: Internals and Design Principles, 6/E.

Introduction To Computer System

 What is an operating system? What is an operating system?  Where does the OS fit in? Where does the OS fit in?  Services provided by an OS Services.

Lappeenranta University of Technology / JP CT30A7001 Concurrent and Parallel Computing Introduction to concurrent and parallel computing.

Chapter 4 Threads, SMP, and Microkernels Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design.

Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,

Department of Computer Science University of the West Indies.

Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.

CS 390 Unix Programming Summer Unix Programming - CS 3902 Course Details Online Information Please check.

Processes and Threads Processes have two characteristics: – Resource ownership - process includes a virtual address space to hold the process image – Scheduling/execution.

Multiprocessing. Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005 Adapted from UC Berkeley "The Beauty and Joy of Computing"

Types of Operating Systems

Threads, SMP, and Microkernels Chapter 4. Process Resource ownership - process is allocated a virtual address space to hold the process image Scheduling/execution-

CE Operating Systems Lecture 3 Overview of OS functions and structure.

1 Threads, SMP, and Microkernels Chapter 4. 2 Process Resource ownership: process includes a virtual address space to hold the process image (fig 3.16)

Introduction to Parallel Processing

Parallel Computing.

Types of Operating Systems 1 Computer Engineering Department Distributed Systems Course Assoc. Prof. Dr. Ahmet Sayar Kocaeli University - Fall 2015.

Operating System 4 THREADS, SMP AND MICROKERNELS.

A. Frank - P. Weisberg Operating Systems Structure of Operating Systems.

CS- 492 : Distributed system & Parallel Processing Lecture 7: Sun: 15/5/1435 Foundations of designing parallel algorithms and shared memory models Lecturer/

Server HW CSIS 4490 n-Tier Client/Server Dr. Hoganson Server Hardware Mission-critical –High reliability –redundancy Massive storage (disk) –RAID for redundancy.

Outline Why this subject? What is High Performance Computing?

Lecture 3: Computer Architectures

Chapter 1 Basic Concepts of Operating Systems Introduction Software A program is a sequence of instructions that enables the computer to carry.

3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.

3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.

Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 May 2, 2006 Session 29.

3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 3.

Parallel Computing Presented by Justin Reschke

Background Computer System Architectures Computer System Software.

Computer Science and Engineering Parallel and Distributed Processing CSE 8380 April 28, 2005 Session 29.

Threads, SMP, and Microkernels Chapter 4. Processes and Threads Operating systems use processes for two purposes - Resource allocation and resource ownership.

Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.

Lecture 1: Network Operating Systems (NOS)

Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Parallel Hardware Dr. Xiao Qin Auburn.

Parallel Processing: Architecture Overview

CMSC 611: Advanced Computer Architecture

Multi-Processing in High Performance Computer Architecture:

What is Parallel and Distributed computing?

Threads, SMP, and Microkernels

Different Architectures

Chapter 17 Parallel Processing

Operating System 4 THREADS, SMP AND MICROKERNELS

Chapter 4 Multiprocessors

Presentation transcript:

Introduction to Parallel Computing: Architectures, Systems, and Programming Prof. Rajkumar Buyya Cloud Computing and Distributed Systems (CLOUDS) Lab. The University of Melbourne, Australia

Serial Vs. Parallel Services Q Please COUNTER COUNTER 1 COUNTER 2

Overview of the Talk  Introduction  Why Parallel Processing ?  Parallel System H/W Architecture  Parallel Operating Systems  Parallel Programming Models  Summary

PPPPPP  Microkernel Multi-Processor Computing System Threads Interface Hardware Operating System Process Processor Thread P P Applications Computing Elements Programming paradigms

Two Eras of Computing Architectures System Software/Compiler Applications P.S.Es Architectures System Software Applications P.S.Es Sequential Era Parallel Era Commercialization R & D Commodity

History of Parallel Processing The notion of parallel processing can be traced to a tablet dated around 100 BC. Tablet has 3 calculating positions capable of operating simultaneously. From this we can infer that: They were aimed at “speed” or “reliability”.

Motivating Factor: Human Brain The human brain consists of a large number (more than a billion) of neural cells that process information. Each cell works like a simple processor and only the massive interaction between all cells and their parallel processing makes the brain's abilities possible. Individual neuron response speed is slow (ms) Aggregated speed with which complex calculations carried out by (billions of) neurons demonstrate feasibility of parallel processing.

Why Parallel Processing? Computation requirements are ever increasing: simulations, scientific prediction (earthquake), distributed databases, weather forecasting (will it rain tomorrow?), search engines, e-commerce, Internet service applications, Data Center applications, Finance (investment risk analysis), Oil Exploration, Mining, etc. Silicon based (sequential) architectures reaching their limits in processing capabilities (clock speed) as they are constrained by: the speed of light, thermodynamics

Age Growth Human Architecture! Growth Performance Vertical Horizontal

No. of Processors C.P.I Computational Power Improvement Multiprocessor Uniprocessor

Why Parallel Processing? Hardware improvements like pipelining, superscalar are not scaling well and require sophisticated compiler technology to exploit performance out of them. Techniques such as vector processing works well for certain kind of problems.

Why Parallel Processing? Significant development in networking technology is paving a way for network-based cost-effective parallel computing. The parallel processing technology is now mature and is being exploited commercially. All computers (including desktops and laptops) are now based on parallel processing (e.g., multicore) architecture.

Processing Elements Architecture

Processing Elements Flynn proposed a classification of computer systems based on a number of instruction and data streams that can be processed simultaneously. They are: SISD (Single Instruction and Single Data) Conventional computers SIMD (Single Instruction and Multiple Data) Data parallel, vector computing machines MISD (Multiple Instruction and Single Data) Systolic arrays MIMD (Multiple Instruction and Multiple Data) General purpose machine

SISD : A Conventional Computer  Speed is limited by the rate at which computer can transfer information internally. Processor Data Input Data Output Instructions Ex: PCs, Workstations

The MISD Architecture  More of an intellectual exercise than a practical configuration. Few built, but commercially not available Data Input Stream Data Output Stream Processor A Processor B Processor C Instruction Stream A Instruction Stream B Instruction Stream C

SIMD Architecture Ex: CRAY machine vector processing, Thinking machine cm* Intel MMX (multimedia support) C i <= A i * B i Instruction Stream Processor A Processor B Processor C Data Input stream A Data Input stream B Data Input stream C Data Output stream A Data Output stream B Data Output stream C

Unlike SISD, MISD, MIMD computer works asynchronously. Shared memory (tightly coupled) MIMD e.g., Multicore Distributed memory (loosely coupled) MIMD MIMD Architecture Processor A Processor B Processor C Data Input stream A Data Input stream B Data Input stream C Data Output stream A Data Output stream B Data Output stream C Instruction Stream A Instruction Stream B Instruction Stream C

MEMORYMEMORY BUSBUS Shared Memory MIMD machine Communication: Source PE writes data to GM & destination PE retrieves it  Easy to build, conventional OSes of SISD can be easily be ported  Limitation : reliability & expandability. A memory component or any processor failure affects the whole system.  Increase of processors leads to memory contention. Ex. : Silicon graphics supercomputers and now Multicore systems MEMORYMEMORY BUSBUS Global Memory System Processor A Processor A Processor B Processor B Processor C Processor C MEMORYMEMORY BUSBUS

MEMORYMEMORY BUSBUS Distributed Memory MIMD l Communication : IPC (Inter-Process Communication) via High Speed Network. l Network can be configured to... Tree, Mesh, Cube, etc. l Unlike Shared MIMD  easily/ readily expandable  Highly reliable (any CPU failure does not affect the whole system) Processor A Processor A Processor B Processor B Processor C Processor C MEMORYMEMORY BUSBUS MEMORYMEMORY BUSBUS Memory System A Memory System A Memory System B Memory System B Memory System C Memory System C IPC channel IPC channel

Types of Parallel Systems Tightly Couple Systems: Shared Memory Parallel Smallest extension to existing systems Program conversion is incremental Distributed Memory Parallel Completely new systems Programs must be reconstructed Loosely Coupled Systems: Clusters (now Clouds) Built using commodity systems Centralised management Grids Aggregation of distributed systems Decentralized management

Laws of caution..... l Speed of computation is proportional to the square root of system cost. i.e. Speed = Cost Speedup by a parallel computer increases as the logarithm of the number of processors. Speedup = log2(no. of processors) S P log 2 P C S

Caution.... Very fast development in network computing and related area have blurred concept boundaries, causing lot of terminological confusion: concurrent computing, parallel computing, multiprocessing, supercomputing, massively parallel processing, cluster computing, distributed computing, Internet computing, grid computing, Cloud computing, etc. At the user level, even well-defined distinctions such as shared memory and distributed memory are disappearing due to new advances in technologies. Good tools for parallel application development and debugging are yet to emerge.

Caution.... There is no strict delimiters for contributors to the area of parallel processing: computer architecture, operating systems, high-level languages, algorithms, databases, computer networks, … All have a role to play.

Operating Systems for High Performance Computing

Operating Systems for PP MPP systems having thousands of processors requires OS radically different from current ones. Every CPU needs OS : to manage its resources to hide its details Traditional systems are heavy, complex and not suitable for MPP

Operating System Models Frame work that unifies features, services and tasks performed Three approaches to building OS.... Monolithic OS Layered OS Microkernel based OS Client server OS Suitable for MPP systems Simplicity, flexibility and high performance are crucial for OS.

Application Programs Application Programs System Services Hardware Monolithic Operating System c Better application Performance c Difficult to extend Ex: MS-DOS User Mode Kernel Mode

Layered OS lEasier to enhance lEach layer of code access lower level interface lLow-application performance Application Programs Application Programs System Services User Mode Kernel Mode Memory & I/O Device Mgmt Hardware Process Schedule Application Programs Application Programs Ex : UNIX

Traditional OS OS Designer OS Hardware User Mode Kernel Mode Application Programs Application Programs Application Programs Application Programs

New trend in OS design User Mode Kernel Mode Hardware Microkernel Servers Application Programs Application Programs Application Programs Application Programs

Microkernel/Client Server OS (for MPP Systems) lTiny OS kernel providing basic primitive (process, memory, IPC) lTraditional services becomes subsystems lMonolithic Application Perf. Competence lOS = Microkernel + User Subsystems Client Application Client Application Thread lib. Thread lib. File Server File Server Network Server Network Server Display Server Display Server Microkernel Hardware User Kernel Send Reply

Few Popular Microkernel Systems, MACH, CMU, PARAS, C-DAC, Chorus, QNX, (Windows)

Parallel Programs Consist of multiple active “processes” simultaneously solving a given problem. And the communication and synchronization between them (parallel processes) forms the core of parallel programming efforts.

Parallel Programming Models Shared Memory Model DSM Threads/OpenMP (enabled for clusters) Java threads (HKU JESSICA, IBM cJVM) Message Passing Model PVM MPI Hybrid Model Mixing shared and distributed memory model Using OpenMP and MPI together Object and Service Oriented Models Wide area distributed computing technologies OO: CORBA, DCOM, etc. Services: Web Services-based service composition

Summary/Conclusions Parallel processing has become a reality: E.g., SMPs are used as (Web) Servers extensively. Threads concept utilized everywhere. Clusters have emerged as popular data centers and processing engines: E.g., Google search engine. The emergence of commodity high- performance CPU, networks, and OSs have made parallel computing applicable to enterprise and consumer applications. E.g., Oracle {9i,10g} database on Clusters/Grids. E.g. Facebook and Twitter running on Clouds