Networks of Workstations Prabhaker Mateti Wright State University.

Slides:



Advertisements
Similar presentations
Multiple Processor Systems
Advertisements

Multiple Processor Systems
Threads, SMP, and Microkernels
Protocols and software for exploiting Myrinet clusters Congduc Pham and the main contributors P. Geoffray, L. Prylli, B. Tourancheau, R. Westrelin.
Beowulf Supercomputer System Lee, Jung won CS843.
1 Parallel Scientific Computing: Algorithms and Tools Lecture #3 APMA 2821A, Spring 2008 Instructors: George Em Karniadakis Leopold Grinberg.
Types of Parallel Computers
Threads, SMP, and Microkernels Chapter 4. Process Resource ownership - process is allocated a virtual address space to hold the process image Scheduling/execution-
Chapter 4 Threads, SMP, and Microkernels Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design.
PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker.
Computer Systems/Operating Systems - Class 8
Distributed Processing, Client/Server, and Clusters
History of Distributed Systems Joseph Cordina
IBM RS6000/SP Overview Advanced IBM Unix computers series Multiple different configurations Available from entry level to high-end machines. POWER (1,2,3,4)
Reference: Message Passing Fundamentals.
Multiple Processor Systems Chapter Multiprocessors 8.2 Multicomputers 8.3 Distributed systems.
Tuesday, September 12, 2006 Nothing is impossible for people who don't have to do it themselves. - Weiler.
2. Multiprocessors Main Structures 2.1 Shared Memory x Distributed Memory Shared-Memory (Global-Memory) Multiprocessor:  All processors can access all.
Parallel Computing Overview CS 524 – High-Performance Computing.
Hitachi SR8000 Supercomputer LAPPEENRANTA UNIVERSITY OF TECHNOLOGY Department of Information Technology Introduction to Parallel Computing Group.
1 CSE SUNY New Paltz Chapter Nine Multiprocessors.
4. Multiprocessors Main Structures 4.1 Shared Memory x Distributed Memory Shared-Memory (Global-Memory) Multiprocessor:  All processors can access all.
1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
Chapter 51 Threads Chapter 5. 2 Process Characteristics  Concept of Process has two facets.  A Process is: A Unit of resource ownership:  a virtual.
Parallel Architectures
Introduction to Parallel Processing 3.1 Basic concepts 3.2 Types and levels of parallelism 3.3 Classification of parallel architecture 3.4 Basic parallel.
Computer Architecture Parallel Processing
Reference: / Parallel Programming Paradigm Yeni Herdiyeni Dept of Computer Science, IPB.
Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster and powerful computers –shared memory model ( access nsec) –message passing.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
CS 390- Unix Programming Environment CS 390 Unix Programming Environment Topics to be covered: Distributed Computing Fundamentals.
Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,
MIMD Distributed Memory Architectures message-passing multicomputers.
Processes and Threads Processes have two characteristics: – Resource ownership - process includes a virtual address space to hold the process image – Scheduling/execution.
1 SIAC 2000 Program. 2 SIAC 2000 at a Glance AMLunchPMDinner SunCondor MonNOWHPCGlobusClusters TuePVMMPIClustersHPVM WedCondorHPVM.
Case Study in Computational Science & Engineering - Lecture 2 1 Parallel Architecture Models Shared Memory –Dual/Quad Pentium, Cray T90, IBM Power3 Node.
Threads, SMP, and Microkernels Chapter 4. Process Resource ownership - process is allocated a virtual address space to hold the process image Scheduling/execution-
Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari.
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster computers –shared memory model ( access nsec) –message passing multiprocessor.
Spring 2003CSE P5481 Issues in Multiprocessors Which programming model for interprocessor communication shared memory regular loads & stores message passing.
Orange Coast College Business Division Computer Science Department CS 116- Computer Architecture Multiprocessors.
1 Threads, SMP, and Microkernels Chapter Multithreading Operating system supports multiple threads of execution within a single process MS-DOS.
Computing Environment The computing environment rapidly evolving ‑ you need to know not only the methods, but also How and when to apply them, Which computers.
Message-Passing Computing Chapter 2. Programming Multicomputer Design special parallel programming language –Occam Extend existing language to handle.
CS- 492 : Distributed system & Parallel Processing Lecture 7: Sun: 15/5/1435 Foundations of designing parallel algorithms and shared memory models Lecturer/
Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 4 Computer Systems Review.
Outline Why this subject? What is High Performance Computing?
Lecture 4 Mechanisms & Kernel for NOSs. Mechanisms for Network Operating Systems  Network operating systems provide three basic mechanisms that support.
COMP381 by M. Hamdi 1 Clusters: Networks of WS/PC.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
Parallel Computing Presented by Justin Reschke
CDA-5155 Computer Architecture Principles Fall 2000 Multiprocessor Architectures.
Background Computer System Architectures Computer System Software.
Threads, SMP, and Microkernels Chapter 4. Processes and Threads Operating systems use processes for two purposes - Resource allocation and resource ownership.
1 Chapter 2: Operating-System Structures Services Interface provided to users & programmers –System calls (programmer access) –User level access to system.
Constructing a system with multiple computers or processors 1 ITCS 4/5145 Parallel Programming, UNC-Charlotte, B. Wilkinson. Jan 13, 2016.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
Parallel Programming Models EECC 756 David D. McGann 18 May, 1999.
Introduction to Cluster Computing
Constructing a system with multiple computers or processors
Threads, SMP, and Microkernels
Multiple Processor Systems
Constructing a system with multiple computers or processors
Constructing a system with multiple computers or processors
Constructing a system with multiple computers or processors
Types of Parallel Computers
Presentation transcript:

Networks of Workstations Prabhaker Mateti Wright State University

Prabhaker Mateti, Networks of Workstations 2 Overview Parallel computers Concurrent computation Parallel Methods Message Passing Distributed Shared Memory Programming Tools Cluster configurations

Prabhaker Mateti, Networks of Workstations 3 Granularity of Parallelism Fine-Grained Parallelism Medium-Grained Parallelism Coarse-Grained Parallelism NOWs (Networks of Workstations)

Prabhaker Mateti, Networks of Workstations 4 Fine-Grained Machines Tens of thousands of Processors Processors –Slow (bit serial) –Small (K bits of RAM) –Distributed Memory Interconnection Networks –Message Passing Single Instruction Multiple Data (SIMD)

Prabhaker Mateti, Networks of Workstations 5 Sample Meshes Massively Parallel Processor (MPP) TMC CM-2 (Connection Machine) MasPar MP-1/2

Prabhaker Mateti, Networks of Workstations 6 Medium-Grained Machines Typical Configurations –Thousands of processors –Processors have power between coarse- and fine-grained Either shared or distributed memory Traditionally: Research Machines Single Code Multiple Data (SCMD)

Prabhaker Mateti, Networks of Workstations 7 Medium-Grained Machines Ex: Cray T3E Processors –DEC Alpha EV5 (600 MFLOPS peak) –Max of 2048 Peak Performance: 1.2 TFLOPS 3-D Torus Memory: 64 MB - 2 GB per CPU

Prabhaker Mateti, Networks of Workstations 8 Coarse-Grained Machines Typical Configurations –Hundreds of Processors Processors –Powerful (fast CPUs) –Large (cache, vectors, multiple fast buses) Memory: Shared or Distributed-Shared Multiple Instruction Multiple Data (MIMD)

Prabhaker Mateti, Networks of Workstations 9 Coarse-Grained Machines SGI Origin 2000: –PEs (MIPS R10000): Max of 128 –Peak Performance: 49 Gflops –Memory: 256 GBytes –Crossbar switches for interconnect HP/Convex Exemplar: –PEs (HP PA-RISC 8000): Max of 64 –Peak Performance: 46 Gflops –Memory: Max of 64 GBytes –Distributed crossbar switches for interconnect

Prabhaker Mateti, Networks of Workstations 10 Networks of Workstations Exploit inexpensive Workstations/PCs Commodity network The NOW becomes a “distributed memory multiprocessor” Workstations send+receive messages C and Fortran programs with PVM, MPI, etc. libraries Programs developed on NOWs are portable to supercomputers for production runs

Prabhaker Mateti, Networks of Workstations 11 “Parallel” Computing Concurrent Computing Distributed Computing Networked Computing Parallel Computing

Prabhaker Mateti, Networks of Workstations 12 Definition of “Parallel” S1 begins at time b1, ends at e1 S2 begins at time b2, ends at e2 S1 || S2 –Begins at min(b1, b2) –Ends at max(e1, e2) –Equiv to S2 || S1

Prabhaker Mateti, Networks of Workstations 13 Data Dependency x := a + b; y := c + d; x := a + b || y := c + d; y := c + d; x := a + b; X depends on a and b, y depends on c and d Assumed a, b, c, d were independent

Prabhaker Mateti, Networks of Workstations 14 Types of Parallelism Result Specialist Agenda

Prabhaker Mateti, Networks of Workstations 15 Perfect Parallelism Also called –Embarrassingly Parallel –Result parallel Computations that can be subdivided into sets of independent tasks that require little or no communication –Monte Carlo simulations –F(x, y, z)

Prabhaker Mateti, Networks of Workstations 16 MW Model Manager –Initiates computation –Tracks progress –Handles worker’s requests –Interfaces with user Workers –Spawned and terminated by manager –Make requests to manager –Send results to manager

Prabhaker Mateti, Networks of Workstations 17 Reduction Combine several sub-results into one Reduce r1 r2 … rn with op Becomes r1 op r2 op … op rn

Prabhaker Mateti, Networks of Workstations 18 Data Parallelism Also called –Domain Decomposition –Specialist Same operations performed on many data elements simultaneously –Matrix operations –Compiling several files

Prabhaker Mateti, Networks of Workstations 19 Control Parallelism Different operations performed simultaneously on different processors E.g., Simulating a chemical plant; one processor simulates the preprocessing of chemicals, one simulates reactions in first batch, another simulates refining the products, etc.

Prabhaker Mateti, Networks of Workstations 20 Process communication Shared Memory Message Passing

Prabhaker Mateti, Networks of Workstations 21 Shared Memory Process A writes to a memory location Process B reads from that memory location Synchronization is crucial Excellent speed

Prabhaker Mateti, Networks of Workstations 22 Shared Memory Needs hardware support: –multi-ported memory Atomic operations: –Test-and-Set –Semaphores

Prabhaker Mateti, Networks of Workstations 23 Shared Memory Semantics: Assumptions Global time is available. Discrete increments. Shared variable s, = vi at ti, i=0,… Process A: s := v1 at time t1 Assume no other assignment occurred after t1. Process B reads s at time t and gets value v.

Prabhaker Mateti, Networks of Workstations 24 Shared Memory: Semantics Value of Shared Variable –v = v1, if t > t1 –v = v0, if t < t1 –v = ??, if t = t1 t = t1 +- discrete quantum Next Update of Shared Variable –Occurs at t2 –t2 = t1 + ?

Prabhaker Mateti, Networks of Workstations 25 Condition Variables and Semaphores Semaphores –V(s) ::= –P(s) ::= 0 do s := s – 1> Condition variables –C.wait() –C.signal()

Prabhaker Mateti, Networks of Workstations 26 Distributed Shared Memory A common address space that all the computers in the cluster share. Difficult to describe semantics.

Prabhaker Mateti, Networks of Workstations 27 Distributed Shared Memory: Issues Distributed –Spatially –LAN –WAN No global time available

Prabhaker Mateti, Networks of Workstations 28 Messages Messages are sequences of bytes moving between processes The sender and receiver must agree on the type structure of values in the message “Marshalling” of data

Prabhaker Mateti, Networks of Workstations 29 Message Passing Process A sends a data buffer as a message to process B. Process B waits for a message from A, and when it arrives copies it into its own local memory. No memory shared between A and B.

Prabhaker Mateti, Networks of Workstations 30 Message Passing Obviously, –Messages cannot be received before they are sent. –A receiver waits until there is a message. Asynchronous –Sender never blocks, even if infinitely many messages are waiting to be received –Semi-asynchronous is a practical version of above with large but finite amount of buffering

Prabhaker Mateti, Networks of Workstations 31 Message Passing: Point to Point Q: send(m, P) –Send message M to process P P: recv(x, Q) –Receive message from process P, and place it in variable x The message data –Type of x must match that of m – As if x := m

Prabhaker Mateti, Networks of Workstations 32 Broadcast One sender, multiple receivers Not all receivers may receive at the same time

Prabhaker Mateti, Networks of Workstations 33 Types of Sends Synchronous Asynchronous

Prabhaker Mateti, Networks of Workstations 34 Synchronous Message Passing Sender blocks until receiver is ready to receive. Cannot send messages to self. No buffering.

Prabhaker Mateti, Networks of Workstations 35 Message Passing: Speed Speed not so good –Sender copies message into system buffers. –Message travels the network. –Receiver copies message from system buffers into local memory. –Special virtual memory techniques help.

Prabhaker Mateti, Networks of Workstations 36 Message Passing: Programming Less error-prone cf. shared memory

Prabhaker Mateti, Networks of Workstations 37 Message Passing: Synchronization Synchronous MP: –Sender waits until receiver is ready. –No intermediary buffering

Prabhaker Mateti, Networks of Workstations 38 Barrier Synchronization Processes wait until “all” arrive

Prabhaker Mateti, Networks of Workstations 39 Parallel Software Development Algorithmic conversion by compilers

Prabhaker Mateti, Networks of Workstations 40 Development of Distributed+Parallel Programs New code + algorithms Old programs rewritten –in new languages that have distributed and parallel primitives –With new libraries Parallelize legacy code

Prabhaker Mateti, Networks of Workstations 41 Conversion of Legacy Software Mechanical conversion by software tools Reverse engineer its design, and re-code

Prabhaker Mateti, Networks of Workstations 42 Automatically parallelizing compilers Compilers analyze programs and parallelize (usually loops). Easy to use, but with limited success

Prabhaker Mateti, Networks of Workstations 43 OpenMP on Networks of Workstations The OpenMP is an API for shared memory architectures. User-gives hints as directives to the compiler

Prabhaker Mateti, Networks of Workstations 44 Message Passing Libraries Programmer is responsible for data distribution, synchronizations, and sending and receiving information Parallel Virtual Machine (PVM) Message Passing Interface (MPI) BSP

Prabhaker Mateti, Networks of Workstations 45 BSP: Bulk Synchronous Parallel model Divides computation into supersteps In each superstep a processor can work on local data and send messages. At the end of the superstep, a barrier synchronization takes place and all processors receive the messages which were sent in the previous superstep

Prabhaker Mateti, Networks of Workstations 46 BSP Library Small number of subroutines to implement –process creation, –remote data access, and –bulk synchronization. Linked to C, Fortran, … programs

Prabhaker Mateti, Networks of Workstations 47 Parallel Languages Shared-memory languages Parallel object-oriented languages Parallel functional languages Concurrent logic languages

Prabhaker Mateti, Networks of Workstations 48 Tuple Space: Linda Atomic Primitives –In (t) –Read (t) –Out (t) –Eval (t) Host language: e.g., JavaSpaces

Prabhaker Mateti, Networks of Workstations 49 Data Parallel Languages Data is distributed over the processors as a arrays Entire arrays are manipulated: –A(1:100) = B(1:100) + C(1:100) Compiler generates parallel code –Fortran 90 –High Performance Fortran (HPF)

Prabhaker Mateti, Networks of Workstations 50 Parallel Functional Languages Erlang SISAL PCN Argonne

Prabhaker Mateti, Networks of Workstations 51 Clusters

Prabhaker Mateti, Networks of Workstations 52 Buildings-Full of Workstations 1.Distributed OS have not taken a foot hold. 2.Powerful personal computers are ubiquitous. 3.Mostly idle: more than 90% of the up- time? Mb/s LANs are common. 5.Windows and Linux are the top two OS in terms of installed base.

Prabhaker Mateti, Networks of Workstations 53 Cluster Configurations NOW -- Networks of Workstations COW -- Clusters of Dedicated Nodes Clusters of Come-and-Go Nodes Beowulf clusters

Prabhaker Mateti, Networks of Workstations 54 Beowulf Collection of compute nodes Full trust in each other –Login from one node into another without authentication –Shared file system subtree Dedicated

Prabhaker Mateti, Networks of Workstations 55 Close Cluster Configuration compute node compute node compute node compute node Front-end High Speed Network Service Network gateway node External Network compute node compute node compute node compute node Front-end High Speed Network gateway node External Network File Server node

Prabhaker Mateti, Networks of Workstations 56 Open Cluster Configuration compute node compute node compute node compute node compute node compute node compute node compute node Front-end External Network File Server node High Speed Network

Prabhaker Mateti, Networks of Workstations 57 Interconnection Network Most popular: Fast Ethernet Network topologies –Mesh –Torus Switch v Hub

Prabhaker Mateti, Networks of Workstations 58 Software Components Operating System –Linux, FreeBSD, … Parallel programming –PVM, MPI Utilities, … Open source

Prabhaker Mateti, Networks of Workstations 59 Software Structure of PC Cluster HIGH-SPEED NETWORK HARDWARE LAYER OS LAYER HARDWARE LAYER OS LAYER HARDWARE LAYER OS LAYER PARALLEL VIRTUAL MACHINE LAYER Parallel Program

Prabhaker Mateti, Networks of Workstations 60 Single System View Single system view –Common filesystem structure view from any node –Common accounts on all nodes –Single software installation point Benefits –Easy to install and maintain system –Easy to use for users

Prabhaker Mateti, Networks of Workstations 61 Installation Steps Install Operating system Setup a Single System View –Shared filesystem –Common accounts –Single software installation point Install parallel programming packages such as MPI, PVM, BSP Install utilities, libraries, and applications

Prabhaker Mateti, Networks of Workstations 62 Linux Installation Linux has many distributions: Redhat, Caldera, SuSe, Debian, … Caldera is easy to install All above upgrade with RPM package management Mandrake and SuSe come with a very complete set of software

Prabhaker Mateti, Networks of Workstations 63 Clusters with Part Time Nodes Cycle Stealing: Running of jobs on a workstation that don't belong to the owner. Definition of Idleness: E.g., No keyboard and no mouse activity Tools/Libraries –Condor –PVM –MPI

Prabhaker Mateti, Networks of Workstations 64 Migration of Jobs Policies –Immediate-Eviction –Pause-and-Migrate Technical Issues –Checkpointing: Preserving the state of the process so it can be resumed. –Migrating from one architecture to another

Prabhaker Mateti, Networks of Workstations 65 Summary Parallel –computers –computation Parallel Methods Communication primitives –Message Passing –Distributed Shared Memory Programming Tools Cluster configurations