CSC 364/664 Parallel Computation Fall 2003 Burg/Miller/Torgersen Chapter 1: Parallel Computers.

Slides:

Advertisements

Similar presentations

Multiple Processor Systems

Advertisements

Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.

SE-292 High Performance Computing

Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.

Super computers Parallel Processing By: Lecturer \ Aisha Dawood.

Today’s topics Single processors and the Memory Hierarchy

1 Parallel Scientific Computing: Algorithms and Tools Lecture #3 APMA 2821A, Spring 2008 Instructors: George Em Karniadakis Leopold Grinberg.

1 Version 3 Module 8 Ethernet Switching. 2 Version 3 Ethernet Switching Ethernet is a shared media –One node can transmit data at a time More nodes increases.

History of Distributed Systems Joseph Cordina

1 I.S Introduction to Telecommunication in Business Chapter 6 Network Hardware Components Dr. Jan Clark FALL, 2002.

NUMA Mult. CSE 471 Aut 011 Interconnection Networks for Multiprocessors Buses have limitations for scalability: –Physical (number of devices that can be.

Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500 Cluster.

Understanding Networks II. Objectives Compare client and network operating systems Learn about local area network technologies, including Ethernet, Token.

Parallel Computer Architectures

1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.

Introduction to Parallel Processing Ch. 12, Pg

Network Topologies.

PHY 201 (Blum) Buses Warning: some of the terminology is used inconsistently within the field.

Peripheral Buses COMP Jamie Curtis. PC Buses ISA is the first generation bus 8 bit on IBM XT 16 bit on 286 or above (16MB/s) Extended through.

Parallel Architectures

Computer Organization CSC 405 Bus Structure. System Bus Functions and Features A bus is a common pathway across which data can travel within a computer.

Chapter 8 Input/Output. Busses l Group of electrical conductors suitable for carrying computer signals from one location to another l Each conductor in.

Reference: / Parallel Programming Paradigm Yeni Herdiyeni Dept of Computer Science, IPB.

Interconnection Structures

Peripheral Busses COMP Jamie Curtis. PC Busses ISA is the first generation bus 8 bit on IBM XT 16 bit on 286 or above (16MB/s) Extended through.

Course Outline Introduction in software and applications. Parallel machines and architectures –Overview of parallel machines –Cluster computers (Myrinet)

Parallel Computing Basic Concepts Computational Models Synchronous vs. Asynchronous The Flynn Taxonomy Shared versus Distributed Memory Interconnection.

SIMS-201 Computer Networks. 2 Introduction to Computer Networks Chapter 19 The Local Area Network  Overview.

Multiple Processor Systems. Multiprocessor Systems Continuous need for faster and powerful computers –shared memory model ( access nsec) –message passing.

CS668- Lecture 2 - Sept. 30 Today’s topics Parallel Architectures (Chapter 2) Memory Hierarchy Busses and Switched Networks Interconnection Network Topologies.

Buses Warning: some of the terminology is used inconsistently within the field.

 Network Segments  NICs  Repeaters  Hubs  Bridges  Switches  Routers and Brouters  Gateways 2.

Department of Computer Science University of the West Indies.

CHAPTER 12 INTRODUCTION TO PARALLEL PROCESSING CS 147 Guy Wong page

Multiprocessing. Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005 Adapted from UC Berkeley "The Beauty and Joy of Computing"

An Overview of Parallel Computing. Hardware There are many varieties of parallel computing hardware and many different architectures The original classification.

Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500, clusters,

Cisco 3 - Switching Perrine. J Page 16/4/2016 Chapter 4 Switches The performance of shared-medium Ethernet is affected by several factors: data frame broadcast.

Chapter 11 Extending LANs 1. Distance limitations of LANs 2. Connecting multiple LANs together 3. Repeaters 4. Bridges 5. Filtering frame 6. Bridged network.

CCNA Guide to Cisco Networking Chapter 2: Network Devices.

1 Chapter 2 Central Processing Unit. 2 CPU The "brain" of the computer system is called the central processing unit. Everything that a computer does is.

Outline Why this subject? What is High Performance Computing?

Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.

Super computers Parallel Processing

Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Advanced Computer Architecture Lecture 22 Distributed computer Interconnection.

Multiprocessor  Use large number of processor design for workstation or PC market  Has an efficient medium for communication among the processor memory.

CDA-5155 Computer Architecture Principles Fall 2000 Multiprocessor Architectures.

CHAPTER -II NETWORKING COMPONENTS CPIS 371 Computer Network 1 (Updated on 3/11/2013)

Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.

SIMS-201 Computer Networks.

Overview Parallel Processing Pipelining

Parallel Architecture

Chapter 3 Computer Networking Hardware

Parallel Computers Definition: “A parallel computer is a collection of processing elements that cooperate and communicate to solve large problems fast.”

Course Outline Introduction in algorithms and applications

CS 147 – Parallel Processing

Outline Interconnection networks Processor arrays Multiprocessors

Multiprocessors - Flynn’s taxonomy (1966)

Parallel Processing Architectures

Networks Networking has become ubiquitous (cf. WWW)

Advanced Computer and Parallel Processing

Advanced Computer and Parallel Processing

SIMS-201 Computer Networks.

Presentation transcript:

CSC 364/664 Parallel Computation Fall 2003 Burg/Miller/Torgersen Chapter 1: Parallel Computers

2 Concurrency vs. True Parallelism Concurrency is used in systems where more than one user is using a resource at the same CPU Database information In true parallelism, multiple processors are working simultaneously on one application problem

3 Flynn’s Taxonomy – Classification by Control Mechanism A classification of parallel systems from a “flow of control” perspective SISD –single instruction, single data SIMD – single instruction, multiple data MISD – multiple instructions, single data MIMD – multiple instructions, multiple data

4 SISD Single instruction, single data Sequential programming with one processor, just like you’ve always done

5 SIMD Single instruction, multiple data One control unit issuing the same instruction to multiple CPU’s that operate simultaneously on their own portions of data Lock-step, synchronized Vector and matrix computation lend themselves to an SIMD implementation Examples of SIMD computers: Illiac IV, MPP, DAP, CM-2, and MasPar MP-2

6 MIMD Multiple instructions, multiple data Each processor “doing its own thing” Processors synchronize either through passing messages or writing values to shared memory addresses Subcategories SPMD – single program, multiple data (MPI on a Linux cluster) MPMD – multiple program, multiple data (PVM) Examples of MIMD computers – BBN butterfly, IPSC 1 and 2, IBM SP, SP2

7 MISD Multiple instruction, single data Doesn’t really exist, unless you consider pipelining an MISD configuration

8 Comparison of SIMD and MIMD It takes a specially-designed computer to do SIMD computing, since one control unit controls multiple processors. SIMD requires only one copy of a program. MIMD systems have a copy of the program and operating system at each processor. SIMD computers quickly become obsolete. MIMD systems can be pieced together from the most up-to-date components available.

9 Classification by Communication Mechanism Shared-address-space “Multiprocessors”, each with its own control unit Virtual memory makes all memory addresses look like they come from one consistent space, but they don’t necessarily Processors communicate with reads and writes Message passing systems “Multicomputers” Separate processors and separate memory addresses Processors communicate with message passing

10 Shared Memory Address Space Interprocess communication is done in the memory interface through reads and writes. Virtual memory address maps to a real address. Different processors may have memory locally attached to them. Access could needed to a processor’s own memory, or to the memory attached to a different processor. Different instances of memory access could take different amounts of time. Collisions are possible. UMA (i.e., shared memory) vs. NUMA (i.e., distributed shared memory)

11 Message Passing System Interprocess communication is done at the program level using sends and receives. Reads and writes refer only to a processor’s local memory. Data can be packed into long messages before being sent, to compensate for latency. Global scheduling of messages can help avoid message collisions.

12 Basic Architecture Terms – Clock Speed and Bandwidth Clock speed of a processor– max # of times per sec. that a device can say something new Bandwidth of a transmission medium (i.e., telephone line, cable line, etc.) is defined as the maximum rate at which the medium can change a signal. Bandwidth is measured in cycles per second or Hertz. Bandwidth is determined by the physical properties of the transmission medium, including the material of which it is composed.

13 Basic Architecture Terms – Clock Speed and Bandwidth Data rate is a measure of the amount of data that can be sent across a transmission medium per unit time. Data rate is determined by two things (1) the bandwidth, and (2) the potential number of different things that can be conveyed each time the signal changes (which, in the case of a bus, is based on the number of parallel data lines).

14 Basic Architecture Terms -- Bus A bus is a communication medium to which all processors are connected. Only one communication at a time is allowed on the bus. Only one step from any source to any destination. Bus data rate (sometimes loosely called “bandwidth”) is defined as clock speed times number of bits transmitted at each clock pulse Bus is low-cost, but you can’t have very many processors attached to it.

15 Bus on a Motherboard The bus transports data among the CPU, memory,and other components. It consists of electrical circuits called traces and adapters or expansion cards. There’s a main motherboard bus, and then buses for the CPU, memory, SCSI connections, and USB.

16 Types of Buses Original IBM PC bus – 8-bit parallel, 4.77 MHz clock speed IBM AT, 1982, introduced the ISA bus (Industry Standard Architecture), 16 bit parallel, with expansion slots, still compatible with 8-bit, 8 MHz clock speed IBM PS/2, MCA (Microchannel Architecture) bus, 32 bit parallel, but not backwardly compatible; 10 MHz clock speed; didn’t catch on

17 Types of Buses Compaq and other IBM rivals introduced EISA (Extended Industry Standard Architecture) bus in 1988, 32-bit parallel, 8.2 MHz clock speed; didn’t catch on VL-Bus (Vesa Local Bus), 32-bit parallel, close to clock speed of CPU, tied directly to CPU The trend moved to specialized buses with higher clock speeds, closer to the CPU’s clock speed, and separate from the system bus – e.g. PCI (Peripheral Component Bus)

18 PCI bus PCI bus can exist side-by-side with ISA bus and system bus; in this sense it’s a “local” bus Originally 33 MHz, 32-bits PCI-X is 133 MHz, 64 bit for 1 GB/sec data transfer rate Supports Plug and Play See

19 Ethernet Bus-Based Network All nodes branch off a common line. Each device has an ethernet address, also known as MAC address. All computers receive all data transmissions (in packets). They look to see if the packet is addressed to them, and read it only if it is. When a computer wants to transmit data, it waits until the line is free. CSMA/CD protocol is used (carrier-sense multiple access with collision detection).

20 Basic Architecture Terms -- Ethernet Ethernet is actually an OSI layer 2 communication protocol. It does not dictate the type of connectivity – could be copper, fiber, wireless. Today’s ethernet is full-duplex, i.e., it has separate lines for send and receive IEEE Standard Ethernet comes in 10, 100, and 1000 Mb/sec (1 Gb/sec) speeds. See

21 Basic Architecture Terms -- Hub Hubs connect computers in a network. They operate using a broadcast model. When n computers are connected to a hub, hubs simply pass through all network traffic to each of the n computers.

22 Basic Architecture Terms -- Switch Unlike hubs, switches can look at data packets as they are received, determine the source and destination device, and forward the packet appropriately. By delivering messages only to the device that the packet was intended for, switches conserve network bandwidth. See

23 Basic Architecture Terms -- Myrinet Packet communication and switching technology, faster than ethernet. Myrinet offers full-duplex 2+2 Gb/sec data rate and low latency. It is used in Linux clusters. Only 16 of the nodes of WFU’s clusters are connected with myrinet. The rest are connected with ethernet, for cost reasons.

24 Classification by Interconnection Network Static network Bus-based network can be static (if no switches are involved) Direct links between computers Examples include completely connected, line/ring, mesh, tree (regular and fat), and hypercube Dynamic network Uses switches Connections change according to whether a switch is open or closed Could be arranged in stages (multistage) (e.g., Omega network)

25 Hypercube A d-dimensional hypercube has 2 d nodes. Each node has a d-bit address. Neighboring nodes differ in one bit. Needs a routing algorithm. We’ll try one in class.

26 Multistage Networks See notes on Omega network from class.

27 Properties of Network Communication Diameter of a network – min # links between 2 farthest nodes Bisection width of a network -- # links that must be cut to divide network into 2 equal parts

28 Properties of Network Communication Message latency – time taken to prepare the message to be sent (software overhead) Network latency – time taken for a message to pass through a network Communication latency – total time taken to send a message, including message and network latency Deadlock – occurs when packets cannot be forwarded because they are waiting for each other in a circular way

29 Memory Hierarchy Global memory Local memory Cache Faster, but more expensive Cache coherence must be maintained

30 Communication Methods Circuit switching Packet switching Wormhole routing

31 Properties of a Parallel Program Granularity Speedup Overhead Efficiency Cost Scalability Gustafson’s law