Super computers Parallel Processing

Slides:

Advertisements

Similar presentations

Shantanu Dutt Univ. of Illinois at Chicago

Advertisements

SE-292 High Performance Computing

Super computers Parallel Processing By: Lecturer \ Aisha Dawood.

Network Topologies William M. Jones Assistant Professor

1 Lecture 23: Interconnection Networks Topics: communication latency, centralized and decentralized switches (Appendix E)

Advanced Topics in Algorithms and Data Structures An overview of the lecture 2 Models of parallel computation Characteristics of SIMD models Design issue.

Interconnection Networks 1 Interconnection Networks (Chapter 6) References: [1,Wilkenson and Allyn, Ch. 1] [2, Akl, Chapter 2] [3, Quinn, Chapter 2-3]

1 CSE 591-S04 (lect 14) Interconnection Networks (notes by Ken Ryu of Arizona State) l Measure –How quickly it can deliver how much of what’s needed to.

Parallel Computing Platforms

1 Tuesday, September 26, 2006 Wisdom consists of knowing when to avoid perfection. -Horowitz.

Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

1 Lecture 24: Interconnection Networks Topics: communication latency, centralized and decentralized switches (Sections 8.1 – 8.5)

Models of Parallel Computation Advanced Algorithms & Data Structures Lecture Theme 12 Prof. Dr. Th. Ottmann Summer Semester 2006.

Interconnection Network Topologies

Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500 Cluster.

1 CSE SUNY New Paltz Chapter Nine Multiprocessors.

1 Lecture 25: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Review session,

1 Static Interconnection Networks CEG 4131 Computer Architecture III Miodrag Bolic.

ECE669 L16: Interconnection Topology March 30, 2004 ECE 669 Parallel Computer Architecture Lecture 16 Interconnection Topology.

Interconnect Network Topologies

Interconnection Networks. Applications of Interconnection Nets Interconnection networks are used everywhere! ◦ Supercomputers – connecting the processors.

1 Lecture 23: Interconnection Networks Topics: Router microarchitecture, topologies Final exam next Tuesday: same rules as the first midterm Next semester:

Course Outline Introduction in software and applications. Parallel machines and architectures –Overview of parallel machines –Cluster computers (Myrinet)

Computer Science Department

Interconnect Networks

Network Topologies Topology – how nodes are connected – where there is a wire between 2 nodes. Routing – the path a message takes to get from one node.

A brief overview about Distributed Systems Group A4 Chris Sun Bryan Maden Min Fang.

CS668- Lecture 2 - Sept. 30 Today’s topics Parallel Architectures (Chapter 2) Memory Hierarchy Busses and Switched Networks Interconnection Network Topologies.

1 Interconnects Shared address space and message passing computers can be constructed by connecting processors and memory unit using a variety of interconnection.

PPC Spring Interconnection Networks1 CSCI-4320/6360: Parallel Programming & Computing (PPC) Interconnection Networks Prof. Chris Carothers Computer.

CSE Advanced Computer Architecture Week-11 April 1, 2004 engr.smu.edu/~rewini/8383.

August 15, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 12: Multiprocessors: Non-Uniform Memory Access * Jeremy R. Johnson.

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.

Parallel Computer Architecture and Interconnect 1b.1.

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.

CHAPTER 12 INTRODUCTION TO PARALLEL PROCESSING CS 147 Guy Wong page

Chapter 6 Multiprocessor System. Introduction  Each processor in a multiprocessor system can be executing a different instruction at any time.  The.

Lecture 3 Innerconnection Networks for Parallel Computers

Computer Science and Engineering Parallel and Distributed Processing CSE 8380 January Session 4.

An Overview of Parallel Computing. Hardware There are many varieties of parallel computing hardware and many different architectures The original classification.

Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500, clusters,

1 Lecture 13: LRC & Interconnection Networks Topics: LRC implementation, interconnection characteristics.

Anshul Kumar, CSE IITD ECE729 : Advanced Computer Architecture Lecture 27, 28: Interconnection Mechanisms In Multiprocessors 29 th, 31 st March, 2010.

Birds Eye View of Interconnection Networks

1 Interconnection Networks. 2 Interconnection Networks Interconnection Network (for SIMD/MIMD) can be used for internal connections among: Processors,

Parallel Programming Sathish S. Vadhiyar. 2 Motivations of Parallel Computing Parallel Machine: a computer system with more than one processor Motivations.

Basic Linear Algebra Subroutines (BLAS) – 3 levels of operations Memory hierarchy efficiently exploited by higher level BLAS BLASMemor y Refs. FlopsFlops/

1 Lecture 24: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix F)

Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Advanced Computer Architecture Lecture 22 Distributed computer Interconnection.

1 Lecture 14: Interconnection Networks Topics: dimension vs. arity, deadlock.

COMP8330/7330/7336 Advanced Parallel and Distributed Computing Tree-Based Networks Cache Coherence Dr. Xiao Qin Auburn University

Interconnection Networks Communications Among Processors.

COMP8330/7330/7336 Advanced Parallel and Distributed Computing Communication Costs in Parallel Machines Dr. Xiao Qin Auburn University

Parallel Architecture

Multiprocessor Systems

Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Interconnection Networks (Part 2) Dr.

Lecture 23: Interconnection Networks

Connection System Serve on mutual connection processors and memory .

Interconnection topologies

Course Outline Introduction in algorithms and applications

Refer example 2.4on page 64 ACA(Kai Hwang) And refer another ppt attached for static scheduling example.

Outline Interconnection networks Processor arrays Multiprocessors

Interconnection Network Design Lecture 14

Communication operations

Static Interconnection Networks

High Performance Computing & Bioinformatics Part 2 Dr. Imad Mahgoub

Birds Eye View of Interconnection Networks

Static Interconnection Networks

Multiprocessors and Multi-computers

Chapter 2 from ``Introduction to Parallel Computing'',

Presentation transcript:

Super computers Parallel Processing By: Lecturer \ Aisha Dawood

Network Topologies: Completely Connected Network Each processor is connected to every other processor. While the performance scales very well, the hardware complexity is not realizable for large values of p. In this sense, these networks are static counterparts of crossbars. This network is ideal in the sense that a node send a message to other node directly in a single step.

Network Topologies: Completely Connected and Star Connected Networks Example of an 8-node completely connected network. (a) A completely-connected network of eight nodes; (b) a star connected network of nine nodes.

Network Topologies: Star Connected Network Every node is connected only to a common node at the center. Distance between any pair of nodes is 1. However, the central node becomes a bottleneck. In this sense, star connected networks are static counterparts of buses.

Network Topologies: Linear Arrays, Meshes, and k-d Meshes In a linear array, each node has two neighbors, one to its left and one to its right. If the nodes at either end are connected, we refer to it as a 1-D torus or a ring. A generalization to 2 dimensions has nodes with 4 neighbors, to the north, south, east, and west called 2 dimensional mesh. A further generalization to d dimensions has nodes with 2d neighbors. A special case of a d-dimensional mesh is a hypercube.

Network Topologies: Linear Arrays Linear arrays: (a) with no wraparound links; (b) with wraparound link.

Network Topologies: Two- and Three Dimensional Meshes Two and three dimensional meshes: (a) 2-D mesh with no wraparound; (b) 2-D mesh with wraparound link (2-D torus); and (c) a 3-D mesh with no wraparound.

Network Topologies: Hypercubes and their Construction Construction of hypercubes from hypercubes of lower dimension.

Network Topologies: Properties of Hypercubes The distance between any two nodes is at most log p. Each node has log p neighbors. The distance between two nodes is given by the number of bit positions at which the two nodes differ.

Network Topologies: Tree-Based Networks Complete binary tree networks: (a) a static tree network; and (b) a dynamic tree network.

Network Topologies: Tree Properties The distance between any two nodes is no more than 2logp. Links higher up the tree potentially carry more traffic than those at the lower levels. To route a message in a tree the source node sends the message up the tree until reaches the root of the smallest sub tree containing both the source and destination. For this reason, a variant called a fat-tree, fattens the links as we go up the tree.

Network Topologies: Fat Trees A fat tree network of 16 processing nodes.

Evaluating Static Interconnection Networks Diameter: The distance between the farthest two nodes in the network. The diameter of a linear array is p − 1, that of a mesh is 2( − 1), that of a tree and hypercube is log p, and that of a completely connected network is1. Bisection Width: The minimum number of wires you must cut to divide the network into two equal parts. The bisection width of a linear array and tree is 1, that of a mesh is , that of a hypercube is p/2. Bisection bandwidth: the minimum volume of communication allowed between any two halves of the network, sometimes it is referred to as cross- section bandwidth.

Evaluating Static Interconnection Networks Connectivity: is a measure of multiplicity of paths between any two processing nodes, high connectivity is desirable, one measure of connectivity is the minimum number of arcs that must be removed to break the network into two equal parts (Arc connectivity). Cost: The number of links or switches is a meaningful measure of the cost. , bisection bandwidth is also used to measure the cost. However, a number of other factors, such as the ability to layout the network, the length of wires, etc., also factor in to the cost.

Evaluating Static Interconnection Networks Diameter BisectionWidth Arc Connectivity Cost (No. of links) Completely-connected Star Complete binary tree Linear array 2-D mesh, no wraparound 2-D wraparound mesh Hypercube Wraparound k-ary d-cube

Evaluating Dynamic Interconnection Networks Diameter: The maximum distance between any processing or switching pair of nodes. The connectivity: is the minimum number of nodes that must be removed to fragment the network into two parts. Bisection bandwidth: the minimum number of edges that must be removed to fragment the network into two equal parts with identical number of processing nodes.

Evaluating Dynamic Interconnection Networks Diameter Bisection Width Arc Connectivity Cost (No. of links) Crossbar Omega Network Dynamic Tree

Cache Coherence in Multiprocessor Systems Interconnects provide basic mechanisms for data transfer. In the case of shared address space machines, additional hardware is required to coordinate access to data that might have multiple copies in the network. The underlying technique must provide some guarantees on the semantics. Keeping caches in multiprocessors systems coherent is more difficult than in uniprocessor systems, because in addition to multiple data copies there are multiple processors.

Cache Coherence in Multiprocessor Systems When the value of a variable is changes, all its copies must either be invalidated or updated. Cache coherence in multiprocessor systems: (a) Invalidate protocol; (b) Update protocol for shared variables.

Cache Coherence in Multiprocessor Systems Update protocol: whenever a data item is written all copies must be updated. For this reason if a processor reads a data one and never uses it again, subsequent updates to this item at other processors cause network overhead. Invalid Protocol: invalidate the data item on the first update at a remote processor, no subsequent updates needed. Both protocols suffer from false sharing overheads (two words that are not shared, however, they lie on the same cache line). Most current machines use invalidate protocols

Cache Coherence: Update and Invalidate Protocols An important factor that affects these protocols is false sharing: is the situation in which different processors update different parts of the same cache line. In an invalidate protocol, when a processor updates its part the other copies are invalidated, when a processor needs this part, the line must be fetched from the remote processor. In an update protocol: the situation is better since all reads are locally and writes must be updated. The trade of between invalid and update schemes is the communication overhead (updates) and idling (stalling in invalid).

Maintaining Coherence Using Invalidate Protocols Each copy of a data item is associated with a state. One example of such a set of states is, shared, invalid, or dirty. In shared state, there are multiple valid copies of the data. When a processor changes the data item it marks all other copies as invalid and this copy as dirty. All subsequent accesses to this item will be served by this processor.

Maintaining Coherence Using Invalidate Protocols State diagram of a simple three-state coherence protocol.

Maintaining Coherence Using Invalidate Protocols Example of parallel program execution with the simple three-state coherence protocol.

Reading tasks (pages 49,…,53)