CS668- Lecture 2 - Sept. 30 Today’s topics Parallel Architectures (Chapter 2) Memory Hierarchy Busses and Switched Networks Interconnection Network Topologies.

Slides:



Advertisements
Similar presentations
SE-292 High Performance Computing
Advertisements

Lecture 38: Chapter 7: Multiprocessors Today’s topic –Vector processors –GPUs –An example 1.
Super computers Parallel Processing By: Lecturer \ Aisha Dawood.
Today’s topics Single processors and the Memory Hierarchy
1 Parallel Scientific Computing: Algorithms and Tools Lecture #3 APMA 2821A, Spring 2008 Instructors: George Em Karniadakis Leopold Grinberg.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Jie Liu, Ph.D. Professor Department of Computer Science
Taxanomy of parallel machines. Taxonomy of parallel machines Memory – Shared mem. – Distributed mem. Control – SIMD – MIMD.
CSCI 8150 Advanced Computer Architecture Hwang, Chapter 1 Parallel Computer Models 1.2 Multiprocessors and Multicomputers.
Multiprocessors CSE 4711 Multiprocessors - Flynn’s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) –Conventional uniprocessor –Although.
Parallel Architectures: Topologies Heiko Schröder, 2003.
Parallel Architectures: Topologies Heiko Schröder, 2003.
Advanced Topics in Algorithms and Data Structures An overview of the lecture 2 Models of parallel computation Characteristics of SIMD models Design issue.
Tuesday, September 12, 2006 Nothing is impossible for people who don't have to do it themselves. - Weiler.

Parallel Computing Platforms
Multiprocessors CSE 471 Aut 011 Multiprocessors - Flynn’s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) –Conventional uniprocessor.
Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.
1 Lecture 23: Multiprocessors Today’s topics:  RAID  Multiprocessor taxonomy  Snooping-based cache coherence protocol.
Models of Parallel Computation Advanced Algorithms & Data Structures Lecture Theme 12 Prof. Dr. Th. Ottmann Summer Semester 2006.
Arquitectura de Sistemas Paralelos e Distribuídos Paulo Marques Dep. Eng. Informática – Universidade de Coimbra Ago/ Machine.
Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500 Cluster.
1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.
Introduction to Parallel Processing Ch. 12, Pg
1 Parallel computing and its recent topics. 2 Outline 1. Introduction of parallel processing (1)What is parallel processing (2)Classification of parallel.
Course Outline Introduction in software and applications. Parallel machines and architectures –Overview of parallel machines –Cluster computers (Myrinet)
1 Chapter 1 Parallel Machines and Computations (Fundamentals of Parallel Processing) Dr. Ranette Halverson.
CSE Advanced Computer Architecture Week-11 April 1, 2004 engr.smu.edu/~rewini/8383.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
A.Broumandnia, 1 4 Models of Parallel Processing Topics in This Chapter 4.1 Development of Early Models 4.2 SIMD versus MIMD Architectures.
Parallel Computer Architecture and Interconnect 1b.1.
Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
CHAPTER 12 INTRODUCTION TO PARALLEL PROCESSING CS 147 Guy Wong page
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Multiprocessing. Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005 Adapted from UC Berkeley "The Beauty and Joy of Computing"
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 January Session 4.
An Overview of Parallel Computing. Hardware There are many varieties of parallel computing hardware and many different architectures The original classification.
Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500, clusters,
Spring 2003CSE P5481 Issues in Multiprocessors Which programming model for interprocessor communication shared memory regular loads & stores message passing.
Lecture 3 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
Orange Coast College Business Division Computer Science Department CS 116- Computer Architecture Multiprocessors.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Shared versus Switched Media.
Basic Linear Algebra Subroutines (BLAS) – 3 levels of operations Memory hierarchy efficiently exploited by higher level BLAS BLASMemor y Refs. FlopsFlops/
Outline Why this subject? What is High Performance Computing?
Super computers Parallel Processing
Lecture 3: Computer Architectures
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 February Session 9.
Parallel Processing Presented by: Wanki Ho CS147, Section 1.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 May 2, 2006 Session 29.
Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Advanced Computer Architecture Lecture 22 Distributed computer Interconnection.
1 Lecture 17: Multiprocessors Topics: multiprocessor intro and taxonomy, symmetric shared-memory multiprocessors (Sections )
Parallel Computing Presented by Justin Reschke
CDA-5155 Computer Architecture Principles Fall 2000 Multiprocessor Architectures.
Parallel Processing & Distributed Systems Thoai Nam Chapter 3.
COMP8330/7330/7336 Advanced Parallel and Distributed Computing Tree-Based Networks Cache Coherence Dr. Xiao Qin Auburn University
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
Parallel Architecture
Distributed and Parallel Processing
Multiprocessor Systems
buses, crossing switch, multistage network.
Interconnection topologies
Course Outline Introduction in algorithms and applications
CS 147 – Parallel Processing
Multi-Processing in High Performance Computer Architecture:
Parallel Architectures Based on Parallel Computing, M. J. Quinn
Outline Interconnection networks Processor arrays Multiprocessors
buses, crossing switch, multistage network.
High Performance Computing & Bioinformatics Part 2 Dr. Imad Mahgoub
Chapter 4 Multiprocessors
Presentation transcript:

CS668- Lecture 2 - Sept. 30 Today’s topics Parallel Architectures (Chapter 2) Memory Hierarchy Busses and Switched Networks Interconnection Network Topologies Multiprocessors / Multicomputers Flynn’s Taxonomy Analysis of Interconnection Networks

Theoretic Computer Architectures Turing Machine Von Neumann Architecture Fetch/Execute Cycle Memory Models RAM model PRAM model extension Shared Memory vs. Distributed Shared Memory vs. Distributed Memory

Processors and the Memory Hierarchy Registers (1 clock cycle, 100s of bytes) 1 st level cache (3-5 clock cycles, 100s KBytes) 2 nd level cache (~10 clock cycles, MBytes) Main memory (~100 clock cycles, GBytes) Disk (milliseconds, 100GB to gianormous) registers 1st level Instructions 1st level Data 2 nd Level unified (Instructions & Data) CPU

IBM Dual Core From Intel® 64 and IA-32 Architectures Optimization Reference Manual

Shared Memory Multiprocessor One or more memories Global address space (all system memory visible to all processors) Transfer of data between processors is usually implicit, just read (write) to (from) a given address (OpenMP) Complex Cache-coherency protocols to maintain consistency between processors. Interconnection Network Memory CPU Memory CPU Memory CPU (UMA) Uniform-memory-access Shared-memory System

Distributed Shared Memory Single address space with implicit communication Hardware support for read/write to non-local memories, cache coherency Latency for a memory operation is greater when accessing non local data than when accessing date within a CPU’s own memory (NUMA)Non-Uniform-memory-access Shared-memory System Interconnection Network Memory CPU Memory CPU Memory CPU

Distributed Memory / Message Passing Each processor has access to its own memory only Data transfer between processors is explicit, user calls message passing functions Common Libraries for message passing –MPI, PVM User has complete control/responsibility for data placement and management Interconnection Network Memory CPU Memory CPU Memory CPU

Hybrid Systems Distributed memory system with multiprocessor shared memory nodes. Most common architecture for current generation of parallel machines Interconnection Network CPU Memory CPU Network Interface CPU Memory CPU Network Interface CPU Memory CPU Network Interface

Flynn’s Taxonomy (figure 2.20 from Quinn) SISD Uniprocessor SIMD Processor arrays Pipelined vector processors MISD Systolic array MIMD Multiprocessors Multicomputers SingleMultiple Single Multiple Data stream Instruction stream

Analysis of Switch Network Topologies View switched network as a graph – n - Vertices = processors or switches – m - Edges = communication paths Two kinds of topologies –Direct - ratio of switches to processors 1:1 –Indirect - ratio is d:1

Evaluating Switch Topologies Diameter Bisection width Number of edges / node (d = degree) Constant edge length? (yes/no) –Layout area/wire length

2-D Mesh Network Direct topology Switches arranged into a 2-D lattice Communication allowed only between neighboring switches Variants allow wraparound connections between switches on edge of mesh

2-D Meshes

Evaluating 2-D Meshes Diameter:  (n 1/2 ) Bisection width:  (n 1/2 ) Number of edges per switch: 4 Constant edge length? Yes

Binary Tree Network Indirect topology n = 2 d processor nodes, n-1 switches

Evaluating Binary Tree Network Diameter: 2 log n Bisection width: 1 Edges / node: 3 Constant edge length? Yes/No?

Hypertree Network Indirect topology Shares low diameter of binary tree Greatly improves bisection width From “front” looks like k-ary tree of height d From “side” looks like upside down binary tree of height d

Hypertree Network

Evaluating 4-ary Hypertree Diameter: log n Bisection width: n / 2 Edges / node: 6 Constant edge length? No

Butterfly Network Indirect topology n = 2 d processor nodes connected by n(log n + 1) switching nodes

Butterfly Network Routing

Evaluating Butterfly Network Diameter: log n Bisection width: n / 2 Edges per node: 4 Constant edge length? No

Hypercube Directory topology 2 x 2 x … x 2 mesh Number of nodes a power of 2 Node addresses 0, 1, …, 2 k -1 Node i connected to k nodes whose addresses differ from i in exactly one bit position

Hypercube Addressing

Evaluating Hypercube Network Diameter: log n Bisection width: n / 2 Edges per node: log n Constant edge length? No

Shuffle-exchange Direct topology Number of nodes a power of 2 Nodes have addresses 0, 1, …, 2 k -1 Two outgoing links from node i –Shuffle link to node LeftCycle(i) –Exchange link to node [xor (i, 1)]

Shuffle-exchange Illustrated

Shuffle-exchange Addressing

Evaluating Shuffle-exchange Diameter: 2log n - 1 Bisection width:  n / log n Edges per node: 2 Constant edge length? No

Comparing Networks All have logarithmic diameter except 2-D mesh Hypertree, butterfly, and hypercube have bisection width n / 2 All have constant edges per node except hypercube Only 2-D mesh keeps edge lengths constant as network size increases