Spring 2006 1 EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Advanced Computer Architecture Lecture 22 Distributed computer Interconnection.

Slides:



Advertisements
Similar presentations
Multiple Processor Systems
Advertisements

Distributed Systems CS
SE-292 High Performance Computing
Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
CA 714CA Midterm Review. C5 Cache Optimization Reduce miss penalty –Hardware and software Reduce miss rate –Hardware and software Reduce hit time –Hardware.
CIS December '99 Introduction to Parallel Architectures Dr. Laurence Boxer Niagara University.
1 Parallel Scientific Computing: Algorithms and Tools Lecture #3 APMA 2821A, Spring 2008 Instructors: George Em Karniadakis Leopold Grinberg.
Multiple Processor Systems
Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.
1 Lecture 5: Part 1 Performance Laws: Speedup and Scalability.
Parallel Architectures: Topologies Heiko Schröder, 2003.
Parallel Architectures: Topologies Heiko Schröder, 2003.
Multiple Processor Systems Chapter Multiprocessors 8.2 Multicomputers 8.3 Distributed systems.
1 Lecture 23: Interconnection Networks Topics: communication latency, centralized and decentralized switches (Appendix E)
Advanced Topics in Algorithms and Data Structures An overview of the lecture 2 Models of parallel computation Characteristics of SIMD models Design issue.
1 Multiprocessors. 2 Idea: create powerful computers by connecting many smaller ones good news: works for timesharing (better than supercomputer) bad.
Computational Astrophysics: Methodology 1.Identify astrophysical problem 2.Write down corresponding equations 3.Identify numerical algorithm 4.Find a computer.
NUMA Mult. CSE 471 Aut 011 Interconnection Networks for Multiprocessors Buses have limitations for scalability: –Physical (number of devices that can be.
Multiprocessors Andreas Klappenecker CPSC321 Computer Architecture.

1 Interfacing Processors and Peripherals I/O Design affected by many factors (expandability, resilience) Performance: — access latency — throughput — connection.
1 Lecture 24: Interconnection Networks Topics: communication latency, centralized and decentralized switches (Sections 8.1 – 8.5)
1 Lecture 23: Multiprocessors Today’s topics:  RAID  Multiprocessor taxonomy  Snooping-based cache coherence protocol.
1 CSE SUNY New Paltz Chapter Nine Multiprocessors.
Introduction to Parallel Processing Ch. 12, Pg
ECE669 L16: Interconnection Topology March 30, 2004 ECE 669 Parallel Computer Architecture Lecture 16 Interconnection Topology.
Lecture 37: Chapter 7: Multiprocessors Today’s topic –Introduction to multiprocessors –Parallelism in software –Memory organization –Cache coherence 1.
MULTICOMPUTER 1. MULTICOMPUTER, YANG DIPELAJARI Multiprocessors vs multicomputers Interconnection topologies Switching schemes Communication with messages.
Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
Interconnect Network Topologies
1 Lecture 7: Part 2: Message Passing Multicomputers (Distributed Memory Machines)
Interconnect Networks
Network Topologies Topology – how nodes are connected – where there is a wire between 2 nodes. Routing – the path a message takes to get from one node.
A brief overview about Distributed Systems Group A4 Chris Sun Bryan Maden Min Fang.
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster and powerful computers –shared memory model ( access nsec) –message passing.
CS668- Lecture 2 - Sept. 30 Today’s topics Parallel Architectures (Chapter 2) Memory Hierarchy Busses and Switched Networks Interconnection Network Topologies.
1 Interconnects Shared address space and message passing computers can be constructed by connecting processors and memory unit using a variety of interconnection.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
August 15, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 12: Multiprocessors: Non-Uniform Memory Access * Jeremy R. Johnson.
1 Next Few Classes Networking basics Protection & Security.
Parallel Computer Architecture and Interconnect 1b.1.
Course Wrap-Up Miodrag Bolic CEG4136. What was covered Interconnection network topologies and performance Shared-memory architectures Message passing.
Spring EE 437 Lillevik 437s06-l21 University of Portland School of Engineering Advanced Computer Architecture Lecture 21 MSP shared cached MSI protocol.
Multiple Processor Systems Chapter Multiprocessors 8.2 Multicomputers 8.3 Distributed systems.
Anshul Kumar, CSE IITD CSL718 : Multiprocessors Interconnection Mechanisms Performance Models 20 th April, 2006.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 January Session 4.
Parallel Programming on the SGI Origin2000 With thanks to Igor Zacharov / Benoit Marchand, SGI Taub Computer Center Technion Moshe Goldberg,
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster computers –shared memory model ( access nsec) –message passing multiprocessor.
Chapter 8-2 : Multicomputers Multiprocessors vs multicomputers Multiprocessors vs multicomputers Interconnection topologies Interconnection topologies.
Anshul Kumar, CSE IITD ECE729 : Advanced Computer Architecture Lecture 27, 28: Interconnection Mechanisms In Multiprocessors 29 th, 31 st March, 2010.
MODERN OPERATING SYSTEMS Third Edition ANDREW S. TANENBAUM Chapter 8 Multiple Processor Systems Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall,
Super computers Parallel Processing
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Principles of Parallel Programming First Edition by Calvin Lin Lawrence Snyder.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 May 2, 2006 Session 29.
August 13, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 11: Multiprocessors: Uniform Memory Access * Jeremy R. Johnson Monday,
Multiprocessor  Use large number of processor design for workstation or PC market  Has an efficient medium for communication among the processor memory.
Self-Tuned Distributed Multiprocessor System Xiaoyan Bi CSC Operating Systems Dr. Mirela Damian.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
COMP8330/7330/7336 Advanced Parallel and Distributed Computing Tree-Based Networks Cache Coherence Dr. Xiao Qin Auburn University
Interconnection Networks Communications Among Processors.
The University of Adelaide, School of Computer Science
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
Introduction to parallel programming
Lecture 23: Interconnection Networks
Interconnection Network Design Lecture 14
Networking What are the basic concepts of networking? Three classes
Advanced Computer Architecture Lecture 23
Chapter 2 from ``Introduction to Parallel Computing'',
Presentation transcript:

Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Advanced Computer Architecture Lecture 22 Distributed computer Interconnection networks

Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Symmetric multiprocessor (SMP) MemoryMemory... CPU Memory Controller CPU I/O Hub/Bridge Key Board Mouse Monitor BIOS EtherNet Power Supply Cooling Fan One address space, uniform access time

Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Coherency requirements 1.Memory operations occur in the order they were issued 2.All reads return the most current value

Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering MESI protocol Controller Observes/Action Bus/snoop generatedProcessor generated

Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering How can a SMP go faster? More cpus Faster cpus Bigger cache Optimize OS Better scheduling

Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Speedup vs. processors Number of processors Speedup Ideal Actual

Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering What limits SMP speedup? Competing for resources: memory Bus performance Sequential program

Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Distributed system Goal –Connect processors (nodes) together to create fast computer –Each node works on part of the problem Node –SMP processor, cache coherent –No global or shared memory Interconnect: network for node-to-node communication

Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Distributed computer No global or shared memory Interconnect P M... P M P M

Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Distributed application Identify parallelism: often the same algorithm for each node Map data set to node ID Determine communication requirements –Synchronization –Data exchange of intermediate results –No shared data so must use message passing paradigms

Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Parallel programming Four key steps: done by programmer and OS

Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering How do you communicate fast? Auctioneer Direct bus between nodes Guess what response prior to receiving

Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Message passing SendRecv Time Reply Blocked Latency Request Message Reply Message P1P1 P2P2

Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Message latency O s = overhead to inject message L = time to traverse network O r = overhead to extract message

Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Interconnection network Interconnect critical to communication Interconnect P M... P M P M

Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Interconnect goals? Low latency High bandwidth Direct route Low cost, low power Reliable

Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Interconnect properties Routing distance: number of links on a route Diameter: maximum routing distance Average distance: average number of links on a route

Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Interconnect performance Latency (ms): overhead, routing delay, capacity, contention Bandwidth (B/s) –Total: sum of all links –Bisection: sum of links that cut the network in half

Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Interconnect topologies Linear array and rings Meshes and tori Trees Hypercubes

Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Linear array and rings Linear array Ring Ring with short links

Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Find the following? 1.Diameter: n/2 2.Average distance: n/4 3.Bisection BW: 2 Ring, n nodes

Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Mesh and torus 2-D mesh 2-D torus Both fairly easy to implement

Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Example mesh Paragon: 1824 nodes, 16 x 114 mesh

Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Find the following? Square mesh, n nodes 1.Diameter: 2(n 1/2 -1) 2.Average distance: (n 1/2 -1) 3.Bisection BW: n 1/2

Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Multi-dimensional meshes 6 x 3 x 2 n = 36 3 x 3 x 3 x 3, n = 81 May require long wires between nodes

Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering

Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering How can a SMP go faster? Faster clock (smaller feature size, ) More processors, 64-bit Larger, faster shared memory Larger, faster local cache memory Wider, faster, shorter shared bus

Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering What limits SMP speedup? Physics –IC technology (speed) –Time of flight across bus Cost –Power –Memory –Disks Key limiter

Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering How do you communicate fast? Low latency between nodes High bandwidth between nodes Low OS overhead

Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Interconnect goals? Latency as small as possible As many concurrent transfers as possible –operation bandwidth –data bandwidth Cost as low as possible

Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Find the following? 1.Diameter: n/2 2.Average distance: n/4 3.Bisection BW: 2 Ring, n nodes

Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Find the following? 1.Diameter: 2 (n 1/2 -1) 2.Average distance: n 1/ Bisection BW: n 1/2 Square mesh, n nodes