ECE669 L11: Static Routing Architectures March 4, 2004 ECE 669 Parallel Computer Architecture Lecture 11 Static Routing Architectures.

Slides:



Advertisements
Similar presentations
Multiple Processor Systems
Advertisements

Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
Super computers Parallel Processing By: Lecturer \ Aisha Dawood.
1/1/ /e/e eindhoven university of technology Microprocessor Design Course 5Z008 Dr.ir. A.C. (Ad) Verschueren Eindhoven University of Technology Section.
By Philippe Kruchten Rational Software
Lecture 9: Coarse Grained FPGA Architecture October 6, 2004 ECE 697F Reconfigurable Computing Lecture 9 Coarse Grained FPGA Architecture.
Multiple Processor Systems
Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.
CSCI 8150 Advanced Computer Architecture Hwang, Chapter 1 Parallel Computer Models 1.2 Multiprocessors and Multicomputers.
Parallel System Performance CS 524 – High-Performance Computing.
Slide 1 Parallel Computation Models Lecture 3 Lecture 4.
Multiple Processor Systems Chapter Multiprocessors 8.2 Multicomputers 8.3 Distributed systems.
ECE669 L15: Mid-term Review March 25, 2004 ECE 669 Parallel Computer Architecture Lecture 15 Mid-term Review.
Advanced Topics in Algorithms and Data Structures An overview of the lecture 2 Models of parallel computation Characteristics of SIMD models Design issue.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Dec 5, 2005 Topic: Intro to Multiprocessors and Thread-Level Parallelism.
Architectural Support for Operating Systems. Announcements Most office hours are finalized Assignments up every Wednesday, due next week CS 415 section.
Distributed Hardware How are computers interconnected ? –via a bus-based –via a switch How are processors and memories interconnected ? –Private –shared.
ECE669 L5: Grid Computations February 12, 2004 ECE 669 Parallel Computer Architecture Lecture 5 Grid Computations.
Communication operations Efficient Parallel Algorithms COMP308.
1 Interfacing Processors and Peripherals I/O Design affected by many factors (expandability, resilience) Performance: — access latency — throughput — connection.
Active Messages: a Mechanism for Integrated Communication and Computation von Eicken et. al. Brian Kazian CS258 Spring 2008.
Strategies for Implementing Dynamic Load Sharing.
Parallel System Performance CS 524 – High-Performance Computing.
ECE669 L23: Parallel Compilation April 29, 2004 ECE 669 Parallel Computer Architecture Lecture 23 Parallel Compilation.
ECE669 L19: Processor Design April 8, 2004 ECE 669 Parallel Computer Architecture Lecture 19 Processor Design.
MULTICOMPUTER 1. MULTICOMPUTER, YANG DIPELAJARI Multiprocessors vs multicomputers Interconnection topologies Switching schemes Communication with messages.
1 Instant replay  The semester was split into roughly four parts. —The 1st quarter covered instruction set architectures—the connection between software.
Lecture 29 Fall 2006 Lecture 29: Parallel Programming Overview.
Computer System Architectures Computer System Software
 What is an operating system? What is an operating system?  Where does the OS fit in? Where does the OS fit in?  Services provided by an OS Services.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster and powerful computers –shared memory model ( access nsec) –message passing.
CS668- Lecture 2 - Sept. 30 Today’s topics Parallel Architectures (Chapter 2) Memory Hierarchy Busses and Switched Networks Interconnection Network Topologies.
Parallel Algorithms Sorting and more. Keep hardware in mind When considering ‘parallel’ algorithms, – We have to have an understanding of the hardware.
Modeling Process CSCE 668Set 14: Simulations 2 May be several algorithms (processes) runs on each processor to simulate the desired communication system.
CHAPTER 3 TOP LEVEL VIEW OF COMPUTER FUNCTION AND INTERCONNECTION
CS 390- Unix Programming Environment CS 390 Unix Programming Environment Topics to be covered: Distributed Computing Fundamentals.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
CHAPTER 12 INTRODUCTION TO PARALLEL PROCESSING CS 147 Guy Wong page
-1.1- Chapter 2 Abstract Machine Models Lectured by: Nguyễn Đức Thái Prepared by: Thoại Nam.
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster computers –shared memory model ( access nsec) –message passing multiprocessor.
Chapter 8-2 : Multicomputers Multiprocessors vs multicomputers Multiprocessors vs multicomputers Interconnection topologies Interconnection topologies.
Lecture 13: Logic Emulation October 25, 2004 ECE 697F Reconfigurable Computing Lecture 13 Logic Emulation.
Spring 2003CSE P5481 Issues in Multiprocessors Which programming model for interprocessor communication shared memory regular loads & stores message passing.
Summary Background –Why do we need parallel processing? Moore’s law. Applications. Introduction in algorithms and applications –Methodology to develop.
Supporting Systolic and Memory Communication in iWarp CS258 Paper Summary Computer Science Jaein Jeong.
SOFTWARE DESIGN. INTRODUCTION There are 3 distinct types of activities in design 1.External design 2.Architectural design 3.Detailed design Architectural.
Workshop BigSim Large Parallel Machine Simulation Presented by Eric Bohm PPL Charm Workshop 2004.
Lecture 12: Reconfigurable Systems II October 20, 2004 ECE 697F Reconfigurable Computing Lecture 12 Reconfigurable Systems II: Exploring Programmable Systems.
CS- 492 : Distributed system & Parallel Processing Lecture 7: Sun: 15/5/1435 Foundations of designing parallel algorithms and shared memory models Lecturer/
Architecture View Models A model is a complete, simplified description of a system from a particular perspective or viewpoint. There is no single view.
Data Structures and Algorithms in Parallel Computing Lecture 1.
EECB 473 Data Network Architecture and Electronics Lecture 1 Conventional Computer Hardware Architecture
Data Structures and Algorithms in Parallel Computing Lecture 7.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
INEL6067 Parallel Architectures Why build parallel machines? °To help build even bigger parallel machines °To help solve important problems °Speed – more.
Parallel Computing Presented by Justin Reschke
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
1 of 14 Lab 2: Formal verification with UPPAAL. 2 of 14 2 The gossiping persons There are n persons. All have one secret to tell, which is not known to.
CDA-5155 Computer Architecture Principles Fall 2000 Multiprocessor Architectures.
Background Computer System Architectures Computer System Software.
Univ. of TehranIntroduction to Computer Network1 An Introduction to Computer Networks University of Tehran Dept. of EE and Computer Engineering By: Dr.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
Overview Parallel Processing Pipelining
Interconnection topologies
Parallel and Multiprocessor Architectures
Multiprocessor Introduction and Characteristics of Multiprocessor
Multiple Processor Systems
Author: Xianghui Hu, Xinan Tang, Bei Hua Lecturer: Bo Xu
Presentation transcript:

ECE669 L11: Static Routing Architectures March 4, 2004 ECE 669 Parallel Computer Architecture Lecture 11 Static Routing Architectures

ECE669 L11: Static Routing Architectures March 4, 2004 Outline °Programming Models Data Parallel Shared Memory Message Passing °Communication requirements Examining the network Available bandwidth Run-time versus compile-time °Models of communication

ECE669 L11: Static Routing Architectures March 4, 2004 Communication Approaches Circuit switched Store and Forward -On-line (dynamic routing) -Off-line (static routing) Special purpose architectures created for static routing Schedule all communication at compile time Can lead to faster overall communication (no headers) Can reduce congestion Doesn’t handle data dependency well

ECE669 L11: Static Routing Architectures March 4, 2004 Interconnection Topology Diamond lattice has desirable structure Each node has four neighbors Space filling – nodes can be packed close together Can embed other topologies

ECE669 L11: Static Routing Architectures March 4, 2004 Interconnection Topology Need to implement in three dimensions Bottom and top of circuit boards have connectors A node can configure its neighbors

ECE669 L11: Static Routing Architectures March 4, 2004 Communication Finite State Machine Each node has a processing part and a communications part Interface to local processor is a FIFO Communication to near- neighbors is pipelined

ECE669 L11: Static Routing Architectures March 4, 2004 Statically Programmed Communication Data transferred one node in one cycle Inter-processor path may require multiple cycles Heavy arrows represent local transfers Grey arrows represent non-local transfers

ECE669 L11: Static Routing Architectures March 4, 2004 Prototype NuMesh Node - CFSM Transceivers used to buffer inter-node data FIFOs buffer paths to/from local processor One node per board

ECE669 L11: Static Routing Architectures March 4, 2004 Prototype NuMesh System Initial topology was a mesh Some nodes in the mesh could be unpopulated Special-purpose nodes could be populated along the system periphery

ECE669 L11: Static Routing Architectures March 4, 2004 NuMesh Parallelization System appears like a two dimensional pipeline FIFOs allow processor to run at different speeds Rational clocking allows clocks to be distributed

ECE669 L11: Static Routing Architectures March 4, 2004 NuMesh Multigrid Results Multigrid is hierarchical Processor utilization indicates periodic reduced activity All communication is scheduled statically

ECE669 L11: Static Routing Architectures March 4, 2004 NuMesh Summary °Communication determined at compile time °Fast near-neighbor communication °Diamond lattice provides routing benefits °Appropriate for applications like multi-grid

ECE669 L11: Static Routing Architectures March 4, 2004 Key Issues Communication -Broadcast, near neighbor, tree Synchronization -Producer-consumer, barrier, locks Partitioning -Grain-size - Division of work - What to run as thread –Mapping - Where to run Scheduling -When to run °Various computing styles differ in how the above are supported: Whether hardware support is provided Whether programmer deals with it Whether it is ignored °Key: Previous machines focused heavily on hardware -once software enters the picture, distinctions become hard to make

ECE669 L11: Static Routing Architectures March 4, 2004 Historically °Build the machine - (paper wt.?) °Low level programming --- some use °Better abstractions --- much better -All programming -Low-level performance hacks -Body of theory –(Low-level machine style pervades every higher level, even theory!) °Low-level machine organization clearly visible ‘exploited’ at higher levels! °Sometimes machines evolve application > machine (or language)

ECE669 L11: Static Routing Architectures March 4, 2004 Another more common evolutionary approach... °Language Machine °Fortran, C,... Shared memory -View: a, b “reside” somewhere -Perform operations and store values back -Notion of ‘location’ -Specify ops that can go on in parallel °Algorithmic model PRAM Variants –Multiple simultaneous R,W –Exclusive writes only –Exclusive R & W PRAMS -CRCW -CREW -EREW Processes

ECE669 L11: Static Routing Architectures March 4, 2004 Object-oriented Programming Smalltalk, variants of Scheme, C++ °Message-Passing Machines °Eg: 1. °Eg: 2. °Jacobi Relaxation Bank account A Balance Message Deposit Withdraw Balance ? Object A Object B Object C Send my peripheral values

ECE669 L11: Static Routing Architectures March 4, 2004 Communication Synchronization via memory Partitioning - User - Coarse-fine Scheduling - System - Dynamic Shared-memory style Shared memory Processes - Parallel control flow

ECE669 L11: Static Routing Architectures March 4, 2004 Communication Synchronization via messages Partioning: User -- coarse Scheduling: System -- dynamic Message-passing style local memory local memory local memory local memory Parallel control flows msg

ECE669 L11: Static Routing Architectures March 4, 2004 Data Parallel Communication Partitioning:Fine-grain - System Scheduling:User - Static Memory... Only one control thread -- multiple data Synchronization - every instruction - like barrier Control instr.

ECE669 L11: Static Routing Architectures March 4, 2004 °Communication: Data values °Synchronization: Completely static (none) Pre-compiled Systolic M M M M M M DDD... DD...

ECE669 L11: Static Routing Architectures March 4, 2004 Vector °Similar to data parallel Only 1 processor (chaining?) But exploits data parallelism MEMORY.DDD.DDDD OP.DDD.DDDD.DDD.DDDD