12/13/99 Page 1 IRAM Network Interface Ioannis Mavroidis IRAM retreat January 12-14, 2000.

Slides:



Advertisements
Similar presentations
Multiple Processor Systems
Advertisements

Electrical and Computer Engineering UAH System Level Optical Interconnect Optical Fiber Computer Interconnect: The Simultaneous Multiprocessor Exchange.
Digital Computer Fundamentals
Prof. Natalie Enright Jerger
System Area Network Abhiram Shandilya 12/06/01. Overview Introduction to System Area Networks SAN Design and Examples SAN Applications.
Evaluation of Message Passing Synchronization Algorithms in Embedded Systems 1 Evaluation of Message Passing Synchronization Algorithms in Embedded Systems.
AMLAPI: Active Messages over Low-level Application Programming Interface Simon Yau, Tyson Condie,
Multiple Processor Systems
Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.
Main Mem.. CSE 471 Autumn 011 Main Memory The last level in the cache – main memory hierarchy is the main memory made of DRAM chips DRAM parameters (memory.
I/O Devices and Drivers
A CHAT CLIENT-SERVER MODULE IN JAVA BY MAHTAB M HUSSAIN MAYANK MOHAN ISE 582 FALL 2003 PROJECT.
Precept 3 COS 461. Concurrency is Useful Multi Processor/Core Multiple Inputs Don’t wait on slow devices.
Architectural Support for Operating Systems. Announcements Most office hours are finalized Assignments up every Wednesday, due next week CS 415 section.
t Popularity of the Internet t Provides universal interconnection between individual groups that use different hardware suited for their needs t Based.
I/O Hardware n Incredible variety of I/O devices n Common concepts: – Port – connection point to the computer – Bus (daisy chain or shared direct access)
Haoyuan Li CS 6410 Fall /15/2009.  U-Net: A User-Level Network Interface for Parallel and Distributed Computing ◦ Thorsten von Eicken, Anindya.
VIRAM-1 Architecture Update and Status Christoforos E. Kozyrakis IRAM Retreat January 2000.
1 Lecture 13: Interconnection Networks Topics: flow control, router pipelines, case studies.
1 Lecture 25: Interconnection Networks Topics: flow control, router microarchitecture Final exam:  Dec 4 th 9am – 10:40am  ~15-20% on pre-midterm  post-midterm:
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
General Purpose Node-to-Network Interface in Scalable Multiprocessors CS 258, Spring 99 David E. Culler Computer Science Division U.C. Berkeley.
1 Today I/O Systems Storage. 2 I/O Devices Many different kinds of I/O devices Software that controls them: device drivers.
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
CS-334: Computer Architecture
Introduction to Interconnection Networks. Introduction to Interconnection network Digital systems(DS) are pervasive in modern society. Digital computers.
Synchronization and Communication in the T3E Multiprocessor.
1 Computer System Overview Chapter 1. 2 n An Operating System makes the computing power available to users by controlling the hardware n Let us review.
Networks-on-Chips (NoCs) Basics
High Performance User-Level Sockets over Gigabit Ethernet Pavan Balaji Ohio State University Piyush Shivam Ohio State University.
Hardware Definitions –Port: Point of connection –Bus: Interface Daisy Chain (A=>B=>…=>X) Shared Direct Device Access –Controller: Device Electronics –Registers:
Silberschatz, Galvin and Gagne  Operating System Concepts I/O Hardware Incredible variety of I/O devices.
I/O Systems I/O Hardware Application I/O Interface
High Performance Computing & Communication Research Laboratory 12/11/1997 [1] Hyok Kim Performance Analysis of TCP/IP Data.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Principles of I/0 hardware.
The MPC Parallel Computer Hardware, Low-level Protocols and Performances University P. & M. Curie (PARIS) LIP6 laboratory Olivier Glück.
QoS Support in High-Speed, Wormhole Routing Networks Mario Gerla, B. Kannan, Bruce Kwan, Prasasth Palanti,Simon Walton.
Top Level View of Computer Function and Interconnection.
Computer Architecture Lecture10: Input/output devices Piotr Bilski.
A Comparative Study of the Linux and Windows Device Driver Architectures with a focus on IEEE1394 (high speed serial bus) drivers Melekam Tsegaye
Addressing Queuing Bottlenecks at High Speeds Sailesh Kumar Patrick Crowley Jonathan Turner.
Computer Architecture System Interface Units Iolanthe II approaches Coromandel Harbour.
Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.
Minimizing Communication Latency to Maximize Network Communication Throughput over InfiniBand Design and Implementation of MPICH-2 over InfiniBand with.
CE Operating Systems Lecture 13 Linux/Unix interprocess communication.
Supporting Systolic and Memory Communication in iWarp CS258 Paper Summary Computer Science Jaein Jeong.
Nirmalya Roy School of Electrical Engineering and Computer Science Washington State University Cpt S 122 – Data Structures Data Structures Queues.
The influence of system calls and interrupts on the performances of a PC cluster using a Remote DMA communication primitive Olivier Glück Jean-Luc Lamotte.
McGraw-Hill©The McGraw-Hill Companies, Inc., 2004 Connecting Devices CORPORATE INSTITUTE OF SCIENCE & TECHNOLOGY, BHOPAL Department of Electronics and.
LRPC Firefly RPC, Lightweight RPC, Winsock Direct and VIA.
Chapter 13 – I/O Systems (Pgs ). Devices  Two conflicting properties A. Growing uniformity in interfaces (both h/w and s/w): e.g., USB, TWAIN.
Router Architecture. December 21, 2015SoC Architecture2 Network-on-Chip Information in the form of packets is routed via channels and switches from one.
Dr Mohamed Menacer College of Computer Science and Engineering, Taibah University CE-321: Computer.
1 Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab Instructor: Evgeny Fiksman Students: Meir.
Queue Manager and Scheduler on Intel IXP John DeHart Amy Freestone Fred Kuhns Sailesh Kumar.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
SCORES: A Scalable and Parametric Streams-Based Communication Architecture for Modular Reconfigurable Systems Abelardo Jara-Berrocal, Ann Gordon-Ross NSF.
Virtual-Channel Flow Control William J. Dally
1 A Deficit Round Robin 20MB/s Layer 2 Switch Muraleedhara Navada Francois Labonte.
Chapter 3 System Buses.  Hardwired systems are inflexible  General purpose hardware can do different tasks, given correct control signals  Instead.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
1 Device Controller I/O units typically consist of A mechanical component: the device itself An electronic component: the device controller or adapter.
WINLAB Open Cognitive Radio Platform Architecture v1.0 WINLAB – Rutgers University Date : July 27th 2009 Authors : Prasanthi Maddala,
1394 H/W and OHCI Gi-Hoon Jung. 2002/01/162 Agenda Overview of the VITANA board OHCILynx PCI-based Host Controller Overview of the OHCI Spec.
Interconnection Networks: Flow Control
Peng Liu Lecture 14 I/O Peng Liu
Presentation transcript:

12/13/99 Page 1 IRAM Network Interface Ioannis Mavroidis IRAM retreat January 12-14, 2000

12/13/99 Page 2 Outline IRAM Network Interface Goals –VIRAM-1 prototype board and Application characteristics –NI Requirements and Design decisions NI Architecture and Design Overview –Rough diagram of the whole datapath –What has been implemented so far: Packet descriptor DMA engine Queue Manager

12/13/99 Page 3 VIRAM-1 prototype board

12/13/99 Page 4 Application Characteristics Streaming Multimedia/DSP computations or problems too large or too slow on a single IRAM : –FFT, MPEG, Sorting, Sparse matrix computations, N-body computations, Speech kernels, Rasterization or other graphics Bulk synchronous communication, mostly messages 100s of bytes long. High bandwidth is more important than low latency. Programming model and OS support similar to : –MPI (message send/receive) –Titanium (remote read/write)

12/13/99 Page 5 NI Requirements Message Passing support User-Level Access (mem-mapped device) Flow Control (no packets dropped) Routing/Bridging Multiple DMA descriptors per packet Should not under-utilize available link bandwidth Keep it simple. Focus on prototype board and apps. –~8 chips on same board –Applications: High bandwidth, Latency tolerant

12/13/99 Page 6 NI Design Decisions Packet is segmented into 32-byte flits Route once per packet –Advantage: Routing each flit separately would: Need more buffer space at the receiving node. Consume more bandwidth for routing info overhead. –Disadvantage: implies that flits from different packets should not be interleaved. Better not have page-fault/MEM exception in the middle of a packet… SW will have to guarantee this OR For our prototype, apps are highly likely to fit in main MEM Credit-based flow-control per flit –Do not have to allocate buffer for whole packet. Error detection/correction codes per flit –Helps to reduce power consumption with low-swing interconnect

12/13/99 Page 7 NI architecture

12/13/99 Page 8 Packet Descriptor Msg send is a 2 phase process: describe and launch 64 memory-mapped registers for packets description. –16 max per packet. Launch is atomic. –Description of one msg can start immediately after previous is launched. Misc registers: –head, tail (circular buffer) –space_avail (max pct descriptor) –save_len (for context-switch) –error (illegal op_len/desc_len)

12/13/99 Page 9 DMA engine Supported operations –Sequential DMA (word aligned) –Strided DMA (words/doubles) Address generator: –Allocates buffer to receive data when it arrives from memory. –Generates addresses. –Remembers pending requests. Data receiver: –Communicates with memory through a 32-bit bus. –Generates mux_sel signal to read mem data, according to pending requests.

12/13/99 Page 10 Queue Manager (1) Manages multiple FIFO queues in one shared memory. 5 queues: –1 x 4 output links –1 free list with all empty flits Each queue is represented as a linked list with head/tail/next pointers Supported operations: –enqueue (list, data) –data = dequeue (list)

12/13/99 Page 11 Queue Manager (2) 2 cycles per operations –Head/Tail Read –MEM, WB Head/Tail Pipelining: 1 op/cycle –Problem: Complexity due to data hazards. –Solution: Do not allow 2 consecutive ops of the same kind to avoid most hazards. –Timing »Enqueue: Write 64 bits »Dequeue: Read 64 bits -> Port 0 »Enqueue: Write 64 bits »Dequeue: Read 64 bits -> Port 1 »Enqueue: Write 64 bits »Dequeue: Read 64 bits -> Port 2 »Enqueue: Write 64 bits »Dequeue: Read 64 bits -> Port 3