Network-on-FPGA Aleksander Ślusarczyk. Network-on-FPGA Network –topologies –routing Data processor –mMIPS –network interface uP Mem IF NI.

Slides:



Advertisements
Similar presentations
Network II.5 simulator ..
Advertisements

Slide 2-1 Copyright © 2004 Pearson Education, Inc. Operating Systems: A Modern Perspective, Chapter 2 Using the Operating System 2.
Packet Switching COM1337/3501 Textbook: Computer Networks: A Systems Approach, L. Peterson, B. Davie, Morgan Kaufmann Chapter 3.
COSC 120 Computer Programming
Mahapatra-Texas A&M-Fall'001 cosynthesis Introduction to cosynthesis Rabi Mahapatra CPSC498.
I/O Hardware n Incredible variety of I/O devices n Common concepts: – Port – connection point to the computer – Bus (daisy chain or shared direct access)
FF-1 9/30/2003 UTD Practical Priority Contention Resolution for Slotted Optical Burst Switching Networks Farid Farahmand The University of Texas at Dallas.
Technische universiteit eindhoven Department of Electrical Engineering Electronic Systems Optimizing the mMIPS Sander Stuijk.
Active Messages: a Mechanism for Integrated Communication and Computation von Eicken et. al. Brian Kazian CS258 Spring 2008.
Chapter 13 Reduced Instruction Set Computers (RISC) Pipelining.
WSN Simulation Template for OMNeT++
Processes and Resources
Matlab as a Design Environment for Wireless ASIC Design June 16, 2005 Erik Lindskog Beceem Communications, Inc.
COMP 14: Intro. to Intro. to Programming May 23, 2000 Nick Vallidis.
C++ fundamentals.
Gursharan Singh Tatla Transport Layer 16-May
Network Architecture and Protocol Concepts. Network Architectures (1) The network provides one or more communication services to applications –A service.
OMNET++. Outline Introduction Overview The NED Language Simple Modules.
ICMP (Internet Control Message Protocol) Computer Networks By: Saeedeh Zahmatkesh spring.
Characteristics of Communication Systems
Protocols and the TCP/IP Suite
Hardware Definitions –Port: Point of connection –Bus: Interface Daisy Chain (A=>B=>…=>X) Shared Direct Device Access –Controller: Device Electronics –Registers:
1 A Simple but Realistic Assembly Language for a Course in Computer Organization Eric Larson Moon Ok Kim Seattle University October 25, 2008.
Copyright 2001 Oxford Consulting, Ltd1 January Storage Classes, Scope and Linkage Overview Focus is on the structure of a C++ program with –Multiple.
Firmware based Array Sorter and Matlab testing suite Final Presentation August 2011 Elad Barzilay & Uri Natanzon Supervisor: Moshe Porian.
CS 390- Unix Programming Environment CS 390 Unix Programming Environment Topics to be covered: Distributed Computing Fundamentals.
Cisco S2 C4 Router Components. Configure a Router You can configure a router from –from the console terminal (a computer connected to the router –through.
High-Level Interconnect Architectures for FPGAs Nick Barrow-Williams.
1 The Internet and Networked Multimedia. 2 Layering  Internet protocols are designed to work in layers, with each layer building on the facilities provided.
Chapter 15 – Part 2 Networks The Internal Operating System The Architecture of Computer Hardware and Systems Software: An Information Technology Approach.
The Socket Interface Chapter 21. Application Program Interface (API) Interface used between application programs and TCP/IP protocols Interface used between.
COP 5611 Operating Systems Spring 2010 Dan C. Marinescu Office: HEC 439 B Office hours: M-Wd 2:00-3:00 PM.
The Cosmic Cube Charles L. Seitz Presented By: Jason D. Robey 2 APR 03.
Packet switching network Data is divided into packets. Transfer of information as payload in data packets Packets undergo random delays & possible loss.
Lecture 12: Reconfigurable Systems II October 20, 2004 ECE 697F Reconfigurable Computing Lecture 12 Reconfigurable Systems II: Exploring Programmable Systems.
11 CLUSTERING AND AVAILABILITY Chapter 11. Chapter 11: CLUSTERING AND AVAILABILITY2 OVERVIEW  Describe the clustering capabilities of Microsoft Windows.
The Alpha Network Architecture Mukherjee, Bannon, Lang, Spink, and Webb Summary Slides by Fred Bower ECE 259, Spring 2004.
Routing Networks and Protocols Prepared by: TGK First Prepared on: Last Modified on: Quality checked by: Copyright 2009 Asia Pacific Institute of Information.
Gedae, Inc. Gedae: Auto Coding to a Virtual Machine Authors: William I. Lundgren, Kerry B. Barnes, James W. Steed HPEC 2004.
Basic Memory Management 1. Readings r Silbershatz et al: chapters
Chapter 13 – I/O Systems (Pgs ). Devices  Two conflicting properties A. Growing uniformity in interfaces (both h/w and s/w): e.g., USB, TWAIN.
Lecture 4 Mechanisms & Kernel for NOSs. Mechanisms for Network Operating Systems  Network operating systems provide three basic mechanisms that support.
Protocol Layering Chapter 11.
Renesas Electronics America Inc. © 2010 Renesas Electronics America Inc. All rights reserved. Overview of Ethernet Networking A Rev /31/2011.
Efficient Software-Based Fault Isolation Authors: Robert Wahbe Steven Lucco Thomas E. Anderson Susan L. Graham Presenter: Gregory Netland.
Silberschatz, Galvin, and Gagne  Applied Operating System Concepts Module 12: I/O Systems I/O hardwared Application I/O Interface Kernel I/O.
Operating Systems: Summary INF1060: Introduction to Operating Systems and Data Communication.
Memory Management Chapter 5 Advanced Operating System.
Hello world !!! ASCII representation of hello.c.
LonWorks Introduction Hwayoung Chae.
Some of the utilities associated with the development of programs. These program development tools allow users to write and construct programs that the.
Introduction to Operating Systems Concepts
InterVLAN Routing 1. InterVLAN Routing 2. Multilayer Switching.
Module 12: I/O Systems I/O hardware Application I/O Interface
IP Routers – internal view
COMBINED PAGING AND SEGMENTATION
Aleksander Ślusarczyk Matthijs Visser Henk Corporaal
Main Memory Management
Introduction to cosynthesis Rabi Mahapatra CSCE617
AT91RM9200 Boot strategies This training module describes the boot strategies on the AT91RM9200 including the internal Boot ROM and the U-Boot program.
Computer Organization & Compilation Process
Bridges and Extended LANs
Operating System Concepts
13: I/O Systems I/O hardwared Application I/O Interface
CS703 - Advanced Operating Systems
Advanced Computer Architecture 5MD00 Project on Network-on-Chip
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
Ch 17 - Binding Protocol Addresses
In Today’s Class.. General Kernel Responsibilities Kernel Organization
Module 12: I/O Systems I/O hardwared Application I/O Interface
Presentation transcript:

Network-on-FPGA Aleksander Ślusarczyk

Network-on-FPGA Network –topologies –routing Data processor –mMIPS –network interface uP Mem IF NI

Network Easy to implement Easy to use –No software assistance required –Reliable –No scheduling/routing

Dally’s network Torus topology E-cube routing Unidirectional links –deadlock-free (2 virtual channels per link)

Router

Sub-router H16bD T

Dally’s network Guaranteed delivery, deadlock-free –no software required, reliable out-of-the-box  Fixed route –impossible congestion avoidance, load balancing –no timing guarantees

Topologies - Mesh Bidir links (double the connections) Asymetric at edges

Topologies - Tree One route Bidir links Top-level nodes overloaded

Routing E-cube Interval –Range of addresses assigned to output port –Deadlock-free labellings for many topologies [1,1] [2,5] [1,2] [3,5] [1,2] [3,5][1,4] [4,5]

Route tables I1 I2 O1 O2 I3 O3 t \ oO1O2O3 t1I1 t2I2 t3I1 Compile-time fixed Scheduling required Contention-free Guaranteed timing Time slots In a time slot one connection active

Routing - Dynamic Header contains routing information –E.g. streetsign: “goto x, turn left, goto y, turn right, … ” –Determined by user application or Network Interface (e.g. routing table) Intermediate router determines best route

Data processor Starting point – mMIPS developed for OGO –pipelined –28 instructions –separate D/I memory –synthesizable SystemC

Network interfacing Memory mapped network device mMIPS IMDM NI Data: 0x address send data_rdy send_rdy Ctl: 0x

mMIPS IMDM NI Memory Data and instruction cache –Currently : local main memory –Plan : network access to memory I$D$ MEMIF RAM NI+

Implementation mMIPS:600 slices Cache:2 x 300 slices Router:500 slices N.I.:100 slices +:1800 Virtex : 15,000 slices KB MHz

Software LCC compiler for mMIPS (Sander Stuijk) Communication library (Mathijs Visser) –C send/receive primitives (blocking/non- blocking) –networked JPEG

Software for the Network-on-FPGA Mathijs Visser (student E) January 2004, version 1.0

Introduction Goals: Create a communications library for C. Improve the programmability of the mMips network Create and test a multi processor application Verify HW and SW correctness Context: Courses for twaio’s Network-on-Chip flagship

Overview 1.Current software tools  The C compiler (lcc)  C communications library  The simulator (SystemC)  Simple C debugging library 2.Multi processor applications  Two examples  Design process & FPGA demonstration 3.Summary

C compiler (LCC) Advantages +Designed for retargetability +Ported by Sander Stuijk for mMips +Different memory layouts supported without recompilation Disadvantages –ANSI/POSIX libraries not implemented –No debugging information –Ongoing test process

mMips communication revisited Memory mapped communication Status_word Data_word Max. physical address 32 bits 0x0000 Request transmission of Data_word Check whether Data_word valid? Set destination node address Contains received data, Location to write outgoing data to

C communications library Goal Simplify inter-processor communications for the C programmer (= user). Constraints Time: Design and test in around 40 hours Interface: Easy to use, encapsulate HW details ROM memory: Should require less than 1kbyte Adhere to a well know standard.

C communications library Possible communication scheme: Message passing Blocking send and receive Non-blocking send (= try) and receive (= peek) Possible implementation: C Function Description sc_send_word() and sc_receive_word() Send or receive exactly 4 bytes sc_send() and sc_receive() Send / receive any number of bytes. ¥ Retry count as optional parameter ¥

C communications library Advantages of Message Passing Directly supported by hardware  Small code base (meets memory constraints)  Easy to implement (meets time constraints) Forms basis for more complex protocols  Only two operations (meets constraints for simplicity)  Uses message passing (= a standard, as required)

Send and receive primitives int sc_send(const int address, const void *data, const int size_in_bytes) int sc_receive( void *data, const int size_in_bytes) address Relative address of destination node data Pointer to source/destination data Return value Number of bytes actually sent or received.

Simulator (SystemC) System level design tool –C++ Class Libraries for hardware constructs, such as adders –SystemC model of the mMips network (Alex) –Standalone executable can be generated

Simulator (SystemC) Important debugging tool –VCD tracings –Memory dumps (ROM & RAM) –Spy module: Spy on instruction pointer (IP) & communication Watch read/writes on specific addresses Stop simulation when IP at specific address Additional options…

Desirable because: LCC cannot generate debugging info No CRT/console, so no printf() C library for debugging

Solution to debugging problem? Implements a printf() -variant Writes output to memory  Useful for both Simulator and FPGA implementation. C library for debugging Instructions - Reserved - Program data and Stack FPGA memory Output of printf() is stored here 0x0000 0x4000 0x8000

Multi processor applications ( for the mMips network) Two examples Design process & FPGA demonstration

Multi processor applications Two applications were developed 1.Multi processor JPEG decoder 2.“Gossip”: a small message circulates the network Both resulted in improvements of both compiler and mMips “Gossip” application & design process will be demonstrated Next slide: some words on the JPEG decoder

JPEG decoder Input: JPEG image Output: BITMAP image 2x2 mMips Network

JPEG decoder Input: JPEG image Output: BITMAP image 2x2 mMips Network Not finished yet… Large: ± 500 lines of code Limited debugging facilities Long simulation times: 2 hours for 16x16 image Discovery of compiler or hardware issues

JPEG decoder Finish the JPEG decoder Because… This complex algorithm is a good test case Good example of a realistic application

JPEG decoder mapped on 3 nodes Phase 1: * Variable length decoding * Zigzag scan * Dequantization Phase 2: * IDCT (inverse discrete cosine transform) Phase 3: * Color conversion * Reordering *Unused node* 2x2 mMips Network

Demonstration Hardware Network layout2-by-2 network (4 nodes) Memory (per node)16 Kbyte ROM, 16 Kbyte RAM “Gossip” application: (send a short message over the network) Node 1 (x1y0)Node 2 (x0y1) Node 0 (x1y1)Node 0 (x0y0) Message (18 bytes): “I know something!”

File with User data (e.g. Node ID) “Gossip”: from idea to hardware Program code User data Program data and Stack Node Create the C program All nodes are identical except for their node ID Node ID: pointer to address in user_data segment. 2.Compilation Compile one node (lcc) Separate code and data using a shell script Insert user_data

“Gossip”: from idea to hardware Program code User data Program data and Stack Node Use the SystemC simulator to test & debug 4.Upload to and run in FPGA

Summary o C Communications library (Message passing) implemented & tested oTest applications have lead to improvements in Compiler, Debugging facilities and hardware oFuture work: –A working JPEG decoder –Improved debugging capabilities