1 Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab Instructor: Evgeny Fiksman Students: Meir.

Slides:



Advertisements
Similar presentations
Nios Multi Processor Ethernet Embedded Platform Final Presentation
Advertisements

IP Router Architectures. Outline Basic IP Router Functionalities IP Router Architectures.
Chapter 8 Hardware Conventional Computer Hardware Architecture.
Router Architecture : Building high-performance routers Ian Pratt
What's inside a router? We have yet to consider the switching function of a router - the actual transfer of datagrams from a router's incoming links to.
Mid semester Presentation Data Packages Generator & Flow Management Data Packages Generator & Flow Management Data Packages Generator & Flow Management.
Spring 2008 Network On Chip Platform Instructor: Yaniv Ben-Itzhak Students: Ofir Shimon Guy Assedou.
1 Multi - Core fast Communication for SoPC Multi - Core fast Communication for SoPC Technion – Israel Institute of Technology Department of Electrical.
Performed by : Rivka Cohen and Sharon Solomon Instructor : Walter Isaschar המעבדה למערכות ספרתיות מהירות High Speed Digital Systems Laboratory הטכניון.
Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab Project performed by: Naor Huri Idan Shmuel.
1 Evgeny Bolotin – ICECS 2004 Automatic Hardware-Efficient SoC Integration by QoS Network on Chip Electrical Engineering Department, Technion, Haifa, Israel.
1 Fast Communication for Multi – Core SOPC Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab.
Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab Written by: Haim Natan Benny Pano Supervisor:
1 Mid-term Presentation Implementation of generic interface To electronic components via USB2 Connection Supervisor Daniel Alkalay System architectures.
Performed by: Yevgeny Kliteynik Ofir Cohen Instructor: Yevgeny Fixman המעבדה למערכות ספרתיות מהירות High speed digital systems laboratory הטכניון - מכון.
Router Architectures An overview of router architectures.
Connecting LANs, Backbone Networks, and Virtual LANs
Viterbi Decoder Project Alon weinberg, Dan Elran Supervisors: Emilia Burlak, Elisha Ulmer.
Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab Spring 2009.
Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical.
LECTURE 9 CT1303 LAN. LAN DEVICES Network: Nodes: Service units: PC Interface processing Modules: it doesn’t generate data, but just it process it and.
Jon Turner (and a cast of thousands) Washington University Design of a High Performance Active Router Active Nets PI Meeting - 12/01.
By: Oleg Schtofenmaher Maxim Fudim Supervisor: Walter Isaschar Characterization presentation for project Winter 2007 ( Part A)
The MPC Parallel Computer Hardware, Low-level Protocols and Performances University P. & M. Curie (PARIS) LIP6 laboratory Olivier Glück.
High-Level Interconnect Architectures for FPGAs An investigation into network-based interconnect systems for existing and future FPGA architectures Nick.
High-Level Interconnect Architectures for FPGAs Nick Barrow-Williams.
Scalable Web Server on Heterogeneous Cluster CHEN Ge.
Univ. of TehranAdv. topics in Computer Network1 Advanced topics in Computer Networks University of Tehran Dept. of EE and Computer Engineering By: Dr.
Local-Area-Network (LAN) Architecture Department of Computer Science Southern Illinois University Edwardsville Fall, 2013 Dr. Hiroshi Fujinoki
1 Abstract & Main Goal המעבדה למערכות ספרתיות מהירות High speed digital systems laboratory The focus of this project was the creation of an analyzing device.
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 ECSE-6600: Internet Protocols Informal Quiz #14 Shivkumar Kalyanaraman: GOOGLE: “Shiv RPI”
Performed by:Yulia Turovski Lior Bar Lev Instructor: Mony Orbach המעבדה למערכות ספרתיות מהירות High speed digital systems laboratory הטכניון - מכון טכנולוגי.
Network On Chip Platform
The influence of system calls and interrupts on the performances of a PC cluster using a Remote DMA communication primitive Olivier Glück Jean-Luc Lamotte.
U N I V E R S I T Y O F S O U T H F L O R I D A Hadoop Alternative The Hadoop Alternative Larry Moore 1, Zach Fadika 2, Dr. Madhusudhan Govindaraju 2 1.
McGraw-Hill©The McGraw-Hill Companies, Inc., 2004 Connecting Devices CORPORATE INSTITUTE OF SCIENCE & TECHNOLOGY, BHOPAL Department of Electronics and.
Performed by:Gidi Getter, Shir Borenstein Supervised by:Ina Rivkin המעבדה למערכות ספרתיות מהירות High speed digital systems laboratory הטכניון - מכון טכנולוגי.
Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab Project performed by: Naor Huri Idan Shmuel.
Rehab AlFallaj.  Network:  Nodes: Service units: PC Interface processing Modules: it doesn’t generate data, but just it process it and do specific task.
1 Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab Instructor: Evgeny Fiksman Students: Meir.
1 Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab Instructor: Evgeny Fiksman Students: Meir.
Coping with Link Failures in Centralized Control Plane Architecture Maulik Desai, Thyagarajan Nandagopal.
Univ. of TehranIntroduction to Computer Network1 An Introduction to Computer Networks University of Tehran Dept. of EE and Computer Engineering By: Dr.
Lab 4 HW/SW Compression and Decompression of Captured Image
Chapter 3: Packet Switching (overview)
Lab 1: Using NIOS II processor for code execution on FPGA
Bus Systems ISA PCI AGP.
Data Center Network Architectures
Topics SRAM-based FPGA fabrics: Xilinx. Altera..
Routing and Switching Fabrics
Advanced Computer Networks
3. Internetworking (part 1)
Chapter 6 Delivery & Forwarding of IP Packets
Addressing: Router Design
Azeddien M. Sllame, Amani Hasan Abdelkader
Cache Memory Presentation I
Chapter 6: Network Layer
The PCI bus (Peripheral Component Interconnect ) is the most commonly used peripheral bus on desktops and bigger computers. higher-level bus architectures.
CT1303 LAN Rehab AlFallaj.
Introduction to Microprocessors and Microcontrollers
Data Structures and Algorithms in Parallel Computing
Advance Computer Networking
Dynamic Packet-filtering in High-speed Networks Using NetFPGAs
2018/12/10 Energy Efficient SDN Commodity Switch based Practical Flow Forwarding Method Author: Amer AlGhadhban and Basem Shihada Publisher: 2016 IEEE/IFIP.
EE 122: Lecture 7 Ion Stoica September 18, 2001.
CSE 550 Computer Network Design
Routing and Switching Fabrics
NetFPGA - an open network development platform
Chapter 13: I/O Systems.
Multiprocessors and Multi-computers
Presentation transcript:

1 Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab Instructor: Evgeny Fiksman Students: Meir Cohen Daniel Marcovitch Spring 2009

2 Introduction/definition Page 1-4 New HW modules Page 5-9 Testing and debug Page Application Page 12 Performance Page Summary/conclusions Page 18 Table of Contents

In the previous semester… 1. Implementing a parallel processing system which contains several NoCs, each chip containing several sub- networks of processors. PC forms part of the network using PCI. 2. Writing an application which utilizes parallel processing. 3. Measuring system performance 3 In previous semseter we took previous “router” and converted it to work on Altera platform. In addition we prepared system architecture and microarchitecture. Problem definition:

4 This semester… Implemented the various HW modules needed for larger scale routing: Added 5 th port to all routers/switches Fabric router InterChip GW PC GW 4 Implemented asynchronous MPI commands ( MPI commands were implemented both for Nios and for PC) Wrote example application which utilizes the 64 processors to solve problem (heat transfer) Measured system performance)

5 Putting it all together – a general view of topology 1. Each local cluster has 4 processors. 2. Each chip has 4 clusters (comms) 3. Gidel board has 4 chip – altogether 64 processors 4. PC is also part of chip – switching between 4 FPGAs is done in software – i.e if forms a “virtual switch”.

New HW modules(1) – Fabric router In “Local router” – forwarding is done by rank – i.e rank = port In “Fabric router” – forwarding table is implemented. 6

Routing tables 7 PC CCFFLL Address localfabricchip rankcomm Local router: Similar comm – routing by rank. Other comms – to 5 th port. Other routers: Routing by comm/chip only. myComm,myChip entry used for PC routing Implemented using VHDL’s “generate” command to reuse existing modules. Hex file is created for each router, loaded into ROM using parameter. Grouping (i.e sub-network prefixes) allows us to use small routing table (only 8 entries)

New HW modules(2) – IC GW Primary/Secondary indicates connectivity rather than implementation Interchip interface has increased latency – we use buffers and credits to ensure no fifo overrun Credit counter is initialized with fifo size (i.e 32) as initial #credits Since fifo size > end 2 end latency – block give 100% throughput 8 c Remote buffer Credit counter Local buffer Remote credit release Local credit release (inc)(dec) FIFO

New HW modules(2) – IC routing IC connectivity itself uses Gidel’s fastest busses: 1. Neighbour busses between 1-2, 2-3, Main bus between 1-4 Both busses are wide enough to support bi-directional traffic i/f : 32 bit data, ctrl, credit_release, push/pop [total: 35 bits X 2]

10 New HW modules(3) – PC GW 10 ToPC GwFromPC Gw Needed for three reasons: 1. FromPCGw adds start/finish “ctrl” signal (parses MPI header for “size” field) 2. Handle PCI idiosyncrasies (minimum messaged length) 3. Use “Gidel’s (req/ack) simple FIFO protocol rather than Altera’s fifo protocol (push/pop)

Testing and debug Since the project is multi-layered, debug can be split into several types: HW (component) issues Connectivity SW (NIOS/PC) Component testing Small testbenches encompassing single block Connectivity Before running main application – we ran connectivity application to check all nios can communicate with each other. Made Specman-E simulation emulating the router’s operation while loading and parsing the real hex files.

Testing and debug 12 SW/NIOS Model Sim was used for logical simulation. Since system was large and debugging is difficult and multi- layered (debugging application run on NIOS), we added special debug registers. Each NIOS writes to these registers (PIO – parallel I/O) during application run, publishing its “state”. In addition, debug registers were attached to main FIFOs to indicate traffic flow (performance counters) PIO FIFO counters When running on chip itself, these registers are sampled and displayed during the application to give indication of system state

Application Parallel jacobian algorithm for approximation solution for the equation. Distribute matrix among CPUs. CPUs communicate with neighbors. Uses computation-communication overlapping. Managed by the host PC. iteration compute interior send/receive boundary compute boundary matrix distribution:

14 Performance – application time vs number of iterations Measurements done on dual core pentium processor running at 2.4Ghz Constant offset indicates PCI latency Running length is #Iterations * (communication + calculation) Linear equation as expected: #Iterations * (communication + calculation) + PCI offset

15 Performance – throughput vs injection rate For low injection rate – routing isn’t a bottleneck => output rate almost identical to input As injection rate increases – router becomes bottleneck Once maximum throughput of router is met – throughput is constant

D(p) – delay(# packets in system) R – average router delay L – system latency λ – injection rate D(p)=R∙p + L P=λ∙D(p) [little’s law] D(p) =λ∙L/(1-λ∙R) Performance – simplified model – delay(congestion) R=50, L=80 16

17 Performance – packet delay vs number of injection stubs Few stubs injection – almost no congestion – constant delay As we approach throughput – congestion increases and delay decreases For very high injection rate –we approach system saturation (since fifo sizes are finite (32 entries) there is a maximum number of packet in the system at any given moment)

18 Performance – packet delay vs injection rate For low injection rate – almost no congestion – constant delay We again see an exponential increase which peters out due to system saturation

1. Original router was robust and easily expanded to support 5 th port and routing tables 2. Debugging software written on this system posed a serious challenge, and required a certain measure of innovation. 3. Despite being on chip – communication between processors still constitutes a serious factor. Therefore, the overall performance system will improve as the calculation/communication ratio decreases. 4. For similar reasons, network can be better used if locality between nodes is utilized. 19 Summary/conclusions: Next steps: 1. Compare topologies (mesh / fat tree ) 2. Develop software to automatically create topologies out of building blocks 3. Simplify router and increase throughput

20 Questions