1 Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab Instructor: Evgeny Fiksman Students: Meir.

Slides:



Advertisements
Similar presentations
Nios Multi Processor Ethernet Embedded Platform Final Presentation
Advertisements

Layer 3 Switching. Routers vs Layer 3 Switches Both forward on the basis of IP addresses But Layer 3 switches are faster and cheaper However, Layer 3.
NetFPGA Project: 4-Port Layer 2/3 Switch Ankur Singla Gene Juknevicius
August 8 th, 2011 Kevan Thompson Creating a Scalable Coherent L2 Cache.
Chapter 10 Input/Output Organization. Connections between a CPU and an I/O device Types of bus (Figure 10.1) –Address bus –Data bus –Control bus.
Reporter :LYWang We propose a multimedia SoC platform with a crossbar on-chip bus which can reduce the bottleneck of on-chip communication.
1/1/ / faculty of Electrical Engineering eindhoven university of technology Processor support devices Part 1:Interrupts and shared memory dr.ir. A.C. Verschueren.
I/O Unit.
Nios implementation in CCD Camera for "Pi of the Sky" experiment Photonics and Web Engineering Research Group Institute of Electronics Systems Warsaw University.
FIU Chapter 7: Input/Output Jerome Crooks Panyawat Chiamprasert
NETWORK ON CHIP ROUTER Students : Itzik Ben - shushan Jonathan Silber Instructor : Isaschar Walter Final presentation part A Winter 2006.
Mid semester Presentation Data Packages Generator & Flow Management Data Packages Generator & Flow Management Data Packages Generator & Flow Management.
Performed by: Moshe Emmer, Harar Meir Instructor: Alkalay Daniel Cooperated with: AE faculty המעבדה למערכות ספרתיות מהירות High speed digital systems laboratory.
Network based System on Chip Performed by: Medvedev Alexey Supervisor: Walter Isaschar (Zigmond) Winter-Spring 2006.
Fast Communication for Multi – Core SOPC Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab.
1 Multi - Core fast Communication for SoPC Multi - Core fast Communication for SoPC Technion – Israel Institute of Technology Department of Electrical.
Performed by : Rivka Cohen and Sharon Solomon Instructor : Walter Isaschar המעבדה למערכות ספרתיות מהירות High Speed Digital Systems Laboratory הטכניון.
Presenting: Yaron Yagoda Kobi Cohen VERSITILE COMMUNICAION BETWEEN MULTI DSPS Digital Systems Laboratory Spring 2003 Supervisor: Isaschar Walter Final.
NETWORK ON CHIP ROUTER Students : Itzik Ben - shushan Jonathan Silber Instructor : Isaschar Walter PART A Midterm presentation Winter 2006.
University College Cork IRELAND Hardware Concepts An understanding of computer hardware is a vital prerequisite for the study of operating systems.
TECH CH03 System Buses Computer Components Computer Function
Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab Project performed by: Naor Huri Idan Shmuel.
1 Fast Communication for Multi – Core SOPC Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab.
CROSSBAR LAN TEAM 8 CURTIS PETE D. ERIC ANDERSON DANIEL HYINK JOHN MUFARRIGE.
Device Driver for Generic ASC Module - Project Presentation - By: Yigal Korman Erez Fuchs Instructor: Evgeny Fiksman Sponsored by: High Speed Digital Systems.
Performed by: Alex Shpiner Eyal Azran Instructor: Boaz Mizrachi המעבדה למערכות ספרתיות מהירות High speed digital systems laboratory הטכניון - מכון טכנולוגי.
Interface of DSP to Peripherals of PC Spring 2002 Supervisor: Broodney, Hen | Presenting: Yair Tshop Michael Behar בס " ד.
WANs and Routers Routers. Router Description Specialized computer Like a general purpose PC, a router has:  CPU  Memory  System Bus Connecting Internal.
1 Mid-term Presentation Implementation of generic interface To electronic components via USB2 Connection Supervisor Daniel Alkalay System architectures.
Performed by: Yevgeny Kliteynik Ofir Cohen Instructor: Yevgeny Fixman המעבדה למערכות ספרתיות מהירות High speed digital systems laboratory הטכניון - מכון.
Chapter 4 Section 3.
Module I Overview of Computer Architecture and Organization.
CS-334: Computer Architecture
Input/Output. Input/Output Problems Wide variety of peripherals —Delivering different amounts of data —At different speeds —In different formats All slower.
1 Computer System Overview Chapter 1. 2 n An Operating System makes the computing power available to users by controlling the hardware n Let us review.
Design and Characterization of TMD-MPI Ethernet Bridge Kevin Lam Professor Paul Chow.
MICROPROCESSOR INPUT/OUTPUT
CHAPTER 3 TOP LEVEL VIEW OF COMPUTER FUNCTION AND INTERCONNECTION
TO p. 1 Spring 2006 EE 5304/EETS 7304 Internet Protocols Tom Oh Dept of Electrical Engineering Lecture 9 Routers, switches.
Top Level View of Computer Function and Interconnection.
Computer Architecture Lecture10: Input/output devices Piotr Bilski.
A Profiler for a Multi-Core Multi-FPGA System by Daniel Nunes Supervisor: Professor Paul Chow September 30 th, 2008 University of Toronto Electrical and.
Local-Area-Network (LAN) Architecture Department of Computer Science Southern Illinois University Edwardsville Fall, 2013 Dr. Hiroshi Fujinoki
NIOS II Ethernet Communication Final Presentation
Interrupts, Buses Chapter 6.2.5, Introduction to Interrupts Interrupts are a mechanism by which other modules (e.g. I/O) may interrupt normal.
Computer Architecture Lecture 2 System Buses. Program Concept Hardwired systems are inflexible General purpose hardware can do different tasks, given.
EEE440 Computer Architecture
Performed by:Yulia Turovski Lior Bar Lev Instructor: Mony Orbach המעבדה למערכות ספרתיות מהירות High speed digital systems laboratory הטכניון - מכון טכנולוגי.
Performed by: Guy Assedou Ofir Shimon Instructor: Yaniv Ben-Yitzhak המעבדה למערכות ספרתיות מהירות High speed digital systems laboratory הטכניון - מכון.
Lab 2 Parallel processing using NIOS II processors
Dr Mohamed Menacer College of Computer Science and Engineering, Taibah University CE-321: Computer.
4 Linking the Components Linking The Components A computer is a system with data and instructions flowing between its components in response to processor.
Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab Project performed by: Naor Huri Idan Shmuel.
Processor Memory Processor-memory bus I/O Device Bus Adapter I/O Device I/O Device Bus Adapter I/O Device I/O Device Expansion bus I/O Bus.
1 Performed by: Kobi Cohen,Yaron Yagoda Instructor: Zigi Walter המעבדה למערכות ספרתיות מהירות High speed digital systems laboratory הטכניון - מכון טכנולוגי.
1 Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab Instructor: Evgeny Fiksman Students: Meir.
بسم الله الرحمن الرحيم MEMORY AND I/O.
Chapter 3 System Buses.  Hardwired systems are inflexible  General purpose hardware can do different tasks, given correct control signals  Instead.
Performed by: Jonathan Silber Itzik Ben-Shushan Instructor: Isaschar walter המעבדה למערכות ספרתיות מהירות High speed digital systems laboratory הטכניון.
1 Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab Instructor: Evgeny Fiksman Students: Meir.
CS703 - Advanced Operating Systems
Operating Systems Chapter 5: Input/Output Management
Presentation transcript:

1 Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab Instructor: Evgeny Fiksman Students: Meir Cohen Daniel Marcovitch Winter 2009

2 Project goals Page 2 Previous router Page 5 Our routers Page 7 Software design Page 11 Obstacles Page 12 Testing Page 14 Time tables Page 16 Table of Contents

Project goals Implementing a parallel processing system which contains several NoCs, each chip containing several sub-networks of processors. Converting existing router to support Altera platform. Expanding the router to enable communications between similar sub-networks. Implementing a processor network which supports communication with the PC enabling: Use of PC’s CPU as part of the processing network. Simple I/O between PC and the rest of the processing network. 3

Top-level structure of the expanded network Each white square represents a single FPGA on the Gidel board. FPGA-FPGA, FPGA-PC routes go via designated routers (GW). The GWs design/protocols are the same as the internal routers. 4

Router from previous project 5 Two main units:  Permission Unit  Port FSM Time limited Round Robin arbiter Port to Port & broadcasting Smart Connectivity R – R R - Core Modular design

Permission process 6 Round Robin arbiter- service order according to loop counter. Check if DEST is not busy. Permit for a ‘time slot’. If not requesting, service next requesting port. BUSY and LAST writing ports are saved. Check for messages COMM and direct to relevant port according to table Broadcast priority to enable only one bcast’ at a time.

Our changes for the router 7 Fifth port Routing table Broadcast table Local router (LR) Fabric router (FR) Primary/secondary interchip router (P/S-ICR) PC router (PCR) New router types: Changes:

Fifth port 8 5 th Port Just adding another port module to the ring…

Routing 9 PC CCFFLL Address localfabricchip rankcomm Local router: Similar comm – routing by rank. Other comms – to 5 th port. Other routers: Routing by comm only.  Result: smaller routing tables

Routing 10 Non-existing components to be added.

Broadcast table  Broadcasting only to spanning tree branches.  Table tags branch ports with ‘1’ value:  Connected to “Port FSM” unit of each port.

12 Software layers Software design Application Layer: MPI functions interface Network Layer: hardware independent implementation of these functions Data layer: relies on command bit fields Physical layer: designed for FSL bus Adjust to conform with altera i/f. Using DMA transfers. Add async. functions Adjusted for new comm size

Message Passing Flow 13  Destination  Tag  Buffer address  Size Source Buffer Auxiliary Receive Buffer (Constant) Destination Buffer Network DMA transfer MPI_Isend: only adds send request to sending list.  Destination  Tag  Buffer address  Size  Destination  Tag  Buffer address  Size DMA sends data asynchronously.  Source  Tag  Buffer address  Size MPI_Irecv: only adds receive request to receiving list.  Source  Tag  Buffer address  Size  Source  Tag  Buffer address  Size DMA receives data asynchronously. Transfer data into buffer in background. Sending Receiving

Obstacle1 - Memory bottleneck 14  Each Nios uses ~13Kb onchip memory.  FPGA has only ~70Kb onchip memory. Only 5 processors fit. Solutions: o Offchip memory – slow. Reducing program footprint. Using bigger FPGA for the whole network.

!! Obstacle2 - Cache coherency 15 DMA buffer cache line Cache flush is necessary but not enough!  Incoherency in unaligned cache lines. Solutions: o Not using cache – asynchronic system not effective. o Disabling cache in buffer area – cannot use cache after DMA transfer. Align DMA buffers to cache lines (using memalign). Memory Cache

Local router Testing 16 Local router NiosII PC Simple FIFO * PIO NiosII PIO NiosII PIO NiosII PIO Simple FIFO * * * Testing Program * PIO to FIFO connector PIO output debug information, data sent/received and results. Test program prints the PIO data on screen. In simulation PIO can be read directly from wave.

Application 17 Multiple matrix multiplication. MUL

19 Questions