Download presentation
Presentation is loading. Please wait.
Published byShon Fields Modified over 9 years ago
1
1 Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab Instructor: Evgeny Fiksman Students: Meir Cohen Daniel Marcovitch Winter 2009
2
2 Project goals Page 2 Previous router Page 5 Our routers Page 7 Software design Page 11 Obstacles Page 12 Testing Page 14 Time tables Page 16 Table of Contents
3
Project goals Implementing a parallel processing system which contains several NoCs, each chip containing several sub-networks of processors. Converting existing router to support Altera platform. Expanding the router to enable communications between similar sub-networks. Implementing a processor network which supports communication with the PC enabling: Use of PC’s CPU as part of the processing network. Simple I/O between PC and the rest of the processing network. 3
4
Top-level structure of the expanded network Each white square represents a single FPGA on the Gidel board. FPGA-FPGA, FPGA-PC routes go via designated routers (GW). The GWs design/protocols are the same as the internal routers. 4
5
Router from previous project 5 Two main units: Permission Unit Port FSM Time limited Round Robin arbiter Port to Port & broadcasting Smart Connectivity R – R R - Core Modular design
6
Permission process 6 Round Robin arbiter- service order according to loop counter. Check if DEST is not busy. Permit for a ‘time slot’. If not requesting, service next requesting port. BUSY and LAST writing ports are saved. Check for messages COMM and direct to relevant port according to table Broadcast priority to enable only one bcast’ at a time.
7
Our changes for the router 7 Fifth port Routing table Broadcast table Local router (LR) Fabric router (FR) Primary/secondary interchip router (P/S-ICR) PC router (PCR) New router types: Changes:
8
Fifth port 8 5 th Port Just adding another port module to the ring…
9
Routing 9 PC CCFFLL Address localfabricchip rankcomm Local router: Similar comm – routing by rank. Other comms – to 5 th port. Other routers: Routing by comm only. Result: smaller routing tables
10
Routing 10 Non-existing components to be added.
11
Broadcast table 11 01101 Broadcasting only to spanning tree branches. Table tags branch ports with ‘1’ value: Connected to “Port FSM” unit of each port.
12
12 Software layers Software design Application Layer: MPI functions interface Network Layer: hardware independent implementation of these functions Data layer: relies on command bit fields Physical layer: designed for FSL bus Adjust to conform with altera i/f. Using DMA transfers. Add async. functions Adjusted for new comm size
13
Message Passing Flow 13 Destination Tag Buffer address Size Source Buffer Auxiliary Receive Buffer (Constant) Destination Buffer Network DMA transfer MPI_Isend: only adds send request to sending list. Destination Tag Buffer address Size Destination Tag Buffer address Size DMA sends data asynchronously. Source Tag Buffer address Size MPI_Irecv: only adds receive request to receiving list. Source Tag Buffer address Size Source Tag Buffer address Size DMA receives data asynchronously. Transfer data into buffer in background. Sending Receiving
14
Obstacle1 - Memory bottleneck 14 Each Nios uses ~13Kb onchip memory. FPGA has only ~70Kb onchip memory. Only 5 processors fit. Solutions: o Offchip memory – slow. Reducing program footprint. Using bigger FPGA for the whole network.
15
!! Obstacle2 - Cache coherency 15 DMA buffer cache line Cache flush is necessary but not enough! Incoherency in unaligned cache lines. Solutions: o Not using cache – asynchronic system not effective. o Disabling cache in buffer area – cannot use cache after DMA transfer. Align DMA buffers to cache lines (using memalign). Memory Cache
16
Local router Testing 16 Local router NiosII PC Simple FIFO * PIO NiosII PIO NiosII PIO NiosII PIO Simple FIFO * * * Testing Program * PIO to FIFO connector PIO output debug information, data sent/received and results. Test program prints the PIO data on screen. In simulation PIO can be read directly from wave.
17
Application 17 Multiple matrix multiplication. MUL
19
19 Questions
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.