Network On Chip Cache Coherency Final presentation – Part A Students: Zemer Tzach Kalifon Ethan Kalifon Ethan Instructor: Walter Isaschar Instructor: Walter Isaschar Spring 2008
Agenda Project’s general concepts. Design architecture of the router. Implementation of the router (using HDL Designer). Router’s simulations. Demonstration of our NoC. Network On Chip - Cache Coherency 2
3
General Background Modern CPU’s are based on CMP - Multi-Core Processor. Improved performance is achieved by “Distribution and Parallelism”. Cores interact by using NoC – Network on Chip. Network On Chip - Cache Coherency 4
NoC’s General Diagram Network On Chip - Cache Coherency 5
NoC’s Characteristics Wormhole packet routing. Packet’s path is X-Y. Units can communicate simultaneously. Reduce power consumption. Scalability. Network On Chip - Cache Coherency 6
Cache Coherency Definition: CMP cores use only up to date data. Originally, Cache Coherency in CMP was achieved by using a central memory control unit. Network On Chip - Cache Coherency 7
Cache Coherency Protocol Nowadays Network On Chip - Cache Coherency 8
Problem Description Prior Cache Coherency protocols are irrelevant – NoC doesn’t have central unit. Adding such unit will damage both NoC’s scalability and parallelism. Network On Chip - Cache Coherency 9
Solution Requirements Won’t affect main NoC’s characteristics (e.g. scalability). Avoid “Hot Spots” and “Bottle Necks”. Minimize use of NoC’s resources. Network On Chip - Cache Coherency 10
Solution Memory control distribution among a number of units according to memory spaces. Placement of control units as part of the NoC. Network On Chip - Cache Coherency 11
Solution Diagram Network On Chip - Cache Coherency 12
Project’s Goals Primary Goal: Design and implement Cache Coherency protocol for CMP. Implement NoC (including NoC’s router). Assemble CMP based on NoC. Network On Chip - Cache Coherency 13
14 Network On Chip - Cache Coherency
NoC Packet’s Structure Packet is divided into flits. There are four flit types: Start, Body, End and Idle. Network On Chip - Cache Coherency 15
Flit’s Structure Flit contain two fields: Data and Type. Network On Chip - Cache Coherency 16
5 Ports Router Direct packets according to X-Y routing. 5 ports – North, East, West, South and Processing Unit. Processing Units are using the network’s communication protocol. 2 Virtual Channels (VC) per port. Network On Chip - Cache Coherency 17
5 Ports Router Structure Network On Chip - Cache Coherency 18
Input Port Receives Flits from Router or from Processing unit. Analyze and save the current packet direction. Switch between Virtual Channels. Network On Chip - Cache Coherency 19
Input Port Structure Network On Chip - Cache Coherency 20
Output Port Transmits Flits to Router or to Processing unit. Each Virtual Channel save the currently serviced input port (CSIP). Switch between Virtual Channels. Network On Chip - Cache Coherency 21
Output Port Structure Network On Chip - Cache Coherency 22
Cross Bar Transfer Flits from Input Port to the matching Output Port. Consists of 5 controllers – one for every Output Port. Network On Chip - Cache Coherency 23
Cross Bar Structure Network On Chip - Cache Coherency 24
25 Network On Chip - Cache Coherency
Network’s characteristics The width of the Data bus is 8 bit. The size of the ports’ buffers is 4 flits (can contain 4 flit at the most). The NoC is composed of 9 routers, placed in 3x3 grid formation. Network On Chip - Cache Coherency 26
Input’s VC Implementation Network On Chip - Cache Coherency 27
Output’s VC Implementation Network On Chip - Cache Coherency 28
Input Port Implementation Network On Chip - Cache Coherency 29
Output Port Implementation Network On Chip - Cache Coherency 30
Output’s Controller Implementation Network On Chip - Cache Coherency 31
Crossbar_Mux Implementation Network On Chip - Cache Coherency 32
Cross Bar Implementation Network On Chip - Cache Coherency 33
Router Implementation Network On Chip - Cache Coherency 34
Synthesis Parameters Network On Chip - Cache Coherency 35
Network’s Performance Latency of the router is 2 cycles. Throughput of the router is 1 flit per cycle. System’s clock frequency is 100 [MHz]. Packets can be routed simultaneously. Packets can by-pass each other. Network On Chip - Cache Coherency 36
37 Network On Chip - Cache Coherency
Cross Transmit 38 Network On Chip - Cache Coherency Routing two packets simultaneously: port 0 to port 2 and port 3 to port 2.
Traffic avoidance by using VC 39 Network On Chip - Cache Coherency First packet from port 0 to port 1 get blocked in output port. Packet from port 3 to port 1 by-pass it.
40 Network On Chip - Cache Coherency
Demonstration Diagram Network On Chip - Cache Coherency 41
General Description Dummy units transmit packets. Destination is being set by the switch- buttons. The Dummy port start transmitting according to its push-button. Network On Chip - Cache Coherency 42
Project Schedule (1 st Semester) Familiarize with design tools – 3 weeks. Familiarize with VirtexII Pro FPGA (application & components) – 4 weeks. Design & Implement NoC’s router – 5 weeks. Assemble CMP using our router implementation – 2 weeks. Network On Chip - Cache Coherency 43
Project Schedule (2 nd Semester) Assemble CMP using our router implementation – 4 weeks. Design Cache Coherency protocol for CMP based on faculty research – 4 weeks. Implement the protocol as part of the assembled CMP – 6 weeks. Network On Chip - Cache Coherency 44