Presentation is loading. Please wait.

Presentation is loading. Please wait.

Evaluation of On-Chip Interconnect Architectures for Multi-Core DSP Students : Haim Assor, Horesh Ben Shitrit 2. Shared Bus 3. Fabric 4. Network on Chip.

Similar presentations


Presentation on theme: "Evaluation of On-Chip Interconnect Architectures for Multi-Core DSP Students : Haim Assor, Horesh Ben Shitrit 2. Shared Bus 3. Fabric 4. Network on Chip."— Presentation transcript:

1 Evaluation of On-Chip Interconnect Architectures for Multi-Core DSP Students : Haim Assor, Horesh Ben Shitrit 2. Shared Bus 3. Fabric 4. Network on Chip (NoC) Master #1 Master #2 Master #3 Master #4 Slave #1 Slave #2 Slave #3 Slave #4 Arbiter Master #1 Master #2 Master #3 Master #4 Slave #1 Slave #2 Slave #3 Slave #4 RouterRouter Router Router Shared Bus is the classical way to connect components inside the chip. Components are divided to masters which can initiate a transaction (for example cores, DMA, peripherals like Ethernet controllers etc) and slaves who can only reply (like memories). The 3-bus architecture (address, data and control) is a shared transmission medium. A signal transmitted by one device is available for reception by all other devices attached to the bus. Only one device can successfully transmit at a time, and an arbitration mechanism is needed to control the transfers. In a multi-core system all cores connect to the same shares bus and therefore extra burden lies on the bus. Shared Bus main features are: Simplicity (small design and verification efforts). Blocking - Only one transaction at a time. Transfer approach contains address, data and control. Preemptive (can stall low priority transaction when higher priority arrives). There are buses that can handle out-of-order transaction (Out of Order is a feature for handling data that was accepted not in the order that was requested). In Fabric architecture every master has a dedicated bus to each one of the slaves, and each bus has full bandwidth power and can operate any main bus feature. The Fabric interconnect enables concurrent transactions between different components. In each slave device entry there is an arbiter which is responsible to decide which transaction takes place at a specific moment. The Fabric architecture enables high performance capabilities but expensive in terms of area. Fabric main features are: Connects multi master multi slave systems and therefore more complex than the shred bus (design and verification efforts). Non-Blocking - Many concurrent transactions. The Transfer approach contains address, data and control similar to the shared bus. Preemptive. Like the Shared Bus, it can handle out-of-order transactions, but this feature will increase design effort and should be taken into consideration. Enables memory bank interleaving. Network on Chip use packet switched transfer approach to transfer data between different chip components. The NoC is based on computer networking. The packet contains the destination address, the data and other control features needed for correct transfer. Transactions that move through the network are out of order: this means that packets from different initiators can mix on the network; re-order buffers will insure the proper ordering on the target. The NoC is constructed from identical routers which construct a homogeneous, scalable network, therefore has high growth capabilities. NoC main features are: The NoC is based on Routers which considered simple and their major advantage is the ability to build a network easily from the same components. The Router disadvantage is the need for Re-Order buffer. This feature is necessary because of the network transaction nature – Out-of-order. Re-Order buffer are complicated and require big area, and therefore increases design effort. The Transfer approach is packet based (each packet contain the address, data and control information). Non-preemptive. Semi Non-Blocking - Many transactions at a time, but can stall transaction. Project Number : p-2006-092 Supervisors: Dr. Shlomo Greenberg 1. Introduction – The need for Multi Core More Data Transfer Increased Algorithmic Complexity More Data Processing High Energy Consumption Multi Core system Chip manufactures nowadays are building multiple processing units inside one integrated chip. Those energy-efficient processing cores instead of one powerful core help in reducing the power consumption while increasing performance. Each core doesnt necessarily run as fast as highest performing single- core module, but this multi-core architecture improve overall performance by utilizing parallelism. Multi-Core systems allow parallel processing on a chip using many small processors or simply allow communication processors to process more data streams such as communication channels. Enable several cores to share the same resources cause new communicational problems, problems like resource unavailability, coherency keeping etc. Using the classical shared bus approach for connectivity between different components inside the chip leads to a major bottleneck in todays Multi-Core systems: only one transaction at a time is available on the bus. All other components that wish to use the bus have to wait for it to be free. Performance 20002005 2008 Project Goal Project Goal – In this project we address the connectivity problem in a multi-core DSP. We explore and model new interconnect architectures that comes to replace the classical shared bus. Our goal is to analyze these architectures and give quantitative evaluation to multi-core performance with each architecture. Performance through Parallelism 5. Modeling In order to examine the performance of different interconnects we modeled typical systems that contain the same components with different interconnect solutions. We used the PANAMA tool which is SystemC based software that enables modeling components and transactions. The PANAMA helps architects to model behavior of complete chip even before the RTL stage. The diagrams represent the modeled systems. Each system simulate real multi-core chip that contains 4 cores, memories, DMA and peripherals. Traces of several typical applications were executed on these systems. Besides the classical shared bus, a split shared bus was modeled which is also a common solution in order to overcome part of shred bus limitations. Also modeled a specific fabric which is in use at Freescales chips and a split fabric to make the comparison complete. Results for comparison between those 4 interconnect architectures are shown at the bottom, it is clear that the Fabric outcomes the shared bus at this type of systems. ArchitectureTotal areaPower Dissipation Operating Frequency 6. Asymptotic comparison: Shared Bus Fabric NoC The Table shows a theoretical comparison of multi-core interconnect solutions at the asymptotic limit for n ~ 100 (n represents number of cores in the system). The advantages of Cost and Performance of the NoC over other interconnect solutions is clear, but this table is deceiving because number of cores in today's most advanced chips is currently 2 to 8, and the NoC suffers from big overhead that is not displayed in that table (the coefficient of the cost and performance functions is dropped). Therefore the NoC advantages will come to fruition only in future technology generation when the number of cores will increase.. 7. Conclusions and future research It is clear that in the long run with the progress in technology, chips will be more complicated, and will contain many cores on the same die. With that scenario we assume that chips manufactures will choose the NoC as their interconnect solution. Meanwhile the most complicated chips contain only several computing units (cores) and the NoC advantages are still not obvious, Furthermore NoC has a lot of limitation when using it with small amount of cores. Initial results show that for multi-core chips containing several cores, the best interconnect solution is the fabric. Our purpose is to produce quantitative conclusions, which will help choosing the best interconnect solution according to number of cores in the system. Master #1 Master #2 Master #3 Master #4 Slave #1 Slave #2 Slave #3 Slave #4 Mr. Ori Goren Mr. Norman Goldstein


Download ppt "Evaluation of On-Chip Interconnect Architectures for Multi-Core DSP Students : Haim Assor, Horesh Ben Shitrit 2. Shared Bus 3. Fabric 4. Network on Chip."

Similar presentations


Ads by Google