Download presentation
Presentation is loading. Please wait.
1
Modern trends in computer architecture and semiconductor scaling are leading towards the design of chips with more and more processor cores. Highly concurrent hardware and software architectures are inevitable in future systems. One of the greatest problems in these systems is communication. It is essential that scalable, flexible, and efficient hardware/software mechanisms be researched and developed to ease the technical community into developing concurrent systems. This research effort is to create such mechanisms by designing a scalable hardware implementation of a multicore communication API as a case study of a concurrent design, synthesis, and verification flow. Communication Architecture Communication Performance Hardware Design, Synthesis, and Verification of a Multicore Communication API Ben Meakin, Ganesh Gopalakrishnan University of Utah School of Computing RISC Communication Instructions Designed as Extension to MIPS ISA Hardware Design Flow Synthesis of VHDL source with Xilinx Compiler Target Platform: Xilinx Virtex5 FPGA Design Objective 1: Nine Core MIPS Processor Running MCAPI Programs on Programmable Logic Design Objective 2: Platform for Testing/Implementing Research Ideas in Multicore Architectures Introduction NoC Router Design and Synthesis ISA Benefits? Flexible: Data can be passed as pointers to shared memory or as 16, 32, or 64-bit scalars Simple: MIPS is a well known ISA and good compiler tools exist Efficient implementation of MCAPI as a C library utilizing instructions as inline assembly code Router Arbitration Round-Robin Scheme Starvation Free Single Cycle Request/Grant Handshake Protocol Routing Function Dimension Order Routing Deadlock Free Saturating Counters Choose Best VC to Use Reduces Worst-Case Latency Advantages/Disadvantages of Single Cycle Design Lowest Possible Latency in Cycles Targets Embedded Systems Where Clock Rate not as Big of an Issue Routing Function Taken Off Critical Path To Improve Single Cycle Clock Rate Best-Case Latency (in Clock Cycles) F = N + L (where N is # of Hops and L is the length of the Packet) Worst-Case Latency (in Clock Cycles) F = 5 * N * Y + L (where N = Hope, L = length of packet, Y = maximum packet length) Conclusions For the worst-case conditions to occur there must be five maximum length packets trying to use the same virtual channel at each router along the path. This is a very rare case. It is expected that the average case is much closer to the best case latency. MCAPI Multi-core Association Communication API What is it? Lightweight Message Passing Interface Provides Communication Primitives Targeted Towards Embedded SoC’s Physical Communication Medium is a 2-D Mesh Network with 9 Nodes Consisting of a Modified MIPS Core, Network Interface Unit, and an On-Chip Router On-Chip Router Module Critical Unit to Minimizing Latency Wormhole Flow Control Five Physical Channels with Two Virtual Channels Each Single Cycle Data path Design References 1)“Multicore Communications API Specification V1.063,” www.Multicore-Association.org www.Multicore-Association.org 2) “Low-Latency Virtual-Channel Routers for On-Chip Networks,” Mullins, West, and Moore. ISCA 2004. 3) “Communication Performance of Mesh and Ring Based NoCs,” Vaclav Dvorak. 7 th International Conference on Networking. 4) FPGA Development Board Picture Courtesy of www.digilentinc.com www.digilentinc.com Supported by SRC 2008-TJ-1847 and NSF CCF 0811429
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.