Download presentation
Presentation is loading. Please wait.
Published byAnthony Hill Modified over 9 years ago
1
Paper Review I Coarse Grained Reconfigurable Arrays Presented By: Matthew Mayhew I.D.# 0234815 ENG*6530 Tues, June, 10, 2008 1
2
References 1. Link 2: Chapter 2: Coarse-Grained Reconfigurable Architectures 2. Parizi, H.; Niktash, A.; Bagherzadeh, N,; Kurdahi, F.; MorphoSys: A Coarse Grain Reconfigurable Architecture for Multimedia Applications, Euro-Par 2002 Parallel Processing. 8th International Euro-Par Conference. Proceedings (Lecture Notes in Computer Science Vol.2400), 2002, p 844-8 2
3
References Cont. 3. Sadasivam, M.; Hong, S.; Application Specific Coarse-Grained FPGA for Processing Element in Real-Time Parallel Particle Filters, Proceedings 3rd IEEE International Workshop on System-on-Chip for Real-Time Applications, 2003, p 116-19 4. Veredes, F,; Scheppler, M.; Moffat, W.; Mei, B.; Custom Implementation of the Coarse-Grained Reconfigurable ADRES Architecture for Multimedia Purposes, Proceedings. 2005 International Conference on Field Programmable Logic and Applications (IEEE Cat. No.05EX1155), 2005, p 106- 11 3
4
Overview Introduction Basic Concepts Classifications General Architectures Research Architectures MorphoSys Architecture for Dynamically Reconfigurable Embedded System (ADRES) Coarse Grained FPGA for parallel partical processing Project Summary 4
5
Problems with Fine Grained FPGAs Wide datapaths constructed of bit level elements to allow for processing on individual bits. Requires a high volume of reconfiguration data for the processing elements and routing switches. Difficulty in mapping from high level languages due to the difference in granularity. 5
6
Coarse Grained Architectures Constructed from multi-bit wide datapaths and complex operators. Wide datapath allows for the implementation of complex operators, reducing routing overhead Connections in CGRA processing elements have widths of multiple bits. As such, each connection takes more area, but fewer connections are needed. 6
7
Classification of Architectures Coarse Grained Architectures are classified based on three criteria: Interconnect Structure Mesh-based Linear Array Crossbar Datapath Width Tradeoff between flexibility and area consumption Reconfiguration Method Static Dynamic 7
8
Basic Architectures: Mesh-Based Processing Elements arranged in a rectangular array with horizontal and vertical connections. 8
9
Mesh-Based Continued Structure allows for good parallelism and use of communication resources. Requires good tools for Place and Route. Arrangement encourages Nearest Neighbour (NN) links, but generally has lines for longer connections. 9
10
Basic Architectures: Linear Array Processing elements arranged in a linear fashion with neighbours generally connected. Generally designed for the implementation of pipelined processes. 10
11
Basic Architectures: Crossbar All Processing Elements connected by a matrix of switches, allowing for arbitrary connections. Simple routing task. Due to implementation restrictions, reduced crossbar more common with clusters connected. 11
12
MorphoSys Designed to handle multimedia applications. Due to varied tasks and a large amount of input/output data, ASIC solutions are generally expensive to develop and GPPs ineffecient. Currently in version M2, with research ongoing. 12
13
System Architecture The system level architecture of the MorphoSys system is shown below: Parizi, H.; Niktash, A.; Bagherzadeh, N.; Kurdahi, F. 13
14
RC Cell Architecture The layout of an individual reconfigurable cell is shown below: Parizi, H.; Niktash, A.; Bagherzadeh, N.; Kurdahi, F. 14
15
Benefits of MorphoSys Combination of both fine and coarse grained reconfigurable elements allow for customization and optimization depending on the application. Memory structure designed to accommodate the high demand for data movement in multimedia applications. 15
16
Evaluation Tested with several operations common in multimedia and DSP applications. Tested against dedicated DSP boards. Parizi, H.; Niktash, A.; Bagherzadeh, N.; Kurdahi, F. 16
17
ADRES Designed to achieve specified performance and power consumption targets for portable wireless media applications. Test application for the architecture was an H.264/AVC decoder. The ADRES architecture consists of a VLIW processor coupled with an array of coarse grained processing cells for acceleration. 17
18
ADRES Architecture VLIW processor optimized for load/store and control operations. The accelerator component optimized for data-flow with branching supported. Each reconfigurable cell contains a local register file, allowing for iterative data processing and data delay. Each reconfigurable cell can communicate with all cells in its row and column, as well as neighbouring cells within its quadrant. 18
19
System Level View When running in acceleration mode, an 8x8 array can be formed by configuring the VLIW elements. Veredes, F.; Scheppler, M.; Moffat, W.; Mei, B. 19
20
ADRES Reconfigurable Cell While the configuration memory is assumed to be static during execution, dynamic reconfiguration is possible using a pointer. Veredes, F.; Scheppler, M.; Moffat, W.; Mei, B. 20
21
Performance and Implementation ADRES found to be 88% faster overall in a full decoding cycle than a standard VLIW processor. Layout study performed using 0.13 μm technology standard cells. Each reconfigurable cell consumes approximately 0.196 mm 2. Configuration memory accounts for around 50% of a cell, with 83% of the area in the full implementation used for various storage elements. 21
22
Parallel Particle Filter Processor Particle filters are used in non-linear problems where the goal is to track or detect dynamic signals. Target application of designed system is the real-time tracking of a ball-bearing, where the goal is to determine the coordinates and velocity of the target using a given input angle. Need to generate new particles, determine appropriate weights, and resample. 22
23
Operations Both the generation of new particles and determining the weights are performed using processing elements. This involves the calculation of w(m), which is the weight of a particle, and f(m), which is determined by the application. 23
24
System Level Architecture Consists of both parallel and sequential data flow, with a buffer to synchronize their behaviour. Sadasivam, M.; Hong, S. 24
25
Sequential Flow Reconfigurable Slice (SFRS) Responsible for the calculation of f(m), with direct access to the buffer unit. Sadasivam, M.; Hong, S. 25
26
Parallel Flow Reconfigurable Slice (PFRS) Handles updating, creating, and outputting the particles. Sadasivam, M.; Hong, S. 26
27
Reconfiguration The architecture can be altered by changing: The way in which particles are generated The way in which particles update The output method The update of particles can be altered by reconfiguring the CORDIC unit used in the calculation of f(m), which also stores needed constants and MUX controls. The control unit is used to control the interconnects in the SFRS to implement the desired function. 27
28
Performance Tested against both a DSP processor and a general purpose FPGA. It should be noted that the authors reported problems in terms of having enough logic elements to map all the required PEs on the general purpose FPGA. The results are shown in the table below for the calculation times of both f(m) and w(m). 28
29
Conclusions Coarse Grained reconfigurable architectures generally used in either calculation or I/O heavy applications. Not single best design, with the architecture layout highly dependent on design goals. Performance generally favourable when compared to dedicated processors and general purpose FPGAs. 29
30
Project Goal: Implementation of the Advanced Encryption Standard (AES) algorithm using VHDL. Secondary Goal: Implement the algorithm in such a way as to reduce the area consumption and computation time. 30
31
Progress Algorithm examined in terms of where parallelism and alternative implementations can be considered. While individual rounds must be performed sequentially, “blocks” of data within a given operation can be acted upon in parallel. Implementation of the S-box and MixColumns operations crucial to a good application. 31
32
Thank you for your time. Questions? 32
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.