Download presentation
Presentation is loading. Please wait.
Published byAdam Myron Gibbs Modified over 9 years ago
1
A Synthesizable Datapath-Oriented Programmable Logic Core Steven J.E. Wilton, Chun Hok Ho, Philip Leong, Wayne Luk, Brad Quinton University of British Columbia and Imperial College
2
Embedded Programmable Logic Cores Embed a small amount of programmable logic onto an ASIC –Postpone some decisions until late in design cycle –Fast upgrade path for products –Embedded Debug:
3
Soft Programmable Logic Cores
4
Advantages –Easy to integrate, reduces design time –Very flexible, can create the exact required core –Easy to migrate to smaller technologies Disadvantages –Inefficient compared to hard cores Our thought –Makes sense if you only want a small core (a few hundred gates)
5
This talk: A new architecture for a synthesizable programmable logic core that supports datapath (bus-based) circuits
6
Previous Synthesizable PLC’s Kim Bozman and Noha Kafafi: LUT-Based Unique Directional Routing Fabric
7
Synthesizable Cores Observation 1: To make it truly synthesizable, must avoid combinational loops in the unprogrammed fabric Observation 2: Each tile need not be identical
8
Previous Synthesizable PLC’s Andy Yan: Product-term Based Logic Block Unique Directional Routing Fabric Supported Sequential Circuits
9
Our Architecture Use it when the PLC is connected to a bus: Bus Observation: These connections are permanently tied to the bus signals, and we know this when the ASIC is designed
10
Logic Architecture
11
Key point: - All bitblocks within a wordblock share same set of configuration bits - Means all bitblocks implement the same function
12
Routing Architecture Key point: Signals are routed as buses
13
Routing Architecture Key point: - Linear array of wordblocks - Buses get wider as we go to the right
14
Routing Architecture Key point: - Linear array of wordblocks - Buses get wider as we go to the right
15
Routing Architecture Key point: - Linear array of wordblocks - Number of buses goes up as we go to the right
16
Datapath Architecture
17
Multipliers Two inputs instead of three Two output buses (MSB, LSB)
18
Add a Control Block Control block is based on P-term fine-grained synthesizable core
19
Example Mapping Monitor two buses: - Count the number of times each bus matches a mask - includes don’t care bits - Count the number of times both buses match the mask at the same time
20
Interesting Questions: 1. How do the various architectural parameters affect density? 2.How does this compare to a fine-grained architecture?
21
Architectural Parameters D Number of Wordblocks (incl. multipliers) N Bit Width M Number of Input Buses R Number of Output Buses F Number of Feedback Paths C Number of Constant Registers A Number of Multipliers P Number of Product-Term Blocks
22
Impact of Number of Word-blocks and bit-width Key Result: Both bit-width and number of wordblocks have a significant impact on area.
23
Impact of the Number of Multipliers Key result: Area increase due to more buses in the routing
24
Impact of the Size of the Control Block Key result: The control block can dominate if it becomes too big
25
Bench- Datapath Fined-Grain ASICFine-Grain/Datapath/ Mark (ours) (PTerm) Datapath ASIC fbly 68,190 132,339,335 9,300 1940 7.33 dotv3 34,119 65,534,780 6,575 1921 5.19 dscg 72,178 116,271,968 9,473 1611 7.62 fir4 76,213 130,971,120 9,843 1718 7.74 egcd 1,225,231 22,776,474 10,420 18.6 117 momul 294,135 11,448,589 7,097 38.9 41 median 142,172 10,733,962 4,420 75.5 32 debug1 87,265 1,302,928 3,484 14.9 25
26
Bench- Datapath Fined-Grain ASICFine-Grain/Datapath/ Mark (ours) (PTerm) Datapath ASIC fbly 68,190 132,339,335 9,300 1940 7.33 dotv3 34,119 65,534,780 6,575 1921 5.19 dscg 72,178 116,271,968 9,473 1611 7.62 fir4 76,213 130,971,120 9,843 1718 7.74 egcd 1,225,231 22,776,474 10,420 18.6 117 momul 294,135 11,448,589 7,097 38.9 41 median 142,172 10,733,962 4,420 75.5 32 debug1 87,265 1,302,928 3,484 14.9 25 Key result 1: Significantly better than fine-grained architecture
27
Bench- Datapath Fined-Grain ASICFine-Grain/Datapath/ Mark (ours) (PTerm) Datapath ASIC fbly 68,190 132,339,335 9,300 1940 7.33 dotv3 34,119 65,534,780 6,575 1921 5.19 dscg 72,178 116,271,968 9,473 1611 7.62 fir4 76,213 130,971,120 9,843 1718 7.74 egcd 1,225,231 22,776,474 10,420 18.6 117 momul 294,135 11,448,589 7,097 38.9 41 median 142,172 10,733,962 4,420 75.5 32 debug1 87,265 1,302,928 3,484 14.9 25 Key result 1: Significantly better than fine-grained architecture Key result 2: Overhead roughly the same as FPGA/ASIC
28
But these results aren’t fair: - For each benchmark, we found the optimum set of architectural parameters. - We need an architecture that works for a variety of circuits
29
Architecture Construction Our thought: - The number of inputs/outputs is fixed by the SoC - The designer has an idea of the size of the programmable logic (number of wordblocks) Fix all other parameters (as a function of # of wordblocks) - eg. fixed ratio between number of multipliers vs. wordblocks fixed ratio between control logic and datapath logic, etc. We arbitrarily chose fixed ratios based on our experience - A full architecture study is left as future work!
30
Bench- Datapath Fined-Grain ASICFine-Grain/Datapath/ Mark (ours) (PTerm) Datapath ASIC fbly 332,091 132,339,335 9,300 399 35.7 dotv3 225,518 65,534,780 6,575 291 34.3 dscg 325,029 116,271,968 9,473 358 34.3 fir4 307,154 130,971,120 9,843 426 31.2 egcd 3,778,611 22,776,474 10,420 6.02 363 momul 486,654 11,448,589 7,097 23.5 68.5 median 194,654 10,733,962 4,420 55.1 44 debug1 119,286 1,302,928 3,484 10.9 34
31
Bench- Datapath Fined-Grain ASICFine-Grain/Datapath/ Mark (ours) (PTerm) Datapath ASIC fbly 332,091 132,339,335 9,300 399 35.7 dotv3 225,518 65,534,780 6,575 291 34.3 dscg 325,029 116,271,968 9,473 358 34.3 fir4 307,154 130,971,120 9,843 426 31.2 egcd 3,778,611 22,776,474 10,420 6.02 363 momul 486,654 11,448,589 7,097 23.5 68.5 median 194,654 10,733,962 4,420 55.1 44 debug1 119,286 1,302,928 3,484 10.9 34
32
Bench- Datapath Fined-Grain ASICFine-Grain/Datapath/ Mark (ours) (PTerm) Datapath ASIC fbly 332,091 132,339,335 9,300 399 35.7 dotv3 225,518 65,534,780 6,575 291 34.3 dscg 325,029 116,271,968 9,473 358 34.3 fir4 307,154 130,971,120 9,843 426 31.2 egcd 3,778,611 22,776,474 10,420 6.02 363 momul 486,654 11,448,589 7,097 23.5 68.5 median 194,654 10,733,962 4,420 55.1 44 debug1 119,286 1,302,928 3,484 10.9 34 Key result 1: Significantly better than fine-grained architecture Key result 2: Overhead roughly the same as FPGA/ASIC
33
625 m
34
Conclusions Our architecture is 6 to 426 x more efficient than fine-grained architecture But, this is only for datapath-oriented circuits. However, this is ok: - In an SoC, we know, when the chip is designed, whether the inputs are buses or bits - If there are buses, use this architecture - If there are not buses, use Andy’s PTerm architecture Final thought: using this architecture, the overhead is similar to that of a normal FPGA. People already accept this!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.