Download presentation
Presentation is loading. Please wait.
Published byAnnabel Bradley Modified over 9 years ago
1
TM Efficient IP Design flow for Low-Power High-Level Synthesis Quick & Accurate Power Analysis and Optimization Flow JAN.20.2014 Asher Berkovitz Yaniv Fais
2
1 © 2014 Freescale Semiconductor, Inc. | External Use Authors Contact Details Asher Berkovitz Asher.Berkovitz@freescale.com +972- 09-9522511 Yaniv Fais Yaniv.Fais@freescale.com +972- 09-9522179 Freescale Semiconductor Israel Herzelia Shenkar 3
3
2 © 2014 Freescale Semiconductor, Inc. | External Use Outline Challenges High Level Synthesis flow Power Efficiency − Problems at RTL Proposed VSIM++ Flow − Analysis − Optimization − Results on Networking Algorithm (Non-Abstract Version) Conclusions
4
3 © 2014 Freescale Semiconductor, Inc. | External Use Challenges IP blocks for networking types of applications need to meet tight power consumptions while meeting aggressive performance requirements. Making changes to micro architectures and other high abstraction modeling styles could deliver the largest benefits on overall power. It is hard to accurately measure power at higher abstractions. Measuring accurate power upon signoff is late in the design process when high level changes are impossible
5
4 © 2014 Freescale Semiconductor, Inc. | External Use High Level Synthesis design Flow Algorithms Definition Algorithms Definition Macro-Architecture Definition RTL2GDSII “Normal” flow RTL2GDSII “Normal” flow RTL Macro-architecture definition: Based on an accelerator base class Uses unified modules (FIFOs, interfaces etc) Commands (uArch) Cell library (.lib) Bit-exact SystemC ® Model SystemC ® Model: Architecture evaluation and RTL generation Accurate data path description according to macro-architecture Design to meet processing requirements HLS: Builds pipelined data path and control logic Considers real timings during RTL generation Explore implementation tradeoffs SystemC ® RTL Quick explore (Timing/Area)
6
5 © 2014 Freescale Semiconductor, Inc. | External Use Power Dissipation Static Power - ~test independent Dynamic Power – highly dependent on application (Signal Transition) Signal transitions can be divided to: − Functional change − Glitch (signal changes that which not captured by a sequential element) Glitches are not visible in RTL simulation and can contribute ~20% to power dissipation
7
6 © 2014 Freescale Semiconductor, Inc. | External Use Fast & Accurate power analysis flow (VSIM) Quick Physical Design (PD) flow: − Timing violations allowed − DRC violations allowed − Less than 100% RTL to GL equivalence Costumed test bench enables Cycle accurate Gate Level Simulation Power analysis is performed using gate level netlist & parasitics file. Power analysis results are mapped backed to RTL netlist. Quick PD flow RTL DB Power Analysis GLV simulation Test bench generation Mapping GL 2 RTL
8
7 © 2014 Freescale Semiconductor, Inc. | External Use Test Bench Generation Based on RTL to GL mapping, force RTL values on GLV simulation Advantages: QD Std’ test bench QD “VSIM” test bench Force the RTL value on the key point Timing violation! QD Short run time: Simulate selected window Force correct value @ time point X QD GL delay for logic cones (SDF) QD QD QD Values are a bit “off” Correct values forced GL & SDF
9
8 © 2014 Freescale Semiconductor, Inc. | External Use Cond_0 Gate level results mapping to RTL netlist reg cond[1:0] reg count[1:0] always @(posedge clk) if (condition == 2’b11) count = count + 1; RTL netlist GL netlist 26 29 Cond_1 count_1count_0 Clock Gate 1.Map RTL 2 GL 2.For each unmapped GL instance: Divide the power between drive/load key points 3.Assign GL key point power to RTL key point 4.The power of each RTL hierarchy is the sum of power assigned to its key point 4 8 10 2 1 11 1 1 13 1415 11
10
9 © 2014 Freescale Semiconductor, Inc. | External Use Mapping results to high-level language (VSIM++) Using annotation of C++ class names, variable names as well as file name/line numbers we can map power consumption from the accurate gate-level to the C++. This capability allows us to: − Analyze and fix clock gating − Redesign “power hungry” resources − Consider different architectures reg my_var_Ln123[1:0] reg count_Ln124[1:0] always @(posedge clk) if (my_var_Ln123 == 2’b11) count_Ln124 = count_Ln124 + 1; reg my_var_Ln123[1:0] reg count_Ln124[1:0] always @(posedge clk) if (my_var_Ln123 == 2’b11) count_Ln124 = count_Ln124 + 1; RTL netlist void process() { … while (true) { if (my_var==3) count++; … } void process() { … while (true) { if (my_var==3) count++; … } C++ code 121: 122: 123: 124: 125: 126: 127: 121: 122: 123: 124: 125: 126: 127: Line #
11
10 © 2014 Freescale Semiconductor, Inc. | External Use DFF Example problem identified Tool inserts “clock gating” enabler code for RTL automatically always @(posedge clk) if (en) data[511:0] <= new_data; always @(posedge clk) if (en) data[511:0] <= new_data; C++ process condition HLS DFF clk en new_data data Gate-Level implementation is not implemented as gated clock but as data logic due to timing violations Solution – Simplify clock gating enablers to meet timing constraints
12
11 © 2014 Freescale Semiconductor, Inc. | External Use Clock gating enabler simplification DFF Hash Key clk en new_data data DFF Header DFF Process control DFF Hash Key clk en new_data data DFF Process control Original clock gating scheme – Complicated enable logic Synthesized to non efficient enabler Simplified clock gating scheme – Enable synthesized w/o changes Leading to high clock gating efficiency
13
12 © 2014 Freescale Semiconductor, Inc. | External Use Conclusions Use High Level Synthesis for IP Design − Quick and easy to explore architecture alternatives − Quick front-end flow including verification Power analysis: − Measure power on system level scenario − Quick (doesn’t require full physical design flow convergence) − Accurate (done on gate-level) Analysis and Optimization in high-level design (C++) − Manual clock gating enable setting reduced dynamic power consumption by 19.4% Early in the design cycle : Easy to change IP architecture !
14
13 © 2014 Freescale Semiconductor, Inc. | External Use Backup
15
14 © 2014 Freescale Semiconductor, Inc. | External Use Accuracy Measured using similar methodology on a different design Si measurement compared to full T/O gate level data TestDynamic power accuracy Single core Fast Fourier Transform-7.59% Single core Fast Fourier Transform No memory miss -8.40% Dual core Fast Fourier Transform7.57%
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.