Power Efficient Rapid System Prototyping Using CoDeL: The 2D DWT Using Lifting Nainesh Agarwal & Nikitas Dimopoulos University of Victoria, Canada PacRim, August, 2005
PacRim /21/2015 Outline Motivation Power Dissipation Clock Gating Hardware Description Languages System Level Design Languages CoDeL Power Savings Analysis Framework Evaluation: DWT Conclusion
PacRim /21/2015 Motivation Increase in portable systems that run on batteries, such as cell phones, PDAs, digital cameras DSP techniques needed to process data, and transmit or display this data As processing algorithms become complex, power requirements increase Higher power requirements means Low battery life Expensive cooling and packaging techniques, which may increase the size of the device Lower circuit density Shorter component life
PacRim /21/2015 Motivation (contd.) Long design cycles for hardware architectures Can take up to a year for a team of engineers to develop an ASIC Emergence of System-Design Languages (SLDLs) Do not address power dissipation Power efficient architecture design is tricky by hand and requires even longer lead times.
PacRim /21/2015 Power Dissipation CMOS circuits Static Dissipation Steady state No Switching Dynamic Dissipation Switching Changes in digital state
PacRim /21/2015 Static Dissipation Ideal static dissipation = 0 Reverse biased diodes between pn junctions Sub-threshold current when gate to source voltage is below the threshold Becoming significant Source: Kursun and Friedman, Sleep Switch Dual Threshold Voltage Domino Logic with Reduced Standby Leakage Current. IEEE Trans. VLSI, Vol. 12, No. 5, May 2004.
PacRim /21/2015 Dynamic Dissipation Short-circuit dissipation When both n- and p-type transistors are on for a brief moment, there is a short current pulse Not significant Current required to charge and discharge the capacitive load Significant Activity factor Capacitive load Source voltage Circuit frequency
PacRim /21/2015 Clock Gating Reduce dynamic power dissipation Reduce the clock switching activity Enable clock only when a useful write is needed
PacRim /21/2015 Hardware Description Languages Describe the temporal and spatial behaviour of a circuit Common targets: ASIC and FPGA VHDL and Verilog Design at Register Transfer Level (RTL) Abstraction level too low
PacRim /21/2015 System Level Design Languages Started late 1990s Provide a high level of abstraction for system development Categories Extend existing HDLs: SystemVerilog Extend existing software languages: SystemC, SpecC, Handel-C, JHDL Newly created languages: Rosetta, CoDeL Algorithmic level design Only CoDeL and Handel-C Assembly Language High Level Languages: C, Java HDL (RTL) SLDL Higher Abstraction Fast development Easy to learn Platform independence
PacRim /21/2015 CoDeL - Overview CoDeL (Controller Description Language), targets the specification and design at the behavioral level. Order of the statements implicitly represents the sequence of activities. Extracts the data and control flow from the program automatically, assigns the necessary hardware blocks and exploits inherent parallelism. Similar to the C language, so easy to learn. Includes a library of I/O protocols that simplify (sub)system interaction. Compiler produces synthesizable VHDL code which can be targeted to any technology including FPGA or ASIC.
PacRim /21/2015 CoDeL – Ports and Protocols CoDeL abstracts module interaction through ports and protocols. Protocols define the sequence of events necessary to transfer information from one module to another.
PacRim /21/2015 CoDeL – Simple Counter A very simple counter
PacRim /21/2015 CoDeL – Clock Gating Example shows write in state x Gate turned on in state x-1, off in state x+1 State x - 1 State xState x + 1 Clk Enable GClk Data Latched
PacRim /21/2015 Power Savings Analysis Framework Power saved = + Power saved in avoiding useless switching + Power saved in avoiding clock switching - Power required for clock gating (overhead)
PacRim /21/2015 Evaluation: 2D DWT Key component in JPEG2000 image compression Lossy compression using MIT 9/7 wavelet Lossless compression using Le Gall 5/3 integer- to-integer wavelet Integer to integer mapping No quantization needed Exact recovery of input signal
PacRim /21/2015 DWT Structure Successive pair of low-pass and high-pass filters, followed by factor 2 down-sampling Analysis stage decomposes, while synthesis reconstructs h 0 is the low-pass filter and h 1 is the high-pass filter Low-pass signal recursively decomposed for full, dyadic transform h0h0 h0h0 h1h1 h1h1 22 22 g0g0 g0g0 g1g1 g1g1 22 22 x(n) ^ Analysis Filter BankSynthesis Filter Bank
PacRim /21/2015 DWT - Lifting Reduction in memory and computational complexity In-place computation of the wavelet coefficients Output is identical to a direct filter bank convolution Lazy Transform Predict Update - + Input Low-pass output High-pass output Even samples Odd samples Predict Update
PacRim /21/2015 Implementation fStart fReady Start Ready Start Ready iStart iReady EndPt Step EndPt Step EndPt Step Start Ready N (Cols) DWT Module Synthesis Filter Bank Module Analysis Filter Bank Module Forward/Inverse M (Rows) Size (M*N) StartPt Register File
PacRim /21/2015 Code Complexity Analysis and synthesis filter bank modules 120 lines of CoDeL code each Generate about 1000 lines of VHDL code each DWT module 110 lines of CoDeL code Generates 560 lines of VHDL Synthesized on a Xilinx 2v2000ff896-4 FPGA About 7% area used Maximum clock frequency of 103 MHz Eight element DWT takes 3.9μs
PacRim /21/2015 Power Savings Estimation No useless switching found Analysis & Synthesis filter bank modules 85% area 17% power saved DWT modules 15% area 8% power saved Use area complexity as an approximation for power complexity 16% total power saved
PacRim /21/2015 Future Work Clock gating Verify analytical framework using simulation and ASIC implementation Efficient clock gating mechanism CoDel compiler Automated clock gating Register and state reuse Allow explicit parallelism (similar to technique used in OpenMP and Handel-C)
PacRim /21/2015 Questions