Download presentation
Presentation is loading. Please wait.
Published byDouglas Wilkins Modified over 8 years ago
1
VADA Lab.SungKyunKwan Univ. 1 L5:Lower Power Architecture Design 1999. 8.2 성균관대학교 조 준 동 교수 http://vada.skku.ac.kr
2
VADA Lab.SungKyunKwan Univ. 2 Architectural-level Synthesis Translate HDL models into sequencing graphs. Behavioral-level optimization: –Optimize abstract models independently from the implementation parameters. Architectural synthesis and optimization : –Create macroscopic structure: data-path and control-unit. –Consider area and delay information Hardware compilation: –Compile HDL model into sequencing graph. –Optimize sequencing graph. –Generate gate-level interconnection for a cell library. of the implementation.
3
VADA Lab.SungKyunKwan Univ. 3 Architecture-Level Solutions Architecture-Driven Voltage Scaling: Choose more parallel architecture, Lowering V dd reduces energy, but increase delays Regularity: to minimize the power in the control hardware and the interconnection network. Modularity: to exploit data locality through distributed processing units, mem- ories and control. –Spatial locality: an algorithm can be partitioned into natural clusters based on connectivity – Temporal locality:average lifetimes of variables (less temporal storage, probability of future accesses referenced in the recent past). Few memory references: since references to memories are expensive in terms of power. Precompute physical capacitance of Interconnect and switching activity (number of bus accesses
4
VADA Lab.SungKyunKwan Univ. 4 Power Measure of P
5
VADA Lab.SungKyunKwan Univ. 5 Architecture Trade-off Reference Data Path
6
VADA Lab.SungKyunKwan Univ. 6 Parallel Data Path
7
VADA Lab.SungKyunKwan Univ. 7 Pipelined Data Path
8
VADA Lab.SungKyunKwan Univ. 8 A Simple Data Path, Result4
9
VADA Lab.SungKyunKwan Univ. 9 Uni-processor Implementation
10
VADA Lab.SungKyunKwan Univ. 10 Multi-Processor Implementation
11
VADA Lab.SungKyunKwan Univ. 11 Datapath Parallelization
12
VADA Lab.SungKyunKwan Univ. 12 Memory Parallelization At first order P= C * f/2 * Vdd 2
13
VADA Lab.SungKyunKwan Univ. 13 VLIW Architecture
14
VADA Lab.SungKyunKwan Univ. 14 VLIW - cont. Compiler takes the responsibility for finding the operations that can be issued in parallel and creating a single very long instruction containing these operations. VLIW instruction decoding is easier than superscalar instruction due to the fixed format and to no instruction dependency. The fixed format could present more limitations to the combination of operations. Intel P6: CISC instructions are combined on chip to provide a set of micro-operations (i.e., long instruction word) that can be executed in parallel. As power becomes a major issue in the design of fast -Pro, the simple is the better architecture. VLIW architecture, as they are simpler than N-issue machines, could be considered as promising architectures to achieve simultaneously high-speed and low-power.
15
VADA Lab.SungKyunKwan Univ. 15 Synchronous VS. Asynchronous Synchronous system: A signal path starts from a clocked flip- flop through combinational gates and ends at another clocked flip- flop. The clock signals do not participate in computation but are required for synchronizing purposes. With advancement in technology, the systems tend to get bigger and bigger, and as a result the delay on the clock wires can no longer be ignored. The problem of clock skew is thus becoming a bottleneck for many system designers. Many gates switch unnecessarily just because they are connected to the clock, and not because they have to process new inputs. The biggest gate is the clock driver itself which must switch. Asynchronous system (self-timed): an input signal (request) starts the computation on a module and an output signal (acknowledge) signifies the completion of the computation and the availability of the requested data. Asynchronous systems are potentially response to transitions on any of their inputs at anytime, since they have no clock with which to sample their inputs.
16
VADA Lab.SungKyunKwan Univ. 16 Asynchronous - Cont. More difficult to implement, requiring explicit synchronization between communication blocks without clocks If the signal feeds directly to conventional gate-level circuitry, invalid logic levels could propagate throughout the system. Glitches, which are filtered out by the clock in synchronous designs, may cause an asynchronous design to malfunction. Asynchronous designs are not widely used, designers can't find the supporting design tools and methodologies they need. DCC Error Corrector of Compact cassette player saves power of 80% as compared to the synchronous counterpart. Offers more architectural options/freedom encourages distributed, localized control offers more freedom to adapt the supply voltage S. Furber, M. Edwards. “Asynchronous Design Methodologies”. 1993
17
VADA Lab.SungKyunKwan Univ. 17 Asynchronous design with adaptive scaling of the supply voltage (a) Synchronous system (b) Asynchronous system with adaptive scaling of the supply voltage
18
VADA Lab.SungKyunKwan Univ. 18 Asynchronous Pipeline
19
VADA Lab.SungKyunKwan Univ. 19 PIPELINED SELF-TIMED micro P
20
VADA Lab.SungKyunKwan Univ. 20 Hazard-free Circuits 6% more logics
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.