Sequential Execution Example of three micro-operations in the same clock period.

Sequential Execution Example of three micro-operations in the same clock period

Insertion of Latch (out) Insertion of latches at the output ports of the functional units

Insertion of Latch (in/out) Insertion of latches at both the input and output ports of the functional units

Overlapping Data Transfer(in) Overlapping read and write data transfers

Overlapping of Data Transfer (in/out) Overlapping data transfer with functional-unit execution

Register Allocation Using Clique Partitioning Scheduled DFG Graph model Lifetime intervals of variable Clique-partitioning solution

Left-Edge Algorithm Register allocation using Left-Edge Algorithm

Register Allocation: Left-Edge Algorithm Sorted variable lifetime intervalsFive-register allocation result

Register Allocation Allocation : bind registers and functional modules to variables and operations in the CDFG and specify the interconnection among modules and registers in terms of MUX or BUS. Reduce capacitance during allocation by minimizing the number of functional modules, registers, and multiplexers. Composite weight w.r.t transition activity and capacitance loads is incorporated into CDFG. Find the highest composite weight and merge the two nodes it joins, i.e., maps the corresponding variable to the same register. Allocation continues till no edges are left in the CDFG while updating the composite weight values. Set the maximum # of operations alive in any control step to be one. Sequence operations/variables to enhance signal correlations

Exploiting spatial locality for interconnect power reduction A spatially local cluster: group of algorithm operations that are tightly connected to each other in the flowgraph representation. Two nodes are tightly connected to each other on the flowgraph representaion if the shortest distance between them, in terms of number of edges traversed, is low. A spatially local assignment is a mapping of the algorithm operations to specific hardware units such that no operations in different clusters share the same hardware. Partitioning the algorithm into spatially local clusters ensures that the majority of the data transfers take place within clusters (with local bus) and relatively few occur between clusters (with global bus). The partitioning information is passed to the architecture netlist and floorplanning tools. Local: A given adder outputs data to its own inputs Global: A given adder outputs data to the aother adder's inputs

Hardware Mapping The last step in the synthesis process maps the allocated, assigned and scheduled flow graph (called the decorated flow graph) onto the available hardware blocks. The result of this process is a structural description of the processor architecture, (e.g., sdl input to the Lager IV silicon assembly environment). The mapping process transforms the flow graph into three structural sub-graphs: the data path structure graph the controller state machine graph the interface graph (between data path control inputs and the controller output signals)

Spectral Partitioning in High-Level Synthesis The eigenvector placement obtained forms an ordering in which nodes tightly connected to each other are placed close together. The relative distances is a measure of the tightness of connections. Use the eigenvector ordering to generate several partitioning solutions The area estimates are based on distribution graphs. A distribution graph displays the expected number of operations executed in each time slot. Local bus power: the number of global data transfers times the area of the cluster Global bus power: the number of global data transfer times the total area:

Finding a good Partition

Interconnection Estimation For connection within a datapath (over-the-cell routing), routing between units increases the actual height of the datapath by approximately 20-30% and that most wire lengths are about 30-40% of the datapath height. Average global bus length : square root of the estimated chip area. The three terms represent white space, active area of the components, and wiring area. The coefficients are derived statistically.

Incorporating into HYPER-LP

Experiments

Datapath Generation Register file recognition and the multiplexer reduction: – Individual registers are merged as much as possible into register files –reduces the number of bus multiplexers, the overall number of busses (since all registers in a file share the input and output busses) and the number of control signals (since a register file uses a local decoder). Minimize the multiplexer and I/O bus, simultaneously (clique partitioning is Np-complete, thus Simulated Annealing is used) Data path partitioning is to optimize the processor floorplan The core idea is to grow pairs of as large as possible isomorphic regions from corresponding of seed nodes.

Hardware Mapper

Test Example

Control Signal Assignment

- 설계 자동화 연구실 - Efficient High Level Synthesis Algorithm for Lower Power Design 1998.5.19 임세진, 조 준 동

- 설계 자동화 연구실 - 목차 상위 수준 합성 기존의 상위 수준의 저전력 방법 최소 비용 할당 알고리즘 ( Minimum Cost Flow Algorithm ) 저전력을 위한 스케쥴링 레지스터 리소스 할당 방법 실험 방법 및 결과 결론

- 설계 자동화 연구실 - 저전력 설계의 필요성  현재 IC 회로의 전력 소모의 계속적인 증가 - Single Chip 에서의 트랜지스터 수의 증가 - 회로의 복잡한 기능의 증가 - 클럭 속도의 증가 최근 저 전력 필요하는 시스템 등장 - 휴대용 셀룰러 전화기, 호출기, 노트북 컴퓨터, PDA LCDs 등의 Battery 전원의 제품 등장 - ULSI Microprocessors - Parallel Computer  기타 - 특수 cooling 장치의 고비용과 제한된 회로의 열 발산 - Battery 수명의 느린 증가

Sequential Execution Example of three micro-operations in the same clock period.

Similar presentations

Presentation on theme: "Sequential Execution Example of three micro-operations in the same clock period."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Sequential Execution Example of three micro-operations in the same clock period.

Similar presentations

Presentation on theme: "Sequential Execution Example of three micro-operations in the same clock period."— Presentation transcript:

Similar presentations

About project

Feedback