Presentation is loading. Please wait.

Presentation is loading. Please wait.

Floorplanning Optimization with Trajectory Piecewise-Linear Model for Pipelined Interconnects C. Long, L. J. Simonson, W. Liao and L. He EDA Lab, EE Dept.

Similar presentations


Presentation on theme: "Floorplanning Optimization with Trajectory Piecewise-Linear Model for Pipelined Interconnects C. Long, L. J. Simonson, W. Liao and L. He EDA Lab, EE Dept."— Presentation transcript:

1 Floorplanning Optimization with Trajectory Piecewise-Linear Model for Pipelined Interconnects C. Long, L. J. Simonson, W. Liao and L. He EDA Lab, EE Dept. UCLA DAC 2004

2 Outline Motivation Background Trajectory piecewise-linear CPI model CPI-aware floorplanning Experiment results Conclusion and discussions

3 Motivation Traditional design flow – Architecture optimization: minimize CPI – Floorplanning optimization: maximize clock frequency – Architectural optimization is separated from the physical optimization under the assumption that layout does NOT change CPI. ISA Configuration Performance evaluation Architecture optimization Floorplanning optimization

4 Traditional Flow A few years ago: – Clock rates were much lower More time for signal to reach its destination Inductance was less of a factor in delay – Interconnects delay was smaller Less resistance Lower aspect ratio meant less capacitance – Inter-module communication takes less than one cycle Interconnect length used to determine clock period (just clock it faster until it doesn’t work) Floorplanning had no impact on the cycle-by-cycle operation (CPI) of the processor

5 A New Interconnect Centric Reality Now: – Clock rates have increased by an order of magnitude My P2 from 1998 is 400MHz, The Prescott P4 will be 4.0GHz by the fourth quarter of ’04 and has 31 pipeline stages for integer operations, some of which are due to interconnect pipelining exclusively – Interconnects have longer delay with higher aspect ratio – Die size is the same – A signal can take up to ten clock cycles to travel from opposite corner to opposite corner of a chip in 90nm technology – Likely, the inter-module communication may take over one cycle Clock period is now a constraint, not an objective – Interconnect is pipelined when it cannot meet the constraint A pipelined interconnect delays the cycle a signal arrives – Changes the cycle-by-cycle behavior (CPI) of the system – Determined by floorplanning

6 How to solve this problem? Evaluate performance during floorplanning optimization – Efficiency of the evaluation is the key – Cycle-accurate simulation is too slow for this purpose ISA, Configuration Performance evaluation Architecture optimization Floorplanning optimization

7 Contributions of our work We have pointed out that the interconnect latency has a significant impact on architecture performance and it is critical to consider it during floorplanning We have developed an efficient table-based cycle-per- instruction (CPI) model – Called trajectory piece-wise linear (TPWL) model with error less than 3.0% We have Integrated TPWL CPI model with floorplan optimization – To reduce CPI by up to 28.57% with a small area overhead of 5.72%

8 Background Architecture and partitioning – A SuperScalar implementation of the MIPS instruction set – Similar to Alpha 21264 – Twelve blocks BlockArea(mm 2 )BlockArea(mm 2 ) IALU11.00IALU21.00 IALU31.00IMULT1.00 F_ADD1.94F_MULT2.07 RUU3.04Decode1.44 Branch2.27L275.6 IL18.99DL110.03

9 Bus Latency Vectors Interface between physical level and architecture level Twelve buses Bus latency vectors (B) – E.g., B = {3, 4, 7, …} – Characterize a floorplan as a vector containing the latency of each interconnect Bus idTerminalBus idTerminal 1IALU1, RUU7IL1, L2 2IALU2, RUU8DL1, L2 3IALU3, RUU9Branch, IL1 4IMULT, RUU10Decode, Branch 5FPADD, RUU11LSQ, DL1 6FPMUL, RUU12Decode, RUU

10 Miss Events and Performance Loss Types of miss events – Data Cache Miss – Instruction Cache Miss – TLB Miss – Branch Miss Prediction Other sources of performance loss – Data dependencies – Resource Contention

11 Measuring Performance No hardware to measure Need a model of the hardware – Simulate the execution of the machine – Two types of simulation Trace driven simulation – Shade to generate instruction and address trace, dinero to model cache, etc. – Fast, 10s of instructions on host machine per instruction on target machine – Inaccurate good for I-Cache performance loss measurement bad for D-Cache performance loss measurement poor for branch miss prediction performance loss very bad for data dependency performance loss Execution driven simulation – State of target hardware is maintained and updated in memory as each instruction is processed – Slow, ~1000s of instructions on host machine per instruction on target machine – Cycle-accurate, true to cycle by cycle behavior of hardware

12 Cycle Accurate Simulation Given B, compute CPI – Modify the architecture according to B Change the configuration file Insert buffers between modules – Measure CPI for a subset of the SPEC2000 benchmark suite Floating point benchmarks: equake and mesa Integer benchmarks: gzip, vortex and mcf – Take the arithmetic mean of these benchmarks as the CPI for B

13 CPI Models A CPI model estimate CPI under interested parameters such as interconnect latency, architecture configuration, etc. CPI models in the literature – Static simulation [Nussbaum’01] Based on a single detailed simulation Generate a synthetic instruction trace Take advantage of cache and branch prediction statistics – Statistical sampling of cycle accurate simulation Sampling instead of truncating: selectively measuring in detail only an appropriate benchmark subset Configuring a systematic sampling simulation run to achieve a desired confidence in estimates – More efficient than cycle-accurate simulation but slow, none of them consider interconnect latency

14 Traditional floorplanning Optimize floorplan via simulated annealing (SA) algorithm – Objective function: – Moves Change the position or shape of blocks – Cooling scheme Initial temperature Constant cooling rate

15 Floorplanning considering CPI Based on simulated annealing – Objective function: Extend from traditional floorplanning framework Key is to estimate CPI efficiently – Moves and cooling schedule remain the same

16 Trajectory of SA The path that SA follows during optimization is a trajectory in the solution space – We only need to accurately estimate CPI in the area where the trajectory travels The trajectory of SA with objective of area, wire length and CPI is close to that of area and wire length only Area and wire length Area, wire length and CPI Bus1 Bus2

17 Trajectory Piecewise-linear CPI Model Build a piecewise-linear model for a small solution region around the trajectories of SA – Three phases: sampling, collecting and simulating – An example for 2-dimension bus vector Latency (bus1) Latency (bus2) simulation

18 TPWL: Sampling Sample a complete simulated annealing process with objective of area and total wire length to obtain a set of bus latency vectors (points in n-dimension) Latency (bus1) Latency (bus2)

19 TPWL: Collecting Collect all the points obtained in the sampling phase in as few as possible “balls” (TPC problem) Latency (bus1) Latency (bus2)

20 TPWL: Simulating Obtain CPI by cycle accurate simulation for the center of “balls” Build a CPI table indexed by these center points Latency (bus1) Latency (bus2) simulation

21 CPI estimation under TPWL model Based on each entry, CPI of target B could be estimated by first order expansion For each entry, a weight is calculated based on the distance between the target B and the entry in CPI table The final estimation is the weighted sum of the estimation based on each entry d1d1 d2d2 d3d3 d4d4 d5d5 B B1B1 B2B2 B3B3 B4B4 B5B5

22 CPI-aware Floorplanning- Overview Integrate the TPWL CPI model with a traditional floorplanning tool Start Floorplanning Trajectory Sampling “Balls” to cover trajectory Solve the TPC problem CPI Table Cycle-accurate simulation Floorplanning considering CPI Integrate to floorplanning

23 Iterative TPWL model When the trajectory with objective of area and total wire length is significantly different from the trajectory with objective of area, total wire length and CPI, an iterative TPWL model is needed Area and wire length Bus1 Bus2 iteration = 1 iteration = 2 Area, wire length and CPI

24 Iterative TPWL Model Iteratively expand the CPI table to build a iterative TPWL (iTPWL) model – Based on the TPWL model but from the second iteration one, the objective of SA is area, total wire length and CPI – Improve the accuracy of CPI estimation and the quality of the final floorplan Start Floorplanning Trajectory Sampling “Balls” to cover trajectory Solve the TPC problem CPI Table Cycle-accurate simulation Floorplanning considering CPI Integrate to floorplanning

25 Summary on TPWL CPI Model Originally proposed for modeling non-linear systems [Rewienski’03] – Outperforms other techniques based on quadratic reduction TPWL model is suitable for floorplanning optimization – The trajectory of SA with objective of area, total wire length and CPI is close to that with objective of area and total wire length only – When these two trajectories are not close, iTPWL model is employed to improve the accuracy Contribution of this paper on TPWL model – Introduce the TPC problem – Expand TPWL model to iTPWL model

26 Experiment results Verification of CPI models – Error of TPWL model: 2.62%; Error of iTPWL model: 1.66%

27 Impact of models to final floorplans Comparison of the floorplans obtained by access ratio, sensitivity rate model, TPWL and iTPWL model with objective of area, total wire length and CPI – Access ratio: Use access ratio of interconnects to represent the impact to system performance – Estimate CPI based on first order expansion on the original point

28 Floorplanning with iTPWL Model Comparison between floorplans obtained by different objectives

29 Running time Simple-scalar simulation times to build up the TPWL and iTPWL model

30 Conclusion and discussion Propose an accurate CPI model with less than 3.0% error The CPI-aware floorplaner reduce CPI by 28.57% with a small area overhead of 5.72% Expand the TPWL model and improve the accuracy of estimation the accuracy of iTPWL model leads to floorplanning solutions with high quality and enables us to develop good heuristics, such as access ratio, to minimize CPI without explicit CPI calculation. Plan to apply this model to architecture changes


Download ppt "Floorplanning Optimization with Trajectory Piecewise-Linear Model for Pipelined Interconnects C. Long, L. J. Simonson, W. Liao and L. He EDA Lab, EE Dept."

Similar presentations


Ads by Google