Copyright 2004 David J. Lilja1 Design of Experiments Goals Terminology Full factorial designs m-factor ANOVA Fractional factorial designs Multi-factorial designs
Copyright 2004 David J. Lilja2 Recall: One-Factor ANOVA Separates total variation observed in a set of measurements into: 1. Variation within one system Due to random measurement errors 2. Variation between systems Due to real differences + random error Is variation(2) statistically > variation(1)? One-factor experimental design
Copyright 2004 David J. Lilja3 ANOVA Summary
Copyright 2004 David J. Lilja4 Generalized Design of Experiments Goals Isolate effects of each input variable. Determine effects of interactions. Determine magnitude of experimental error Obtain maximum information for given effort Basic idea Expand 1-factor ANOVA to m factors
Copyright 2004 David J. Lilja5 Terminology Response variable Measured output value E.g. total execution time Factors Input variables that can be changed E.g. cache size, clock rate, bytes transmitted Levels Specific values of factors (inputs) Continuous (~bytes) or discrete (type of system)
Copyright 2004 David J. Lilja6 Terminology Replication Completely re-run experiment with same input levels Used to determine impact of measurement error Interaction Effect of one input factor depends on level of another input factor
Copyright 2004 David J. Lilja7 Two-factor Experiments Two factors (inputs) A, B Separate total variation in output values into: Effect due to A Effect due to B Effect due to interaction of A and B (AB) Experimental error
Copyright 2004 David J. Lilja8 Example – User Response Time A = degree of multiprogramming B = memory size AB = interaction of memory size and degree of multiprogramming B (Mbytes) A
Copyright 2004 David J. Lilja9 Two-factor ANOVA Factor A – a input levels Factor B – b input levels n measurements for each input combination abn total measurements
Copyright 2004 David J. Lilja10 Two Factors, n Replications Factor A 12…j…a Factor B 1……………… 2……………… ………………… i………y ijk …… ………………… b……………… n replications
Copyright 2004 David J. Lilja11 Recall: One-factor ANOVA Each individual measurement is composition of Overall mean Effect of alternatives Measurement errors
Copyright 2004 David J. Lilja12 Two-factor ANOVA Each individual measurement is composition of Overall mean Effects Interactions Measurement errors
Copyright 2004 David J. Lilja13 Sum-of-Squares As before, use sum-of-squares identity SST = SSA + SSB + SSAB + SSE Degrees of freedom df(SSA) = a – 1 df(SSB) = b – 1 df(SSAB) = (a – 1)(b – 1) df(SSE) = ab(n – 1) df(SST) = abn - 1
Copyright 2004 David J. Lilja14 Two-Factor ANOVA
Copyright 2004 David J. Lilja15 Need for Replications If n=1 Only one measurement of each configuration Can then be shown that SSAB = SST – SSA – SSB Since SSE = SST – SSA – SSB – SSAB We have SSE = 0
Copyright 2004 David J. Lilja16 Need for Replications Thus, when n=1 SSE = 0 → No information about measurement errors Cannot separate effect due to interactions from measurement noise Must replicate each experiment at least twice
Copyright 2004 David J. Lilja17 Example Output = user response time (seconds) Want to separate effects due to A = degree of multiprogramming B = memory size AB = interaction Error Need replications to separate error B (Mbytes) A
Copyright 2004 David J. Lilja18 Example B (Mbytes) A
Copyright 2004 David J. Lilja19 Example
Copyright 2004 David J. Lilja20 Conclusions From the Example 77.6% (SSA/SST) of all variation in response time due to degree of multiprogramming 11.8% (SSB/SST) due to memory size 9.9% (SSAB/SST) due to interaction 0.7% due to measurement error 95% confident that all effects and interactions are statistically significant
Copyright 2004 David J. Lilja21 Generalized m-factor Experiments Effects for 3 factors: A B C AB AC BC ABC
Copyright 2004 David J. Lilja22 Degrees of Freedom for m- factor Experiments df(SSA) = (a-1) df(SSB) = (b-1) df(SSC) = (c-1) df(SSAB) = (a-1)(b-1) df(SSAC) = (a-1)(c-1) … df(SSE) = abc(n-1) df(SSAB) = abcn-1
Copyright 2004 David J. Lilja23 Procedure for Generalized m-factor Experiments 1. Calculate (2 m -1) sum of squares terms (SSx) and SSE 2. Determine degrees of freedom for each SSx 3. Calculate mean squares (variances) 4. Calculate F statistics 5. Find critical F values from table 6. If F(computed) > F(table), (1- α) confidence that effect is statistically significant
Copyright 2004 David J. Lilja24 A Problem Full factorial design with replication Measure system response with all possible input combinations Replicate each measurement n times to determine effect of measurement error m factors, v levels, n replications → n v m experiments m = 5 input factors, v = 4 levels, n = 3 → 3(4 5 ) = 3,072 experiments!
Copyright 2004 David J. Lilja25 Fractional Factorial Designs: n2 m Experiments Special case of generalized m-factor experiments Restrict each factor to two possible values High, low On, off Find factors that have largest impact Full factorial design with only those factors
Copyright 2004 David J. Lilja26 n2 m Experiments
Copyright 2004 David J. Lilja27 Finding Sum of Squares Terms Sum of n measurements with (A,B) = (High, Low) Factor AFactor B y AB High y Ab HighLow y aB LowHigh y ab Low
Copyright 2004 David J. Lilja28 n2 m Contrasts
Copyright 2004 David J. Lilja29 n2 m Sum of Squares
Copyright 2004 David J. Lilja30 To Summarize -- n2 m Experiments
Copyright 2004 David J. Lilja31 Contrasts for n2 m with m = 2 factors -- revisited MeasurementsContrast wawa wbwb w ab y AB +++ y Ab +-- y aB -+- y ab --+
Copyright 2004 David J. Lilja32 Contrasts for n2 m with m = 3 factors MeasContrast wawa wbwb wcwc w ab w ac w bc w abc y abc y Abc y aBc ……………………
Copyright 2004 David J. Lilja33 n2 m with m = 3 factors df(each effect) = 1, since only two levels measured SST = SSA + SSB + SSC + SSAB + SSAC + SSBC + SSABC df(SSE) = (n-1)2 3 Then perform ANOVA as before Easily generalizes to m > 3 factors
Copyright 2004 David J. Lilja34 Important Points Experimental design is used to Isolate the effects of each input variable. Determine the effects of interactions. Determine the magnitude of the error Obtain maximum information for given effort Expand 1-factor ANOVA to m factors Use n2 m design to reduce the number of experiments needed But loses some information
Copyright 2004 David J. Lilja35 Still Too Many Experiments with n2 m ! Plackett and Burman designs (1946) Multifactorial designs Effects of main factors only Logically minimal number of experiments to estimate effects of m input parameters (factors) Ignores interactions Requires O(m) experiments Instead of O(2 m ) or O(v m )
Copyright 2004 David J. Lilja36 Plackett and Burman Designs PB designs exist only in sizes that are multiples of 4 Requires X experiments for m parameters X = next multiple of 4 ≥ m PB design matrix Rows = configurations Columns = parameters’ values in each config High/low = +1/ -1 First row = from P&B paper Subsequent rows = circular right shift of preceding row Last row = all (-1)
Copyright 2004 David J. Lilja37 PB Design Matrix ConfigInput Parameters (factors)Response ABCDEFG Effect
Copyright 2004 David J. Lilja38 PB Design Matrix ConfigInput Parameters (factors)Response ABCDEFG Effect
Copyright 2004 David J. Lilja39 PB Design Matrix ConfigInput Parameters (factors)Response ABCDEFG Effect
Copyright 2004 David J. Lilja40 PB Design Matrix ConfigInput Parameters (factors)Response ABCDEFG Effect65
Copyright 2004 David J. Lilja41 PB Design Matrix ConfigInput Parameters (factors)Response ABCDEFG Effect65-45
Copyright 2004 David J. Lilja42 PB Design Matrix ConfigInput Parameters (factors)Response ABCDEFG Effect
Copyright 2004 David J. Lilja43 PB Design Only magnitude of effect is important Sign is meaningless In example, most → least important effects: [C, D, E] → F → G → A → B
Copyright 2004 David J. Lilja44 PB Design Matrix with Foldover Add X additional rows to matrix Signs of additional rows are opposite original rows Provides some additional information about selected interactions
Copyright 2004 David J. Lilja45 PB Design Matrix with Foldover ABCDEFGExec. Time
Copyright 2004 David J. Lilja46 Case Study #1 Determine the most significant parameters in a processor simulator. [Yi, Lilja, & Hawkins, HPCA, 2003.]
Copyright 2004 David J. Lilja47 Determine the Most Significant Processor Parameters Problem So many parameters in a simulator How to choose parameter values? How to decide which parameters are most important? Approach Choose reasonable upper/lower bounds. Rank parameters by impact on total execution time.
Copyright 2004 David J. Lilja48 Simulation Environment SimpleScalar simulator sim-outorder 3.0 Selected SPEC 2000 Benchmarks gzip, vpr, gcc, mesa, art, mcf, equake, parser, vortex, bzip2, twolf MinneSPEC Reduced Input Sets Compiled with gcc (PISA) at O3
Copyright 2004 David J. Lilja49 Functional Unit Values ParameterLow ValueHigh Value Int ALUs14 Int ALU Latency2 Cycles1 Cycle Int ALU Throughput1 FP ALUs14 FP ALU Latency5 Cycles1 Cycle FP ALU Throughputs1 Int Mult/Div Units14 Int Mult Latency15 Cycles2 Cycles Int Div Latency80 Cycles10 Cycles Int Mult Throughput1 Int Div ThroughputEqual to Int Div Latency FP Mult/Div Units14 FP Mult Latency5 Cycles2 Cycles FP Div Latency35 Cycles10 Cycles FP Sqrt Latency35 Cycles15 Cycles FP Mult ThroughputEqual to FP Mult Latency FP Div ThroughputEqual to FP Div Latency FP Sqrt ThroughputEqual to FP Sqrt Latency
Copyright 2004 David J. Lilja50 Memory System Values, Part I ParameterLow ValueHigh Value L1 I-Cache Size4 KB128 KB L1 I-Cache Assoc1-Way8-Way L1 I-Cache Block Size16 Bytes64 Bytes L1 I-Cache Repl PolicyLeast Recently Used L1 I-Cache Latency4 Cycles1 Cycle L1 D-Cache Size4 KB128 KB L1 D-Cache Assoc1-Way8-Way L1 D-Cache Block Size16 Bytes64 Bytes L1 D-Cache Repl PolicyLeast Recently Used L1 D-Cache Latency4 Cycles1 Cycle L2 Cache Size256 KB8192 KB L2 Cache Assoc1-Way8-Way L2 Cache Block Size64 Bytes256 Bytes
Copyright 2004 David J. Lilja51 Memory System Values, Part II ParameterLow ValueHigh Value L2 Cache Repl PolicyLeast Recently Used L2 Cache Latency20 Cycles5 Cycles Mem Latency, First200 Cycles50 Cycles Mem Latency, Next0.02 * Mem Latency, First Mem Bandwidth4 Bytes32 Bytes I-TLB Size32 Entries256 Entries I-TLB Page Size4 KB4096 KB I-TLB Assoc2-WayFully Assoc I-TLB Latency80 Cycles30 Cycles D-TLB Size32 Entries256 Entries D-TLB Page SizeSame as I-TLB Page Size D-TLB Assoc2-WayFully-Assoc D-TLB LatencySame as I-TLB Latency
Copyright 2004 David J. Lilja52 Processor Core Values ParameterLow ValueHigh Value Fetch Queue Entries432 Branch Predictor2-LevelPerfect Branch MPred Penalty10 Cycles2 Cycles RAS Entries464 BTB Entries16512 BTB Assoc2-WayFully-Assoc Spec Branch UpdateIn CommitIn Decode Decode/Issue Width4-Way ROB Entries864 LSQ Entries0.25 * ROB1.0 * ROB Memory Ports14
Copyright 2004 David J. Lilja53 Determining the Most Significant Parameters 1. Run simulations to find response With input parameters at high/low, on/off values ConfigInput Parameters (factors)Response ABCDEFG …………………… Effect
Copyright 2004 David J. Lilja54 Determining the Most Significant Parameters 2. Calculate the effect of each parameter Across configurations ConfigInput Parameters (factors)Response ABCDEFG …………………… Effect65
Copyright 2004 David J. Lilja55 Determining the Most Significant Parameters 3. For each benchmark Rank the parameters in descending order of effect (1=most important, …) ParameterBenchmark 1Benchmark 2Benchmark 3 A3128 B29422 C267 …………
Copyright 2004 David J. Lilja56 Determining the Most Significant Parameters 4. For each parameter Average the ranks ParameterBenchmark 1Benchmark 2Benchmark 3Average A B C2675 ……………
Copyright 2004 David J. Lilja57 Most Significant Parameters NumberParametergccgzipartAverage 1ROB Entries L2 Cache Latency Branch Predictor Accuracy Number of Integer ALUs L1 D-Cache Latency L1 I-Cache Size L2 Cache Size L1 I-Cache Block Size Memory Latency, First LSQ Entries Speculative Branch Update
Copyright 2004 David J. Lilja58 General Procedure Determine upper/lower bounds for parameters Simulate configurations to find response Compute effects of each parameter for each configuration Rank the parameters for each benchmark based on effects Average the ranks across benchmarks Focus on top-ranked parameters for subsequent analysis
Copyright 2004 David J. Lilja59 Case Study #2 Determine the “big picture” impact of a system enhancement.
Copyright 2004 David J. Lilja60 Determining the Overall Effect of an Enhancement Problem: Performance analysis is typically limited to single metrics Speedup, power consumption, miss rate, etc. Simple analysis Discards a lot of good information
Copyright 2004 David J. Lilja61 Determining the Overall Effect of an Enhancement Find most important parameters without enhancement Using Plackett and Burman Find most important parameters with enhancement Again using Plackett and Burman Compare parameter ranks
Copyright 2004 David J. Lilja62 Example: Instruction Precomputation Profile to find the most common operations 0+1, 1+1, etc. Insert the results of common operations in a table when the program is loaded into memory Query the table when an instruction is issued Don’t execute the instruction if it is already in the table Reduces contention for function units
Copyright 2004 David J. Lilja63 The Effect of Instruction Precomputation Average Rank ParameterBeforeAfterDifference ROB Entries 2.77 L2 Cache Latency 4.00 Branch Predictor Accuracy 7.69 Number of Integer ALUs 9.08 L1 D-Cache Latency10.00 L1 I-Cache Size10.23 L2 Cache Size10.62 L1 I-Cache Block Size11.77 Memory Latency, First12.31 LSQ Entries12.62
Copyright 2004 David J. Lilja64 The Effect of Instruction Precomputation Average Rank ParameterBeforeAfterDifference ROB Entries 2.77 L2 Cache Latency 4.00 Branch Predictor Accuracy Number of Integer ALUs L1 D-Cache Latency L1 I-Cache Size L2 Cache Size L1 I-Cache Block Size Memory Latency, First LSQ Entries
Copyright 2004 David J. Lilja65 The Effect of Instruction Precomputation Average Rank ParameterBeforeAfterDifference ROB Entries L2 Cache Latency Branch Predictor Accuracy Number of Integer ALUs L1 D-Cache Latency L1 I-Cache Size L2 Cache Size L1 I-Cache Block Size Memory Latency, First LSQ Entries
Copyright 2004 David J. Lilja66 Case Study #3 Benchmark program classification.
Copyright 2004 David J. Lilja67 Benchmark Classification By application type Scientific and engineering applications Transaction processing applications Multimedia applications By use of processor function units Floating-point code Integer code Memory intensive code Etc., etc.
Copyright 2004 David J. Lilja68 Another Point-of-View Classify by overall impact on processor Define: Two benchmark programs are similar if – They stress the same components of a system to similar degrees How to measure this similarity? Use Plackett and Burman design to find ranks Then compare ranks
Copyright 2004 David J. Lilja69 Similarity metric Use rank of each parameter as elements of a vector For benchmark program X, let X = (x 1, x 2,…, x n-1, x n ) x 1 = rank of parameter 1 x 2 = rank of parameter 2 …
Copyright 2004 David J. Lilja70 Vector Defines a Point in n-space (y 1, y 2, y 3 ) Param #3 (x 1, x 2, x 3 ) Param #2 Param #1 D
Copyright 2004 David J. Lilja71 Similarity Metric Euclidean Distance Between Points
Copyright 2004 David J. Lilja72 Most Significant Parameters NumberParametergccgzipart 1ROB Entries412 2L2 Cache Latency244 3Branch Predictor Accuracy5227 4Number of Integer ALUs8329 5L1 D-Cache Latency778 6L1 I-Cache Size1612 7L2 Cache Size691 8L1 I-Cache Block Size Memory Latency, First LSQ Entries Speculative Branch Update28816
Copyright 2004 David J. Lilja73 Distance Computation Rank vectors Gcc = (4, 2, 5, 8, …) Gzip = (1, 4, 2, 3, …) Art = (2, 4, 27, 29, …) Euclidean distances D(gcc - gzip) = [(4-1) 2 + (2-4) 2 + (5-2) 2 + … ] 1/2 D(gcc - art) = [(4-2) 2 + (2-4) 2 + (5-27) 2 + … ] 1/2 D(gzip - art) = [(1-2) 2 + (4-4) 2 + (2-27) 2 + … ] 1/2
Copyright 2004 David J. Lilja74 Euclidean Distances for Selected Benchmarks gccgzipartmcf gcc gzip art098.6 mcf0
Copyright 2004 David J. Lilja75 Dendogram of Distances Showing (Dis-)Similarity
Copyright 2004 David J. Lilja76 Final Benchmark Groupings GroupBenchmarks IGzip,mesa IIVpr-Place,twolf IIIVpr-Route, parser, bzip2 IVGcc, vortex VArt VIMcf VIIEquake VIIIammp
Copyright 2004 David J. Lilja77 Important Points Multifactorial (Plackett and Burman) design Requires only O(m) experiments Determines effects of main factors only Ignores interactions Logically minimal number of experiments to estimate effects of m input parameters Powerful technique for obtaining a big-picture view of a lot of data
Copyright 2004 David J. Lilja78 Summary Design of experiments Isolate effects of each input variable. Determine effects of interactions. Determine magnitude of experimental error m-factor ANOVA (full factorial design) All effects, interactions, and errors
Copyright 2004 David J. Lilja79 Summary n2 m designs Fractional factorial design All effects, interactions, and errors But for only 2 input values high/low on/off
Copyright 2004 David J. Lilja80 Summary Plackett and Burman (multi-factorial design) O(m) experiments Main effects only No interactions For only 2 input values (high/low, on/off) Examples – rank parameters, group benchmarks, overall impact of an enhancement