Using Sampled and Incomplete Profiles David Kaeli Department of Electrical and Computer Engineering Northeastern University Boston, MA

Using Sampled and Incomplete Profiles David Kaeli Department of Electrical and Computer Engineering Northeastern University Boston, MA kaeli@ece.neu.edu

Overview Trace-based simulation is expensive (caches are getting larger, CPUs and networks are getting faster) Approximate results are many times sufficient How can we utilize a reduced or sampled profile and still obtain accurate modeling/simulation results?

How does sampling affect our Metrics for Evaluating Trace Collection Methodologies Speed – sampled profiles reduce speed requirements Memory – sampled profiles may take less space Accuracy – sampled profiles are less accurate Intrusiveness – sampling is less intrusive Completeness – no change Granularity – may affect our ability to capture fast events (nyquist) Flexibility – no change Portability – clock speed may affect sampling accuracy Capacity – sampling should reduce this Cost – potentially less cost and less time

Application Areas : Memory system performance – cache simulation, working set models, temporal and spatial locality CPU pipeline modeling – instruction frequencies, instruction sequences (2-at-a- time, 3-at-a time, pipeline snapshots) Network simulation – input traffic distribution, queue lengths, throughput, burstiness

Memory Systems Temporal locality – addresses will be referenced close in time Spatial locality – addresses close by will be referenced next Working set models –Belady (1966) – Virtual memory page replacement algorithms (Optimal replacement defined) –Denning (1980) – The pattern of page access over the execution of the program –Thiebaut and Stone (1986) – The number of misses incurred due to task switches can be modeled as a binomial distribution

Memory Systems Cold start – How do we model behavior when a program starts execution for the first time? Warm start – How do we model behavior when a program resumes execution? How does memory organization (e.g., set associativity)and typical program behavior affect our ability to utilize sampled profiles effectively? Do we sample in time or can we also sample in space (e.g., a set)?

Memory Systems If we only capture a subset of the important addresses, can we reproduce the full trace? –Abstract Execution (Larus 1990) – basic block traces –Trace Reduction (Smith 1977, Puzak 1985) – generate a reduced trace that contains the exact same number of misses and writebacks as the original trace (a type of filtering is performed, similar to Agarwal’s Block Filter) –Trace compaction (Samples 1989) – perform a diff on sequential addresses and only capture important diffs

Can we do something simple? Laha 1988 Fu 1994 ignore sample nsample n+1 ignore sampling ratio = sample size/sample interval sampling intervalsample size Two types of errors 1.sampling errors – is the ratio optimal? 2.accurately predicting effects of ignored portions But this only suggests when to sample, not what to sample…

Sampling Rate: What is the Nyquist Frequency for an Program? We must sample a program at a rate of twice the frequency of the event of interest Half the sampling frequency is termed the Nyquist frequency Sampling at lower rates than twice the Nyquist frequency can cause aliasing and distortion The biggest problem is that not all events of interest in a program exhibit a nice periodic pattern

Example: gprof( ) Produces 3 things – A listing of the total execution times and call counts for each of the functions in the program, sorted by decreasing time – The functions sorted according to the time they represent, including the time of their call graph descendents –Total execution in a cycle and the members in that cycle (a cycle is a back edge) gprof samples a program’s execution –Obtains exact call statistics –Does not obtain exact time measurements –Accuracy is obtained through statistical sampling –Sampling reduces the associated overhead

When to sample: Cold start vs. Warm start

Sampling dimensions We can sample in both time and space –Time (periodic sampling) –Filtered sampling (e.g., using addresses ranges) Time –Periodic –Random –#misses, #instructions, #loads/stores Space –Address ranges – may not be representative of all ranges –Cache sets – may limit the utility of the sample –Statically tagged events – focuses in on particular instructions and data of interest

How do we account for unknown references? Assume that this behavior does not affect past/future behavior Assume that some percentage of past behavior is overwritten by unknown reference behavior –Decay model –Footprints in the cache –MRU model Assume that all of the past behavior is overwritten by the unknown reference behavior

How do we model the effects of multiprogramming?? Flush all tables (caches, branch predictors, TLBs, load buffers, etc.) Estimate interference using a model –Invalidate some % of all entries based on the relative time since last execution –Utilize working set models and utilize these to estimate the effect of the interference Allow aliasing to occur where appropriate (branch predictors, but not caches)

Sampled Instruction Execution Profiles Instruction frequencies –For SPEC92int programs on a IA32 CPU 43% ALU, 22% loads, 12% stores, 23% control flow (H&P AQA) Instruction sequences –Top pairs –Top triples –Continue up until average BB size Branches in the pipeline –Sampled versus modeled

Analytical Models of Workload Behavior Squillante and Kaeli, 1997 We an capture distributions of the distance between events We can then compute the probability of different events occurring in a pipeline of length n (think of n as a window of execution) We can also compute the conditional probabilities of multiple events occurring in the pipeline of length n We can then assign weights to each of these multiple events to compute the throughput (IPC) of a pipeline Our model uses random marked point process (time between events) and produces very accurate estimates of pipeline throughput

Analytical Models of Workload Behavior Squillante and Kaeli, 1997 Some constraints are assumed in the sake of simplicity: –Inter-branch times are independent –Inter-branch times are identically distributed –Delay due to taken branches is constant

Analytical Models of Workload Behavior Squillante and Kaeli, 1997

A analytical formula is used to compute approximate CPI and speedup measures for an n-stage pipelined processor Traces are captured from benchmark execution The distribution of the number of instructions between successive taken branches is computed using a “window of execution” filter

Analytical Models of Workload Behavior Squillante and Kaeli, 1997

For a pipeline length of 8, the obtained results are quite precise for three out of the five benchmarks For bubblesort and prime, the IID assumption is violated, thus introducing some inaccuracies in our model Future work looks at handling inaccuracies that provide multi-level conditional probabilities into our model

Capturing n-length Instruction Sequences Sampling over n sequential instructions Capturing the most frequently executed sequences Utilizing these sequences to drive pipeline design Capturing longer profiles may allow us to predict design hardware trace caches Gonzalez, Tubella and Molina describe a mechanism for both profiled instructions and operand values

Using Sampled and Incomplete Profiles David Kaeli Department of Electrical and Computer Engineering Northeastern University Boston, MA

Similar presentations

Presentation on theme: "Using Sampled and Incomplete Profiles David Kaeli Department of Electrical and Computer Engineering Northeastern University Boston, MA"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Using Sampled and Incomplete Profiles David Kaeli Department of Electrical and Computer Engineering Northeastern University Boston, MA

Similar presentations

Presentation on theme: "Using Sampled and Incomplete Profiles David Kaeli Department of Electrical and Computer Engineering Northeastern University Boston, MA"— Presentation transcript:

Similar presentations

About project

Feedback