Analytic Evaluation of Shared-Memory Systems with ILP Processors D.J. Sorin, V.S. Pai, S.V. Adve, M.K. Vernon, D.A. Wood Presented by Bogdan Romanescu
Introduction Motivation: Simulating shared-memory systems with ILP processors takes painfully long Hypothesis: It is possible to describe the system with a set of equations which have simple parameters capture system details Method: View memory as a system of queues and delay centers Metric: Processor throughput
System under test Cache coherent shared-memory multiprocessor Mesh interconnection Processor multiple issue out of order scheduling non blocking loads speculative execution L1 and L2 $ state tracking miss status holding registers (MSHR) Interleaved memory and directory
Model parameters Architecture parameters Application parameters number of nodes number of MSHRs NI, bus and switch occupancies Application parameters ILP parameters: , CV fsynch-write fM Directory coherence parameters: Pread, Pwrite, Pupgrade, Pwb, PL|x, PM|x,y, P3hop|x¬-memory, H, X
Estimating parameters Non-ILP dependent : fast simulators for multiprocessors with single issue in order processors ILP dependent : FastILP simulator Timestamping “Eras” division Trace-driven simulations
Analytical model Output measure: system throughput (IPC) as f(input parameters, system architecture) Iterations between 2 models Synchronous blocking model (SB): processor stalled due to load and read-modify-write MSHR blocking model (MB): processor stalled due to MSHRs full MVA equations used for computing delay Synchronizations accounted for separately (locks and barriers)
Equations Average round-trip time SB Total average residence time at NI out queue Total mean delay for each type of synchronous transaction at local NI Utilization of local NI queue Average waiting time at local NI queue due to traffic from remote nodes
Model validations Better approximation for the residual life Account for significant fsynch-write
Applications Insights into application behavior fM : ability to exploit ILP to overlap read memory requests CV: degree of burstiness Evaluation of the impact of the MSHRs number Benefits of coupled/decoupled memory and directories Analysis of programmable coherence controllers impact
Questions Is “mean time” a representative measure? How misleading can it be? Residual life: even with interpolation, accurate enough? Why are the errors going up even after using the 2 accuracy-increasing observations?