Outline Motivation Project Goals Methodology Preliminary Results

Slides:



Advertisements
Similar presentations
1 ICCD 2010 Amsterdam, the Netherlands Rami Sheikh North Carolina State University Mazen Kharbutli Jordan Univ. of Science and Technology Improving Cache.
Advertisements

Discovering and Exploiting Program Phases Timothy Sherwood, Erez Perelman, Greg Hamerly, Suleyman Sair, Brad Calder CSE 231 Presentation by Justin Ma.
Stream Chaining: Exploiting Multiple Levels of Correlation in Data Prefetching Pedro Díaz and Marcelo Cintra University of Edinburgh
Dynamic History-Length Fitting: A third level of adaptivity for branch prediction Toni Juan Sanji Sanjeevan Juan J. Navarro Department of Computer Architecture.
1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank.
Modeling shared cache and bus in multi-core platforms for timing analysis Sudipta Chattopadhyay Abhik Roychoudhury Tulika Mitra.
1 Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines Moinuddin K. Qureshi M. Aater Suleman Yale N. Patt HPCA 2007.
Data Marshaling for Multi-Core Architectures M. Aater Suleman Onur Mutlu Jose A. Joao Khubaib Yale N. Patt.
UPC Microarchitectural Techniques to Exploit Repetitive Computations and Values Carlos Molina Clemente LECTURA DE TESIS, (Barcelona,14 de Diciembre de.
Sim-alpha: A Validated, Execution-Driven Alpha Simulator Rajagopalan Desikan, Doug Burger, Stephen Keckler, Todd Austin.
Wrong Path Events and Their Application to Early Misprediction Detection and Recovery David N. Armstrong Hyesoon Kim Onur Mutlu Yale N. Patt University.
Clustered Indexing for Conditional Branch Predictors Veerle Desmet Ghent University Belgium.
Enabling Efficient On-the-fly Microarchitecture Simulation Thierry Lafage September 2000.
CISC Machine Learning for Solving Systems Problems Presented by: John Tully Dept of Computer & Information Sciences University of Delaware Using.
CS752 Decoupled Architecture for Data Prefetching Jichuan Chang Kai Xu.
CPE 731 Advanced Computer Architecture ILP: Part II – Branch Prediction Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.
WCED: June 7, 2003 Matt Ramsay, Chris Feucht, & Mikko Lipasti University of Wisconsin-MadisonSlide 1 of 26 Exploring Efficient SMT Branch Predictor Design.
1 Improving Branch Prediction by Dynamic Dataflow-based Identification of Correlation Branches from a Larger Global History CSE 340 Project Presentation.
1 Applying Perceptrons to Speculation in Computer Architecture Michael Black Dissertation Defense April 2, 2007.
11-1 Ch. 11App: The Financial Impact The Financial Assessment Process Budgetary considerations play a role in the identification, evaluation, and control.
Address-Value Delta (AVD) Prediction Onur Mutlu Hyesoon Kim Yale N. Patt.
Techniques for Efficient Processing in Runahead Execution Engines Onur Mutlu Hyesoon Kim Yale N. Patt.
Comparison of JVM Phases on Data Cache Performance Shiwen Hu and Lizy K. John Laboratory for Computer Architecture The University of Texas at Austin.
Catching Accurate Profiles in Hardware Satish Narayanasamy, Timothy Sherwood, Suleyman Sair, Brad Calder, George Varghese Presented by Jelena Trajkovic.
Dynamically Trading Frequency for Complexity in a GALS Microprocessor Steven Dropsho, Greg Semeraro, David H. Albonesi, Grigorios Magklis, Michael L. Scott.
Waleed Alkohlani 1, Jeanine Cook 2, Nafiul Siddique 1 1 New Mexico Sate University 2 Sandia National Laboratories Insight into Application Performance.
Korea Univ B-Fetch: Branch Prediction Directed Prefetching for In-Order Processors 컴퓨터 · 전파통신공학과 최병준 1 Computer Engineering and Systems Group.
Statistical Simulation of Superscalar Architectures using Commercial Workloads Lieven Eeckhout and Koen De Bosschere Dept. of Electronics and Information.
Ioana Burcea * Stephen Somogyi §, Andreas Moshovos*, Babak Falsafi § # Predictor Virtualization *University of Toronto Canada § Carnegie Mellon University.
ACSAC’04 Choice Predictor for Free Mongkol Ekpanyapong Pinar Korkmaz Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Institute.
Prefetching Challenges in Distributed Memories for CMPs Martí Torrents, Raúl Martínez, and Carlos Molina Computer Architecture Department UPC – BarcelonaTech.
MadCache: A PC-aware Cache Insertion Policy Andrew Nere, Mitch Hayenga, and Mikko Lipasti PHARM Research Group University of Wisconsin – Madison June 20,
Replicating Memory Behavior for Performance Skeletons Aditya Toomula PC-Doctor Inc. Reno, NV Jaspal Subhlok University of Houston Houston, TX By.
Trace Substitution Hans Vandierendonck, Hans Logie, Koen De Bosschere Ghent University EuroPar 2003, Klagenfurt.
11 Online Computing and Predicting Architectural Vulnerability Factor of Microprocessor Structures Songjun Pan Yu Hu Xiaowei Li {pansongjun, huyu,
CMP Design Choices Finding Parameters that Impact CMP Performance Sam Koblenski and Peter McClone.
Fetch Directed Prefetching - a Study
Energy Efficient Prefetching and Caching Athanasios E. Papathanasiou and Michael L. Scott. University of Rochester Proceedings of 2004 USENIX Annual Technical.
2D-Profiling Detecting Input-Dependent Branches with a Single Input Data Set Hyesoon Kim M. Aater Suleman Onur Mutlu Yale N. Patt HPS Research Group The.
Sunpyo Hong, Hyesoon Kim
On the Importance of Optimizing the Configuration of Stream Prefetches Ilya Ganusov Martin Burtscher Computer Systems Laboratory Cornell University.
Prophet/Critic Hybrid Branch Prediction B B B
Fast Path-Based Neural Branch Prediction Daniel A. Jimenez Presented by: Ioana Burcea.
Value Prediction Kyaw Kyaw, Min Pan Final Project.
PINTOS: An Execution Phase Based Optimization and Simulation Tool) PINTOS: An Execution Phase Based Optimization and Simulation Tool) Wei Hsu, Jinpyo Kim,
Cache Issues Computer Organization II 1 Main Memory Supporting Caches Use DRAMs for main memory – Fixed width (e.g., 1 word) – Connected by fixed-width.
CS203 – Advanced Computer Architecture
FA-TAGE Frequency Aware TAgged GEometric History Length Branch Predictor Boyu Zhang, Christopher Bodden, Dillon Skeehan ECE/CS 752 Advanced Computer Architecture.
Running OpenSSL Crypto Algorithms in Simplescalar
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
Tosiron Adegbija and Ann Gordon-Ross+
EE 382N Guest Lecture Wish Branches
Address-Value Delta (AVD) Prediction
Phase Capture and Prediction with Applications
Alpha Microarchitecture
ICIEV 2014 Dhaka, Bangladesh
Christophe Dubach, Timothy M. Jones and Michael F.P. O’Boyle
Hyesoon Kim Onur Mutlu Jared Stark* Yale N. Patt
Morgan Kaufmann Publishers
Lecture 20: OOO, Memory Hierarchy
Presented by David Wolinsky
Adapted from the slides of Prof
Aliasing and Anti-Aliasing in Branch History Table Prediction
rePLay: A Hardware Framework for Dynamic Optimization
CMP Design Choices Finding Parameters that Impact CMP Performance
Gang Luo, Hongfei Guo {gangluo,
Phase based adaptive Branch predictor: Seeing the forest for the trees
Srinivas Neginhal Anantharaman Kalyanaraman CprE 585: Survey Project
Stream-based Memory Specialization for General Purpose Processors
DSPatch: Dual Spatial pattern prefetcher
Presentation transcript:

Evaluation of SimPoint for Specific Architectural Studies Veynu Narasiman Aater Suleman May 3, 2005

Outline Motivation Project Goals Methodology Preliminary Results Analysis Conclusion

Motivation Architects need to know the performance improvement of a particular enhancement The sooner the better There is a need to reduce simulation time Accuracy should not be compromised SimPoint attempts to solve this problem Many architects are hesitant to use SimPoint

SimPoint Reduces the number of instructions to be simulated Divides entire application into fixed length slices and chooses the most representative slices Uses the Basic Block Execution behavior of each slice as the selection criteria Detailed information can be found at: http://www.simpoint.com

Project Goals Evaluate the accuracy of SimPoint for: Prefetching Compare actual performance improvement to that estimated using SimPoint Branch Prediction Compare the overall actual prediction rates to that estimated using SimPoint Evaluate SimPoint’s ability to capture branches that exhibit a certain kind of phase behavior

Methodology Use PIN Instrumentation tool Simulate SPECINT suite with Reference input Prefetch Tool 32KB L1-cache, 1MB L2-cache 32-byte cache line size 32-way associative caches with Round Robin Replacement Stream Prefetcher from PowerPC Measure L2-cache statistics Branch Prediction Tool GSHARE predictor with 8192-entry Pattern History Table Measure prediction statistics Slice Size: 100 million instructions

Prefetching Data

Analysis of Bzip

Branch Prediction Data

Branch Behavior do { c1 = block[i1]; c2 = block[i2]; if (c1 != c2) return (c1 > c2); s1 = quadrant[i1]; s2 = quadrant[i2]; if (s1 != s2) return (s1 > s2); i1++; i2++; .

Conclusion SimPoint reduces simulation time Prefetching Accuracy Improvement captured for applications with high hit ratios Improvement overestimated for bzip Branch Prediction Accuracy Overall branch prediction accuracy captured Individual branch phase behavior to be determined

Questions?