Dynamic Runtime Testing for Cycle-Accurate Simulators Saša Tomić, Adrián Cristal, Osman Unsal, Mateo Valero Barcelona Supercomputing Center (BSC) Universitat.

Slides:



Advertisements
Similar presentations
UPC MICRO35 Istanbul Nov Effective Instruction Scheduling Techniques for an Interleaved Cache Clustered VLIW Processor Enric Gibert 1 Jesús Sánchez.
Advertisements

QuakeTM: Parallelizing a Complex Serial Application Using Transactional Memory Vladimir Gajinov 1,2, Ferad Zyulkyarov 1,2,Osman S. Unsal 1, Adrián Cristal.
ICS’02 UPC An Interleaved Cache Clustered VLIW Processor E. Gibert, J. Sánchez * and A. González * Dept. d’Arquitectura de Computadors Universitat Politècnica.
Link-Time Path-Sensitive Memory Redundancy Elimination Manel Fernández and Roger Espasa Computer Architecture Department Universitat.
UPC Compiler Support for Trace-Level Speculative Multithreaded Architectures Antonio González λ,ф Carlos Molina ψ Jordi Tubella ф INTERACT-9, San Francisco.
Department of Computer Sciences Revisiting the Complexity of Hardware Cache Coherence and Some Implications Rakesh Komuravelli Sarita Adve, Ching-Tsun.
THE MIPS R10000 SUPERSCALAR MICROPROCESSOR Kenneth C. Yeager IEEE Micro in April 1996 Presented by Nitin Gupta.
This project and the research leading to these results has received funding from the European Community's Seventh Framework Programme [FP7/ ] under.
Chapter 8. Pipelining. Instruction Hazards Overview Whenever the stream of instructions supplied by the instruction fetch unit is interrupted, the pipeline.
Parallel and Distributed Simulation Time Warp: Other Mechanisms.
1 RAMP White RAMP Retreat, BWRC, Berkeley, CA 20 January 2006 RAMP collaborators: Arvind (MIT), Krste Asanovíc (MIT), Derek Chiou (Texas), James Hoe (CMU),
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Sep 5, 2005 Lecture 2.
UPC Reducing Misspeculation Penalty in Trace-Level Speculative Multithreaded Architectures Carlos Molina ψ, ф Jordi Tubella ф Antonio González λ,ф ISHPC-VI,
Sim2Imp (Simulation to Implementation) Breakout J. Wawrzynek, K. Asanovic, G. Gibeling, M. Lin, Y. Lee, N. Patil.
Turning Eclipse Against Itself: Finding Errors in Eclipse Sources Benjamin Livshits Stanford University.
An Evaluation of BLAST John Gallagher CS4117. Overview BLAST incorporates new, fascinating and complex technology. The engine and external components.
ECE Synthesis & Verification1 ECE 667 Spring 2011 Synthesis and Verification of Digital Systems Verification Introduction.
Logic Design Outline –Logic Design –Schematic Capture –Logic Simulation –Logic Synthesis –Technology Mapping –Logic Verification Goal –Understand logic.
Functional Coverage Driven Test Generation for Validation of Pipelined Processors P. Mishra and N. Dutt Proceedings of the Design, Automation and Test.
Compilation Techniques for Energy Reduction in Horizontally Partitioned Cache Architectures Aviral Shrivastava, Ilya Issenin, Nikil Dutt Center For Embedded.
Adaptive Cache Compression for High-Performance Processors Alaa R. Alameldeen and David A.Wood Computer Sciences Department, University of Wisconsin- Madison.
1 Reducing Verification Complexity of a Multicore Coherence Protocol Using Assume/Guarantee Xiaofang Chen 1, Yu Yang 1, Ganesh Gopalakrishnan 1, Ching-Tsun.
Vir. Mem II CSE 471 Aut 011 Synonyms v.p. x, process A v.p. y, process B v.p # index Map to same physical page Map to synonyms in the cache To avoid synonyms,
Memory Consistency Models Some material borrowed from Sarita Adve’s (UIUC) tutorial on memory consistency models.
Revisiting Load Value Speculation:
Robotics- Basic On/Off Control Considerations. On/Off Control Forms the basis of most robotics operations Is deceptively simple until the consequences.
Discovering and Understanding Performance Bottlenecks in Transactional Applications Ferad Zyulkyarov 1,2, Srdjan Stipic 1,2, Tim Harris 3, Osman S. Unsal.
A Reconfigurable Processor Architecture and Software Development Environment for Embedded Systems Andrea Cappelli F. Campi, R.Guerrieri, A.Lodi, M.Toma,
15-740/ Oct. 17, 2012 Stefan Muller.  Problem: Software is buggy!  More specific problem: Want to make sure software doesn’t have bad property.
Programming Paradigms for Concurrency Part 2: Transactional Memories Vasu Singh
Sutirtha Sanyal (Barcelona Supercomputing Center, Barcelona) Accelerating Hardware Transactional Memory (HTM) with Dynamic Filtering of Privatized Data.
Dynamic Verification of Cache Coherence Protocols Jason F. Cantin Mikko H. Lipasti James E. Smith.
CSc 453 Final Code Generation Saumya Debray The University of Arizona Tucson.
EazyHTM: Eager-Lazy Hardware Transactional Memory Saša Tomić, Cristian Perfumo, Chinmay Kulkarni, Adrià Armejach, Adrián Cristal, Osman Unsal, Tim Harris,
WormBench A Configurable Application for Evaluating Transactional Memory Systems MEDEA Workshop Ferad Zyulkyarov 1, 2, Sanja Cvijic 3, Osman.
SafetyNet: improving the availability of shared memory multiprocessors with global checkpoint/recovery Daniel J. Sorin, Milo M. K. Martin, Mark D. Hill,
TEMPLATE DESIGN © Hardware Design, Synthesis, and Verification of a Multicore Communication API Ben Meakin, Ganesh Gopalakrishnan.
(C) 2003 Daniel SorinDuke Architecture Dynamic Verification of End-to-End Multiprocessor Invariants Daniel J. Sorin 1, Mark D. Hill 2, David A. Wood 2.
Implementing Precise Interrupts in Pipelined Processors James E. Smith Andrew R.Pleszkun Presented By: Ravikumar Source:
Last Bank: Dealing with Address Reuse in Non-Uniform Cache Architecture for CMPs Javier Lira ψ Carlos Molina ф Antonio González λ λ Intel Barcelona Research.
Cache Coherence Protocols 1 Cache Coherence Protocols in Shared Memory Multiprocessors Mehmet Şenvar.
Out-of-Order Commit Processors Adrián Cristal (UPC), Daniel Ortega (HP Labs), Josep Llosa (UPC) and Mateo Valero (UPC) HPCA-10, Madrid February th.
Lazy Release Consistency for Software Distributed Shared Memory Pete Keleher Alan L. Cox Willy Z. By Nooruddin Shaik.
Properties Incompleteness Evaluation by Functional Verification IEEE TRANSACTIONS ON COMPUTERS, VOL. 56, NO. 4, APRIL
Implementing Precise Interrupts in Pipelined Processors James E. Smith Andrew R.Pleszkun Presented By: Shrikant G.
Exploiting Instruction Streams To Prevent Intrusion Milena Milenkovic.
HARD: Hardware-Assisted lockset- based Race Detection P.Zhou, R.Teodorescu, Y.Zhou. HPCA’07 Shimin Chen LBA Reading Group Presentation.
Exploiting Fine-Grained Data Parallelism with Chip Multiprocessors and Fast Barriers Jack Sampson*, Rubén González†, Jean-Francois Collard¤, Norman P.
Token Coherence: Decoupling Performance and Correctness Milo M. D. Martin Mark D. Hill David A. Wood University of Wisconsin-Madison ISCA-30 (2003)
“An Evaluation of Directory Schemes for Cache Coherence” Presented by Scott Weber.
Dynamic Verification of Sequential Consistency Albert Meixner Daniel J. Sorin Dept. of Computer Dept. of Electrical and Science Computer Engineering Duke.
A Framework For Trusted Instruction Execution Via Basic Block Signature Verification Milena Milenković, Aleksandar Milenković, and Emil Jovanov Electrical.
Presenter: Yi-Ting Chung Fast and Scalable Hybrid Functional Verification and Debug with Dynamically Reconfigurable Co- simulation.
??? ple r B Amulya Sai EDM14b005 What is simple scalar?? Simple scalar is an open source computer architecture simulator developed by Todd.
Maurice Herlihy and J. Eliot B. Moss,  ISCA '93
Six Sigma Approach.
Morgan Kaufmann Publishers
Structural Simulation Toolkit / Gem5 Integration
Energy-Efficient Address Translation
Constructive Computer Architecture Tutorial 7 Final Project Overview
Hardware Description Languages
QGen and TQL-1 Qualification
QGen and TQL Qualification
Improving Cache Management Policies Using Dynamic Reuse Distances
Virtual Memory Overcoming main memory size limitation
Case Study 1 By : Shweta Agarwal Nikhil Walecha Amit Goyal
Lecture 1 An Overview of High-Performance Computer Architecture
CSc 453 Final Code Generation
Synonyms v.p. x, process A v.p # index Map to same physical page
Lois Orosa, Rodolfo Azevedo and Onur Mutlu
Presentation transcript:

Dynamic Runtime Testing for Cycle-Accurate Simulators Saša Tomić, Adrián Cristal, Osman Unsal, Mateo Valero Barcelona Supercomputing Center (BSC) Universitat Politecnica de Catalunya (UPC)

2 Can we trust the simulator-based evaluations? Typical simulator evaluation: Make a simulator REPEAT { Debug Simulate } UNTIL: the results make sense (intuition!) Discard and ignore the failed simulations Are there any bugs left?

Verifying the simulators Verification is important! –Industry puts significant resources Testing and Verification 50-70% of the costs Mission critical application even 90% of the costs –Academia puts less resources Why do we have bugs? –Simulators are complex –Proposed extensions are often complex –The extensions may uncover existing bugs 3

Simulator bugs Timing bugs –Incorrect estimation of the execution time –Simulation terminates without obvious errors –Needs other types of testing and verification Functional bugs –Incorrect implementation of functional units –Simulations may or may not terminate correctly 4 Our target

Outline Examples of functional bugs An overview of the Dynamic Simulator Testing methodology Use Cases of Dynamic Simulator Testing Performance evaluation Conclusions 5

Example: a bug in the cache coherence protocol 6 simulator of multi-level coherent caches X=0 X=10X=0X=20 X += 10X += 20 Bug: X should be X = = 30 shared memory processor 1processor 2 Proc 1 X+=10 Proc 2 X+=20 X=0 X=30 time

Example: a bug in the HTM X = 0; Atomic { X += 10; } 7 HTM simulator X=0 X=10 nothing? Bug: not committed X = 10 processor shared memory X += 10

Detecting functional bugs The functionality of the simulators is often simple –Can be emulated with simple emulators –The emulators can be fast and stable Can we take an advantage of the emulators? 8

Dynamic Testing Methodology Add a simple, no-timing emulator Execute each operation in the simulator and then in the emulator Compare the executions –We compared only the memory accesses The execution must be identical during entire simulation 9

An overview of dynamic testing 10 timing simulator simple no-timing emulator input output Use the same input Compare the outputs Repeat for every operation!

Dynamic testing a cache coherence protocol 11 STL map timing simulator of multi-level coherent caches X += 10X += 20 shared memory processor 1processor 2 input X=10 input X=0 output X=0 output X=0 output X=20 input X=20 output input X=0 input X=10 Check failed: should be X=10 X=0X=10

Dynamic testing of an HTM 12 timing simulator of an HTM STL map per TX TX_Begin; X += 10; TX_Commit; Check failed: should commit X=10 input X=0 input X=0 processor shared memory input X=10 input X=10 output X=0 output X=???X=10 output X=0X=10

Other Use Cases Out-Of-Order or pipelined processor –With a processor emulator, e.g., QEMU Complex memory hierarchy –With an STL map Incoherent multilevel memory hierarchy –W/ multiple STL maps, one per memory hierarchy System-On-Chip, Routing Protocols, etc. –Simple emulators of the functionalities 13

Performance Evaluation Implemented on 4 HTMs with lazy and eager version management Implemented for a directory-based cache- coherence protocol Baseline: M5 full-system simulator 14

Performance evaluation OS booting 15

Performance evaluation applications 16

Our experience with Dynamic Testing Reduced the time spent on writing tests Faster debugging –Detects most bugs “in minutes” –Eliminating a bug takes tens of minutes instead of hours/days/weeks/… Shortened the total simulator development from months to 3-4 months 17

Conclusions Presented the Dynamic Simulator Testing Detects the functional bugs in Cycle-Accurate Simulators Modest reduction of simulator performance 18

Thanks! Sasha Tomić 19 Dynamic Runtime Testing for Cycle-Accurate Simulators

Cache/HTM emulator implementation 20 STL map (dictionary) address line data address line data