SoftSig: Software-Exposed Hardware Signatures for Code Analysis and Optimizations UIUC – ASPLOS 2008 by Evangelos Vlachos.

Slides:



Advertisements
Similar presentations
Copyright 2008 Sun Microsystems, Inc Better Expressiveness for HTM using Split Hardware Transactions Yossi Lev Brown University & Sun Microsystems Laboratories.
Advertisements

Instruction Set Design
Embedded Streaming Media with GStreamer and BeagleBoard ESC-228 Presented by Santiago Nunez santiago.nunez (at) ridgerun.com.
Alias Speculation using Atomic Regions (To appear at ASPLOS 2013) Wonsun Ahn*, Yuelu Duan, Josep Torrellas University of Illinois at Urbana Champaign.
1/1/ / faculty of Electrical Engineering eindhoven university of technology Architectures of Digital Information Systems Part 1: Interrupts and DMA dr.ir.
Elements of a Microprocessor system Central processing unit. This performs the arithmetic and logical operations, such as add/subtract, multiply/divide,
Register Renaming & Value Prediction. Overview ► Need for Post-RISC ► Register Renaming vs. Allocation Strategies ► How to compile for Post-RISC machines.
PART 4: (2/2) Central Processing Unit (CPU) Basics CHAPTER 13: REDUCED INSTRUCTION SET COMPUTERS (RISC) 1.
Chapter 6 Limited Direct Execution
CMPT 300: Operating Systems I Dr. Mohamed Hefeeda
Processes CSCI 444/544 Operating Systems Fall 2008.
Order-Independent Texture Synthesis Li-Yi Wei Marc Levoy Gcafe 1/30/2003.
2/15/2006"Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA Software-Hardware Cooperative Memory Disambiguation Ruke Huang, Alok.
Memory Management CSCI 3753 Operating Systems Spring 2005 Prof. Rick Han.
Improving IPC by Kernel Design Jochen Liedtke Presented by Ahmed Badran.
Register Allocation (via graph coloring)
Choice for the rest of the semester New Plan –assembler and machine language –Operating systems Process scheduling Memory management File system Optimization.
Chapter 13 Reduced Instruction Set Computers (RISC) Pipelining.
State Machines Timing Computer Bus Computer Performance Instruction Set Architectures RISC / CISC Machines.
1 New Architectures Need New Languages A triumph of optimism over experience! Ian Watson 3 rd July 2009.
CH11 Instruction Sets: Addressing Modes and Formats
Register Allocation (via graph coloring). Lecture Outline Memory Hierarchy Management Register Allocation –Register interference graph –Graph coloring.
Exokernel: An Operating System Architecture for Application-Level Resource Management Dawson R. Engler, M. Frans Kaashoek, and James O’Toole Jr. M.I.T.
Efficient Instruction Set Randomization Using Software Dynamic Translation Michael Crane Wei Hu.
The Origin of the VM/370 Time-sharing system Presented by Niranjan Soundararajan.
Reduced Instruction Set Computers (RISC) Computer Organization and Architecture.
Qin Zhao (MIT) Derek Bruening (VMware) Saman Amarasinghe (MIT) Umbra: Efficient and Scalable Memory Shadowing CGO 2010, Toronto, Canada April 26, 2010.
Efficient Software-Based Fault Isolation—sandboxing Presented by Carl Yao.
CH13 Reduced Instruction Set Computers {Make hardware Simpler, but quicker} Key features  Large number of general purpose registers  Use of compiler.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Principles of I/0 hardware.
Recall: Three I/O Methods Synchronous: Wait for I/O operation to complete. Asynchronous: Post I/O request and switch to other work. DMA (Direct Memory.
A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Graduate Student Department Of CSE 1.
Deriving State-Based Test Oracles for Conformance Testing Jamie Andrews Associate Professor Department of Computer Science University of Western Ontario.
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Register Allocation John Cavazos University.
ReSlice: Selective Re-execution of Long-retired Misspeculated Instructions Using Forward Slicing Smruti R. Sarangi, Wei Liu, Josep Torrellas, Yuanyuan.
TMS320 DSP Algorithm Standard: Overview & Rationalization.
COP4020 Programming Languages Subroutines and Parameter Passing Prof. Xin Yuan.
CS5222 Advanced Computer Architecture Part 3: VLIW Architecture
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
We will focus on operating system concepts What does it do? How is it implemented? Apply to Windows, Linux, Unix, Solaris, Mac OS X. Will discuss differences.
ECEG-3202 Computer Architecture and Organization Chapter 7 Reduced Instruction Set Computers.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
Next Generation ISA Itanium / IA-64. Operating Environments IA-32 Protected Mode/Real Mode/Virtual Mode - if supported by the OS IA-64 Instruction Set.
Flashback : A Lightweight Extension for Rollback and Deterministic Replay for Software Debugging Sudarshan M. Srinivasan, Srikanth Kandula, Christopher.
Soyeon Park, Shan Lu, Yuanyuan Zhou UIUC Reading Group by Theo.
G. Venkataramani, I. Doudalis, Y. Solihin, M. Prvulovic HPCA ’08 Reading Group Presentation 02/14/2008.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Operating Systems Overview: Using Hardware.
LECTURE 19 Subroutines and Parameter Passing. ABSTRACTION Recall: Abstraction is the process by which we can hide larger or more complex code fragments.
IA-64 Architecture Muammer YÜZÜGÜLDÜ CMPE /12/2004.
Chapter 6 Limited Direct Execution Chien-Chung Shen CIS/UD
Architectures of Digital Information Systems Part 1: Interrupts and DMA dr.ir. A.C. Verschueren Eindhoven University of Technology Section of Digital.
15-740/ Computer Architecture Lecture 3: Performance
Parallel Algorithm Design
CS 286 Computer Organization and Architecture
Computer Architecture & Operations I
Henk Corporaal TUEindhoven 2009
Chapter 9 :: Subroutines and Control Abstraction
Chap. 8 :: Subroutines and Control Abstraction
Chap. 8 :: Subroutines and Control Abstraction
Henk Corporaal TUEindhoven 2011
Recall: ROM example Here are three functions, V2V1V0, implemented with an 8 x 3 ROM. Blue crosses (X) indicate connections between decoder outputs and.
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
Chapter 12 Pipelining and RISC
Lecture 8: Efficient Address Translation
Lecture 4: Instruction Set Design/Pipelining
 Is a machine that is able to take information (input), do some work on (process), and to make new information (output) COMPUTER.
Chapter 4 The Von Neumann Model
Presentation transcript:

SoftSig: Software-Exposed Hardware Signatures for Code Analysis and Optimizations UIUC – ASPLOS 2008 by Evangelos Vlachos

Motivation Runtime Disambiguation of Sets of Addresses Multiple Hardware watch-points Inter-thread dependencies (e.g., TLS, TM) Compile-time analysis too limited Previous solutions Compare address to an associative structure Operate on sets of addresses (signatures)

Propose Hardware signatures Perform multiple operations at a time Too simple software interface so far (TM) Expose hw signatures to software Software flexibility: decides on Memory accesses to collect Memory accesses to disambiguate against Software Register File (SFR) & Sophisticated ISA

Background 1) 2) Memoization Don’t compute same result again – Remember it

Examples

Design Guidelines G1: Minimize SR accesses and copies SR size = 1Kbit Context switch – discard SRs Never spill SR to stack

Design Guidelines G1: Minimize SR accesses and copies G2: Manage the SRF through dynamic allocation Limited number of SRs Hard-to-predict lifetimes

Design Guidelines G1: Minimize SR accesses and copies G2: Manage the SRF through dynamic allocation G3: Imprecision should never compromise correctness Software that uses SR must be able to overcome false positives

Design Guidelines G1: Minimize SR accesses and copies G2: Manage the SRF through dynamic allocation G3: Imprecision should never compromise correctness G4: Manage imprecision to provide the most efficiency Shorter ranges & filter some of the addresses

ISA extensions

SoftSig Architecture

Collection & Local Disambiguation bcollect or bdisamb.loc Notify LSQ to send addresses to SPM If no conflict, the instruction can retire ecollect and edisamb.loc Stop collecting and disambiguating addresses

Remote Disambiguation When is an address disambiguated? ICD = In-flight Conflict Detector What about cache displacements?

Example: Memoization Framework Identify redundant calls 1. Remember inputs & outputs 2. Collect implicit inputs & outputs 3. Check to see if implicit in/out get modified 4. Don’t perform next call if no conflict is found – Memoized

Example: Memoization Framework Prologue: Avoid Function call? Compare explicit in/out with the ones memoized See if there was a conflict Setup: Cannot avoid it Remember in/out Allocate SR Epilogue Finish setting up

Evaluation