Douglas Lacy & Daniel LeCheminant CS 252 December 10, 2003

Slides:

Advertisements

Similar presentations

Project : Phase 1 Grading Default Statistics (40 points) Values and Charts (30 points) Analyses (10 points) Branch Predictor Statistics (30 points) Values.

Advertisements

Computer Organization and Architecture

Anshul Kumar, CSE IITD CSL718 : VLIW - Software Driven ILP Hardware Support for Exposing ILP at Compile Time 3rd Apr, 2006.

DBMSs on a Modern Processor: Where Does Time Go? Anastassia Ailamaki Joint work with David DeWitt, Mark Hill, and David Wood at the University of Wisconsin-Madison.

Combining Statistical and Symbolic Simulation Mark Oskin Fred Chong and Matthew Farrens Dept. of Computer Science University of California at Davis.

POLITECNICO DI MILANO Parallelism in wonderland: are you ready to see how deep the rabbit hole goes? ILP: VLIW Architectures Marco D. Santambrogio:

Rung-Bin Lin Chapter 4: Exploiting Instruction-Level Parallelism with Software Approaches4-1 Chapter 4 Exploiting Instruction-Level Parallelism with Software.

UPC Microarchitectural Techniques to Exploit Repetitive Computations and Values Carlos Molina Clemente LECTURA DE TESIS, (Barcelona,14 de Diciembre de.

Sim-alpha: A Validated, Execution-Driven Alpha Simulator Rajagopalan Desikan, Doug Burger, Stephen Keckler, Todd Austin.

Using Hardware Vulnerability Factors to Enhance AVF Analysis Vilas Sridharan RAS Architecture and Strategy AMD, Inc. International Symposium on Computer.

Limits on ILP. Achieving Parallelism Techniques – Scoreboarding / Tomasulo’s Algorithm – Pipelining – Speculation – Branch Prediction But how much more.

Microarchitectural Approaches to Exceeding the Complexity Barrier © Eric Rotenberg 1 Microarchitectural Approaches to Exceeding the Complexity Barrier.

A Scalable Front-End Architecture for Fast Instruction Delivery Paper by: Glenn Reinman, Todd Austin and Brad Calder Presenter: Alexander Choong.

Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Performance See: P&H 1.4.

1 Lecture 7: Static ILP, Branch prediction Topics: static ILP wrap-up, bimodal, global, local branch prediction (Sections )

Glenn Reinman, Brad Calder, Department of Computer Science and Engineering, University of California San Diego and Todd Austin Department of Electrical.

Perceptron-based Global Confidence Estimation for Value Prediction Master’s Thesis Michael Black June 26, 2003.

1 Improving Branch Prediction by Dynamic Dataflow-based Identification of Correlation Branches from a Larger Global History CSE 340 Project Presentation.

Energy Efficient Instruction Cache for Wide-issue Processors Alex Veidenbaum Information and Computer Science University of California, Irvine.

Pipelined Processor II CPSC 321 Andreas Klappenecker.

Techniques for Efficient Processing in Runahead Execution Engines Onur Mutlu Hyesoon Kim Yale N. Patt.

Restrictive Compression Techniques to Increase Level 1 Cache Capacity Prateek Pujara Aneesh Aggarwal Dept of Electrical and Computer Engineering Binghamton.

1 Lecture 7: Static ILP and branch prediction Topics: static speculation and branch prediction (Appendix G, Section 2.3)

1 Multi-Level Error Detection Scheme based on Conditional DIVA-Style Verification Kevin Lacker and Huifang Qin CS252 Project Presentation 12/10/2003.

1 The Performance Potential for Single Application Heterogeneous Systems Henry Wong* and Tor M. Aamodt § *University of Toronto § University of British.

Transient Fault Detection via Simultaneous Multithreading Shubhendu S. Mukherjee VSSAD, Alpha Technology Compaq Computer Corporation.

Korea Univ B-Fetch: Branch Prediction Directed Prefetching for In-Order Processors 컴퓨터 · 전파통신공학과 최병준 1 Computer Engineering and Systems Group.

1 Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah

Dynamic Verification of Cache Coherence Protocols Jason F. Cantin Mikko H. Lipasti James E. Smith.

Advanced Computer Architecture Lab University of Michigan Compiler Controlled Value Prediction with Branch Predictor Based Confidence Eric Larson Compiler.

Coherence Decoupling: Making Use of Incoherence J. Huh, J. Chang, D. Burger, G. Sohi ASPLOS 2004.

Hybrid Multi-Core Architecture for Boosting Single-Threaded Performance Presented by: Peyman Nov 2007.

Dynamic Branch Prediction During Context Switches Jonathan Creekmore Nicolas Spiegelberg T NT.

Sunpyo Hong, Hyesoon Kim

An Evaluation of Memory Consistency Models for Shared- Memory Systems with ILP processors Vijay S. Pai, Parthsarthy Ranganathan, Sarita Adve and Tracy.

On the Importance of Optimizing the Configuration of Stream Prefetches Ilya Ganusov Martin Burtscher Computer Systems Laboratory Cornell University.

1 Lecture: Static ILP Topics: predication, speculation (Sections C.5, 3.2)

CS717 1 Hardware Fault Tolerance Through Simultaneous Multithreading (part 2) Jonathan Winter.

Application-Specific Customization of Soft Processor Microarchitecture Peter Yiannacouras J. Gregory Steffan Jonathan Rose University of Toronto Electrical.

现代计算机体系结构主讲教师：张钢天津大学计算机学院 2009 年.

Real-time Software Design

Improving Multi-Core Performance Using Mixed-Cell Cache Architecture

CS 352H: Computer Systems Architecture

Lecture 3: MIPS Instruction Set

Application-Specific Customization of Soft Processor Microarchitecture

The University of Adelaide, School of Computer Science

Dave Maze Edwin Olson Andrew Menard

Real-time Software Design

CSCI1600: Embedded and Real Time Software

CMPT 886: Computer Architecture Primer

Computer Architecture

Lecture: Static ILP, Branch Prediction

Yingmin Li Ting Yan Qi Zhao

How much does OS operation impact your code’s performance?

Ka-Ming Keung Swamy D Ponpandi

Computer Architecture: Multithreading (IV)

Christophe Dubach, Timothy M. Jones and Michael F.P. O’Boyle

How to improve (decrease) CPI

CARP: Compression-Aware Replacement Policies

Serene Banerjee, Lizy K. John, Brian L. Evans

CSC3050 – Computer Architecture

Patrick Akl and Andreas Moshovos AENAO Research Group

Aliasing and Anti-Aliasing in Branch History Table Prediction

October 9, 2003.

Application-Specific Customization of Soft Processor Microarchitecture

CSCI1600: Embedded and Real Time Software

Phase based adaptive Branch predictor: Seeing the forest for the trees

Ka-Ming Keung Swamy D Ponpandi

Presentation transcript:

Douglas Lacy & Daniel LeCheminant CS 252 December 10, 2003 AVID: Breaking Processors for Increased Performance & Reduced Power Consumption Douglas Lacy & Daniel LeCheminant CS 252 December 10, 2003

Background Todd Austin’s DIVA paper DIVA dynamically verifies all instructions, guarding against transient and permanent faults Austin speculated that DIVA could allow throttling of processor clock speed AVID: Breaking Processors for Increased Performance & Reduced Power Consumption

Background / Motivation DIVA: maintains correctness even with malfunctioning hardware Is there a way to “break” the core processor in such a way as to optimize it? Remove rarely-used components? Reduce tolerance in clock cycle, voltage, etc? May be possible to dynamically alter processor to be only as correct as necessary AVID: Breaking Processors for Increased Performance & Reduced Power Consumption

Motivation Some components of processors exist to ensure correctness in rarer cases May waste resources and cycles to check these cases With DIVA, we can ignore them, mostly “Rare” is variable Could be lazy with some computations, need to be more strict with others Which are possible is dependent on program AVID: Breaking Processors for Increased Performance & Reduced Power Consumption

Motivation cc1 anagram compress % Loads 23% 24% 21% % Stores 14% 10% 37% 33% 36% AVID: Breaking Processors for Increased Performance & Reduced Power Consumption

Motivation Specifically, what can we remove/throttle? Memory disambiguation Branch prediction Branch checking Exceptions Long-latency operations (multiply & divide) Rare instructions? Prefetching AVID: Breaking Processors for Increased Performance & Reduced Power Consumption

Proposal: AVID AVID: Architecture that Varies, Input Dependent Use a DIVA unit to provide verification, and also feedback to the core processor Can dynamically throttle operations from most time/power-consuming and correct to least consuming and sometimes incorrect Won’t require much more hardware than standard DIVA AVID: Breaking Processors for Increased Performance & Reduced Power Consumption

AVID! AVID: Breaking Processors for Increased Performance & Reduced Power Consumption

More AVID Branch predictors Multiply/Divide Loads Static, bimodal, 2-level, hybrid Multiply/Divide Truncate inputs, run for fewer cycles Loads Allow them to proceed past unresolved stores Clock cycle throttling Start fast, reduce speed if errors crop up AVID: Breaking Processors for Increased Performance & Reduced Power Consumption

Methodology Simulate in SimpleScalar Base architecture: Standard DIVA Modify simple scalar to include a core & DIVA unit Modify base architecture into AVID DIVA catches all errors so processor is still functional & reliable AVID: Breaking Processors for Increased Performance & Reduced Power Consumption

Comparison Run benchmarks on both architectures & compare performance (SPEC or similar) CPI: Read from simulator output Exec. Time: Total cycles * cycle time Power Consumption Total cycles * constant + branch predictions * constant for type of pred. Amount of hardware AVID: Breaking Processors for Increased Performance & Reduced Power Consumption

Results: CPI in Best Case cc1 anagram compress Base 1.0890 0.4985 0.5833 No Load Stalls 1.0879 0.4982 0.5815 Reduced Multiply 0.4984 SimpleScalar run with relaxed constraints without producing errors AVID: Breaking Processors for Increased Performance & Reduced Power Consumption

Results: Power Consumption prog Base Bimodal 2-Level Dynamic AVID: Breaking Processors for Increased Performance & Reduced Power Consumption

Conclusions No long benchmarks successfully run Preliminary results promising in some areas, discouraging in others AVID may be best for reducing power consumption AVID could be extended for further dynamic alteration of processors, limited reconfigurable computing AVID: Breaking Processors for Increased Performance & Reduced Power Consumption

Future Work Extension of AVID to throttle other possible components Further static removal of components Actual full SPEC benchmark comparisons of standard, DIVA, and AVID architectures Exploration of speculation in several ways, using AVID for verification and feedback AVID: Breaking Processors for Increased Performance & Reduced Power Consumption

Questions? You know you have them! Ask! Go on, pick us apart! AVID: Breaking Processors for Increased Performance & Reduced Power Consumption