BitValue: Detecting and Exploiting Narrow Bitwidth Computations Mihai Budiu Carnegie Mellon University joint work with Majd Sakr, Kip.

Slides:

Advertisements

Similar presentations

Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.

Advertisements

Fast Compilation for Reconfigurable Hardware Mihai Budiu and Seth Copen Goldstein Carnegie Mellon University Computer Science Department Joint work with.

Dataflow: A Complement to Superscalar Mihai Budiu – Microsoft Research Pedro V. Artigas – Carnegie Mellon University Seth Copen Goldstein – Carnegie Mellon.

Inter-Iteration Scalar Replacement in the Presence of Control-Flow Mihai Budiu – Microsoft Research, Silicon Valley Seth Copen Goldstein – Carnegie Mellon.

Mihai Budiu Microsoft Research – Silicon Valley joint work with Girish Venkataramani, Tiberiu Chelcea, Seth Copen Goldstein Carnegie Mellon University.

Approximation of the Worst-Case Execution Time Using Structural Analysis Matteo Corti and Thomas Gross Zürich.

Instruction Set Design

MATH 224 – Discrete Mathematics

INTRO TO VHDL Appendix A: page page VHDL is an IEEE and ANSI standard. VHDL stands for Very High Speed IC hardware description language.

CPE 731 Advanced Computer Architecture Instruction Level Parallelism Part I Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.

Computer Architecture Lecture 7 Compiler Considerations and Optimizations.

CS412/413 Introduction to Compilers Radu Rugina Lecture 37: DU Chains and SSA Form 29 Apr 02.

1 COMP 740: Computer Architecture and Implementation Montek Singh Tue, Feb 24, 2009 Topic: Instruction-Level Parallelism IV (Software Approaches/Compiler.

Computer Architecture Instruction Level Parallelism Dr. Esam Al-Qaralleh.

Enforcing Sequential Consistency in SPMD Programs with Arrays Wei Chen Arvind Krishnamurthy Katherine Yelick.

School of Computer Science A Global Progressive Register Allocator David Ryan Koes Seth Copen Goldstein Carnegie Mellon University

Static Analysis of Embedded C Code John Regehr University of Utah Joint work with Nathan Cooprider.

PipeRench: A Coprocessor for Streaming Multimedia Acceleration Seth Goldstein, Herman Schmit et al. Carnegie Mellon University.

Next Section: Pointer Analysis Outline: –What is pointer analysis –Intraprocedural pointer analysis –Interprocedural pointer analysis (Wilson & Lam) –Unification.

Memory Systems Performance Workshop 2004© David Ryan Koes MSP 2004 Programmer Specified Pointer Independence David Koes Mihai Budiu Girish Venkataramani.

2/15/2006"Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA Software-Hardware Cooperative Memory Disambiguation Ruke Huang, Alok.

© 2002 IBM Corporation IBM Toronto Software Lab October 6, 2004 | CASCON2004 Interprocedural Strength Reduction Shimin Cui Roch Archambault Raul Silvera.

Range Analysis. Intraprocedural Points-to Analysis Want to compute may-points-to information Lattice:

Compiling Application-Specific Hardware Mihai Budiu Seth Copen Goldstein Carnegie Mellon University.

Data Partitioning for Reconfigurable Architectures with Distributed Block RAM Wenrui Gong Gang Wang Ryan Kastner Department of Electrical and Computer.

Application-Specific Hardware Computing Without Processors Mihai Budiu October 6, 2001 SOCS-4.

Carnegie Mellon Adaptive Mapping of Linear DSP Algorithms to Fixed-Point Arithmetic Lawrence J. Chang Inpyo Hong Yevgen Voronenko Markus Püschel Department.

Detecting and Exploiting Narrow Bitwidth Computations Mihai Budiu Carnegie Mellon University joint work with Seth Copen Goldstein.

Distributed Arithmetic: Implementations and Applications

SSS 4/9/99CMU Reconfigurable Computing1 The CMU Reconfigurable Computing Project April 9, 1999 Mihai Budiu

Pointer analysis. Pointer Analysis Outline: –What is pointer analysis –Intraprocedural pointer analysis –Interprocedural pointer analysis Andersen and.

ASH: A Substrate for Scalable Architectures Mihai Budiu Seth Copen Goldstein CALCM Seminar, March 19, 2002.

Procedure Optimizations and Interprocedural Analysis Chapter 15, 19 Mooly Sagiv.

Bitwidth Analysis with Application to Silicon Compilation Mark Stephenson Jonathan Babb Saman Amarasinghe MIT Laboratory for Computer Science.

Chapter 1 Algorithm Analysis

Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Fast, Effective Code Generation in a Just-In-Time Java Compiler Rejin P. James & Roshan C. Subudhi CSE Department USC, Columbia.

Control Flow Resolution in Dynamic Language Author: Štěpán Šindelář Supervisor: Filip Zavoral, Ph.D.

Lecture No.01 Data Structures Dr. Sohail Aslam

Reconfigurable Computing - VHDL - Types John Morris Chung-Ang University The University of Auckland.

Speculative Software Management of Datapath-width for Energy Optimization G. Pokam, O. Rochecouste, A. Seznec, and F. Bodin IRISA, Campus de Beaulieu

Complexity of Algorithms

DSP Processors We have seen that the Multiply and Accumulate (MAC) operation is very prevalent in DSP computation computation of energy MA filters AR filters.

Static Program Analysis of Embedded Software Ramakrishnan Venkitaraman Graduate Student, Computer Science Advisor: Dr. Gopal Gupta.

Chapter 8 Search and Sort ©Rick Mercer. Outline Understand how binary search finds elements more quickly than sequential search Sort array elements Implement.

Program Analysis Mooly Sagiv Tel Aviv University Sunday Scrieber 8 Monday Schrieber.

Radix Sort and Hash-Join for Vector Computers Ripal Nathuji 6.893: Advanced VLSI Computer Architecture 10/12/00.

CSCI1600: Embedded and Real Time Software Lecture 33: Worst Case Execution Time Steven Reiss, Fall 2015.

CSCI-455/552 Introduction to High Performance Computing Lecture 23.

An Offline Approach for Whole-Program Paths Analysis using Suffix Arrays G. Pokam, F. Bodin.

Placement and Routing Algorithms. 2 FPGA Placement & Routing.

Program Analysis Last Lesson Mooly Sagiv. Goals u Show the significance of set constraints for CFA of Object Oriented Programs u Sketch advanced techniques.

Phoenix Based Dynamic Slicing Debugging Tool Eric Cheng Lin Xu Matt Gruskin Ravi Ramaseshan Microsoft Phoenix Intern Team (Summer '06)

Complexity Analysis (Part I)

Code Optimization.

Data Flow Analysis Suman Jana

Antonia Zhai, Christopher B. Colohan,

High Coverage Detection of Input-Related Security Faults

CSCI1600: Embedded and Real Time Software

Algorithm design and Analysis

University Of Virginia

Florin Balasa University of Illinois at Chicago

STUDY AND IMPLEMENTATION

Radu Rugina and Martin Rinard Laboratory for Computer Science

Pointer analysis.

Bitwidth Analysis with Application to Silicon Compilation

Mapping DSP algorithms to a general purpose out-of-order processor

CSE 373: Data Structures and Algorithms

CSCI1600: Embedded and Real Time Software

Complexity Analysis (Part I)

Presentation transcript:

BitValue: Detecting and Exploiting Narrow Bitwidth Computations Mihai Budiu Carnegie Mellon University joint work with Majd Sakr, Kip Walker and Seth Copen Goldstein

08/29/00Narrow Bitwidths / Europar 002 Word Size Evolution YearCPUWord size Itanium64 Size increase recently driven by address space constraints Claim: data often does not use the whole word width We present a technique for static width inference

08/29/00Narrow Bitwidths / Europar 003 Motivation: Applications Media processing Digital Signal Processing FFT

08/29/00Narrow Bitwidths / Europar 004 Motivation: Applications (2) Source: Brooks & Martonosi, HPCA ‘99 Cumulative frequencyOperations on <16 bits bits

08/29/00Narrow Bitwidths / Europar 005 “MMX” CPU support for narrow widths Reconfigurable hardware Motivation: Hardware (a & 0xf) | (b & 0x18) ba

08/29/00Narrow Bitwidths / Europar 006 No programming language support No compiler support Motivation: Languages int a; long b; int a; a = (a >> 16) & 0xf0;

08/29/00Narrow Bitwidths / Europar 007 Outline Motivation The width inference algorithm Implementations Results Conclusions

08/29/00Narrow Bitwidths / Europar 008 The Width Inference Algorithm Data-flow at the bit level Infer values for each bit of an integer Forward and backward propagation –Forward discover constant bits –Backward discover don’t care bits We use iterative DF analysis Low time and space complexity

08/29/00Narrow Bitwidths / Europar 009 Benefits of Bit Value Inference You don’t have to implement: –don’t care bits –constant bits Use hardware more efficiently  increased performance

08/29/00Narrow Bitwidths / Europar 0010 The Lattices x u u uu 0x xx The bit latticeThe bitstring lattice L Pointwise

08/29/00Narrow Bitwidths / Europar 0011 u0uuu + u00uu u001u Forward (Constant) Propagation

08/29/00Narrow Bitwidths / Europar 0012 Backward (Don’t Care) Propagation + xux xuu In Out xuu

08/29/00Narrow Bitwidths / Europar 0013 Transfer Functions f : int k -> int Forward(f) : L k -> L Backward(f, in) : L x L k-1 -> L ### Given We show how to build

08/29/00Narrow Bitwidths / Europar 0014 Sample Forward Transfer Function 0u + x0 Worst 01 + x x0 WorstBest WorstBest Worst x1 x0 xu We resort to conservative approximations

08/29/00Narrow Bitwidths / Europar 0015 Induction Variable Analysis We complement the data-flow with induction variable analysis We determine the range for the linear loop induction variables j ’s range is 0-10, 4 bits: uuuu is an upper bound for its value for (i=0; i < 5; i++) j = 2*i;

08/29/00Narrow Bitwidths / Europar 0016 Implementation for C Suif compiler passes Intraprocedural, no pointer analysis 1100 lines/second on PIII/600 “Validated” algorithm through code instrumentation We only deal with scalars

08/29/00Narrow Bitwidths / Europar 0017 Implementation for Reconfigurable Hardware Part of a standalone compiler/CAD tool for DIL, a hardware description language DIL allows widths to be unspecified Width inference is used to bound precision and reduce hardware Produce smaller and faster hardware

08/29/00Narrow Bitwidths / Europar 0018 SPECint 95 “Useless” Data (Dynamic) Mediabenchmean Percent

08/29/00Narrow Bitwidths / Europar 0019 Size Histograms (Dynamic)

08/29/00Narrow Bitwidths / Europar 0020 Circuit Reduction for Reconfigurable Hardware

08/29/00Narrow Bitwidths / Europar 0021 Conclusions (1) Wide data values often inappropriate Reducing width can lead to performance increase It is worth to explore architectures which can better exploit useless bits

08/29/00Narrow Bitwidths / Europar 0022 Conclusions (2) Static bit-value analysis is very powerful Efficient data-flow algorithm for bit-value inference Can pass to compiler width hints using masks

Backup slides

08/29/00Narrow Bitwidths / Europar 0024 Sources of Width Reduction Array index calculations Loop induction variables Masking and shifting