Modifying Floating-Point Precision with Binary Instrumentation Michael Lam University of Maryland, College Park Jeff Hollingsworth, Advisor.

Slides:



Advertisements
Similar presentations
Automatically Adapting Programs for Mixed-Precision Floating-Point Computation Mike Lam and Jeff Hollingsworth University of Maryland, College Park Bronis.
Advertisements

Binary-Level Tools for Floating-Point Correctness Analysis Michael Lam LLNL Summer Intern 2011 Bronis de Supinski, Mentor.
Roundoff and truncation errors
A Discrete Adjoint-Based Approach for Optimization Problems on 3D Unstructured Meshes Dimitri J. Mavriplis Department of Mechanical Engineering University.
University of Washington Today Topics: Floating Point Background: Fractional binary numbers IEEE floating point standard: Definition Example and properties.
1 Presentation at the 4 th PMEO-PDS Workshop Benchmark Measurements of Current UPC Platforms Zhang Zhang and Steve Seidel Michigan Technological University.
1 Lecture 9: Floating Point Today’s topics:  Division  IEEE 754 representations  FP arithmetic Reminder: assignment 4 will be posted later today.
COS 323: Computing for the Physical and Social Sciences Szymon Rusinkiewicz.
MA5233: Computational Mathematics
COS 323: Computing for the Physical and Social Sciences Szymon Rusinkiewicz.
The Problem With The Linpack Benchmark 1.0 Matrix Generator Jack J. Dongarra and Julien Langou International Journal of High Performance Computing Applications.
Revision.
High Performance Computing 1 Parallelization Strategies and Load Balancing Some material borrowed from lectures of J. Demmel, UC Berkeley.
PARALLEL PROCESSING The NAS Parallel Benchmarks Daniel Gross Chen Haiout.
Informationsteknologi Friday, October 19, 2007Computer Architecture I - Class 61 Today’s class Floating point numbers Computer systems organization.
Active Set Support Vector Regression
Monica Garika Chandana Guduru. METHODS TO SOLVE LINEAR SYSTEMS Direct methods Gaussian elimination method LU method for factorization Simplex method of.
COMPUTER ARCHITECTURE & OPERATIONS I Instructor: Hao Ji.
1 Lecture 10: FP, Performance Metrics Today’s topics:  IEEE 754 representations  FP arithmetic  Evaluating a system Reminder: assignment 4 due in a.
CISE-301: Numerical Methods Topic 1: Introduction to Numerical Methods and Taylor Series Lectures 1-4: KFUPM.
Automated Floating-Point Precision Analysis Michael O. Lam Ph.D. Defense 6 Jan 2014 Jeff Hollingsworth, Advisor.
1 Titanium Review: Ti Parallel Benchmarks Kaushik Datta Titanium NAS Parallel Benchmarks Kathy Yelick U.C. Berkeley September.
Floating Point Analysis Using Dyninst Mike Lam University of Maryland, College Park Jeff Hollingsworth, Advisor.
Motivation “Every three minutes a woman is diagnosed with Breast cancer” (American Cancer Society, “Detailed Guide: Breast Cancer,” 2006) Explore the use.
An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.
Numerical Computations in Linear Algebra. Mathematically posed problems that are to be solved, or whose solution is to be confirmed on a digital computer.
High Throughput Compression of Double-Precision Floating-Point Data Martin Burtscher and Paruj Ratanaworabhan School of Electrical and Computer Engineering.
Information Representation (Level ISA3) Floating point numbers.
Background (Floating-Point Representation 101)  Floating-point represents real numbers as (± sig × 2 exp )  Sign bit  Significand (“mantissa” or “fraction”)
 The PFunc Implementation of NAS Parallel Benchmarks. Presenter: Shashi Kumar Nanjaiah Advisor: Dr. Chung E Wang Department of Computer Science California.
Fixed-Point Arithmetics: Part II
Number Systems So far we have studied the following integer number systems in computer Unsigned numbers Sign/magnitude numbers Two’s complement numbers.
CISE-301: Numerical Methods Topic 1: Introduction to Numerical Methods and Taylor Series Lectures 1-4: KFUPM CISE301_Topic1.
CISE301_Topic11 CISE-301: Numerical Methods Topic 1: Introduction to Numerical Methods and Taylor Series Lectures 1-4:
An Introduction to high precision calculations of well-known mathematical constants by Kurt Calder Calculating Euler’s Number “e” using continued fractions.
ECE232: Hardware Organization and Design
College of Engineering Representing Numbers in a Computer Section B, Spring 2003 COE1361: Computing for Engineers COE1361: Computing for Engineers 1 COE1361:
A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Tekin Bicer Gagan Agrawal 1.
A STUDY OF BRANCH PREDICTION STRATEGIES JAMES E.SMITH Presented By: Prasanth Kanakadandi.
A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Graduate Student Department Of CSE 1.
ParCFD Parallel computation of pollutant dispersion in industrial sites Julien Montagnier Marc Buffat David Guibert.
Lecture 9: Floating Point
1 Lecture 2  Complement  Floating Point Number  Character Encoding.
PDCS 2007 November 20, 2007 Accelerating the Complex Hessenberg QR Algorithm with the CSX600 Floating-Point Coprocessor Yusaku Yamamoto 1 Takafumi Miyata.
Elliptic PDEs and the Finite Difference Method
On the Use of Sparse Direct Solver in a Projection Method for Generalized Eigenvalue Problems Using Numerical Integration Takamitsu Watanabe and Yusaku.
2007/11/2 First French-Japanese PAAP Workshop 1 The FFTE Library and the HPC Challenge (HPCC) Benchmark Suite Daisuke Takahashi Center for Computational.
University of Maryland Dynamic Floating-Point Error Detection Mike Lam, Jeff Hollingsworth and Pete Stewart.
University of Maryland Using Dyninst to Measure Floating-point Error Mike Lam, Jeff Hollingsworth and Pete Stewart.
Numerical Analysis Intro to Scientific Computing.
University of Maryland Towards Automated Tuning of Parallel Programs Jeffrey K. Hollingsworth Department of Computer Science University.
Modeling Biosystems Mathematical models are tools that biomedical engineers use to predict the behavior of the system. Three different states are modeled.
With a focus on floating point.  For floating point (i.e., real numbers), MASM supports:  real4  single precision; IEEE standard; analogous to float.
Chapter 3 Arithmetic for Computers. Chapter 3 — Arithmetic for Computers — 2 Arithmetic for Computers Operations on integers Addition and subtraction.
Floating Point Numbers Representation, Operations, and Accuracy CS223 Digital Design.
Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA
IT11004: Data Representation and Organization Floating Point Representation.
A Parallel Linear Solver for Block Circulant Linear Systems with Applications to Acoustics Suzanne Shontz, University of Kansas Ken Czuprynski, University.
Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA Shirley Moore CPS5401 Fall 2013 svmoore.pbworks.com November 12, 2012.
TEMPLATE DESIGN © H. Che 2, E. D’Azevedo 1, M. Sekachev 3, K. Wong 3 1 Oak Ridge National Laboratory, 2 Chinese University.
Cosc 2150: Computer Organization Chapter 9, Part 3 Floating point numbers.
Ioannis E. Venetis Department of Computer Engineering and Informatics
Fei Cai Shaogang Wu Longbing Zhang and Zhimin Tang
Roundoff and Truncation Errors
Memory Opportunity in Multicore Era
Bin Ren, Gagan Agrawal, Brad Chamberlain, Steve Deitz
Kenneth Moreland Edward Angel Sandia National Labs U. of New Mexico
Ph.D. Thesis Numerical Solution of PDEs and Their Object-oriented Parallel Implementations Xing Cai October 26, 1998.
Roundoff and Truncation Errors
CISE-301: Numerical Methods Topic 1: Introduction to Numerical Methods and Taylor Series Lectures 1-4: KFUPM CISE301_Topic1.
Presentation transcript:

Modifying Floating-Point Precision with Binary Instrumentation Michael Lam University of Maryland, College Park Jeff Hollingsworth, Advisor

Background Floating-point represents real numbers as (± sgnf × 2 exp ) o Sign bit o Exponent o Significand (“mantissa” or “fraction”) Floating-point numbers have finite precision o Single-precision: 24 bits (~7 decimal digits) o Double-precision: 53 bits (~16 decimal digits) Significand (23 bits)Exponent (8 bits) IEEE Single Significand (52 bits)Exponent (11 bits) IEEE Double

1/10  Images courtesy of BinaryConvert.com Single-precision Double-precisionExample

Motivation Finite precision causes round-off error o Compromises “ill-conditioned” calculations o Hard to detect and diagnose Increasingly important as HPC scales o Double-precision data movement is a bottleneck o Streaming processors are faster in single-precision (~2x) o Need to balance speed (singles) and accuracy (doubles) 4

Previous Work Traditional error analysis (Wilkinson 1964) o Forwards vs. backwards o Requires extensive numerical analysis expertise Interval/affine arithmetic (Goubault 2001) o Conservative static error bounds are largely unhelpful 5 Correct value High bound Low bound Time Value

Previous Work Manual mixed-precision (Dongarra 2008) o Requires numerical expertise Fallback: ad-hoc experiments o Tedious, time-consuming, and error-prone 6 1: LU ← PA 2: solve Ly = Pb 3: solve Ux 0 = y 4: for k = 1, 2,... do 5:r k ← b – Ax k-1 6:solve Ly = Pr k 7:solve Uz k = y 8:x k ← x k-1 + z k 9:check for convergence 10: end for Red text indicates steps performed in double-precision (all other steps are single-precision) Mixed-precision linear solver algorithm

Our Goal Develop automated analysis techniques to inform developers about floating-point behavior and make recommendations regarding the precision level that each part of a computer program must use in order to maintain overall accuracy. 7

Framework CRAFT: Configurable Runtime Analysis for Floating-point Tuning Static binary instrumentation o Controlled by configuration settings o Replace floating-point instructions with new code o Re-write a modified binary Dynamic analysis o Run modified program on representative data set o Produces results and recommendations 8

Advantages Automated o Minimize developer effort o Ensure consistency and correctness Binary-level o Include shared libraries without source code o Include compiler optimizations Runtime o Dataset and communication sensitivity 9

downcast conversion Current approach: in-place replacement o Narrowed focus: doubles  singles o In-place downcast conversion o Flag in the high bits to indicate replacement Double Replaced Double 7FF4DEAD Non-signalling NaN Single 10Implementation

Example gvec[i,j] = gvec[i,j] * lvec[3] + gvar 1movsd 0x601e38(%rax, %rbx, 8)  %xmm0 2mulsd -0x78(%rsp)  %xmm0 3addsd -0x4f02(%rip)  %xmm0 4movsd %xmm0  0x601e38(%rax, %rbx, 8) 11

gvec[i,j] = gvec[i,j] * lvec[3] + gvar 1movsd 0x601e38(%rax, %rbx, 8)  %xmm0 check/replace -0x78(%rsp) and %xmm0 2mulss -0x78(%rsp)  %xmm0 check/replace -0x4f02(%rip) and %xmm0 3addss -0x20dd43(%rip)  %xmm0 4movsd %xmm0  0x601e38(%rax, %rbx, 8) 12Example

Implementation 13 double  single conversion original instruction in block block splits initializationcleanup

Configuration 14

Autoconfiguration Helper script o Generates and tests a variety of configurations o Keeps a “work queue” of untested configurations o Brute-force attempt to find maximal replacement Algorithm: o Initially, build individual configurations for each function and add them to the work queue o Retrieve the next available configuration and test it o If it passes, add it to the final configuration o If it fails, build individual configurations for any child members (basic blocks, instructions) and add them to the queue o Build and test the final configuration 15

NAS Benchmarks EP (Embarrassingly Parallel) o Generate independent Gaussian random variates using the Marsaglia polar method CG (Conjugate gradient) o Estimate the smallest eigenvalue of a matrix using the inverse iteration with the conjugate gradient method FT (Fourier Transform) o Solve a three-dimensional partial differential equation (PDE) using the fast Fourier transform (FFT) MG (MultiGrid) o Approximate the solution to a three-dimensional discrete Poisson equation using the V-cycle multigrid method 16

Results 17 Results shown for final configuration only. All benchmarks were 8-core versions compiled by the Intel Fortran compiler with optimization enabled. Tests were performed on the Sierra cluster at LLNL (Intel Xeon 2.8GHz w/ twelve cores and 24 GB memory per node running 64-bit Linux).

Observations randlc / vranlc o Random number generators are dependent on a 64-bit floating-point data type Accumulators o Multiple operations concluded with an addition o Requires double-precision only for the final addition Minimal dataset-based variation o Larger variation due to optimization level 18

Future Work Automated configuration tuning 19

Conclusion Automated instrumentation techniques can be used to implement mixed-precision configurations for floating-point code, and there is much opportunity in the future for automated precision recommendations. 20

Acknowledgements Jeff Hollingsworth, University of Maryland (Advisor) Drew Bernat, University of Wisconsin Bronis de Supinski, LLNL Barry Rountree, LLNLMatt Legendre, LLNL Greg Lee, LLNLDong Ahn, LLNL 21