Reproducible BLAS Wen Rui Liau Computer Science Division UC Berkeley

Slides:



Advertisements
Similar presentations
Zhongkai Chen. Gonzalez-Navarro, S. ; Tsen, C. ; Schulte, M. ; Univ. of Malaga, Malaga This paper appears in: Signals, Systems and Computers, ACSSC.
Advertisements

Fabián E. Bustamante, Spring 2007 Floating point Today IEEE Floating Point Standard Rounding Floating Point Operations Mathematical properties Next time.
Annoucements  Next labs 9 and 10 are paired for everyone. So don’t miss the lab.  There is a review session for the quiz on Monday, November 4, at 8:00.
Microprocessors General Features To be Examined For Each Chip Jan 24 th, 2002.
Floating Point Numbers
University of Washington Today Topics: Floating Point Background: Fractional binary numbers IEEE floating point standard: Definition Example and properties.
COS 323: Computing for the Physical and Social Sciences Szymon Rusinkiewicz.
CSE 378 Floating-point1 How to represent real numbers In decimal scientific notation –sign –fraction –base (i.e., 10) to some power Most of the time, usual.
Floating-Point and High-Level Languages Programming Languages Spring 2004.
Binary Arithmetic Math For Computers.
Computer Organization and Architecture Computer Arithmetic Chapter 9.
CEN 316 Computer Organization and Design Computer Arithmetic Floating Point Dr. Mansour AL Zuair.
IT253: Computer Organization
Number Systems So far we have studied the following integer number systems in computer Unsigned numbers Sign/magnitude numbers Two’s complement numbers.
Accuracy, Cost, and Performance Trade-offs for Floating Point Accumulation Krishna K. Nagar and Jason D. Bakos Univ. of South Carolina.
Floating Point. Agenda  History  Basic Terms  General representation of floating point  Constructing a simple floating point representation  Floating.
Floating Point Arithmetic
Computer Arithmetic Floating Point. We need a way to represent –numbers with fractions, e.g., –very small numbers, e.g., –very large.
Numeric Weirdness. Weirdness Overflow Each data type has a limited range – Depends on platform/compiler Going past boundary wraps around.
Floating Point Numbers Representation, Operations, and Accuracy CS223 Digital Design.
Binary Arithmetic.
Data Types and Conversions, Input from the Keyboard CS303E: Elements of Computers and Programming.
Binary Numbers The arithmetic used by computers differs in some ways from that used by people. Computers perform operations on numbers with finite and.
1 5. Abstract Data Structures & Algorithms 5.6 Algorithm Evaluation.
Answer CHAPTER FOUR.
Floating Point Representations
Lecture 6. Fixed and Floating Point Numbers
© David Kirk/NVIDIA and Wen-mei W
Cosc 2150: Computer Organization
Data analysis and modeling: the tools of the trade
Binary Numbers The arithmetic used by computers differs in some ways from that used by people. Computers perform operations on numbers with finite and.
Data Representations and Computer Arithmetic
Computer Architecture & Operations I
Recitation 4&5 and review 1 & 2 & 3
Integer Division.
Topics IEEE Floating Point Standard Rounding Floating Point Operations
CS 232: Computer Architecture II
Floating-Point and High-Level Languages
IEEE floating point format
Revision Lecture
A Closer Look at Instruction Set Architectures
Chapter 6 Floating Point
Algorithms with numbers (1) CISC4080, Computer Algorithms
Lecture 10: Floating Point, Digital Design
Binary Numbers Material on Data Representation can be found in Chapter 2 of Computer Architecture (Nicholas Carter) CSC 370 (Blum)
Superscalar Processors & VLIW Processors
Floating Point Representation
ECE 498AL Spring 2010 Lecture 11: Floating-Point Considerations
Section 4.7 Inverse Trig Functions
CSCE Fall 2013 Prof. Jennifer L. Welch.
CS 61C: Great Ideas in Computer Architecture Floating Point Arithmetic
EE 445S Real-Time Digital Signal Processing Lab Spring 2014
(Part 3-Floating Point Arithmetic)
Arithmetic Circuits (Part I) Randy H
P A R A L L E L C O M P U T I N G L A B O R A T O R Y
CS 105 “Tour of the Black Holes of Computing!”
How to represent real numbers
Bits and Bytes Topics Representing information as bits
October 17 Chapter 4 – Floating Point Read 5.1 through 5.3 1/16/2019
Practical Session 9, Memory
CSCE Fall 2012 Prof. Jennifer L. Welch.
ECE 352 Digital System Fundamentals
Operations and Arithmetic
Analysis of Algorithms
Review In last lecture, done with unsigned and signed number representation. Introduced how to represent real numbers in float format.
Topic 3d Representation of Real Numbers
Linear Time Sorting.
Chapter 1: Creating a Program.
Presentation transcript:

Reproducible BLAS Wen Rui Liau Computer Science Division UC Berkeley bebop.cs.berkeley.edu/reproblas/

Remember Bob’s example? (Yesterday)

Floating Point Operations (FLOPS) Computation of floating point numbers (decimal point numbers) Often a trade-off between precision and range Encapsulated with the IEEE754 standard

Motivation Since roundoff makes floating point addition nonassociative, different orders of summation often give different answers On a parallel machine, the order of summation can vary from run to run, or even subroutine-call to subroutine- call, depending on scheduling of available resources, so answers can change Reproducibility important for debugging, validation, contractual obligations, …

Goals (1/2) Reproducible BLAS And eventually higher level libraries Reproducibility means bitwise identical answers on any computer no matter what hardware resources are available, no matter how they are scheduled, for any ordering of inputs, that would get identical results in exact arithmetic Reproducible exception-handling too (return same +∞, -∞, or NaN) Assume only limited subset of IEEE 754 standard

Goals (2/2) Performance/Accuracy requirements Accuracy at least as good as conventional summation (tunable) Do only one read-only pass over data Do only one reduction operation Use as little memory as possible, to enable tiling optimizations

Relevance to other intern projects ReproBLAS project is low-level in nature Potentially useful for the application based projects Force, Molecular dynamics: Ensures consistent summation for results Dot Product: Results may be useful if project requires access to a lower level functions

Project Timeline (1/2) Summer ‘17 1st Half: Summer ’17 2nd Half: Learn basics of Floating Point Arithmetic, Reproducible Summation, performance programming and tuning on Blue Waters system Summer ’17 2nd Half: Implement reproducible BLAS on single node system

Project Timeline (2/2) Fall ‘17: Spring ’18 Continue implementing and tuning performance of reproducible BLAS Spring ’18 Integrate reproducible BLAS into selected LAPACK and ScaLAPACK routines, test overall reproducibility and do performance evaluations Prepare findings for submission to Blue Waters

Pre-Rounding Talk about rounding modes Special rounding - You round to the nearest, break tie if exactly in between by rounding away from 0. Not in the IEE754 for binary. Four rounding modes, usual one is round to nearest, by rounding to even. Another rounding mode - Always round up, always round down. Round towards 0

Pre-Rounding

Pre-Rounding Drawback: costs 2 or 3 reduction/broadcast steps

Pre-Rounding Costs 2 or 3 reduction/broadcast steps

Indexed Summation

Indexed Summation K = 2 bins Boundaries predetermined Input split into several bins Top K = 2 bins are accumulated K = 2 bins Boundaries predetermined

Indexed Summation Only keep top K bins, don’t compute or discard rest Only need to store accumulators for top 2 bins seen so far Only keep top K bins, don’t compute or discard rest

Summation Performance Compare to gcc –O3 applied to: res=0; for ( j=0; j<N; j++ ) { res += X[j]; } Reproducible sum faster for large N ! Other performance data: 3.3 to 4.2x slower vs MKL dot product N=2.^[6:12]