Drew Freer, Beayna Grigorian, Collin Lambert, Alfonso Roman, Brian Soumakian.

Slides:



Advertisements
Similar presentations
Arithmetic in Computers Chapter 4 Arithmetic in Computers2 Outline Data representation integers Unsigned integers Signed integers Floating-points.
Advertisements

Advanced microprocessor optimization Kampala August, 2007 Agner Fog
Topics covered: Floating point arithmetic CSE243: Introduction to Computer Architecture and Hardware/Software Interface.
COE 202: Digital Logic Design Signed Numbers
CENG536 Computer Engineering Department Çankaya University.
1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 3 (and Appendix C) Instruction-Level Parallelism and Its Exploitation Computer Architecture.
1 ALUs. 2 Topics: ALU Overview - core of the integer datapath - 2 operands, 32-bits wide, plus control signals Exercise: A simple multiplier.
DISTRIBUTED AND HIGH-PERFORMANCE COMPUTING CHAPTER 7: SHARED MEMORY PARALLEL PROGRAMMING.
Chapter 4 Operations on Bits
UNIVERSITY OF MASSACHUSETTS Dept
1 ITCS4145/5145, Parallel Programming B. Wilkinson Feb 21, 2012 Programming with Shared Memory Introduction to OpenMP.
Lecture 5 Sept 14 Goals: Chapter 2 continued MIPS assembly language instruction formats translating c into MIPS - examples.
Computer ArchitectureFall 2008 © August 25, CS 447 – Computer Architecture Lecture 3 Computer Arithmetic (1)
Finite Arithmetics of Integers FORTRAN codes: INTEGER*2INTEGER HTML HTML version.
Computer ArchitectureFall 2007 © August 29, 2007 Karem Sakallah CS 447 – Computer Architecture.
Kathy Grimes. Signals Electrical Mechanical Acoustic Most real-world signals are Analog – they vary continuously over time Many Limitations with Analog.
Operations on data CHAPTER 4.
Lecture 12: Computer Arithmetic Today’s topic –Numerical representations –Addition / Subtraction –Multiplication / Division 1.
– 1 – Basic Machine Independent Performance Optimizations Topics Load balancing (review, already discussed) In the context of OpenMP notation Performance.
Arithmetic for Computers
Chapter 1 Algorithm Analysis
A Prototypical Self-Optimizing Package for Parallel Implementation of Fast Signal Transforms Kang Chen and Jeremy Johnson Department of Mathematics and.
This module was created with support form NSF under grant # DUE Module developed by Martin Burtscher Module B1 and B2: Parallelization.
A COMPARISON MPI vs POSIX Threads. Overview MPI allows you to run multiple processes on 1 host  How would running MPI on 1 host compare with POSIX thread.
Data Representation – Binary Numbers
Digital Kommunikationselektronik TNE027 Lecture 2 1 FA x n –1 c n c n1- y n1– s n1– FA x 1 c 2 y 1 s 1 c 1 x 0 y 0 s 0 c 0 MSB positionLSB position Ripple-Carry.
Computer Organization David Monismith CS345 Notes to help with the in class assignment.
Sieve of Eratosthenes by Fola Olagbemi. Outline What is the sieve of Eratosthenes? Algorithm used Parallelizing the algorithm Data decomposition options.
Floating Point Arithmetic
Work Replication with Parallel Region #pragma omp parallel { for ( j=0; j
©Brooks/Cole, 2003 Chapter 4 Operations on Bits. ©Brooks/Cole, 2003 Apply arithmetic operations on bits when the integer is represented in two’s complement.
CDA 3101 Fall 2013 Introduction to Computer Organization The Arithmetic Logic Unit (ALU) and MIPS ALU Support 20 September 2013.
Automating and Optimizing Data Transfers for Many-core Coprocessors Student: Bin Ren, Advisor: Gagan Agrawal, NEC Intern Mentor: Nishkam Ravi, Yi Yang.
Parallelization of likelihood functions for data analysis Alfio Lazzaro CERN openlab Forum on Concurrent Programming Models and Frameworks.
Optimizing MapReduce for GPUs with Effective Shared Memory Usage Department of Computer Science and Engineering The Ohio State University Linchuan Chen.
Computer Architecture Lecture 22 Fasih ur Rehman.
Introduction to OpenMP Eric Aubanel Advanced Computational Research Laboratory Faculty of Computer Science, UNB Fredericton, New Brunswick.
Radix Sort and Hash-Join for Vector Computers Ripal Nathuji 6.893: Advanced VLSI Computer Architecture 10/12/00.
OPTIMIZING DSP SCHEDULING VIA ADDRESS ASSIGNMENT WITH ARRAY AND LOOP TRANSFORMATION Chun Xue, Zili Shao, Ying Chen, Edwin H.-M. Sha Department of Computer.
Chapter 10 Instruction Sets: Characteristics and Functions Felipe Navarro Luis Gomez Collin Brown.
Arithmetic Operations
Parallelization Strategies Laxmikant Kale. Overview OpenMP Strategies Need for adaptive strategies –Object migration based dynamic load balancing –Minimal.
Computer Architecture Lecture 11 Arithmetic Ralph Grishman Oct NYU.
3/12/2013Computer Engg, IIT(BHU)1 CONCEPTS-1. Pipelining Pipelining is used to increase the speed of processing It uses temporal parallelism In pipelining,
Instruction Sets. Instruction set It is a list of all instructions that a processor can execute. It is a list of all instructions that a processor can.
Lecture #23: Arithmetic Circuits-1 Arithmetic Circuits (Part I) Randy H. Katz University of California, Berkeley Fall 2005.
MicroProcessors Lec. 4 Dr. Tamer Samy Gaafar. Course Web Page —
Chapter 4 Operations on Bits. Apply arithmetic operations on bits when the integer is represented in two’s complement. Apply logical operations on bits.
Heterogeneous Computing using openMP lecture 2 F21DP Distributed and Parallel Technology Sven-Bodo Scholz.
CPE779: Shared Memory and OpenMP Based on slides by Laxmikant V. Kale and David Padua of the University of Illinois.
Floating Point Arithmetic – Part I
Parallel Programming By J. H. Wang May 2, 2017.
CSCI206 - Computer Organization & Programming
Task Scheduling for Multicore CPUs and NUMA Systems
Introduction to OpenMP
Shared-Memory Programming
Lecture 5: GPU Compute Architecture
Lecture 5: GPU Compute Architecture for the last time
Performance Optimization for Embedded Software
CSCI206 - Computer Organization & Programming
Arithmetic Circuits (Part I) Randy H
Optimizing MapReduce for GPUs with Effective Shared Memory Usage
Languages and Compilers (SProg og Oversættere)
UNIVERSITY OF MASSACHUSETTS Dept
UNIVERSITY OF MASSACHUSETTS Dept
CSC 220: Computer Organization Signed Number Representation
靜夜思 床前明月光, 疑是地上霜。 舉頭望明月, 低頭思故鄉。 ~ 李白 李商隱.
Prof. Onur Mutlu Carnegie Mellon University
Presentation transcript:

Drew Freer, Beayna Grigorian, Collin Lambert, Alfonso Roman, Brian Soumakian

Introduction Motivation Massive integers computations are useful for science and research Difficult to achieve efficiently with standard computing resources E.g. GNU Multiple Precision Arithmetic Library Problem Big Integer Arithmetic Current limitations based on computer architecture

Introduction Goal Allow for arithmetic on operands of unlimited size Perform operations efficiently (parallelize!) Solution Utilize existing parallel programming frameworks Implement new integer representation: BigInt Support common operations: Addition, Subtraction, Multiplication, Division Bitwise AND, Bitwise OR, Bitwise XOR, Bitwise NOT Left-Shift, Right-Shift, Equality-Test

Design Overview Sign-and-Magnitude representation Array of integers w/ MSB used as carry bit (overflow) Multiplication: Shift-and-Add Division: Shift-and-Subtract Key Insight: Delayed Carry-Ripple! I.E. Exploit arithmetic on un-normalized values Mostly beneficial for addition and multiplication Only normalize when absolutely necessary (e.g. comparisons, print value, etc.)

Design Overview: BigInt Repr.

Parallel Programming OpenMP Need to compute massive amounts of data  Ideal problem for data decomposition OpenMP simplifies data decomposition Loop iterations easily divided amongst threads (#pragma omp parallel for) Static, dynamic, and guided Automated reductions Easy to compare sequential vs. parallel execution

Division of Tasks Development BigInt Structure: Brian & Drew Addition & Subtraction: Drew Multiplication & Division: Collin & Beayna Bitwise Ops & Equality-Test: Alfonso Left-Shift & Right-Shift: Brian Testing & Code Review Everyone!

Progress Basic Functionality Parse & Print Values: Complete. Bitwise Ops & Equality: Complete. Addition & Multiplication: Complete. Left-Shift & Right-Shift: Complete. Subtraction & Division: Started… Optimizations Drafted Ready for implementation

Challenges & Bugs Design Challenges Minimize memory usage (avoid unnecessary allocations) Aliasing issues (e.g. a+b = a) Minimize sequential tasks (i.e. normalization) Expose algorithmic shortcuts (e.g. a+0=a) Appropriate handling of boundary cases Interesting Bugs Unintentional deallocation of result pointers Memory swapping Masking and shifting

Results * 5000 Chunks = ± 2 31*625 * 1,000,000 Chunks = ± 2 31*1250