Computer Science 320 Reduction. Estimating π Throw N darts, and let C be the number of darts that land within the circle quadrant of a unit circle Then,

Slides:



Advertisements
Similar presentations
AP Computer Science Anthony Keen. Computer 101 What happens when you turn a computer on? –BIOS tries to start a system loader –A system loader tries to.
Advertisements

Computer Science 320 Clumping in Parallel Java. Sequential vs Parallel Program Initial setup Execute the computation Clean up Initial setup Create a parallel.
Standard Algorithms Find the highest number. ! Your name and today’s date ! Find the maximum Dim numbers(20) As Integer.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.
Practical techniques & Examples
Computer Science 320 Reduction Variables and Operators.
Embarrassingly Parallel (or pleasantly parallel) Domain divisible into a large number of independent parts. Minimal or no communication Each processor.
1 9/29/06CS150 Introduction to Computer Science 1 Loops Section Page 255.
Embarrassingly Parallel Computations Partitioning and Divide-and-Conquer Strategies Pipelined Computations Synchronous Computations Asynchronous Computations.
1 Tuesday, October 03, 2006 If I have seen further, it is by standing on the shoulders of giants. -Isaac Newton.
Parallel Merging Advanced Algorithms & Data Structures Lecture Theme 15 Prof. Dr. Th. Ottmann Summer Semester 2006.
Random (1) Random class contains a method to generate random numbers of integer and double type Note: before using Random class, you should add following.
BUILDING JAVA PROGRAMS CHAPTER 7 Array Algorithms.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Chapter 6Java: an Introduction to Computer Science & Programming - Walter Savitch 1 l Array Basics l Arrays in Classes and Methods l Programming with Arrays.
Introduction to Parallel Programming MapReduce Except where otherwise noted all portions of this work are Copyright (c) 2007 Google and are licensed under.
Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.
A Bridge to Your First Computer Science Course Prof. H.E. Dunsmore Concurrent Programming Threads Synchronization.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Computer Science 320 Load Balancing for Hybrid SMP/Clusters.
Computer Science 320 Broadcasting. Floyd’s Algorithm on SMP for i = 0 to n – 1 parallel for r = 0 to n – 1 for c = 0 to n – 1 d rc = min(d rc, d ri +
CSE 260 – Parallel Processing UCSD Fall 2006 A Performance Characterization of UPC Presented by – Anup Tapadia Fallon Chen.
CS320n –Visual Programming More LabVIEW Control Structures.
Simulation Time-stepping and Monte Carlo Methods Random Number Generation Shirley Moore CS 1401 Spring 2013 March 26, 2013.
C Functions Pepper. Objectives Create functions Function prototypes Parameters – Pass by value or reference – Sending a reference Return values Math functions.
Definitions Speed-up Efficiency Cost Diameter Dilation Deadlock Embedding Scalability Big Oh notation Latency Hiding Termination problem Bernstein’s conditions.
1 " Teaching Parallel Design Patterns to Undergraduates in Computer Science” Panel member SIGCSE The 45 th ACM Technical Symposium on Computer Science.
By Chad Blankenbeker.  The for-loop is best used when you know how many times it is going to be looped  So if you know you want it to only loop 10 times,
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Principles of Parallel Programming First Edition by Calvin Lin Lawrence Snyder.
Lab 2 Parallel processing using NIOS II processors
CSCI 171 Presentation 6 Functions and Variable Scope.
Computer Science 320 Introduction to Hybrid SMP/Clusters.
Computer Science 320 Reduction. Estimating π Throw N darts, and let C be the number of darts that land within the circle quadrant of a unit circle Then,
Lec 21 More Fun with Arrays: For Loops. Agenda Some backfill for Lab 20: – Using an array in an applet or class – array instance variables – using Math.random()
Computer Science 320 Load Balancing with Clusters.
CSCI-455/552 Introduction to High Performance Computing Lecture 9.
Georgia Institute of Technology Speed part 4 Barb Ericson Georgia Institute of Technology May 2006.
Computer Science 320 Parallel Image Generation. The Mandelbrot Set.
Zhen Jiang Dept. of Computer Science West Chester University West Chester, PA CSC141 Computer Science I 2/4/20161.
Computer Science 320 Random Numbers for Parallel Programs.
LECTURE 23: LOVE THE BIG-OH CSC 212 – Data Structures.
Computer Science 320 A First Program in Parallel Java.
Pattern Programming with the Seeds Framework © 2013 B. Wilkinson/Clayton Ferner SIGCSE 2013 Workshop 31 intro.ppt Modification date: Feb 17,
Arrays Chapter 12. Overview Arrays and their properties Creating arrays Accessing array elements Modifying array elements Loops and arrays.
int [] scores = new int [10];
Embarrassingly Parallel (or pleasantly parallel) Characteristics Domain divisible into a large number of independent parts. Little or no communication.
Computer Science 320 Barrier Actions. 1-D Continuous Cellular Automata 1-D array of cells, each having a value between 0.0 and 1.0 Each cell has a neighborhood.
Intro. to Data Structures Chapter 7 Sorting Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University 1 Chapter 7 Sorting Sort is.
Programming for Performance Laxmikant Kale CS 433.
1 Computer Science of Graphics and Games MONT 105S, Spring 2009 Lecture 9 For Loops.
VISUAL C++ PROGRAMMING: CONCEPTS AND PROJECTS Chapter 7A Arrays (Concepts)
Computer Science 320 Introduction to Cluster Computing.
DGrid: A Library of Large-Scale Distributed Spatial Data Structures Pieter Hooimeijer,
Suzaku Pattern Programming Framework (a) Structure and low level patterns © 2015 B. Wilkinson Suzaku.pptx Modification date February 22,
High Altitude Low Opening?
C Functions Pepper.
LOOPS.
CSC 142 Computer Science II
CSC141 Computer Science I Zhen Jiang Dept. of Computer Science
Using compiler-directed approach to create MPI code automatically
Random numbers What does it mean for a number to be random?
Lecture#12: External Sorting (R&G, Ch13)
Embarrassingly Parallel
CSC4005 – Distributed and Parallel Computing
By Brandon, Ben, and Lee Parallel Computing.
Computer Science 3 03A-Searching
Parallel Programming in C with MPI and OpenMP
Lecture 9 Randomized Algorithms
Random numbers What does it mean for a number to be random?
Presentation transcript:

Computer Science 320 Reduction

Estimating π Throw N darts, and let C be the number of darts that land within the circle quadrant of a unit circle Then, C / N should be about the same ratio as circle area / square area Circle’s area = π * R 2, and circle quadrant’s area is π / 4, where R = 1 Then C / N = π / 4, and π = 4 * C / N

Sequential Program PiSeq // Generate n random points in the unit square, count how many are in // the unit circle. count = 0; for (long i = 0; i < N; ++ i){ double x = prng.nextDouble(); double y = prng.nextDouble(); if (x * x + y * y <= 1.0) ++ count; } // Stop timing. time += System.currentTimeMillis(); // Print results. System.out.println("pi = 4 * " + count + " / " + N + " = " + (4.0 * count / N));

Parallel Program PiSmp3 new ParallelTeam().execute (new ParallelRegion(){ public void run() throws Exception{ execute (0, N-1, new LongForLoop(){ // Set up per-thread PRNG and counter. Random prng_thread = Random.getInstance(seed); long count_thread = 0; // Extra padding to avert cache interference. long pad0, pad1, pad2, pad3, pad4, pad5, pad6, pad7; long pad8, pad9, pada, padb, padc, padd, pade, padf; // Parallel loop body. public void run (long first, long last){ // Skip PRNG ahead to index prng_thread.setSeed(seed); prng_thread.skip(2 * first); // Generate random points. for (long i = first; i <= last; ++ i){ double x = prng_thread.nextDouble(); double y = prng_thread.nextDouble(); if (x * x + y * y <= 1.0) ++ count_thread; }

Reduction Step, SMP-Style static SharedLong count;... public void finish(){ // Reduce per-thread counts into shared count. count.addAndGet(count_thread); }

Monte Carlo Design for a Cluster Could keep global counter in process 0, but that would involve too many messages Use reduction instead, so message passing is minimal Each process has its own PRNG, with its own split sequence

Reduction vs Gather Could allocate an array of K cells for results, where the ith processor’s result is in the ith cell; then gather these into process 0 and let process 0 reduce the end result from these Instead, the reduce method employs all processes in computing the reduction

Reduction in Cluster Concentrate data into fewer and fewer processes When K = 8, –processes 4-7 send their data to processes 0-3 –processes 2-3 send their results to processes 0-1 –process 1 sends its results to process 0 At most log 2 (K) messages!

Reduction Tree for K = 8 Messages are sent in parallel at each level, starting at the bottom When results have been computed, messages are sent from the next level

Example: Add the Results Initial stateAfter first set of messages

Example: Add the Results After second set of messagesAfter third set of messages

It’s Automatic: reduce world.reduce(0, buf, InegerOp.SUM); // Compute the count in each processor... // Perform the reduction step LongItemBuf buf = new LongItemBuf(); buf.item = count; world.reduce(0, buf, InegerOp.SUM); count = buf.item;... if (rank == 0) // Output the count and the estimate of PI

Reduction in Mandelbrot Histogram int[] histogram = new int[maxiter = 1];... world.reduce(0, IntegerBuf.buffer(histogram), InegerOp.SUM);

Reduction in Mandelbrot Histogram int[] histogram = new int[maxiter = 1];... world.reduce(0, IntegerBuf.buffer(histogram), InegerOp.SUM);