Sending a few stragglers home

Slides:



Advertisements
Similar presentations
Partitioning and Divide-and-Conquer Strategies ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, Jan 23, 2013.
Advertisements

CIS December '99 Introduction to Parallel Architectures Dr. Laurence Boxer Niagara University.
Multilingual Debugging Support for Data-driven Parallel Languages Parthasarathy Ramachandran Laxmikant Kale Parallel Programming Laboratory Dept. of Computer.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
CPSC 668Set 10: Consensus with Byzantine Failures1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
FALL 2006CENG 351 Data Management and File Structures1 External Sorting.
Topic Overview One-to-All Broadcast and All-to-One Reduction
Chapter 5 Parallel Join 5.1Join Operations 5.2Serial Join Algorithms 5.3Parallel Join Algorithms 5.4Cost Models 5.5Parallel Join Optimization 5.6Summary.
Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.
Sieve of Eratosthenes by Fola Olagbemi. Outline What is the sieve of Eratosthenes? Algorithm used Parallelizing the algorithm Data decomposition options.
1 Blue Gene Simulator Gengbin Zheng Gunavardhan Kakulapati Parallel Programming Laboratory Department of Computer Science.
A Fault Tolerant Protocol for Massively Parallel Machines Sayantan Chakravorty Laxmikant Kale University of Illinois, Urbana-Champaign.
CHAN Siu Lung, Daniel CHAN Wai Kin, Ken CHOW Chin Hung, Victor KOON Ping Yin, Bob SPRINT: A Scalable Parallel Classifier for Data Mining.
Workshop on Price Index Compilation Issues February 23-27, 2015 Data Collection Issues Gefinor Rotana Hotel, Beirut, Lebanon.
CS 232: Computer Architecture II Prof. Laxmikant (Sanjay) Kale.
Programming an SMP Desktop using Charm++ Laxmikant (Sanjay) Kale Parallel Programming Laboratory Department of Computer Science.
3/12/2013Computer Engg, IIT(BHU)1 CONCEPTS-1. Pipelining Pipelining is used to increase the speed of processing It uses temporal parallelism In pipelining,
Using Charm++ with Arrays Laxmikant (Sanjay) Kale Parallel Programming Lab Department of Computer Science, UIUC charm.cs.uiuc.edu.
Department of Computer Science, Johns Hopkins University Lecture 7 Finding Concurrency EN /420 Instructor: Randal Burns 26 February 2014.
Computing Shapley values, manipulating value division schemes, and checking core membership in multi-issue domains Vincent Conitzer, Tuomas Sandholm Computer.
Massively Parallel Cosmological Simulations with ChaNGa Pritish Jetley, Filippo Gioachin, Celso Mendes, Laxmikant V. Kale and Thomas Quinn.
Pitfalls: Time Dependent Behaviors CS433 Spring 2001 Laxmikant Kale.
1 ChaNGa: The Charm N-Body GrAvity Solver Filippo Gioachin¹ Pritish Jetley¹ Celso Mendes¹ Laxmikant Kale¹ Thomas Quinn² ¹ University of Illinois at Urbana-Champaign.
Snapshots, checkpoints, rollback, and restart
CSE373: Data Structures & Algorithms Priority Queues
Design & Analysis of Algorithm Hashing
LiveViz – What is it? Charm++ library Visualization tool
Parallel Databases.
Parallel Programming By J. H. Wang May 2, 2017.
Slides by Steve Armstrong LeTourneau University Longview, TX
Running Example – Airline
Interquery Parallelism
MCA 301: Design and Analysis of Algorithms
Database Management Systems (CS 564)
Implementing Simplified Molecular Dynamics Simulation in Different Parallel Paradigms Chao Mei April 27th, 2006 CS498LVK.
Hash Tables Part II: Using Buckets
Problem Solving: Brute Force Approaches
Sorting in linear time Idea: if we can assume there are only k possible values to sort, we have extra information about where each element might need.
Component Frameworks:
Distributed Consensus
Unit-2 Divide and Conquer
B- Trees D. Frey with apologies to Tom Anastasio
Introduction to Database Systems
B- Trees D. Frey with apologies to Tom Anastasio
Sorting … and Insertion Sort.
Hash Tables Chapter 12 discusses several ways of storing information in an array, and later searching for the information. Hash tables are a common.
Chapters 15 and 16b: Query Optimization
Merge Sort 1/12/2019 5:31 PM Dynamic Programming Dynamic Programming.
Background and Motivation
Noncomparison Based Sorting
Dynamic Programming Merge Sort 1/18/ :45 AM Spring 2007
Sub-Quadratic Sorting Algorithms
CS202 - Fundamental Structures of Computer Science II
Sorting "There's nothing in your head the sorting hat can't see. So try me on and I will tell you where you ought to be." -The Sorting Hat, Harry Potter.
B- Trees D. Frey with apologies to Tom Anastasio
Coarse Grained Parallel Selection
Implementation of Relational Operations
DATABASE HISTOGRAMS E0 261 Jayant Haritsa
MapReduce Algorithm Design
Hash Tables Chapter 12 discusses several ways of storing information in an array, and later searching for the information. Hash tables are a common.
Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.
Hash Tables Buckets/Chaining
Algorithms CSCI 235, Spring 2019 Lecture 18 Linear Sorting
CENG 351 Data Management and File Structures
What we learn with pleasure we never forget. Alfred Mercier
Parallel Programming in C with MPI and OpenMP
An Orchestration Language for Parallel Objects
Dynamic Programming Merge Sort 5/23/2019 6:18 PM Spring 2008
Pointer analysis John Rollinson & Kaiyuan Li
Lecture-Hashing.
Presentation transcript:

Sending a few stragglers home Consider a situation in which a chare array is sorted Values are between 0 and M, long integers Without worrying too much about data balance The data distribution is uniform, so, we decide that chare I will hold values between (I*M/P, (I+1)*M/P -1) Where P is the number of chares in the array Now, each chare generates a few new items Their value may be anywhere in the range 0..M Lets assume the “few” is really small, like 5 or 10 And P is large (say > 10,000) Also, the total data on each chare is large.. But that’s immaterial How can we send them to their correct home places? Laxmikant Kalé and PPL (UIUC) – Parallel Migratable Objects

Send a few stragglers home: Ideas Easy enough: Just send a message for each new value to its home (it is easy to calculate home) Optimize: don’t send message to yourself Optimize?: combine messages going to the same processor? Rare so we will ignore for now The problem? How do we know when we are done So, each chare can start the next phase of computation, for example Idea 1: send a message to every chare, even if its empty Idea 2: Do a reduction on a vector v of size P, v[i]: count of messages going to chare indexed i with a broadcast callback, so every chare knows the result Now everyone knows how many messages to expect Both are expensive (remember: only a few stragglers generated) Laxmikant Kalé and PPL (UIUC) – Parallel Migratable Objects

Quiescence Detection In Charm++, quiescence is defined as the state in which no processor is executing any entry method, no messages are awaiting processing, and there are no messages in-flight Charm++ provides a method to detect quiescence: From any chare, invoke CkStartQD(callback); The system is always doing the background bookkeeping so it can detect quiescence The call just starts a low-overhead distributed algorithm to detect the condition It runs concurrently with your application So, call CkStartQD as late as possible to reduce the overhead Laxmikant Kalé and PPL (UIUC) – Parallel Migratable Objects

Quiescence Detection applied to stragglers For our problem, we can have one of the chares (say with index 0) call CkStartQD after it has done its sending With a callback that broadcasts to every chare in the array that quiescence has been attained This means all the stragglers have reached their home Laxmikant Kalé and PPL (UIUC) – Parallel Migratable Objects

Histogram sort Idea: extend the median finding algorithm If we have P chares, we need to find P-1 separator keys I.e. values that act as (or define) boundaries between chares Such that everyone has an (approximately) equal number of values We can make a guess (called a probe) Collect a histogram (counts) Correct the guesses and prepeat When adequate separators are identified: Everyone sends the data to where it belongs Use quiescence detection to make sure all the data is received Sort locally Laxmikant Kalé and PPL (UIUC) – Parallel Migratable Objects

Histogram sort: interesting optimizations Some optimizations to this algorithm exploit Charm++’s message driven execution E.g. Some chares’ separators may be found early on: Everyone can start sending their values in parallel with histogramming for other separators Histogramming and initial local sorting may be overlapped Histogram may be decomposed into multiple portions So that it can be pipelined While root is preparing the next guess for one ection, the other section is doing it distributed histogramming Laxmikant Kalé and PPL (UIUC) – Parallel Migratable Objects