Coarse Grained Parallel Selection

Coarse Grained Parallel Selection
Laurence Boxer Department of Computer and Information Sciences Niagara University Department of Computer Science and Engineering SUNY at Buffalo Presentation at University of District of Columbia, November, 2018

Presentation at University of District of Columbia, November, 2018
Selection Problem Given an array 𝑥 1,⋯,𝑛 of real numbers and an index 𝑘, 1 ≤𝑘≤𝑛, find an index 𝑗 such that 𝑥 𝑗 is the 𝑘-th smallest entry of 𝑥; more generally, find record in list of records that has 𝑘-th smallest key value. So what does this generalize? for 𝑘=1 a solution to the Selection Problem computes the minimum for 𝑘=𝑛 a solution to the Selection Problem computes the maximum. for 𝑘= 𝑛 2 or 𝑘= 𝑛 2 , problem is to find median. Naïve but non-optimal solution: Sort the array in ascending order. The entry at index 𝑘 is now the desired value. Presentation at University of District of Columbia, November, 2018

Principles of Analysis of Algorithms* Ignore machine-dependent constants. There is no concern as to how fast an individual processor executes a machine instruction, as this says nothing about the quality of the algorithm. Look at growth of resources as 𝑛→∞. The interest is in 𝑇(𝑛), the running time of an algorithm for large 𝑛, where 𝑛 is typically the size of the data input to the algorithm. The following are ignored when expressing asymptotic analysis. low-order terms multiplicative constant factors Example: the function 3 𝑛 𝑛 2 +𝑛+17 is said to grow as 𝑛 3 . That is, for large values of 𝑛, the quadratic, linear, and constant terms, respectively, 10 𝑛 2 , 𝑛, and 17, are insignificant compared with the cubic term when considering growth rate of the function. * This slide copied from PowerPoints for Miller and Boxer, 2013 Presentation at University of District of Columbia, November, 2018

Technical details Let 𝑓 𝑛 , 𝑔(𝑛) be positive-valued functions of an integer variable (in the back of your mind, think of them as measures of algorithms’ running times for a problem of size 𝑛). 𝑓 𝑛 =𝑂 𝑔 𝑛 if there exist 𝑐>0, 𝑛 0 >0 such that 𝑛 ≥ 𝑛 0 implies 𝑓 𝑛 ≤𝑐𝑔(𝑛). 𝑓 𝑛 =𝜃 𝑔 𝑛 if there exist 𝑐 2 > 𝑐 1 >0, 𝑛 0 >0 such that 𝑛 ≥ 𝑛 0 implies 𝑐 1 𝑔(𝑛)≤𝑓 𝑛 ≤ 𝑐 2 𝑔(𝑛). Presentation at University of District of Columbia, November, 2018

Example for uninitiated: 1,000,000 𝑛+1000=𝑂 𝑛 2 How do we know this? (I.e., proof:) Well, for 𝑛 ≥1,000,000 (= 𝑛 0 ) we have 1,000,000≤𝑛 (1) and we multiply both sides of (1) by 𝑛 to get 1,000,000𝑛≤ 𝑛 (2) and 1,000<1,000,000 ≤𝑛=𝑛×1<𝑛×𝑛= 𝑛 (3) so by adding (2) and (3) we get 1,000,000𝑛 + 1,000< 𝑛 2 + 𝑛 2 =2 𝑛 2 (take 𝑐=2) Presentation at University of District of Columbia, November, 2018

Naïve non-optimality, sequential version
log_2 n n log_2 n 1,024 10 10,240 2,048 11 22,528 4,096 12 49,152 8,192 13 106,496 16,384 14 229,376 32,768 15 491,520 65,536 16 1,048,576 131,072 17 2,228,224 262,144 18 4,718,592 524,288 19 9,961,472 The minimum and maximum problems have sequential solutions based on semigroup (simple scan of the data) operations that run in 𝜃(𝑛) time. Sorting takes 𝜃(𝑛 log 𝑛 ) time. Therefore, the naïve algorithm takes 𝜃(𝑛 log 𝑛 ) time. Since Selection generalizes Minimum and Maximum, we want Selection to be solved in same asymptotic time, 𝜃(𝑛), as Minimum and Maximum. A 𝜃(𝑛) time sequential solution exists: [Blum et al.; Miller & Boxer] Presentation at University of District of Columbia, November, 2018

Application example – awarding merit scholarships
E.g., top 5% of applicants in a scholarship program are to be awarded. Determine awardees without taking the time to sort scores. Solution, analyzed sequentially: Determine 𝑘= 100% −5% 𝑛=95%𝑛 (𝑛 = # applicants) - 𝜃(1) time Use Selection algorithm to find criterion score, 𝑘 – th lowest - 𝜃(𝑛) time Scan scores to determine awardees, those with score ≥𝑘 – th lowest - 𝜃(𝑛) time Total sequential time: 𝜃(𝑛) This is asymptotically optimal, since you must consider all 𝑛 applicants – why? Else, could overlook one who should be a winner (whose record should be part of the output). If desired, then sort awardees alphabetically by names or numerically by scores – but now only sorting 5% of original data. Though more steps described above than alternate of sorting & taking 5% at high end of sorted list, running time - 𝜃 𝑛 - is asymptotically faster than 𝜃(𝑛 log 𝑛 ) for sorting. . Presentation at University of District of Columbia, November, 2018

Coarse grained parallel multicomputer 𝐶𝐺𝑀(𝑛,𝑝)
Much early research in parallel computing was fine grained – assume lots of processors, typically 𝑛 when processing 𝑛 data. This makes theory easier, but is generally wildly impractical. 𝐶𝐺𝑀(𝑛,𝑝) has 𝑝 processors of approximately same capabilities (speed, memory) for processing 𝑛 data, where 𝑝≪𝑛. Typically, assume 𝑝 2 ≤𝑛 or 𝑝 2 log 2 𝑝 ≤𝑛 ; also, Ω 𝑛 𝑝 memory per processor More realistic, though theory is often more challenging than for fine grained parallelism. Most modern computers are coarse grained parallel, i.e., more than 1 but not many processors. Ideal: Solution times satisfy 𝑇 𝑝𝑎𝑟 𝑛,𝑝 =θ 𝑇 𝑠𝑒𝑞 (𝑛)/𝑝 Ideal not always achievable, since even if problem is parallelizable, e.g., communications between processors take time. Note to students – this hints at challenges of parallel computing, and opportunities. Presentation at University of District of Columbia, November, 2018

Saukas-Song coarse grained parallel Selection algorithm
Published 1999 Restriction on number of processors: 𝑝 2 log 2 𝑝 ≤𝑛 Paper analyzed algorithm in terms of local computation (processors compute in parallel, independent of each other) and “communication rounds” (processors communicate data with each other) Local computing time: 𝜃 𝑛 𝑝 Communications rounds: 𝑂 log 𝑝 In 1999, asymptotic time of communications hadn’t been studied. Now, can evaluate asymptotic time of communications [Boxer&Miller 2004] Algorithm runs in 𝜃 𝑛 log 𝑝 𝑝 time – efficient but not ideal Presentation at University of District of Columbia, November, 2018

Boxer coarse grained parallel Selection algorithm – assuming data uniformly distributed on an interval, WLOG, 0,1 Guess a small interval 𝑈,𝑉 containing the expected value of the 𝑘-th smallest entry of the list, 𝐸 𝑋 𝑘 = 𝑘 𝑛+1 , in 𝜃(1) time. In parallel, each processor scan its data values; accumulate counts 𝑆 of those 𝑥 𝑖 <𝑈 and 𝑀 of those such that 𝑈 ≤ 𝑥 𝑖 ≤𝑉, keeping track of the set 𝑀′ satisfying the latter inequalities. With high probability, i.e., Pr → 𝑛→∞ 1, we have 𝑆<𝑘≤𝑀≤𝑛/𝑝. (Proofs of the probability limits from Chebyshev’s inequality). Time: 𝜃(𝑛/𝑝) If these are realized, then Gather 𝑀′ to one processor - 𝑂 𝑛/𝑝 time (Boxer&Miller, 2004) Use sequential selection to find 𝑘−𝑆 -th smallest member of 𝑀′ - 𝑂 𝑛/𝑝 time Thus, solution finishes in 𝜃 𝑛/𝑝 time. Else Apply Saukas-Song algorithm to solve in worst-case 𝜃 𝑛 log 𝑝 𝑝 time Probabilities computed for previous steps enable us to compute expected running time = 𝜃 𝑛/𝑝 ∗𝜃(1) + 𝜃 𝑛 log 𝑝 𝑝 ∗𝑂 1 log 𝑝 = 𝜃 𝑛/𝑝 . Presentation at University of District of Columbia, November, 2018

References L. Boxer, Coarse Grained Parallel Selection, submitted. Available at L. Boxer and R. Miller, Coarse grained gather and scatter operations with applications, Journal of Parallel and Distributed Computing 64 (11) (2004), M. Blum, R.W. Floyd, V. Pratt, R. L. Rivest, and R. E. Tarjan, Bounds for selection, Journal of Computer and System Sciences 7 (1973), R. Miller and L. Boxer, Algorithms Sequential and Parallel, A Unified Approach, 3rd ed., Cengage Learning, Boston, 2013. E.L.G. Saukas and S.W. Song, A note on parallel selection on coarse-grained multicomputers, Algorithmica 24 (1999), Presentation at University of District of Columbia, November, 2018

Coarse Grained Parallel Selection

Similar presentations

Presentation on theme: "Coarse Grained Parallel Selection"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Coarse Grained Parallel Selection

Similar presentations

Presentation on theme: "Coarse Grained Parallel Selection"— Presentation transcript:

Similar presentations

About project

Feedback