Parasol LaboratoryTexas A&M University IPDPS 20021 The R-LRPD Test: Speculative Parallelization of Partially Parallel Loops Francis Dang, Hao Yu, and Lawrence.

Parasol LaboratoryTexas A&M University IPDPS 20021 The R-LRPD Test: Speculative Parallelization of Partially Parallel Loops Francis Dang, Hao Yu, and Lawrence Rauchwerger Department of Computer Science Texas A&M University

Parasol LaboratoryTexas A&M University IPDPS 20022 Motivation  To maximize performance, extract the maximum available parallelism from loops.  Static compiler methods may be insufficient. –Access patterns may be too complex. –Required information is only available at runtime.  Run-time methods needed to extract loop parallelism –Inspector/Executor –Speculative Parallelization

Parasol LaboratoryTexas A&M University IPDPS 20023 Speculative Parallelization: LRPD Test  Main Idea –Execute a loop as a DOALL. –Record memory references during execution. –Check for data dependences. –If there was a dependence, re-execute the loop sequentially.  Disadvantages –One data dependence can invalidate speculative parallelization. –Slowdown is proportional to speculative parallel execution time. –Partial parallelism is not exploited.

Parasol LaboratoryTexas A&M University IPDPS 20024 Partially Parallel Loop Example do i = 1, 8 z = A[K[i]] A[L[i]] = z + C[i] end do K[1:8] = [1,2,3,1,4,2,1,1] L[1:8] = [4,5,5,4,3,5,3,3] iter12345678 A() 1RRRR 2RR 3RWWW 4WWR 5WWW

Parasol LaboratoryTexas A&M University IPDPS 20025 The Recursive LRPD  Main Idea –Transform a partially parallel loop into a sequence of fully parallel, block-scheduled loops. –Iterations before the first data dependence are correct and committed. –Re-apply the LRPD test on the remaining iterations.  Worst case –Sequential time plus testing overhead

Parasol LaboratoryTexas A&M University IPDPS 20026 Algorithm success Initialize Commit Analyze Execute as DOALL Checkpoint if failure Reinitialize Restore Restart

Parasol LaboratoryTexas A&M University IPDPS 20027 Implementation  Implemented in run-time pass in Polaris and additional hand-inserted code. –Privatization with copy-in/copy-out for arrays under test. –Replicated buffers for reductions. –Backup arrays for checkpointing.

Parasol LaboratoryTexas A&M University IPDPS 20028 Recursive LRPD Example do i = 1, 8 z = A[K[i]] A[L[i]] = z + C[i] end do K[1:8] = [1,2,3,1,4,2,1,1] L[1:8] = [4,5,5,4,2,5,3,3] 7-85-63-41-2iter WWW5 RWW4 WR3 WR2 RRR1 A() P4P3P2P1proc First Stage 7-85-6iter W5 R4 W3 W2 R1 A() P4P3P2P1proc Second Stage

Parasol LaboratoryTexas A&M University IPDPS 20029 Heuristics  Work Redistribution  Sliding Window Approach  Data Dependence Graph Extraction

Parasol LaboratoryTexas A&M University IPDPS 200210 Work Redistribution  Redistribute remaining iterations across processors.  Execution time for each stage will decrease.  Disadvantages: –May uncover new dependences across processors. –May incur remote cache misses from data redistribution. p1p2p3 p4 1 st stage After 1 st stage 2 nd stage After 2 nd stage With Redistribution p1p2p3 p4 1 st stage After 1 st stage 2 nd stage After 2 nd stage Without Redistribution

Parasol LaboratoryTexas A&M University IPDPS 200211 Work Redistribution Example do i = 1, 8 z = A[K[i]] A[L[i]] = z + C[i] end do K[1:8] = [1,2,3,1,4,2,1,1] L[1:8] = [4,5,5,4,2,5,3,3] 7-85-63-41-2iter WWW5 RWW4 WR3 WR2 RRR1 A() P4P3P2P1proc First StageSecond Stage 8765iter W5 R4 WW3 RW2 RR1 A() P4P3P2P1proc Third Stage 876iter W5 4 WW3 R2 RR1 A() P4P3P2P1proc

Parasol LaboratoryTexas A&M University IPDPS 200212 Redistribution Model  Redistribution may not always be beneficial.  Stop redistribution if: –The cost of data redistribution outweighs the benefit from work redistribution.  Synthetic loop to model this adaptive method.

Parasol LaboratoryTexas A&M University IPDPS 200213 Redistribution Model

Parasol LaboratoryTexas A&M University IPDPS 200214 Sliding Window R-LRPD  R-LRPD can generate a sequential schedule for long dependence distributions.  Strip-mine the speculative execution.  Apply the R-LRPD on a contiguous block of iterations.  Only dependences within the window cause failures.  Adds more global synchronizations and test overhead. After 1 st stage After 2 nd stage p1 1 st stage p2 p1 2 nd stage

Parasol LaboratoryTexas A&M University IPDPS 200215 DDG Extraction  R-LRPD can generate sequential schedules for complex dependence distributions.  Use the SW R-LRPD scheme to extract the data dependence graph (DDG).  Generate an optimized schedule from the DDG.  Obtains the DDG for loops from which a proper inspector cannot be extracted. p1 1 st stage p2 p1 2 nd stage After 1 st stage 13 After 2 nd stage 25 34

Parasol LaboratoryTexas A&M University IPDPS 200216 Performance Issues  Performance issues: –Blocked scheduling – potential cause for load imbalance. –Checkpointing can be expensive.  Feedback guided blocked scheduling –Use the timing information from the previous instantiation (Bull, EuroPar 98) –Estimate the processor chunk sizes for minimal load imbalance.  On-Demand Checkpointing –Checkpoint only data modified during execution.

Parasol LaboratoryTexas A&M University IPDPS 200217 Experiments  Setup: –16 processor HP V-Class –4 GB memory –HP-UX 11.0 DCDCMP_do15 DCDCMP_do70 BJT SPICE 2G6 Quadrilateral LoopFMA3D NLFILT_do300 EXTEND_do400 FPTRAK_do300 TRACK  Codes and Loops:

Parasol LaboratoryTexas A&M University IPDPS 200218 Experimental Results – Input Profiles

Parasol LaboratoryTexas A&M University IPDPS 200219 Experimental Results - TRACK

Parasol LaboratoryTexas A&M University IPDPS 200223 Experimental Results – Sliding Window

Parasol LaboratoryTexas A&M University IPDPS 200224 Experimental Results – Sliding Window

Parasol LaboratoryTexas A&M University IPDPS 200225 Experimental Results – FMA3D

Parasol LaboratoryTexas A&M University IPDPS 200226 Experimental Results – SPICE 2G6

Parasol LaboratoryTexas A&M University IPDPS 200227 Conclusion  Contribution: –Can speculatively parallelize any loop. –Concern is now optimizing the parallelization and not when to parallelize.  Future work: –Use dependence distribution information for adaptive redistribution and scheduling.

Parasol LaboratoryTexas A&M University IPDPS 20021 The R-LRPD Test: Speculative Parallelization of Partially Parallel Loops Francis Dang, Hao Yu, and Lawrence.

Similar presentations

Presentation on theme: "Parasol LaboratoryTexas A&M University IPDPS 20021 The R-LRPD Test: Speculative Parallelization of Partially Parallel Loops Francis Dang, Hao Yu, and Lawrence."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Parasol LaboratoryTexas A&M University IPDPS 20021 The R-LRPD Test: Speculative Parallelization of Partially Parallel Loops Francis Dang, Hao Yu, and Lawrence.

Similar presentations

Presentation on theme: "Parasol LaboratoryTexas A&M University IPDPS 20021 The R-LRPD Test: Speculative Parallelization of Partially Parallel Loops Francis Dang, Hao Yu, and Lawrence."— Presentation transcript:

Similar presentations

About project

Feedback