Lecture 6 Objectives Communication Complexity Analysis Collective Operations –Reduction –Binomial Trees –Gather and Scatter Operations Review Communication.

Lecture 6 Objectives Communication Complexity Analysis Collective Operations –Reduction –Binomial Trees –Gather and Scatter Operations Review Communication Analysis of Floyd’s Algorithm

Parallel Reduction Evolution

Binomial Trees Subgraph of hypercube

Finding Global Sum 4207 -35-6-3 8123 -446

Finding Global Sum 17-64 4582

Finding Global Sum 8-2 910

Finding Global Sum 178

Finding Global Sum 25 Binomial Tree

Agglomeration

Gather

All-gather

Complete Graph for All-gather

Hypercube for All-gather

Analysis of Communication Lambda is latency = message delay = overhead to send 1 message Beta is bandwidth = number of data items per unit time = bytes per message Sending a message with n data items costs

Communication Time for All-Gather Hypercube Complete graph

Adding Data Input

Scatter

Scatter in log p Steps 12345678 567812345612 7834

Communication Time for Scatter Hypercube Complete graph

Recall Parallel Floyd’s Computational Complexity Innermost loop has complexity  (n) Middle loop executed at most  n/p  times Outer loop executed n times Overall computation complexity  (n 3 /p)

Floyd’s Communication Complexity No communication in inner loop No communication in middle loop Broadcast in outer loop — complexity is Executed n times

Execution Time Expression (1) Iterations of outer loop Iterations of middle loop Cell update time Iterations of outer loop Messages per broadcast Message-passing time bytes/msg Iterations of inner loop

Accounting for Computation/communication Overlap Note that after the 1 st broadcast all the wait times overlap the computation time of Process 0.

Execution Time Expression (2) Iterations of outer loop Iterations of middle loop Cell update time Iterations of outer loop Messages per broadcast Message-passing time Iterations of inner loop Message transmission

Predicted vs. Actual Performance Execution Time (sec) ProcessesPredictedActual 125.54 213.0213.89 39.019.60 46.897.29 55.865.99 65.015.16 74.404.50 83.943.98 X=25.5 nsec L = 250 usecs B = 10MB/sec N = 1000

Summary Two matrix decompositions –Rowwise block striped –Columnwise block striped Blocking send/receive functions –MPI_Send –MPI_Recv Overlapping communications with computations

Lecture 6 Objectives Communication Complexity Analysis Collective Operations –Reduction –Binomial Trees –Gather and Scatter Operations Review Communication.

Similar presentations

Presentation on theme: "Lecture 6 Objectives Communication Complexity Analysis Collective Operations –Reduction –Binomial Trees –Gather and Scatter Operations Review Communication."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Lecture 6 Objectives Communication Complexity Analysis Collective Operations –Reduction –Binomial Trees –Gather and Scatter Operations Review Communication.

Similar presentations

Presentation on theme: "Lecture 6 Objectives Communication Complexity Analysis Collective Operations –Reduction –Binomial Trees –Gather and Scatter Operations Review Communication."— Presentation transcript:

Similar presentations

About project

Feedback