Presentation is loading. Please wait.

Presentation is loading. Please wait.

Performance Evaluation of the Parallel Fast Multipole Algorithm Using the Optimal Effectiveness Metric Ioana Banicescu and Mark Bilderback Department of.

Similar presentations


Presentation on theme: "Performance Evaluation of the Parallel Fast Multipole Algorithm Using the Optimal Effectiveness Metric Ioana Banicescu and Mark Bilderback Department of."— Presentation transcript:

1 Performance Evaluation of the Parallel Fast Multipole Algorithm Using the Optimal Effectiveness Metric Ioana Banicescu and Mark Bilderback Department of Computer Science NSF/ERC for Computational Field Simulation Mississippi State University

2 Overview Scientific Applications Performance Evaluation
Scalability Analysis Optimal Effectiveness Metric Parallel Fast Mutipole Algorithm Experimental Results Conclusions and Future Work

3 Scientific Applications
Large, computationally intensive, irregular Parallel Implementation (various algorithms) Performance degradation factors Communication and load imbalance architecture independent architecture dependent

4 Architecture Independent Factors
Problem characteristics nonuniformity of input data Algorithmic serial section communication patterns local / non-local dependencies

5 Architecture Dependent Factors
Architectural charateristics Language, OS Interconnected Network Characteristics of each component processor speed, memory, etc.

6 Performance Evaluation
Parallel Applications Scalability algorithm, architecture, mapping Evaluation Isolated to particular applications Different types of performance metrics Performance metric characteristics Relevant, consistent, quantitative, predictive

7 Performance Metrics Commonly used (time, speedup, efficiency, cost)
Speedup [Amdahl ‘67] Scaled Speedup [Gustafson ‘88] Fixed time size-up [Sun and Gustafson ‘91] Isoefficiency [Gupta & Kumar ‘93] Optimal effectiveness [Luke, Banicescu, Li ‘98]

8 Isoefficiency Algorithms that can add processors at faster rate are able to achieve higher performance. Does not identify the number of processors required before an algorithm becomes an effective option. It discounts valuable parallel algorithms for which an isoefficiency does not exists.

9 Performance - Cost Tradeoffs
High performance application seek performance-cost balance. Scalability analysis - theoretical, experimental. Optimal effectiveness [Luke, Banicecsu, Li ‘98] Similar to (E*S)max [Tang, Li ‘90] Asymptotic relationship between isoefficiency and (E*S)max

10 Optimal Effectiveness
Cost Effectiveness: Optimal Effectiveness:

11 Optimal Effectiveness (contd.)
Compare the performance of different parallel algorithms. Identify specific conditions of problem size and number of processors that characterize crossover points and intervals where one algorithm becomes more cost effective than another. Prescribe the number of processors that are relevant to particular problem size: Popt.

12 The N-body Problem Resulting force Problem: Simulate the evolution of N particles over time (given initial positions and velocities) Compute new positions and velocities of the N particles after one time step Applications: astrophysics, molecular dynamics Naive algorithm: O(N2)

13 Approximation Algorithms
O(N) [Appel85] O(NlogN) [Barnes-Hut86] O(N) Fast Multipole Algorithm (FMA) [Greengard87]a Particles interaction approximation within a specified accuracy (Zhao, Board, Pringle,..) O(N) Adaptive Fast Multipole Algorithm (AFMA) [Greengard87]b Singh et al., Nyland et al., etc

14 The Greengard Algorithm
Two traversals: upward downward 2D: Quad-tree 3D: Oct-tree

15 Traversing the Tree Upwards
Multipole expansion Computing combined field effects of particles in regions evaluation point well-separated equivalent particle group of particles

16 Traversing the Tree Downwards
Higher level Lower level

17 Implementation 3D-PFMA, LB[Duke], Fractiling
KSR-1, IBM-SP2, SuperMSPARC Pthreads, MPI Uniform, Nonuniform (Gaussian, Corner) processors, 1k - 100k particles

18 3-d Cost: nonuniform (corner) (KSR1)
Lightly packed (50K6) Densely packed (50K5) Cost in seconds Number of processors LB better 4-16 proc

19 3-d Cost (IBM-SP2)

20 3-d Cost (SuperMSPARC)

21 Optimal Effectiveness (KSR-1)

22 Optimal Effectiveness (KSR-1)

23 Optimal Effectiveness (KSR-1)

24 Optimal Effectiveness (KSR-1)

25 Optimal Effectiveness (KSR-1)

26 Optimal Effectiveness (KSR-1)

27 Optimal Effectiveness (KSR-1)

28 Optimal Effectiveness (IBM-SP2)

29 Optimal Effectiveness (IBM-SP2)

30 Optimal Effectiveness (IBM-SP2)

31 Optimal Effectiveness (IBM-SP2)

32 Optimal Effectiveness (SuperMSPARC)

33 Optimal Effectiveness (SuperMSPARC)

34 Optimal Effectiveness (SuperMSPARC)

35 Optimal Effectiveness (SuperMSPARC)

36 Optimal Effectiveness (SuperMSPARC)

37 Optimal Effectiveness (SuperMSPARC)

38 Optimal Effectiveness (SuperMSPARC)

39 Optimal Effectiveness (SuperMSPARC)

40 Optimal Effectiveness (SuperMSPARC)

41 Optimal Effectiveness (SuperMSPARC)

42 Cost vs. Cost Effectiveness
10k nonunioform corner Fractiling cost < LB cost < PFMA cost (regardless of number of processors). The IDEAL number of processors to use for a cost effective execution is unknown. Allocate only Popt number of processors and leave the rest for other simultaneously executing applications.

43 Cost

44 Optimal Effectiveness

45 Conclusions Cost effectiveness analysis - novel approach.
Qualitative and quantitative characteristics. Optimal effectiveness derived from cost effectiveness curves. Measurement of Γopt give the exact number of processors relevant to particular problem size.

46 Conclutions (contd.) Cost effectiveness / Optimal effectiveness:
Quantifies specific conditions that make a particular algorithm optimal. Capability to compare any set of algorithms regardless of the existence of the isoefficiency. Γopt shows the point at which using one of the algorithm is more advantageous than using another.

47 Conclutions (contd.) Cost effectiveness / Optimal effectiveness:
Allows intelligent allocation of available processors to other applications. Improved throughput for the entire system. Captures the impact and tradeoff in complexity of the conditions that dictate performance.


Download ppt "Performance Evaluation of the Parallel Fast Multipole Algorithm Using the Optimal Effectiveness Metric Ioana Banicescu and Mark Bilderback Department of."

Similar presentations


Ads by Google