Download presentation
Presentation is loading. Please wait.
Published byFerdinand Spencer Modified over 9 years ago
1
Working Group on Methodology for Optimizing Multilevel Parallelism Fialho, Gimenez, Tallent, Welton, Morris, Malony, Montoya and Browne
2
Working Assumptions: “Optimal” Parallelism = Optimum Productivity Formulate performance optimization problem as find “optimal” parallelism Best possible balance of the several modes of parallelism: Intra-core Intra-chip Intra-node Inter-node Multiple interacting factors each with many options Intra-chip memory access Intra-node memory access Concurrency (threading, vectorization, acceleration) Internode communication Load Balance Optimization with consideration of interactions
3
Current Status of Tools Separate tools for optimizing each factor Separate tools for optimizing each mode of parallelism Several different tools for each factor or mode of parallelism are available Frameworks for integration of tools and/or creating “workflows” are available How do we determine appropriate and consistent workflows or framework instances from the tools?
4
Apply a Conceptual Process 1.Specify what is to be optimized 2.Specify the metrics needed to diagnosis the bottleneck and recommend the optimization 3.Define the algorithms for diagnosing bottlenecks and recommending optimizations in terms of the metrics 4.Determine the information needed to evaluate those metrics 5.Specify how to obtain the information. Generate a methodology (workflow) from the conceptual process
5
Two Cases Optimize” parallelism of application for given execution environment and input data set with only “local” restructuring Only “local” source code changes No algorithm changes Re-structure/re-engineer application to attain “optimal” parallelism on (possible) execution environments Componentize code Choose different algorithms Evaluate different component parts and optimize across “components” Workflows are different for each, but certainly overlap
6
Optimization Information Requirements Need to incorporate multiple types of information “Optimize” with only “local” modification Source code Execution environment Runtime behavior Optimize with restructuring Domain Algorithm Source code/Execution environment/runtime behavior
7
Conceptual Workflow Local (Inside out) Optimization Workflow Assumptions: application structure, execution environment and intial conditions/inputs are fixed 1.Insure load balance and choose optimal affinity mappings, etc. 2.Maximize Intra-node efficiency 1.Intra-core – Maximize vectorization and core-local memory access 2.Intra-chip – optimize chip-local memory access 3.Intra-node – minimize NUMA accesses 4.Intra-node – Choose optimal number of tasks/threads 3.Minimize internode communication cost 4.If nodes are at “roofline” for computation or memory bandwidth, then optimize internode communication 5.If nodes are not bottlenecked on either computation or memory bandwidth then reallocate data to minimize the number of nodes used 6.Go to step 2 and repeat
8
Questions for Further Discussion What is the model for restructuring applications to attain “optimal” parallelism? Can we construct “roofline” analytical models for factors such as vectorization, threading and communication? How can we combine software restructuring tools with performance optimization tools to get “optimal” restructuring workflow? Roles for offline and online optimization?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.