Department of Computer Science, Johns Hopkins University Lecture 7 Finding Concurrency EN 600.320/420 Instructor: Randal Burns 26 February 2014.

Department of Computer Science, Johns Hopkins University Lecture 7 Finding Concurrency EN 600.320/420 Instructor: Randal Burns 26 February 2014

Lecture 7: Finding Concurrency Big Picture Recipes (patterns) for turning a problem into a parallel program in four steps – Find concurrency – Choose algorithmic structure – Identify data structures – Implement Based on an analysis of problem domain – And comparison of the effectiveness of different patterns

Lecture 7: Finding Concurrency First Questions Should I solve this problem with a parallel program? – Implementation time versus solution time – Does the problem parallelize Identify computationally expensive parts – Parallelism is for a reason: to improve performance – Only optimize the expensive parts

Lecture 7: Finding Concurrency Three Steps to Finding Concurrency Consider all options at each stage And iterate among steps

Lecture 7: Finding Concurrency What’s a good decomposition? Flexible – Independent of architecture, programming language, problem size, etc. Efficient – Scalable: generates enough tasks to keep PEs busy – Each tasks represents enough works to amortize startup – Tasks are minimally dependent (often competes w/ scalable) Simple – Can be implemented, debugged, maintained, and is resuable

Lecture 7: Finding Concurrency Task Decomposition Divide problem into groups of operations/instructions – Most natural/intuitive approach Identify tasks – Tasks should be independent or nearly independent – Recall, tasks are groups of operations/instructions – Consider all tasks sequentially Idioms for identifying tasks – Loops – Functions (with no side effects) = functional decomposition – Higher-level concepts (not software-derived), e.g. trajectories in medical imaging

Lecture 7: Finding Concurrency Data Decomposition Divide problem based on partitioning, distributing or replicating the data Works well when: – Compute intensive work manipulates a large data structure – Similar operations to different parts of data (SPMD) Idioms for data decomposition – Sequential (arrays) or spatial division – Recursive division of data – Clusters

Lecture 7: Finding Concurrency Tasks or Data or Both Decompositions are not independent – Task decomposition derives a data decomposition – Data decomposition implies a task decomposition Iteration leads to hybrid designs – Not purely either – For embarassingly parallel problems, task and data decomposition are identical. Why?

Lecture 7: Finding Concurrency Example Problem Streaming surface reconstruction Tasks = solve poisson equation in each cell

Lecture 7: Finding Concurrency Example Problem: Tasks Multiple streaming passes – Could decompose by pass – Limited parallelism and sequential data dependencies

Lecture 7: Finding Concurrency Example Problem: Tasks II Iterations of the solver – Same problems

Lecture 7: Finding Concurrency Example Problem: Data Quad tree

Lecture 7: Finding Concurrency Example Problem: Decomposition Data decomposition; hierarchy of streams – Replicate highest level streams – Paration lower level streams

Lecture 7: Finding Concurrency Example Problem: Comments End up with tasks defined by data – Update solution in each partition Mutliple parallel programs – For each phase in the processing pipeline

Lecture 7: Finding Concurrency Dependency Analysis Help! I decomposed my problem and the tasks are not independent. How does decomposed data depend on each other? How do tasks depend on each other?

Lecture 7: Finding Concurrency Dependency Analysis Help! I decomposed my problem and the tasks are not independent. How does decomposed data depend on each other? Data are used by multiple tasks – Replication (overlap) – Read/write dependencies How do tasks depend on each other? Share data – Sequential/ordering constraints

Lecture 7: Finding Concurrency Group Tasks For complex problems, not all tasks are the same Natural groups – Share a temporal constraint (satisfy constraint once) – Share a data dependency (meet depedency once) – Idependent tasks non-intuitive, but this is a group shares no dependencies and allows for maximum concurrency Grouping results in: – Simplified dependency analysis – Identification of concurrency

Lecture 7: Finding Concurrency Order Tasks Find and account for dependencies – Temporal – Concurrent – Independence Build an execution graph Design principles – Must meet all constraints (correct) – Minimally (to not interfere with concurrency) Example: merge sort (recursive parallelism)

Lecture 7: Finding Concurrency Merge Sort http://www.toves.org/books/distalg/#4.2 http://blogs.msdn.com/b/pfxteam/archive/20 11/06/07/10171827.aspx Dependencies (L) and parallelism (R)

Lecture 7: Finding Concurrency Data Sharing What group and order are to task decomposition, data sharing is to data decomposition Several types of shared data – Local data partitioned to tasks (no dependencies) – Local data transferred from task to task (associated with dependencies) – Global read-only data (can be replicated, no dependencies) – Global shared data structure Map data sharing dependencies onto group/order Principles: – Minimize data sharing associated with dependencies – Think about sharing frequency and granularity

Lecture 7: Finding Concurrency Example Problem Dependency analysis: all neighbors in the tree at all levels in the hierarchy! – Replicate or share the root partition Group tasks – By data partitions – Into pipeline phases Order tasks: in sweep order? top to bottom? – We relaxed these constraints – But, ordered by iteration

Lecture 7: Finding Concurrency Example Problem Data sharing – Neighbors across each partition boundary (read/write) – Replicated root partition (read/write) – Replicated quad-tree structure (read-only) Separate replicated data by access type

Lecture 7: Finding Concurrency Design Evaluation Revisiting flexibility, efficiency and simplicity Suitability for the target platform? – #PEs available and #UEs produced by problem design – How are data structures share is granularity suitable for target architectures? (cache alignment, #messages, message size) How much concurrency did the design produce? – Compare useful work to interference/synchronization Can the design be parameterized? – To different problem sizes – To different numbers of processors Is the design architecture independent? – Rarely does one answer yes

Lecture 7: Finding Concurrency Design Evaluation II More specific questions: – Does the task decomposition and ordering allow for load balancing? Or too many temporal constraints? – Are the tasks regular? Or heterogeneous in size? – Can the tasks run asynchronously? (How many barriers?) – Does the decomposition allow for overlapped computation with communication and I/O? And then redo the whole thing

Department of Computer Science, Johns Hopkins University Lecture 7 Finding Concurrency EN 600.320/420 Instructor: Randal Burns 26 February 2014.

Similar presentations

Presentation on theme: "Department of Computer Science, Johns Hopkins University Lecture 7 Finding Concurrency EN 600.320/420 Instructor: Randal Burns 26 February 2014."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Department of Computer Science, Johns Hopkins University Lecture 7 Finding Concurrency EN 600.320/420 Instructor: Randal Burns 26 February 2014.

Similar presentations

Presentation on theme: "Department of Computer Science, Johns Hopkins University Lecture 7 Finding Concurrency EN 600.320/420 Instructor: Randal Burns 26 February 2014."— Presentation transcript:

Similar presentations

About project

Feedback