Thinking in Parallel – Domain Decomposition New Mexico Supercomputing Challenge in partnership with Intel Corp. and NM EPSCoR
Copyrights and Acknowledgments Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. New Mexico EPSCoR Program is funded in part by the National Science Foundation award # and the State of New Mexico. Any opinions, findings, conclusions, or recommendations expressed in the material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. For questions about the Supercomputing Challenge, a 501(c)3 organization, contact us at: challenge.nm.org
Agenda Definition review of domain decomposition Steps involved in domain decomposition Conceptual example of domain decomposition Hands-on domain decomposition parallel activity
Methodology Domain decomposition – Used when the same operation is performed on a large number of similar data items. Task decomposition – Used when some different operations can be performed at the same time. Pipelining – Used when there are sequential operations on a large amount of data.
Domain Decomposition Decide how data items should be divided into groups. Assign each group to a processor. Determine how each processor will report results from its data group, and how those results will be combined.
Domain Decomposition Example Problem: Find the largest element of an array
Domain Decomposition Example Divide the data into groups
Domain Decomposition Example Assign each group to a processor
Domain Decomposition Example Each processor will send answer to central
Domain Decomposition Example Scan 1 st data element and record value
Domain Decomposition Example Scan 2 nd data element and record if higher
Domain Decomposition Example Scan 3 rd data element and record if higher
Domain Decomposition Example Scan 4 th data element and record if higher
Domain Decomposition Example Scan 5 th data element and record if higher
Domain Decomposition Example Scan 6 th data element and record if higher
Domain Decomposition Example CPU sends highest value to central
Domain Decomposition Example CPU sends highest value to central which keeps highest
Domain Decomposition Example CPU sends highest value to central which keeps highest
Domain Decomposition Example CPU sends highest value to central which keeps highest
Activity 1 – Addition with Race Conditions Teams will explore how to use domain decomposition to accomplish the job of adding a set of numbers together. Through this activity, team members will see what can happen when changes are made to shared memory in parallel without regard for possible conflicts. * See handout for detailed instructions
Activity 1 – Setup 1.Divide the numbered cards and 16 Shared Sum cards equally (or roughly so) among the 4 or 5 team members (processors). For example, with 4 processors, each processor will have 4 numbered cards and 4 Shared Sum cards. (The numbered cards should be face down in a pile.) 2.Give a pencil to each processor. 3.Write the number 0 on one of the Shared Sum cards, and place it in the middle of the table, visible to and within reach of all of the processors.
Activity 1 – Execution 1.After all of the processors are given the instructions, start timer. 2.Each processor should perform these steps as quickly as possible: a.Read the value of the current Shared Sum in the middle of the table. b.Turn over the numbered card on the top of the pile. c.Add the value shown on the numbered card to the value read from the Shared Sum card in step a, and write the result on a new Shared Sum card. d.Place the new Shared Sum card on the top of the pile forming in the middle of the table. (Don’t worry if others have placed new Shared Sum cards on the pile while you were processing.) e.Repeat steps a - d until you are out of index cards. 3.Stop timer.
Activity 1 – Debrief Is the Shared Sum (i.e. the value on the top card of the Shared Sum pile) equal to 46? If so, discuss reasons why. If not, discuss reasons why.
Activity 2 – Addition with a Critical Section Teams will explore how to improve accuracy by protecting the shared memory, so that one processor’s changes don’t overwrite those of other processors.
Activity 2 – Setup 1.Divide the numbered cards equally (or roughly so) among the 4 or 5 team members (processors). 2.Write the number 0 on a Shared Sum card, and place it in the middle of the table, visible to and within reach of all of the processors. 3.Place the marker next to the Shared Sum card; this will be the only writing instrument used in this activity.
Activity 2 – Execution 1.After all of the processors are given the instructions, start timer. 2.Each processor should perform these steps as quickly as possible: a.Pick up marker. (Obviously, only one processor can do this at a time.) b.Read the current value of the Shared Sum in the middle of the table. c.Turn over the numbered card on the top of your pile. d.Add the value shown on the numbered card to the value read from the Shared Sum card in step b, and write the result on the Shared Sum card, crossing out the previous value. e.Return the marker to the center of the table. f.Repeat steps a - e until all processors are out of index cards. 3.Stop timer.
Activity 2 – Debrief 1.Is the Shared Sum equal to 46? a.If so, discuss possible reasons for the improved accuracy. These could include: b.If not, discuss possible reasons. 2.Compare times required for processing in activities 1 and 2. Discuss reasons for any difference observed.
Activity 3 – Addition with Reduction Teams will explore how to improve the efficiency of addition by using subtotals.
Activity 3 – Setup 1.Divide the numbered cards equally (or roughly so) among the 4 or 5 team members (processors). 2.Give each processor a Local Memory card, with the number 0 written on it. 3.Give each processor a pencil. 4.Designate one processor as the Master Processor; give a Shared Sum card with the number 0 to this processor.
Activity 3 – Execution 1.After all of the processors are given the instructions, start timer. 2.Each processor should perform these steps as quickly as possible: a.Read the current value of the Local Memory card. b.Turn over the numbered card on the top of your pile. c.Add the value shown on the numbered card to the value read from the Local Memory card in step 1, and write the new result on the Local Memory card, crossing out the previous value. d.Repeat steps a - c until out of numbered cards. e.Hand Local Memory card to Master Processor. 3.Master Processor adds the values of the Local Memory cards to the Shared Sum card. 4.Stop timer.
Activity 3 – Debrief 1.Is the Shared Sum equal to 46? a.If so, discuss possible reasons. b.If not, discuss possible reasons. 2.Compare times required for processing in activities 2 and 3. Discuss reasons for any difference observed.
Activities Review 1.Describe a race condition. 2.What does a critical section do? 3.What is reduction?