CS 8625 High Performance and Parallel, Dr. Hoganson Copyright © 2005, 2006 Dr. Ken Hoganson CS8625-June Class Will Start Momentarily… Homework & Midterm Review CS8625 High Performance and Parallel Computing Dr. Ken Hoganson
CS 8625 High Performance and Parallel, Dr. Hoganson Copyright © 2005, 2006 Dr. Ken Hoganson Balance Point The basis for the argument against “putting all your (speedup) eggs in one basket”: Amdahl’s Law Note the balance point in the denominator where both parts are equal. Increasing N (number of processors) beyond this point can at best halve the denominator, and double the speedup.
CS 8625 High Performance and Parallel, Dr. Hoganson Copyright © 2005, 2006 Dr. Ken Hoganson Balance Point Heuristic Increasing N (number of processors) beyond this point can at best halve the denominator, and double the speedup. Solved for N N= α α Solved for α α= N N + 1
CS 8625 High Performance and Parallel, Dr. Hoganson Copyright © 2005, 2006 Dr. Ken Hoganson Balance Point Example Parallel Fraction = 90% (10% in serial) NAlpha/N1-alphaSpeedup / /( ) = /( )= /( )= /( )= /( )= /( )= 8.77 infinity /( )= 10 Solved for N N= α α N=0.90/0.10=9, Sup=5
CS 8625 High Performance and Parallel, Dr. Hoganson Copyright © 2005, 2006 Dr. Ken Hoganson Example Example: Workload has an average alpha of 94%. How many processors can reasonably be applied to speedup this workload? Solved for N N= α α
CS 8625 High Performance and Parallel, Dr. Hoganson Copyright © 2005, 2006 Dr. Ken Hoganson Example Example: An architecture has 32 processors. What workload parallel fraction is the minimum need to make reasonably efficient use of the processors? Solved for α α= N N + 1
CS 8625 High Performance and Parallel, Dr. Hoganson Copyright © 2005, 2006 Dr. Ken Hoganson Multi-Bus Multiprocessors Shared-Memory Multiprocessors are very fast –Low latency to memory on bus –Low communication overhead through shared-memory Scalability problems –Length of bus slows signals (.75 SOL) –Contention for the bus reduces performance –Requires Cache to reduce contention CPU MEM
CS 8625 High Performance and Parallel, Dr. Hoganson Copyright © 2005, 2006 Dr. Ken Hoganson Bus Contention Multiple devices – processors, etc, compete for access to a bus Only one device can use a bus at a time, limiting performance and scalability 1 – zero requests – exactly one request = probability of 2 or more (at least one blocked request)
CS 8625 High Performance and Parallel, Dr. Hoganson Copyright © 2005, 2006 Dr. Ken Hoganson Performance degrades as requests are blocked Resubmitted blocked requests degrades performance even further than that shown above N=4N=8N=16 R r (1-r)^n Nr(1-r)^(n-1) Blocked
CS 8625 High Performance and Parallel, Dr. Hoganson Copyright © 2005, 2006 Dr. Ken Hoganson Clearly, the probability that a processor’s access to a shared bus will be denied will increase with both: The number of processors sharing a bus The probability a processor will need access to the bus. What can be done? What is the “universal band- aid” for performance problems?
CS 8625 High Performance and Parallel, Dr. Hoganson Copyright © 2005, 2006 Dr. Ken Hoganson If cache greatly reduces access to mem, then Blocking rate on the bus is much lower. N=4N=8N=16 R r (1-r)^n Nr(1-r)^(n-1) Blocked
CS 8625 High Performance and Parallel, Dr. Hoganson Copyright © 2005, 2006 Dr. Ken Hoganson Two approaches to improving shared memory/bus machine performance: Invest in large amounts, and multiple levels of, cache, –and a connection network to allow caches to synchronize contents. Invest in multiple buses and independently accessible blocks of memory Combining both may be the best strategy.
CS 8625 High Performance and Parallel, Dr. Hoganson Copyright © 2005, 2006 Dr. Ken Hoganson Homework Your project is to explore the effect on the performance of a shared-memory bus-based multiprocessor, of interconnection network contention. You will do some calculations, use the HPPAS simulator, and write a couple-page report to turn in.
CS 8625 High Performance and Parallel, Dr. Hoganson Copyright © 2005, 2006 Dr. Ken Hoganson Task 1 For a machine with processors that include on- chip cache that yield a cache hit rate of 90%, determine the maximum number of processors that can go on a single shared-bus, and still maintain at least a 98% acceptance of requests. Use the calculations shown in the lecture to zero in on the correct answer, recording your calculations in a table for your report. Show each step of the calculation as was done in the lecture/ppt. Your results should “bracket” the maximum.
CS 8625 High Performance and Parallel, Dr. Hoganson Copyright © 2005, 2006 Dr. Ken Hoganson Task 1 Task 1: Use the formula in the table to find N=4N=8N=16N=? R=10% r0.90 (1-r)^n Nr(1-r)^(n-1) Blocked 1 - 0Req - 1Req
CS 8625 High Performance and Parallel, Dr. Hoganson Copyright © 2005, 2006 Dr. Ken Hoganson Task 2 Use the maximum number of processors (Task 1) and Amdahl’s law at the balance point, to figure out what workload parallel fraction yields a balance in the denominator. Determine the theoretical speedup that will be obtained. Solved for α α= N N + 1
CS 8625 High Performance and Parallel, Dr. Hoganson Copyright © 2005, 2006 Dr. Ken Hoganson Task 3 Use the data values developed so far, to run the HPPAS simulation system. Record the speedup obtained from this system. If it differs markedly from the theoretical value, check all the settings, and rerun the simulation, and explain any variation from the theoretical expected value. Record your results in your report, showing each step of the calculation as was done in the lecture/ppt.
CS 8625 High Performance and Parallel, Dr. Hoganson Copyright © 2005, 2006 Dr. Ken Hoganson Dates The current plan: Make the midterm available on Friday June 23. Due date will be July 10 (after the conference and after the July 4 th weekend). Conference week: Complete homework: Due on July 3 by . Work on Midterm exam. No class lecture on June 27 and 29. No class on July 4. Next live class is Wed July 6.
CS 8625 High Performance and Parallel, Dr. Hoganson Copyright © 2005, 2006 Dr. Ken Hoganson Topic Overview Overview of topics for the exam: Five parallel levels Problems to be solved for parallelism Limitations to parallel speedup Amdahl’s Law: theory, implications Limiting factors in realizing parallel performance Pipelines and their performance issues Flynn’s classification SIMD architectures SIMD algorithms Elementary analysis of algorithms MIMD: Multiprocessors and Multicomputers Balance point and heuristic (from Amdahl’s Law) Bus contention and analysis of single shared bus. Use of the online HPPAS tool. Specific multiprocessor clustered architectures: –Compaq –DASH –Dell Blade Cluster
CS 8625 High Performance and Parallel, Dr. Hoganson Copyright © 2005, 2006 Dr. Ken Hoganson End of Lecture End Of Today’s Lecture.
CS 8625 High Performance and Parallel, Dr. Hoganson Copyright © 2005, 2006 Dr. Ken Hoganson This slide left intentionally blank.