Presentation is loading. Please wait.

Presentation is loading. Please wait.

National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Evaluating the Tera MTA Allan Snavely, Wayne Pfeiffer et.

Similar presentations


Presentation on theme: "National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Evaluating the Tera MTA Allan Snavely, Wayne Pfeiffer et."— Presentation transcript:

1 National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Evaluating the Tera MTA Allan Snavely, Wayne Pfeiffer et al Architectural features: Massive hardware multithreading Flat, randomized memory (no data cache) Support for automatic parallelization Single programming model for 1 or many processors Designed to scale Goals of Architecture: Cover memory and other operational latencies Ease burden on programmer Exploit multiple levels of parallelism Scale Goals of SDSC Evaluation: Funded by NSF to evaluate the MTA for the purposes of scientific computing Wayne Pfeiffer, Larry Carter PIs

2 National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Evaluating the Tera MTA Executive Summary A few kernels and applications have been found for which the MTA achieves higher performance than other SDSC machines. Such codes have these characteristics: They do not vectorize well. They are difficult to parallelize on conventional machines. They contain substantial parallelism. Examples are codes that involve: Integer sorting. Dynamic, irregular meshes or dynamic, non-uniform workloads within a regular mesh. Parallel operations (such as a general gather/scatter) with poor data locality. Single-processor performance of the multithreaded Tera MTA (with a 260-MHz clock) is typically lower than that of the vector Cray T90 (with a 440-MHz clock). The T90 is faster than the MTA processor for 4 out of 7 kernels and 2 out of 3 applications compared. The MTA processor is appreciably faster for one kernel which does an integer sort. Single-processor performance of the MTA is typically higher than that of cache-based, workstation processors. An MTA processor is substantially faster than a workstation processor for 8 out of 9 applications compared. This indicates the effectiveness of multithreading as compared to cache utilization. Scalability on the MTA is good up to 8 processors in many instances and better for kernels than for larger applications. Very good scalability (parallel efficiency between 0.80 and 1.00 on 8 processors) has been achieved for 6 out of 7 kernels and 5 out of 11 applications studied. Compared to kernels, the applications have more sections of code that must be tuned to achieve good performance. The MTA is faster processor for processor than the IBM Blue Horizon (with a 220 MHz clock.) Scaling sometimes favors one machine and sometimes the other. Codes which put pressure on the I-cache suffer degraded scaling on the MTA. Recall that the IBM has 1152 processors, an advantage for large problems that scale well.

3 National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center MTA v.s. IBM Blue Horizon

4 National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center MTA v.s. T90

5 National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Scalability

6 National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Symbiosis and Congestion Pricing on MTA Allan Snavely’s Ph.D. thesis (Fall 2000) Advisor Larry Carter. Symbiosis: A term from Biology meaning ‘the living together of distinct organisms in close proximity’. We adapt that term to refer to an increase in throughput and job turnaround that can occur when jobs are coscheduled on a multithreaded machine. Congestion Pricing: An area of Economics dealing with the right way of pricing a ‘congestion externality’ in such a way that users are caused to take cognizance of the impact their usage has on others. Key Observation: Resource sharing among coscheduled jobs on a multithreaded machine such as the MTA or SMT is very intimate. Thesis: Jobschedulers which take Symbiosis into account, when combined with principles of Congestion Pricing, deliver significant throughput and turnaround gains and maximize global user utility when deployed on multithreaded machines. See www.sdsc.edu/~allans


Download ppt "National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Evaluating the Tera MTA Allan Snavely, Wayne Pfeiffer et."

Similar presentations


Ads by Google