Presentation is loading. Please wait.

Presentation is loading. Please wait.

04/06/2006CS267 Lecture 22a1 CS 267: Applications of Parallel Computers Final Project Suggestions James Demmel www.cs.berkeley.edu/~demmel/cs267_Spr06.

Similar presentations


Presentation on theme: "04/06/2006CS267 Lecture 22a1 CS 267: Applications of Parallel Computers Final Project Suggestions James Demmel www.cs.berkeley.edu/~demmel/cs267_Spr06."— Presentation transcript:

1 04/06/2006CS267 Lecture 22a1 CS 267: Applications of Parallel Computers Final Project Suggestions James Demmel www.cs.berkeley.edu/~demmel/cs267_Spr06

2 04/06/2006CS267 Lecture 22a2 Outline Kinds of projects Evaluating and improving the performance of a parallel application “Application” could be full scientific application, or important kernel Parallelizing a sequential application other kinds of performance improvements possible too, eg memory hierarchy tuning Devise a new parallel algorithm for some problem Porting parallel application or systems software to new architecture Example of previous projects (all on-line) Upcoming guest lecturers See their previous lectures, or contact them, for project ideas Suggested projects

3 04/06/2006CS267 Lecture 22a3 CS267 Class Projects from 2004 BLAST Implementation on BEE2 — Chen ChangBLAST Implementation on BEE2 PFLAMELET; An Unsteady Flamelet Solver for Parallel Computers — Fabrizio BisettiPFLAMELET; An Unsteady Flamelet Solver for Parallel Computers Parallel Pattern Matcher — Frank Gennari, Shariq Rizvi, and Guille Díez-CañasParallel Pattern Matcher Parallel Simulation in Metropolis — Guang YangParallel Simulation in Metropolis A Survey of Performance Optimizations for Titanium Immersed Boundary Simulation — Hormozd Gahvari, Omair Kamil, Benjamin Lee, Meling Ngo, and Armando SolarA Survey of Performance Optimizations for Titanium Immersed Boundary Simulation Parallelization of oopd1 — Jeff HammelParallelization of oopd1 Optimization and Evaluation of a Titanium Adaptive Mesh Refinement Code — Amir Kamil, Ben Schwarz, and Jimmy SuOptimization and Evaluation of a Titanium Adaptive Mesh Refinement Code

4 04/06/2006CS267 Lecture 22a4 CS267 Class Projects from 2004 (cont) Communication Savings With Ghost Cell Expansion For Domain Decompositions Of Finite Difference Grids — C. Zambrana Rojas and Mark HoemmenCommunication Savings With Ghost Cell Expansion For Domain Decompositions Of Finite Difference Grids Parallelization of Phylogenetic Tree Construction — Michael TungParallelization of Phylogenetic Tree Construction UPC Implementation of the Sparse Triangular Solve and NAS FT — Christian Bell and Rajesh NishtalaUPC Implementation of the Sparse Triangular Solve and NAS FT Widescale Load Balanced Shared Memory Model for Parallel Computing — Sonesh Surana, Yatish Patel, and Dan AdkinsWidescale Load Balanced Shared Memory Model for Parallel Computing

5 04/06/2006CS267 Lecture 22a5 Planned Guest Lecturers Katherine Yelick (UPC, heart modeling) David Anderson (volunteer computing) Kimmen Sjolander (phylogenetic analysis of proteins – SATCHMO – Bonnie Kirkpatrick) Julian Borrill, (astrophysical data analysis) Wes Bethel, (graphics and data visualization) Phil Colella, (adaptive mesh refinement) David Skinner, (tools for scaling up applications) Xiaoye Li, (sparse linear algebra) Osni Marques and Tony Drummond, (ACTS Toolkit) Andrew Canning (computational neuroscience) Michael Wehner (climate modeling)

6 04/06/2006CS267 Lecture 22a6 Suggested projects (1) Weekly research group meetings on these and related topics (see J. Demmel and K. Yelick) Contribute to upcoming ScaLAPACK release (JD) Proposal, talk at www.cs.berkeley.edu/~demmel; ask me for latestwww.cs.berkeley.edu/~demmel Performance evaluation of existing parallel algorithms Ex: New eigensolvers based on successive band reduction Improved implementations of existing parallel algorithms Ex: Use UPC to overlap communication, computation Many serial algorithms to be parallelized See following slides

7 04/06/2006CS267 Lecture 22a7 Missing Drivers in Sca/LAPACK LAPACKScaLAPACK Linear EquationsLU Cholesky LDL T xGESV xPOSV xSYSV PxGESV PxPOSV missing Least Squares (LS)QR QR+pivot SVD/QR SVD/D&C SVD/MRRR QR + iterative refine. xGELS xGELSY xGELSS xGELSD missing PxGELS missing missing (intent?) missing Generalized LSLS + equality constr. Generalized LM Above + Iterative ref. xGGLSE xGGGLM missing

8 04/06/2006CS267 Lecture 22a8 More missing drivers LAPACKScaLAPACK Symmetric EVDQR / Bisection+Invit D&C MRRR xSYEV / X xSYEVD xSYEVR PxSYEV / X PxSYEVD missing Nonsymmetric EVDSchur form Vectors too xGEES / X xGEEV /X missing driver SVDQR D&C MRRR Jacobi xGESVD xGESDD missing PxGESVD missing (intent?) missing Missing Generalized Symmetric EVDQR / Bisection+Invit D&C MRRR xSYGV / X xSYGVD missing PxSYGV / X missing (intent?) missing Generalized Nonsymmetric EVD Schur form Vectors too xGGES / X xGGEV / X missing Generalized SVDKogbetliantz MRRR xGGSVD missing missing (intent) missing

9 04/06/2006CS267 Lecture 22a9 Suggested projects (2) Contribute to sparse linear algebra (JD & KY) Performance tuning to minimize latency and bandwidth costs, both to memory and between processors (sparse => few flops per memory reference or word communicated) Typical methods (eg CG = conjugate gradient) do some number of dot projects, saxpys for each SpMV, so communication cost is O(# iterations) Our goal: Make latency cost O(1)! Requires reorganizing algorithms drastically, including replacing SpMV by new kernel [Ax, A 2 x, A 3 x, …, A k x], which can be done with O(1) messages Projects Study scalability bottlenecks of current CG on real, large matrices Optimize [Ax, A 2 x, A 3 x, …, A k x] on sequential machines Optimize [Ax, A 2 x, A 3 x, …, A k x] on parallel machines

10 04/06/2006CS267 Lecture 22a10 Suggested projects (3) Evaluate new languages on applications (KY) UPC or Titanium UPC for asynchrony, overlapping communication & computation ScaLAPACK in UPC Use UPC-based 3D FFT in your application Optimize existing 1D FFT in UPC, to use 3D techniques Porting, Evaluating parallel systems software (KY) Port UPC to RAMP Port GASNET to Blue Gene, evaluate performance


Download ppt "04/06/2006CS267 Lecture 22a1 CS 267: Applications of Parallel Computers Final Project Suggestions James Demmel www.cs.berkeley.edu/~demmel/cs267_Spr06."

Similar presentations


Ads by Google