Presentation is loading. Please wait.

Presentation is loading. Please wait.

Nek5000 preliminary discussion for petaflops apps project.

Similar presentations


Presentation on theme: "Nek5000 preliminary discussion for petaflops apps project."— Presentation transcript:

1 nek5000 preliminary discussion for petaflops apps project

2 General facts about nek5000 Research variant of commercial code developed by Fischer, Ho, and Ronquist in late 80’s subsequently modified by Fischer and Tufo Solves incompressible Navier Stokes using spectral element method. Used by several external research groups (Duke, Brown).

3 nek5000 language issues About 80,000 lines of mostly pure old f77 with a little C. C called from Fortran with decent portability strategies. Shell scripts provided to simplify job management. These are mostly jazz specific.

4 nek5000 portability issues Has been run on a wide range of architectures – Power3, Pentiums, Alpha, SGI, etc. Focus is on PGF compiler but portability looks pretty good – I ran on SGI and Fujitsu-i386 pretty easily as well as Jazz with PGF.

5 portability, cont. Relies on hacking a somewhat generalized makefile No configure script and no pre-existent machine-specific makefiles. compiler must be able to promote real to double precision some non-standard f77 (common blocks resized, e.g.)

6 Software process No repository Thus, no versioning, no release schedule, no bug tracking, etc. Test problems, but no auto-verification-type test suite Good quick howto guide but very light on documentation not directly downloadable from e.g. web server

7 Performance Exhaustively studied/optimized Gordon Bell Prize winner Serial part: Dominated by matrix-matrix product with smallish vector lengths homemade routine makes much better use of cache, does much better than BLAS – very high floppage rates on non-vector mahines.

8 Performance, cont. Communication patterns Nearest neighbor (~10%) Vector reduction (~10%) Coarse grid solve (small) Not communication bound (yet) Has scaled nicely to 1000’s procs on ASCI Red, Seaborg (SP3)

9 Performance issues Outstanding performance questions Serial  Efficient use of cache for different parameter regimes (different vector sizes)  how will it perform on new vector hardware?  No “spike” in performance histogram. Hard to optimize further. Parallel  nearest nabe could become bottleneck for slow- converging helmholtz  Scaling at 100,000 procs depends possibly on improved vector reduction implementation

10 What am I doing now? Software process creating cvs repository establishing license agreement creating simple web page with info/release creating some self-testing scripts convincing Paul to add some documentation posting a page of benchmarks creating release script

11 What am I doing, cont. Performance collecting some of my own numbers  PAPI installed locally but need on Jazz!  pgf tools to access hardware counters?  adding some superior instrumentation techniques to the code to make this easier in the future. Petaflops apps meetings posting minutes/notes from each meeting on local web site.


Download ppt "Nek5000 preliminary discussion for petaflops apps project."

Similar presentations


Ads by Google