Case studies in Optimizing High Performance Computing Software Jan Westerholm High performance computing Department of Information Technologies Faculty.

Case studies in Optimizing High Performance Computing Software Jan Westerholm High performance computing Department of Information Technologies Faculty of Technology / Åbo Akademi University

FINHPC / Åbo Akademi Objectives Sub-project in FINHPC Three year duration 01.07.2005-30.06.2008 Objective: to improve code individuals and research groups have written and are running on CSC machines –faster code, with in many cases exactly the same numerical results as before –ability to run bigger problems Work approach: apply well known techniques from computer science Faster programs may imply better quality for results Better throughput for everybody

FINHPC / Åbo Akademi Limitations We will use: –parallelization techniques –code optimization cache utilization (particularly L2-cache) microprocessor pipeline continuity data blocking: grid scan order –introduction of new data structures –replacement of very simple algorithms sorting (quicksort instead of bubble sort) –open source libraries

FINHPC / Åbo Akademi Limitations We will not: –introduce better physics, chemistry, etc. –replace chosen basic numerical technique –replace individual algorithms unless they are clearly modularized (matrix inversion as library routine)

3 case studies Lattice-Boltzmann fluid simulation : 3DQ19 Protein covariance analysis: Covana Fusion reactor simulation: Elmfire

3DQ19: Lattice Boltzmann fluid mechanics Jyväskylä University / Jussi Timonen, Keijo Mattila; ÅA / Anders Gustafsson Physical background: –phase space distribution simulated in time –Boltzmann's equation: drift term and collision term –physical quantities = moments of distribution

3DQ19: Program Profiling Flat profile: % cumulative self self total time seconds seconds calls ms/call ms/call name 33.96 43.65 43.65 50 873.00 1230.10 everything2to1() 30.79 83.22 39.57 50 791.40 1148.50 everything1to2() 27.79 118.93 35.71 49000000 0.00 0.00 relaxation_BGK() 2.30 121.89 2.96 shmem_msgs_available 1.19 123.42 1.53 100 15.30 15.30 send_west() 1.11 124.85 1.43 100 14.30 14.30 send_east() 0.82 125.91 1.06 recv_message 0.45 126.49 0.58 sock_msg_avail_on_fd 0.37 126.97 0.48 100 4.80 4.80 per_bound_xslice() 0.33 127.40 0.43 1 430.00 430.00 init_fluid() 0.31 127.80 0.40 1 400.00 400.00 local_profile_y() 0.23 128.10 0.30 socket_msgs_available 0.19 128.34 0.24 1 240.00 240.00 calc_mass() 0.04 128.39 0.05 net_recv 0.03 128.43 0.04 1 40.00 40.00 allocation() 0.02 128.46 0.03 main

3DQ19: Optimizations Parallelization: well done already! Code optimization –blocking: grid scan order –anti-dependency: make blocks of code independent –deep fluid: mark those grid points which do not have solids as neighbours

3DQ19: Blocking

3DQ19: Results on three parallel systems Athlon 1800IBMSC AMD64 everything1to2(): 18,8 19,48 10,06 everything2to1(): 19,34 18,78 10,52 send_west(): 8,4 0,68 1,96 send_east(): 8,31 1,17 3,14 Total time (s): 55,15 40,28 25,76 Time gained (s): 27,48 14,13 14,76 Speed up (%): 33% 26% 36%

2nd case study: Covana Protein Covariance analysis Institute of Medical Technology, University of Tampere / Mauno Vihinen, Bairong Chen; ÅA / André Norrgård Biological background –physico-chemical groups of amino acids –protein function from structure pair and triple correlations between amino acids web server for covariance analysis

Covana: Protein covariance analysis Protein sequences: calculate correlations between columns of amino acids Typical size 50-150 sequences (rows) 300-1500 amino acids in a sequence (columns) >Q9XW32_CAEEL/9-307 IDVTKPTFLLTFYSIHGTFALVFNILGIFLIMK-NPKIVKMYKGFMINMQ-ILSLLADAQ TTLLMQPVYILPIIGGYTNGLLWQVFR----LSSHIQMAMF---LLLLY---------LQ VASIVCAIVTKYHVVSNIGKLSDRSI-LFWIF---VIVYHGCAFVITGFFSVS-CLARQ- -EEENLIK------T-KFPNAISVFTLEN--VAIYDLQVN---KWMMITTILFAFMLTSS IVISFY--FSVRLLKTLPSKRNTISARSFRGHQIAVTSLM-AQAT-VPFLVL---IIP-- IGTIVYLFVHVLP------NAQ-----EISNIMMAV--YSFHASLST---FVMIISTPQY

Covana: Code optimization Effective data structures: dynamic memory allocation Effective generic algorithms: sorting Avoid recalculations

Covana: Run time

Covana: Results –Runtime: Original : 227.8 s Final Version:2.0 s Improvement :112 times faster –Computer memory usage: Original : 3250 MB Final Version:37 MB Improvement :88 times less. –Disk space usage: Original :277 MB Final version:21 MB Improvement:13 times less.

3rd study case: ELMFIRE Tokamak fusion reactor simulation Jukka Heikkinen, Salomon Janhunen, Timo Kiviniemi / Advanced Energy Systems / HUT; ÅA / Artur Signell Physical background: –particle simulation with averaged gyrokinetic Larmor orbits –turbulence and plasma modes

Elmfire: Tokamak fusion reactor simulation Goal 1: Computer platform independence –replacing proprietary library routines for random number generation with open source routines –replacing proprietary library routines for distributed solution of sparse linear systems with open source library routines Goal 2: Scalability –Elmfire ran on at most 8 processors –new data structures for sparse matrices were invented, which make element updates efficient

Elmfire

Conclusions Software can be improved! –modern microprocessor architecture is taken into account: cache utilization pipeline –use of well-established computer science methods

Conclusions In 1 case out 3, a clear impact on run time was made In 2 cases out of 3, previously intractable results can now be obtained Are these three cases representative of code running on CSC machines? –the next two cases are under study!

What have we learnt? Computer scientists with minimal prior knowledge of e.g. physical sciences can contribute to HPC Are supercomputers needed to the extent they are used today at CSC? Interprocess communication often a bottleneck –Parallel computing with 1000 processors may become routine in the future for certain types of problems Who should do the coding? –Code for production use (intensive cycles of use, maintainability) should be outsourced?

Co-workers: Mats Aspnäs, Ph.D Anders Gustafsson, M.Sc. Artur Signell, M.Sc. André Norrgård THANK YOU!

Case studies in Optimizing High Performance Computing Software Jan Westerholm High performance computing Department of Information Technologies Faculty.

Similar presentations

Presentation on theme: "Case studies in Optimizing High Performance Computing Software Jan Westerholm High performance computing Department of Information Technologies Faculty."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Case studies in Optimizing High Performance Computing Software Jan Westerholm High performance computing Department of Information Technologies Faculty.

Similar presentations

Presentation on theme: "Case studies in Optimizing High Performance Computing Software Jan Westerholm High performance computing Department of Information Technologies Faculty."— Presentation transcript:

Similar presentations

About project

Feedback