Presentation is loading. Please wait.

Presentation is loading. Please wait.

Case studies in Optimizing High Performance Computing Software Jan Westerholm High performance computing Department of Information Technologies Faculty.

Similar presentations


Presentation on theme: "Case studies in Optimizing High Performance Computing Software Jan Westerholm High performance computing Department of Information Technologies Faculty."— Presentation transcript:

1 Case studies in Optimizing High Performance Computing Software Jan Westerholm High performance computing Department of Information Technologies Faculty of Technology / Åbo Akademi University

2 FINHPC / Åbo Akademi Objectives Sub-project in FINHPC Three year duration 01.07.2005-30.06.2008 Objective: to improve code individuals and research groups have written and are running on CSC machines –faster code, with in many cases exactly the same numerical results as before –ability to run bigger problems Work approach: apply well known techniques from computer science Faster programs may imply better quality for results Better throughput for everybody

3 FINHPC / Åbo Akademi Limitations We will use: –parallelization techniques –code optimization cache utilization (particularly L2-cache) microprocessor pipeline continuity data blocking: grid scan order –introduction of new data structures –replacement of very simple algorithms sorting (quicksort instead of bubble sort) –open source libraries

4 FINHPC / Åbo Akademi Limitations We will not: –introduce better physics, chemistry, etc. –replace chosen basic numerical technique –replace individual algorithms unless they are clearly modularized (matrix inversion as library routine)

5 3 case studies Lattice-Boltzmann fluid simulation : 3DQ19 Protein covariance analysis: Covana Fusion reactor simulation: Elmfire

6 3DQ19: Lattice Boltzmann fluid mechanics Jyväskylä University / Jussi Timonen, Keijo Mattila; ÅA / Anders Gustafsson Physical background: –phase space distribution simulated in time –Boltzmann's equation: drift term and collision term –physical quantities = moments of distribution

7 3DQ19: Program Profiling Flat profile: % cumulative self self total time seconds seconds calls ms/call ms/call name 33.96 43.65 43.65 50 873.00 1230.10 everything2to1() 30.79 83.22 39.57 50 791.40 1148.50 everything1to2() 27.79 118.93 35.71 49000000 0.00 0.00 relaxation_BGK() 2.30 121.89 2.96 shmem_msgs_available 1.19 123.42 1.53 100 15.30 15.30 send_west() 1.11 124.85 1.43 100 14.30 14.30 send_east() 0.82 125.91 1.06 recv_message 0.45 126.49 0.58 sock_msg_avail_on_fd 0.37 126.97 0.48 100 4.80 4.80 per_bound_xslice() 0.33 127.40 0.43 1 430.00 430.00 init_fluid() 0.31 127.80 0.40 1 400.00 400.00 local_profile_y() 0.23 128.10 0.30 socket_msgs_available 0.19 128.34 0.24 1 240.00 240.00 calc_mass() 0.04 128.39 0.05 net_recv 0.03 128.43 0.04 1 40.00 40.00 allocation() 0.02 128.46 0.03 main

8 3DQ19: Optimizations Parallelization: well done already! Code optimization –blocking: grid scan order –anti-dependency: make blocks of code independent –deep fluid: mark those grid points which do not have solids as neighbours

9 3DQ19: Blocking

10 3DQ19: Results on three parallel systems Athlon 1800IBMSC AMD64 everything1to2(): 18,8 19,48 10,06 everything2to1(): 19,34 18,78 10,52 send_west(): 8,4 0,68 1,96 send_east(): 8,31 1,17 3,14 Total time (s): 55,15 40,28 25,76 Time gained (s): 27,48 14,13 14,76 Speed up (%): 33% 26% 36%

11 2nd case study: Covana Protein Covariance analysis Institute of Medical Technology, University of Tampere / Mauno Vihinen, Bairong Chen; ÅA / André Norrgård Biological background –physico-chemical groups of amino acids –protein function from structure pair and triple correlations between amino acids web server for covariance analysis

12 Covana: Protein covariance analysis Protein sequences: calculate correlations between columns of amino acids Typical size 50-150 sequences (rows) 300-1500 amino acids in a sequence (columns) >Q9XW32_CAEEL/9-307 IDVTKPTFLLTFYSIHGTFALVFNILGIFLIMK-NPKIVKMYKGFMINMQ-ILSLLADAQ TTLLMQPVYILPIIGGYTNGLLWQVFR----LSSHIQMAMF---LLLLY---------LQ VASIVCAIVTKYHVVSNIGKLSDRSI-LFWIF---VIVYHGCAFVITGFFSVS-CLARQ- -EEENLIK------T-KFPNAISVFTLEN--VAIYDLQVN---KWMMITTILFAFMLTSS IVISFY--FSVRLLKTLPSKRNTISARSFRGHQIAVTSLM-AQAT-VPFLVL---IIP-- IGTIVYLFVHVLP------NAQ-----EISNIMMAV--YSFHASLST---FVMIISTPQY

13 Covana: Code optimization Effective data structures: dynamic memory allocation Effective generic algorithms: sorting Avoid recalculations

14 Covana: Run time

15 Covana: Results –Runtime: Original : 227.8 s Final Version:2.0 s Improvement :112 times faster –Computer memory usage: Original : 3250 MB Final Version:37 MB Improvement :88 times less. –Disk space usage: Original :277 MB Final version:21 MB Improvement:13 times less.

16 3rd study case: ELMFIRE Tokamak fusion reactor simulation Jukka Heikkinen, Salomon Janhunen, Timo Kiviniemi / Advanced Energy Systems / HUT; ÅA / Artur Signell Physical background: –particle simulation with averaged gyrokinetic Larmor orbits –turbulence and plasma modes

17 Elmfire: Tokamak fusion reactor simulation Goal 1: Computer platform independence –replacing proprietary library routines for random number generation with open source routines –replacing proprietary library routines for distributed solution of sparse linear systems with open source library routines Goal 2: Scalability –Elmfire ran on at most 8 processors –new data structures for sparse matrices were invented, which make element updates efficient

18 Elmfire

19

20 Conclusions Software can be improved! –modern microprocessor architecture is taken into account: cache utilization pipeline –use of well-established computer science methods

21 Conclusions In 1 case out 3, a clear impact on run time was made In 2 cases out of 3, previously intractable results can now be obtained Are these three cases representative of code running on CSC machines? –the next two cases are under study!

22 What have we learnt? Computer scientists with minimal prior knowledge of e.g. physical sciences can contribute to HPC Are supercomputers needed to the extent they are used today at CSC? Interprocess communication often a bottleneck –Parallel computing with 1000 processors may become routine in the future for certain types of problems Who should do the coding? –Code for production use (intensive cycles of use, maintainability) should be outsourced?

23 Co-workers: Mats Aspnäs, Ph.D Anders Gustafsson, M.Sc. Artur Signell, M.Sc. André Norrgård THANK YOU!


Download ppt "Case studies in Optimizing High Performance Computing Software Jan Westerholm High performance computing Department of Information Technologies Faculty."

Similar presentations


Ads by Google