Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Research 2007 Introduction to Research 2007 Ashok Srinivasan Florida State University www.cs.fsu.edu/~asriniva Recent collaborators V.

Similar presentations


Presentation on theme: "Introduction to Research 2007 Introduction to Research 2007 Ashok Srinivasan Florida State University www.cs.fsu.edu/~asriniva Recent collaborators V."— Presentation transcript:

1 Introduction to Research 2007 Introduction to Research 2007 Ashok Srinivasan Florida State University www.cs.fsu.edu/~asriniva Recent collaborators V. Aggarwal, J. Kolhe, L. Ji, M. Mascagni, H. Nymeyer, and Y. Yu  Florida State University S. Kapoor  IBM Austin S. Namilae  Oak Ridge National Lab M. Krishna, A. Kumar, N. Jayam, G. Senthilkumar, P. K. Baruah, and R. Sharma  Sri Sathya Sai University, India N. Chandra  University of Nebraska at Lincoln Research support Funding  DoD, FSU, NSF Computer time  IBM, NCSA, NERSC, ORNL

2 Outline Research Areas  Computational Nanotechnology  Computational Biology  High Performance Computing on Multicore Processors Potential Research Topics Graduate Courses

3 Research Areas High Performance Computing, Applications in Computational Sciences, Scalable Algorithms, Mathematical Software  Current topics: Computational Nanotechnology, Computational Biology, HPC on Multicore Processors  New Topics: Dynamic Data Driven Applications  Old Topics: Computational Finance, Parallel Random Number Generation, Monte Carlo Linear Algebra, Computational Fluid Dynamics, Image Compression

4 Importance of Parallel Computing Makes feasible products based on more fundamental understanding of science  Example: Nanotechnology, Medicine Increasing relevance to industry  In 1993, fewer than 30% of top 500 supercomputers were commercial  Now, over 50% are commercial Finance and insurance Medicine Aerospace and Automobiles Telecom Oil exploration Shoes! (Nike) Potato chips! Toys!

5 Architectural Trends Massive parallelism  10K processor systems will be commonplace  Large end already has over 100K processors Single chip multiprocessing  All processors will be multicore  Heterogeneous multicore processors Cell used in the PS3 80-core processor from Intel Processors with hundreds of cores are already commercially available Distributed environments, such as the Grid But it is hard to get good performance on these systems

6 Computational Nanotechnology Example application  Carbon Nanotube Can span 23,000 miles without failing due to own weight 100 times stronger than steel Lighter than feather Conducts heat better than diamond  Computations are used to understand materials at the atomic scale, so that better materials can be designed Easier than experimentation at the nano-meter scale

7 CNT Tensile Test Pull the CNT at constant speed  Determine material properties from force-displacement response Computational difficulties  Time steps size ~ 10 –15 seconds Desired time range is much larger A million time steps are required to reach 10 -9 s ~ 500 hours of computing for ~ 40K atoms using GROMACS MD uses unrealistically large pulling speed  1 to 10 m/s instead of 10 -7 to10 -5 m/s Results at unrealistic speeds are unrealistic!

8 Difficulty with Parallelization Results on scalable code  Does not scale efficiently beyond 10 ms/iteration If we want to simulate to a ms  Time step 1 fs  10 12 iterations  10 10 s ≈ 300 years If we scaled to 10  s per iteration  4 months computing time NAMD, 327K atom ATPase PME, Blue Gene, IPDPS 2006 NAMD, 92K atom ApoA1 PME, Blue Gene, IPDPS 2006 IBM Blue Matter, 43K Rhodopsin, Blue Gene, Tech Report 2005 Desmond, 92K atom ApoA1, SC 2006

9 Data Driven Time Parallelization Each processor simulates a different time interval  Initial state is obtained by prediction, using prior data (except for processor 0)  Verify if prediction for end state is close to that computed by MD  Prediction is based on dynamically determining a relationship between the current simulation and those in a database of prior results If time interval is sufficiently large, then communication overhead is small

10 Results Speedup result  Red line: Ideal speedup  Blue: v = 0.1m/s  Green: A different predictor  Experimental parameters v = 1m/s, using v = 10m/s CNT with 1000 atoms Xeon/ Myrinet cluster Validation  Compare stress strain response  Blue: Exact results  Red: Time parallel results  Green: Direct prediction

11 Computational Biology Data driven time parallelization in the AFM simulation of proteins  An order of magnitude improvement in performance by combining conventional and data driven time parallelization with the protein Titin

12 A PowerPC core, with 8 co-processors (SPE) with 256 K local store each Shared 512 MB - 2 GB main memory - SPEs can DMA Peak speeds of 204.8 Gflops in single precision and 14.64 Gflops in double precision for SPEs 204.8 GB/s EIB bandwidth, 25.6 GB/s for memory Two Cell processors can be combined to form a Cell blade with global shared memory High Performance Computing on Multicore Processors DMA put times Memory to Memory Copy using: SPE local store memcpy by PPE Cell Architecture

13 Cell MPI Results PE: Consider SPUs to be a logical hypercube – in each step, each SPU exchanges messages with neighbor along one dimension DIS: In step i, SPU j sends to SPU j + 2 i and receives from j – 2 i Comparison of MPI_Barrier on different hardware PCell (PE)  s Xeon/Myrinet  s NEC SX-8  s SGI Altix BX2  s 80.4  10  13  3 161.0  14  5 MPI_Barrier timing Broadcast bandwidth

14 Potential Research Topics Computational Biology  Data Driven Time Parallelization  Markov State Modeling  Other topics Dynamic Data Driven Applications  Combining simulations and experiments in superplastic forming High Performance Computing on Multicore Processors  Algorithms and libraries on the Cell processor Example: Sorting, linear algebra, etc Good software cache/code overlaying implementations Other possible new directions  Applications in history, linguistics, medicine, etc

15 Graduate Courses Parallel Computing, Spring 2008  MPI and OpenMP programming on traditional parallel machines  Threaded programming on multicore processors  Parallel algorithms Advanced Algorithms, Fall 2008  Approximation algorithms for NP hard problems  Randomized algorithms  Cache aware algorithms


Download ppt "Introduction to Research 2007 Introduction to Research 2007 Ashok Srinivasan Florida State University www.cs.fsu.edu/~asriniva Recent collaborators V."

Similar presentations


Ads by Google