Download presentation
Presentation is loading. Please wait.
Published byDana Harper Modified over 9 years ago
1
1 First-Principles Molecular Dynamics for Petascale Computers François Gygi Dept of Applied Science, UC Davis fgygi@ucdavis.edu http://eslab.ucdavis.edu Zhaojun Bai Dept of Computer Science, UC Davis Giulia Galli Dept of Chemistry, UC Davis Kwan-Liu Ma Dept of Computer Science, UC Davis Supported by NSF-ITR-HECURA 0749217
2
2 The Qbox project Qbox is a C++/MPI implementation of First-Principles Molecular Dynamics (FPMD) Qbox includes a quantum mechanical description of electronic structure within Density Functional Theory Applications to Materials Science, Chemistry, Nanoscience Software development focuses on large-scale parallelism
3
3 Qbox code architecture Qbox ScaLAPACK/PBLAS BLACS MPI BLAS/ATLAS XercesC (XML parser) FFTW lib DGEMM lib http://eslab.ucdavis.edu/software/qbox
4
4 Qbox performance results 8 k-points: 207.3 TFlop/s (56% of peak) 4 k-points: 187.7 TFlop/s (51% of peak) 1 k-point: 108.8 TFlop/s (30% of peak) 2006 ACM/IEEE Gordon Bell Award for peak performance Electronic structure of a 1000- atom Molybdenum sample 12,000 electrons LLNL BlueGene/L
5
5 Current Qbox availability on Teragrid Platforms Mercury, NCSA Cobalt, NCSA Tungsten, NCSA BlueGene/L, SDSC IBM p655, SDSC Other platforms ANL BG/L ANL BG/P NERSC Franklin, Cray XT4 NCSA Abe
6
6 New scalable algorithms for electronic structure calculations One-sided Jacobi simultaneous diagonalization algorithm used in electronic structure calculations –64-node dual-dual-core AMD Opteron/Infinipath cluster –1 rack ANL BlueGene/L
7
7 Qbox scalability for nanoscience applications Electronic structure of a 2260-atom silicon nanowire Cray-XT4, up to 8k CPUs Superlinear scaling due to cache effects and size- dependent MPI protocols 86% parallel efficiency between 2k and 8k CPUs
8
8 Qbox parallel I/O strategy Advanced functions in MPI-IO are not supported by all file systems (MPI_File_write_shared, etc.) Qbox uses a strategy based on shared file pointer objects Achieves >700 MB/s write rate for file sizes of 50–250 GB platform#taskswrite speed Cray-XT42048778 MB/s Cray-XT44096715 MB/s Cray-XT48192687 MB/s BG/P (ANL)2048814 MB/s
9
9 Analysis of MPI message traffic patterns in Qbox Multiple traffic patterns are involved during a Qbox simulation –physics kernels –3D Fourier transforms –ScaLAPACK linear algebra Logical-to-physical mapping of tasks has a large impact on performance on large platforms (> 4k CPUs) We are developing instrumentation and visualization tools to analyze message traffic patterns on various interconnect architectures Mapping of 65536 MPI tasks on the 32x32x64 torus of the LLNL BG/L
10
10 Analysis of MPI message traffic patterns in Qbox Screenshot of the message traffic visualization tool showing MPI calls in a ScaLAPACK matrix multiplication (C. Muelder, K-L Ma, UCDavis)
11
11 Qbox current developments Deployment on TeraGrid track-2 platforms Applications to Nanoscience simulations –G. Galli, Chemistry UCDavis Specialized linear algebra algorithms –Z. Bai, Computer Science, UCDavis Visualization –K-L. Ma, Computer Science, UCDavis Application-specific data compression algorithms Large dataset management (10 10 – 10 12 bytes) XML standards for electronic structure data (http://www.quantum- simulation.org) Supported by NSF-ITR-HECURA 0749217 http://eslab.ucdavis.edu
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.