Computational Science and Engineering Petascale Initiative at LBL PI: Alice Koniges An ASCR Funded ARRA Project Status: March 16, 2011 Project Scope: Hire.

Slides:

Advertisements

Similar presentations

S ITE R EPORT : L AWRENCE B ERKELEY N ATIONAL L ABORATORY J OERG M EYER

Advertisements

PERFORMANCE EVALUATION OF USER-REAXC PACKAGE Hasan Metin Aktulga Postdoctoral Researcher Scientific Computing Group Lawrence Berkeley National Laboratory.

Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.

A NOVEL APPROACH TO SOLVING LARGE-SCALE LINEAR SYSTEMS Ken Habgood, Itamar Arel Department of Electrical Engineering & Computer Science GABRIEL CRAMER.

Priority Research Direction Key challenges General Evaluation of current algorithms Evaluation of use of algorithms in Applications Application of “standard”

Performance Metrics Inspired by P. Kent at BES Workshop Run larger: Length, Spatial extent, #Atoms, Weak scaling Run longer: Time steps, Optimizations,

1 DISTRIBUTION STATEMENT XXX– Unclassified, Unlimited Distribution Laser Propagation Modeling Making Large Scale Ultrashort Pulse Laser Simulations Possible.

Current Progress on the CCA Groundwater Modeling Framework Bruce Palmer, Yilin Fang, Vidhya Gurumoorthi, Computational Sciences and Mathematics Division.

GPU Computational Screening of Carbon Capture Materials J Kim 1, A Koniges 1, R Martin 1, M Haranczyk 1, J Swisher 2 and B Smit 1,2 1 Berkeley Lab (USA),

L13: Review for Midterm. Administrative Project proposals due Friday at 5PM (hard deadline) No makeup class Friday! March 23, Guest Lecture Austin Robison,

Weekly Report Start learning GPU Ph.D. Student: Leo Lee date: Sep. 18, 2009.

CS 732: Advance Machine Learning Usman Roshan Department of Computer Science NJIT.

Accelerating Machine Learning Applications on Graphics Processors Narayanan Sundaram and Bryan Catanzaro Presented by Narayanan Sundaram.

Computing Platform Benchmark By Boonyarit Changaival King Mongkut’s University of Technology Thonburi (KMUTT)

Adnan Ozsoy & Martin Swany DAMSL - Distributed and MetaSystems Lab Department of Computer Information and Science University of Delaware September 2011.

Jawwad A Shamsi Nouman Durrani Nadeem Kafi Systems Research Laboratories, FAST National University of Computer and Emerging Sciences, Karachi Novelties.

Programming for High Performance Computers John M. Levesque Director Cray’s Supercomputing Center Of Excellence.

N Tropy: A Framework for Analyzing Massive Astrophysical Datasets Harnessing the Power of Parallel Grid Resources for Astrophysical Data Analysis Jeffrey.

Iterative and direct linear solvers in fully implicit magnetic reconnection simulations with inexact Newton methods Xuefei (Rebecca) Yuan 1, Xiaoye S.

© Fujitsu Laboratories of Europe 2009 HPC and Chaste: Towards Real-Time Simulation 24 March

Scaling to New Heights Retrospective IEEE/ACM SC2002 Conference Baltimore, MD.

Extracted directly from:

An Autonomic Framework in Cloud Environment Jiedan Zhu Advisor: Prof. Gagan Agrawal.

CS6963 L15: Design Review and CUBLAS Paper Discussion.

Physics Steven Gottlieb, NCSA/Indiana University Lattice QCD: focus on one area I understand well. A central aim of calculations using lattice QCD is to.

R. Ryne, NUG mtg: Page 1 High Energy Physics Greenbook Presentation Robert D. Ryne Lawrence Berkeley National Laboratory NERSC User Group Meeting.

2005 Materials Computation Center External Board Meeting The Materials Computation Center Duane D. Johnson and Richard M. Martin (PIs) Funded by NSF DMR.

Gregory Fotiades.  Global illumination techniques are highly desirable for realistic interaction due to their high level of accuracy and photorealism.

1 © 2012 The MathWorks, Inc. Parallel computing with MATLAB.

Programming Concepts in GPU Computing Dušan Gajić, University of Niš Programming Concepts in GPU Computing Dušan B. Gajić CIITLab, Dept. of Computer Science.

March 9, 2015 San Jose Compute Engineering Workshop.

Fast Support Vector Machine Training and Classification on Graphics Processors Bryan Catanzaro Narayanan Sundaram Kurt Keutzer Parallel Computing Laboratory,

SuperLU_DIST on GPU Cluster Sherry Li FASTMath Meeting, Oct. 1-2, /2014 “A distributed CPU-GPU sparse direct solver”, P. Sao, R. Vuduc and X.S. Li, Euro-Par.

GPU Architecture and Programming

Dense Image Over-segmentation on a GPU Alex Rodionov 4/24/2009.

Combinatorial Scientific Computing and Petascale Simulation (CSCAPES) A SciDAC Institute Funded by DOE’s Office of Science Investigators Alex Pothen, Florin.

Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015.

US Army Corps of Engineers Engineer Research and Development Center Navigation R&D High Fidelity Vessel Effects PI: Chris Kees and Matthew FarthingJanuary.

Template This is a template to help, not constrain, you. Modify as appropriate. Move bullet points to additional slides as needed. Don’t cram onto a single.

GPUs: Overview of Architecture and Programming Options Lee Barford firstname dot lastname at gmail dot com.

UNDER THE GUIDANCE DR. K. R. RAO SUBMITTED BY SHAHEER AHMED ID : Encoding H.264 by Thread Level Parallelism.

 The need for parallelization  Challenges towards effective parallelization  A multilevel parallelization framework for BEM: A compute intensive application.

1 OFFICE OF ADVANCED SCIENTIFIC COMPUTING RESEARCH The NERSC Center --From A DOE Program Manager’s Perspective-- A Presentation to the NERSC Users Group.

Template This is a template to help, not constrain, you. Modify as appropriate. Move bullet points to additional slides as needed. Don’t cram onto a single.

1 1  Capabilities: Serial (thread-safe), shared-memory (SuperLU_MT, OpenMP or Pthreads), distributed-memory (SuperLU_DIST, hybrid MPI+ OpenM + CUDA).

ComPASS Summary, Budgets & Discussion Panagiotis Spentzouris, Fermilab ComPASS PI.

Bridging Atomistic to Continuum Scales – Multiscale Investigation of Self-Assembling Magnetic Dots in Heteroepitaxial Growth Katsuyo Thornton, University.

The Performance Evaluation Research Center (PERC) Participating Institutions: Argonne Natl. Lab.Univ. of California, San Diego Lawrence Berkeley Natl.

CS 732: Advance Machine Learning

Space Charge with PyHEADTAIL and PyPIC on the GPU Stefan Hegglin and Adrian Oeftiger Space Charge Working Group meeting –

C OMPUTATIONAL R ESEARCH D IVISION 1 Defining Software Requirements for Scientific Computing Phillip Colella Applied Numerical Algorithms Group Lawrence.

CERN VISIONS LEP  web LHC  grid-cloud HL-LHC/FCC  ?? Proposal: von-Neumann  NON-Neumann Table 1: Nick Tredennick’s Paradigm Classification Scheme Early.

PEER 2003 Meeting 03/08/031 Interdisciplinary Framework Major focus areas Structural Representation Fault Systems Earthquake Source Physics Ground Motions.

Large-scale geophysical electromagnetic imaging and modeling on graphical processing units Michael Commer (LBNL) Filipe R. N. C. Maia (LBNL-NERSC) Gregory.

CSCAPES Mission Research and development Provide load balancing and parallelization toolkits for petascale computation Develop advanced automatic differentiation.

Fermi National Accelerator Laboratory & Thomas Jefferson National Accelerator Facility SciDAC LQCD Software The Department of Energy (DOE) Office of Science.

Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,

Accelerating K-Means Clustering with Parallel Implementations and GPU Computing Janki Bhimani Miriam Leeser Ningfang Mi

Defining the Competencies for Leadership- Class Computing Education and Training Steven I. Gordon and Judith D. Gardiner August 3, 2010.

Stencil-based Discrete Gradient Transform Using

Electron Ion Collider New aspects of EIC experiment instrumentation and computing, as well as their possible impact on and context in society (B) COMPUTING.

for the Offline and Computing groups

Parallel ODETLAP for Terrain Compression and Reconstruction

Parallel Objects: Virtualization & In-Process Components

Parallel Algorithm Design

Performance Evaluation of Adaptive MPI

Inculcating “Parallel Programming” in UG curriculum

Ray-Cast Rendering in VTK-m

Scalable Parallel Interoperable Data Analytics Library

Department of Computer Science, University of Tennessee, Knoxville

Presentation transcript:

Computational Science and Engineering Petascale Initiative at LBL PI: Alice Koniges An ASCR Funded ARRA Project Status: March 16, 2011 Project Scope: Hire eight or nine post-doctoral researchers for a ~24 month period who will work on key areas supporting DOE’s energy mission including energy- related research, ARRA funded projects and Energy Frontier Research Centers (EFRC) –Science Goals & Objectives Increase performance, science content and algorithms of energy-related codes running on NERSC systems Provide feedback to NERSC users on application acceleration techniques, new programming models, algorithmic enhancement Encourage and support use of NERSC systems for energy applications –Work Authorization Performance Goal : Delivery of computational capability to at least one ERFC by end of the second year. –Milestone has been achieved with results from the chemistry EFRC; research continues in this and other areas –Budget: $3.125M This budget funds ~9 post-doc positions and their related expenses (workstations and conference travel) for approximately 24 month term assignments. Status: –9 positions filled 7 have started working at NERSC, number 8 is due to start April 4, number 9 in May 2011

Petascale Initiative Post-doctoral Hires and Current Projects Light green boxes denote projected start date for those who have accepted LBL offers, but not started yet.

Accelerate chemistry codes in order to screen millions of structures Speedup comparison of the two GPU Monte Carlo algorithms over a single CPU core (1) Parallel Lennard-Jones algorithm –CUDA threads within the same block work in parallel to process the same methane – MFI system –Total number of independent systems: (# CUDA blocks per SM) x (# of SM) –Utilize fast GPU memory –Can simulate large system size (2) Embarrassingly parallel algorithm –No direct communication between threads (implicit communication through memory access) –Total number of independent system: (# CUDA threads per block) x (# CUDA blocks per SM) x (# of SM) –Cannot utilize fast GPU memory –Small system size (large DRAM usage) Postdoctoral Researcher: J. Kim Codes are used for EFRC Modeling. GPU platforms are introduced.

Advanced Programming Model Studies using Co-Array Fortran (CAF) in the GTS code Extend the existing hybrid MPI/OpenMP communication model for better performance and investigate the applicability of new parallel programming models in the communication-intensive part of GTS, a plasma PIC code Postdoctoral Researcher: R. Preissl

Performance Optimization of Porous Media Calculations for Carbon Sequestration The goal of this work is to enhance the computational and mathematical models used to investigate CO 2 storage in geological formations. Isosurfaces of CO 2 density in the brine. This image highlights the structure of the 3D fingers. Block-structured adaptive mesh refinement in carbon sequestration calculation. Finest grids are concentrated near edges of the fingers, while coarse grids are used in the fluid, saving computational time. Postdoctoral Researcher: K. Fagnan

Alternative communication patterns to speed up FFT-based Poisson solver in the Impact-T code “Packed” algorithm gives 40-60% speedup over original version with 1024 cores on Franklin Original Nonblocking Packed FFT and transpose performed serially. FFT overlapped with transpose. Transpose messages combined. Transpose ( Communication ) FFT (Computation) Postdoctoral Researcher: B. Austin Impact-T is used for Next Generation Light Source Design

Biological Imaging with the Advanced Light Source The goal of this work is to advance the analytical tools available to users of the ALS, including the development of real time reconstruction software for the upcoming Nanosurveyor instrument at beamline Matlab implementation completed. Serial GPU implementation completed. Proof of concept for parallel GPU implementation tested. Postdoctoral Researcher: F. Maia

Performance Characterization of Magnetic Fusion Applications with Implications for Fusion Co-design Characterize application codes for performance on current (petascale) and future (exascale) platforms using NERSC and Cray tools Codes: GTS, GYRO/NEO, BOUT++, NIMROD, VORPAL, M3D-C1 Performance metrics: Scalability, communication, memory and network bandwidth GTS: Weak scalingGYRO: Weak scalingBOUT++: Strong scaling Scaling results from NERSC Franklin and Hopper machines show scalability and role of communication overhead at large concurrency for sample codes. Postdoctoral Researcher: P. Narayanan

Improving Computational Algorithms and Performance in Warm Dense Matter Modeling The goal of this work is to improve the accuracy of NDCX-II modeling codes. In particular, we incorporate surface tension effects into the ALE-AMR (Arbitrary Lagrangian Eulerian – Adaptive Mesh Refinement) model. NDCX-II is a warm dense matter experiment at LBL with ARRA funding. Current modeling of the NDCX-II project without proper surface tension effect. The material breaks up into pieces instead of small droplets. A standalone sample of droplet breakup with a new surface tension model in an expanding flow. Postdoctoral Researcher: W. Liu

Additional Contributions and Achievements Two post-docs (Kim and Preissl) were co-authors and significant contributors to the paper “Application Acceleration on Current and Future Cray Platforms,” winner BEST PAPER, CUG 2010, Edinburgh, May 2010 Two first author (Fagnan and Narayanan) post-doc paper abstracts were accepted for CUG 2011 Other papers submitted and/or published include: –“GPU Monte Carlo Algorithms for Molecules within a Microporous Framework" - Jihan Kim, Jocelyn Rodgers, Berend Smit, submitted to Caches 2011 workshop. –“Acceleration of RI-MP2 Gradient Routine Using Graphics Processing Units" - Jihan Kim, Martin Head-Gordon, submitted to ICCS –“Exploitation of Dynamic Communication Patterns through Static Analysis,” Robert Preissl, B. de Supinski, M. Schulz, D. Quinlan, Dd. Kranzlmüller,and T. Panas, ICPP –“Overlapping Communication with Computation using OpenMP tasks on the GTS magnetic fusion code,” R. Preissl, A. Koniges, S. Ethier, W. Wang, N. Wichmann, Journal of Scientific Programming, –Compressive auto-indexing in femtosecond nanocrystallography, Filipe R.N.C. Maia, Chao Yang and Stefano Marchesini, Ultramicroscopy, November –Compressive phase contrast tomography, F. Maia, A. MacDowell, S. Marchesini, H. A. Padmore, D. Y. Parkinson, A. Schirotzek, and C. Yang, J. Pien, Proc. SPIE 7800, 2010 More than 10 talks and conference poster presentations were given Two post-docs (Kim and Maia) served as teaching assistants for the NERSC- sponsored user course on GPUs, Aug One post-doc (Preissl) served as a teaching assistant for an SC10 tutorial