MESQUITE: Mesh Optimization Toolkit Brian Miller, LLNL

Slides:

Advertisements

Similar presentations

Intel® performance analyze tools Nikita Panov Idrisov Renat.

Advertisements

Software and Services Group Optimization Notice Advancing HPC == advancing the business of software Rich Altmaier Director of Engineering Sept 1, 2011.

GPU System Architecture Alan Gray EPCC The University of Edinburgh.

Introduction CS 524 – High-Performance Computing.

CS 732: Advance Machine Learning Usman Roshan Department of Computer Science NJIT.

Overview of Eclipse Parallel Tools Platform Adam Leko UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red:

Digital signature using MD5 algorithm Hardware Acceleration

The Asynchronous Dynamic Load-Balancing Library Rusty Lusk, Steve Pieper, Ralph Butler, Anthony Chan Mathematics and Computer Science Division Nuclear.

GPU Programming with CUDA – Accelerated Architectures Mike Griffiths

Basis Light-Front Quantization: a non-perturbative approach for quantum field theory Xingbo Zhao With Anton Ilderton, Heli Honkanen, Pieter Maris, James.

1 Babak Behzad, Yan Liu 1,2,4, Eric Shook 1,2, Michael P. Finn 5, David M. Mattli 5 and Shaowen Wang 1,2,3,4 Babak Behzad 1,3, Yan Liu 1,2,4, Eric Shook.

Effective User Services for High Performance Computing A White Paper by the TeraGrid Science Advisory Board May 2009.

Venkatram Ramanathan 1. Motivation Evolution of Multi-Core Machines and the challenges Summary of Contributions Background: MapReduce and FREERIDE Wavelet.

ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.

Artdaq Introduction artdaq is a toolkit for creating the event building and filtering portions of a DAQ. A set of ready-to-use components along with hooks.

Engr. M. Fahad Khan Lecturer Software Engineering Department University Of Engineering & Technology Taxila.

HPC in linguistic research Andrew Meade University Of Reading

Nov. 14, 2012 Hank Childs, Lawrence Berkeley Jeremy Meredith, Oak Ridge Pat McCormick, Los Alamos Chris Sewell, Los Alamos Ken Moreland, Sandia Panel at.

Embedding Constraint Satisfaction using Parallel Soft-Core Processors on FPGAs Prasad Subramanian, Brandon Eames, Department of Electrical Engineering,

The european ITM Task Force data structure F. Imbeaux.

Stochastic optimization of energy systems Cosmin Petra Argonne National Laboratory.

Distributed Galois Andrew Lenharth 2/27/2015. Goals An implementation of the operator formulation for distributed memory – Ideally forward-compatible.

Climate-Weather modeling studies Using a Prototype Global Cloud-System Resolving Model Zhi Liang (GFDL/DRC)

Towards Exascale File I/O Yutaka Ishikawa University of Tokyo, Japan 2009/05/21.

FREERIDE: System Support for High Performance Data Mining Ruoming Jin Leo Glimcher Xuan Zhang Ge Yang Gagan Agrawal Department of Computer and Information.

What is SAM-Grid? Job Handling Data Handling Monitoring and Information.

1 1 What does Performance Across the Software Stack mean?  High level view: Providing performance for physics simulations meaningful to applications 

Debugging parallel programs. Breakpoint debugging Probably the most widely familiar method of debugging programs is breakpoint debugging. In this method,

Experts in numerical algorithms and HPC services Compiler Requirements and Directions Rob Meyer September 10, 2009.

Parallelization of likelihood functions for data analysis Alfio Lazzaro CERN openlab Forum on Concurrent Programming Models and Frameworks.

Computing Simulation in Orders Based Transparent Parallelizing Pavlenko Vitaliy Danilovich, Odessa National Polytechnic University Burdeinyi Viktor Viktorovych,

Derek Wright Computer Sciences Department University of Wisconsin-Madison Condor and MPI Paradyn/Condor.

Belgrade, 26 September 2014 George S. Markomanolis, Oriol Jorba, Kim Serradell Overview of on-going work on NMMB HPC performance at BSC.

October 2008 Integrated Predictive Simulation System for Earthquake and Tsunami Disaster CREST/Japan Science and Technology Agency (JST)

Lawrence Livermore National Laboratory S&T Principal Directorate - Computation Directorate Tools and Scalable Application Preparation Project Computation.

Connections to Other Packages The Cactus Team Albert Einstein Institute

Parallel I/O Performance Study and Optimizations with HDF5, A Scientific Data Package MuQun Yang, Christian Chilan, Albert Cheng, Quincey Koziol, Mike.

Copyright © 2002 OSI Software, Inc. All rights reserved. PI Application Framework Example Applying the Application Framework.

Template This is a template to help, not constrain, you. Modify as appropriate. Move bullet points to additional slides as needed. Don’t cram onto a single.

“NanoElectronics Modeling tool – NEMO5” Jean Michel D. Sellier Purdue University.

Xolotl: A New Plasma Facing Component Simulator Scott Forest Hull II Jr. Software Developer Oak Ridge National Laboratory

Parallelization Strategies Laxmikant Kale. Overview OpenMP Strategies Need for adaptive strategies –Object migration based dynamic load balancing –Minimal.

SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Advanced User Support for MPCUGLES code at University of Minnesota October 09,

CS 732: Advance Machine Learning

SDM Center High-Performance Parallel I/O Libraries (PI) Alok Choudhary, (Co-I) Wei-Keng Liao Northwestern University In Collaboration with the SEA Group.

Lesson 1 1 LESSON 1 l Background information l Introduction to Java Introduction and a Taste of Java.

A Pattern Language for Parallel Programming Beverly Sanders University of Florida.

Cactus Workshop - NCSA Sep 27 - Oct Generic Cactus Workshop: Summary and Future Ed Seidel Albert Einstein Institute

Climate-SDM (1) Climate analysis use case –Described by: Marcia Branstetter Use case description –Data obtained from ESG –Using a sequence steps in analysis,

NERSC User Group Meeting June 3, 2002 FY 2003 Allocation Process Francesca Verdier NERSC User Services Group Lead

HPC University Requirements Analysis Team Training Analysis Summary Meeting at PSC September Mary Ann Leung, Ph.D.

Get your software working before putting it on the robot!

First INFN International School on Architectures, tools and methodologies for developing efficient large scale scientific computing applications Ce.U.B.

SEAMCAT European Communications Office José Carrascosa - SEAMCAT Manager 5 April 2016.

Multi-cellular paradigm The molecular level can support self- replication (and self- repair). But we also need cells that can be designed to fit the specific.

Wednesday NI Vision Sessions

Fermi National Accelerator Laboratory & Thomas Jefferson National Accelerator Facility SciDAC LQCD Software The Department of Energy (DOE) Office of Science.

Defining the Competencies for Leadership- Class Computing Education and Training Steven I. Gordon and Judith D. Gardiner August 3, 2010.

Parallel OpenFOAM CFD Performance Studies Student: Adi Farshteindiker Advisors: Dr. Guy Tel-Zur,Prof. Shlomi Dolev The Department of Computer Science Faculty.

VisIt Project Overview

Productive Performance Tools for Heterogeneous Parallel Computing

GdX - Grid eXplorer parXXL: A Fine Grained Development Environment on Coarse Grained Architectures PARA 2006 – UMEǺ Jens Gustedt - Stéphane Vialle - Amelia.

VisIt Libsim Update DOE Computer Graphics Forum 2012 Brad Whitlock

Case Study 2- Parallel Breadth-First Search Using OpenMP

Parallel Programming By J. H. Wang May 2, 2017.

STUDY OF PARALLEL MONTE CARLO SIMULATION TECHNIQUES

Programming Models for SimMillennium

Unstructured Grids at Sandia National Labs

Module 1: Getting Started

Lecture 2 The Art of Concurrency

Presentation transcript:

MESQUITE: Mesh Optimization Toolkit Brian Miller, LLNL

A) Project Overview Science goal: Algorithms for improving unstructured mesh quality, achieved through optimization techniques. –Provide library of high quality mesh optimization tools to simulation code projects (Mesquite). Pat Knupp (SNL) project PI –Brian Miller, Lori Diachin (LLNL) –Carl Ollivier-Gooch (UBC) Long history of support through DoE Office of Science –Several successful collaborations with both SciDAC and ASC code groups. Goals in CScADS context: Apply threaded parallelism to Mesquite optimization solvers. –Evolve algorithms and software to take advantage of current and emerging hardware and software capabilities (multicore, many core, etc.)

B) Science Lesson MESQUITE poses unstructured mesh quality improvement as an optimization problem. –Element Quality: Ideal element as defined by the user drives this. –Mesh quality objective function: How local element qualities are summed into the global objective function. Again, user defined. –Optimization problem: min(F(x)) Optimization problems solved using included solvers ranging from simple steepest descent to more sophisticated Feasible Newton and Active Set solvers. Again user chooses solver method.

C/D) Methods and Programming Model Pretty basic C++. No third party libraries except for unit testing (cppunit). MPI parallelism, mostly low volume nearest neighbor communication. No threaded parallelism currently – we intend to change this. Fairly portable code including recent runs on LLNL dawn BG/P machine. Optimization solvers included in the code, no interface to external optimization libraries. Designed to meet TSTTM mesh query interface and have demonstrated its use in several code interfaces.

E) I/O and Viz I/O: –Not really applicable since Mesquite is intended for use within an existing code framework. –For standalone use and testing we typically read/write one file per MPI task. Viz: –Visit or paraview for viewing parallel mesh files. –Optional Gnuplot output of convergence histories. Analysis: –Internal mechanism for mesh quality calculations.

G/H) Tools and Performance What tools do you use? –TAU/OpenSpeedShop/Intel tools for performance analysis and thread checks. –Totalview, valgrind for debugging. –Some internal debugging output available. What do you believe is your current bottleneck to better performance? –Serial performance is sub-optimal. A route is needed from the generic algorithms provided in Mesquite to tight, high performance loops. What do you believe is your current bottleneck to better scaling? –Scaling hasn’t been a problem (yet.) What features would you like to see in performance tools? –Better derived hardware metrics/more sophisticated analysis.

I) Status and Scalability Goal in one year: Similar graph but with threads added. Top Pains: –Must add threading to existing code – not enough resources to rewrite. –Require portable threading model. –How to inherit simulation threading model.

J) Roadmap Where will your science take you over the next 2 years? –Desire to support runs on significantly larger systems (Sequoia) What do you hope to learn / discover? –Extent of MPI scalability. –Effect of adding threading on MPI scalability. What improvements will you need to make? –New threaded global solver algorithms. –Gradual evolution to threaded implementation. What are your plans? –OpenMP threading in limited regions of the code – specific algorithms with good available parallelism initially. –Extending threads to other areas may require algorithmic changes. –Explore other threading models: OpenCL, OpenACC, CUDA, etc.