ExaFMM --An open source fast multipole method library aimed for Exascale systems Rio Yokota (KAUST), L. A. Barba (BU)

Slides:



Advertisements
Similar presentations
Performance Metrics Inspired by P. Kent at BES Workshop Run larger: Length, Spatial extent, #Atoms, Weak scaling Run longer: Time steps, Optimizations,
Advertisements

1 Scalable Fast Multipole Accelerated Vortex Methods Qi Hu a Nail A. Gumerov a Rio Yokota b Lorena Barba c Ramani Duraiswami a a Department of Computer.
Materials by Design G.E. Ice and T. Ozaki Park Vista Hotel Gatlinburg, Tennessee September 5-6, 2014.
Proportion Priors for Image Sequence Segmentation Claudia Nieuwenhuis, etc. ICCV 2013 Oral.
The Green Abstraction Layer A Standard Power-Management Interface for Next-Generation Network Devices By group 8 1.
Session: Computational Wave Propagation: Basic Theory Igel H., Fichtner A., Käser M., Virieux J., Seriani G., Capdeville Y., Moczo P.  The finite-difference.
Cyberinfrastructure for Scalable and High Performance Geospatial Computation Xuan Shi Graduate assistants supported by the CyberGIS grant Fei Ye (2011)
SKELETON BASED PERFORMANCE PREDICTION ON SHARED NETWORKS Sukhdeep Sodhi Microsoft Corp Jaspal Subhlok University of Houston.
Scaling Distributed Machine Learning with the BASED ON THE PAPER AND PRESENTATION: SCALING DISTRIBUTED MACHINE LEARNING WITH THE PARAMETER SERVER – GOOGLE,
REES: Reasoning Engines Evaluation Shell version 3.0 Automated Reasoning Lab University of California, Irvine.
1 Parallel multi-grid summation for the N-body problem Jesús A. Izaguirre with Thierry Matthey Department of Computer Science and Engineering University.
Lecture 15 Today Transformations between coordinate systems 1.Cartesian to cylindrical transformations 2.Cartesian to spherical transformations.
1 Pendahuluan Pertemuan 9 Matakuliah: H0062/Teori Sistem Tahun: 2006.
Advanced Technology Center 1 HMI Rasmus Larsen / Processing Modules Stanford University HMI Team Meeting – May 2003 Processing Module Development Rasmus.
02/03/2014PHY 712 Spring Lecture 81 PHY 712 Electrodynamics 10-10:50 AM MWF Olin 107 Plan for Lecture 8: Start reading Chapter 4 Multipole moment.
1 Scalable Distributed Fast Multipole Methods Qi Hu, Nail A. Gumerov, Ramani Duraiswami Institute for Advanced Computer Studies Department of Computer.
1 ProActive performance evaluation with NAS benchmarks and optimization of OO SPMD Brian AmedroVladimir Bodnartchouk.
Getting connected.  Java application calls the JDBC library.  JDBC loads a driver which talks to the database.  We can change database engines without.
Data Partitioning on Heterogeneous Multicore and Multi-GPU Systems Using Functional Performance Models of Data-Parallel Applications Published in: Cluster.
Parallel Performance of Hierarchical Multipole Algorithms for Inductance Extraction Ananth Grama, Purdue University Vivek Sarin, Texas A&M University Hemant.
An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.
Priority Research Direction Key challenges Fault oblivious, Error tolerant software Hybrid and hierarchical based algorithms (eg linear algebra split across.
Orthogonal moments Motivation for using OG moments Stable calculation by recurrent relations Easier and stable image reconstruction - set of orthogonal.
Domain Applications: Broader Community Perspective Mixture of apprehension and excitement about programming for emerging architectures.
Chap. 1 Overview of Digital Design with Verilog. 2 Overview of Digital Design with Verilog HDL Evolution of computer aided digital circuit design Emergence.
A SCALABLE LIBRARY FOR PSEUDORANDOM NUMBER GENERATION ALGORITHM 806: SPRNG.
PuReMD: Purdue Reactive Molecular Dynamics Package Hasan Metin Aktulga and Ananth Grama Purdue University TST Meeting,May 13-14, 2010.
1 Scalable Fast Multipole Methods on Distributed Heterogeneous Architecture Qi Hu, Nail A. Gumerov, Ramani Duraiswami Institute for Advanced Computer Studies.
Codeigniter is an open source web application. It occupies a very small amount of space in the memory and is most useful for developers who aim to develop.
§ Separation of spherical variables: zonal harmonics Christopher Crawford PHY
Hurieh Khalajzadeh Mohammad Mansouri Mohammad Teshnehlab
Adventures in Mastering the Use of Performance Evaluation Tools Manuel Ríos Morales ICOM 5995 December 4, 2002.
4.2.1 Programming Models Technology drivers – Node count, scale of parallelism within the node – Heterogeneity – Complex memory hierarchies – Failure rates.
CS179: GPU Programming Lecture 16: Final Project Discussion.
NIH Resource for Biomolecular Modeling and Bioinformatics Beckman Institute, UIUC NAMD Development Goals L.V. (Sanjay) Kale Professor.
Emily Puleston. Wordpress is a free blogging website It is the #1 Content Management System site today First released in May, 2003 Has been downloaded.
Workshop BigSim Large Parallel Machine Simulation Presented by Eric Bohm PPL Charm Workshop 2004.
IT Directors Group 13 & 14 October 2008 Item of the Agenda Seasonal Adjustment software Cristina Calizzani - Unit B5.
Present / introduce / motivate After Introduction to the topic
Automatic Parameterisation of Parallel Linear Algebra Routines Domingo Giménez Javier Cuenca José González University of Murcia SPAIN Algèbre Linéaire.
| nectar.org.au NECTAR TRAINING Module 4 From PC To Cloud or HPC.
Parallelization Strategies Laxmikant Kale. Overview OpenMP Strategies Need for adaptive strategies –Object migration based dynamic load balancing –Minimal.
NA-MIC National Alliance for Medical Image Computing Core 1b – Engineering Computational Platform Jim Miller GE Research.
3.Spherical Tensors Spherical tensors : Objects that transform like 2 nd tensors under rotations.  {Y l m ; m =  l, …, l } is a (2l+1)-D basis for (irreducible)
Author : Cedric Augonnet, Samuel Thibault, and Raymond Namyst INRIA Bordeaux, LaBRI, University of Bordeaux Workshop on Highly Parallel Processing on a.
1 1  Capabilities: PCU: Communication, threading, and File IO built on MPI APF: Abstract definition of meshes, fields, and their algorithms GMI: Interface.
Algebraic Solvers in FASTMath Argonne Training Program on Extreme-Scale Computing August 2015.
Is MPI still part of the solution ? George Bosilca Innovative Computing Laboratory Electrical Engineering and Computer Science Department University of.
Comparison on Size FreeRTOS RTLinux Kernel Size Kernel Size
Nek5000 preliminary discussion for petaflops apps project.
TMVA New Features Sergei V. Gleyzer Omar Zapata Mesa Lorenzo Moneta Data Science at the LHC Workshop Nov. 9, 2015.
NSLS-II Data Management Framework Arman Arkilic Brookhaven National Lab EPICS 2015 Collaboration Meeting-FRIB.
NCSA Strategic Retreat: System Software Trends Bill Gropp.
1 Munther Abualkibash University of Bridgeport, CT.
ERGoUrHIP Workshop, Progress Report on Parallel Processing with Flower Marek Szuba Heavy-Ion Reactions Group Warsaw University of Technology.
Parallel Programming Models
Panel: Beyond Exascale Computing
Energy efficient SCalable
TensorFlow– A system for large-scale machine learning
CST 1101 Problem Solving Using Computers
Building Machine Learning System with Python
A survey of Exascale Linear Algebra Libraries for Data Assimilation
Deep Learning Platform as a Service
پروتكل آموزش سلامت به مددجو
Tutorial Overview February 2017
Geometry checking tools
Support Vector Machine _ 2 (SVM)
PHY 712 Electrodynamics 9-9:50 AM MWF Olin 105 Plan for Lecture 8:
Aim: Does the Constitution develop a more effective framework for American Government? Do Now:
Predicting Loan Defaults
Presentation transcript:

ExaFMM --An open source fast multipole method library aimed for Exascale systems Rio Yokota (KAUST), L. A. Barba (BU)

Features of ExaFMM For application scientists: Easy to use --simple interface, tutorials, support Many short examples For algorithm developers: Flexible framework for further experimentation Many alternative modules, controllable parameters For hackers: Detailed comments in source code Many unit tests, regression tests

Current features 1. Auto-tuning for heterogeneous architectures 2. Optimized kernels for both high & low accuracy 3. Periodic boundary conditions 4. Recursive multi-section partitioning/load balancing 5. Hierarchical MPI communication with overlapping

1. Auto-tuning

2. Optimized kernels Cartesian Taylor series O(p 6 ) Spherical harmonics O(p 4 ) Spherical harmonics + rotation O(p 3 ) Spherical harmonics + plane wave O(p 3 ) Laplace : G, ∇ G, ∇ G×, ∇∇ G Helmholtz : G Stokes : G

3. Periodic boundary conditions

4. Recursive multisection

5. Hierarchical communication

Applications and Performance Benchmarks

Bio-molecular application

Turbulence application

Scalability on Kraken

Large GPU systems

Strong scaling (N=10 8 )

Weak scaling

Comparison with 2010 Gordon Bell

ExaFMM official release today Details & download: