Structured ALE Solver. Overview Structured ALE mesh automatically generated Smaller input deck; Easier modifications to the mesh; Less I/O time. Shorter.

Slides:



Advertisements
Similar presentations
A Study on the Multi-Material ALE Scheme with FSI in LS-DYNA
Advertisements

ENERGY AND POWER CHARACTERIZATION OF PARALLEL PROGRAMS RUNNING ON THE INTEL XEON PHI JOAL WOOD, ZILIANG ZONG, QIJUN GU, RONG GE {JW1772, ZILIANG,
Thoughts on Shared Caches Jeff Odom University of Maryland.
Dynamic Load Balancing for VORPAL Viktor Przebinda Center for Integrated Plasma Studies.
Development of Parallel Simulator for Wireless WCDMA Network Hong Zhang Communication lab of HUT.
Visual Traffic Simulation Thomas Fotherby. Objective To visualise traffic flow. –Using 2D animated graphics –Using simple models of microscopic traffic.
© Janice Regan, CMPT 102, Sept CMPT 102 Introduction to Scientific Computer Programming The software development method algorithms.
Claude TADONKI Mines ParisTech – LAL / CNRS / INP 2 P 3 University of Oujda (Morocco) – October 7, 2011 High Performance Computing Challenges and Trends.
Algorithms and Problem Solving-1 Algorithms and Problem Solving.
Parallel Decomposition-based Contact Response Fehmi Cirak California Institute of Technology.
Algorithms and Problem Solving. Learn about problem solving skills Explore the algorithmic approach for problem solving Learn about algorithm development.
Steady Aeroelastic Computations to Predict the Flying Shape of Sails Sriram Antony Jameson Dept. of Aeronautics and Astronautics Stanford University First.
Translation Buffers (TLB’s)
Operating Systems CS208. What is Operating System? It is a program. It is the first piece of software to run after the system boots. It coordinates the.
Support for Adaptive Computations Applied to Simulation of Fluids in Biological Systems Immersed Boundary Method Simulation in Titanium.
Copyright Arshi Khan1 System Programming Instructor Arshi Khan.
The Fundamentals: Algorithms, the Integers & Matrices.
C o n f i d e n t i a l Developed By Nitendra NextHome Subject Name: Data Structure Using C Title: Overview of Data Structure.
The Group Runtime Optimization for High-Performance Computing An Install-Time System for Automatic Generation of Optimized Parallel Sorting Algorithms.
Computer System Architectures Computer System Software
Global Address Space Applications Kathy Yelick NERSC/LBNL and U.C. Berkeley.
Processing of a CAD/CAE Jobs in grid environment using Elmer Electronics Group, Physics Department, Faculty of Science, Ain Shams University, Mohamed Hussein.
PuReMD: Purdue Reactive Molecular Dynamics Package Hasan Metin Aktulga and Ananth Grama Purdue University TST Meeting,May 13-14, 2010.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
R 29 G 115 B 176 R 69 G 153 B 195 R 0 G 51 B 153 R 190 G 26 B 38 R 234 G 40 B 57 R 255 G 121 B 1 R 155 G 238 B 255 R 146 G 212 B 0 R 205 G 234 B 247 R.
Timothy Reeves: Presenter Marisa Orr, Sherrill Biggers Evaluation of the Holistic Method to Size a 3-D Wheel/Soil Model.
Microcontroller Presented by Hasnain Heickal (07), Sabbir Ahmed(08) and Zakia Afroze Abedin(19)
Stephen P. Carl - CS 2421 Recursion Reading : Chapter 4.
CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.
GU Junli SUN Yihe 1.  Introduction & Related work  Parallel encoder implementation  Test results and Analysis  Conclusions 2.
YOU LI SUPERVISOR: DR. CHU XIAOWEN CO-SUPERVISOR: PROF. LIU JIMING THURSDAY, MARCH 11, 2010 Speeding up k-Means by GPUs 1.
4.2.1 Programming Models Technology drivers – Node count, scale of parallelism within the node – Heterogeneity – Complex memory hierarchies – Failure rates.
GOMs and Action Analysis and more. 1.GOMS 2.Action Analysis.
Stochastic optimization of energy systems Cosmin Petra Argonne National Laboratory.
Chapter 10 Analysis and Design Discipline. 2 Purpose The purpose is to translate the requirements into a specification that describes how to implement.
NIH Resource for Biomolecular Modeling and Bioinformatics Beckman Institute, UIUC NAMD Development Goals L.V. (Sanjay) Kale Professor.
GPU-Accelerated Beat Detection for Dancing Monkeys Philip Peng, Yanjie Feng UPenn CIS 565 Spring 2012 Final Project – Final Presentation img src:
Author: B. C. Bromley Presented by: Shuaiyuan Zhou Quasi-random Number Generators for Parallel Monte Carlo Algorithms.
Overcoming Scaling Challenges in Bio-molecular Simulations Abhinav Bhatelé Sameer Kumar Chao Mei James C. Phillips Gengbin Zheng Laxmikant V. Kalé.
Dr. Alexander Michalski, SL-Rasch GmbH Dr. Ries Bouwman, ESI Group
1 1 What does Performance Across the Software Stack mean?  High level view: Providing performance for physics simulations meaningful to applications 
Ale with Mixed Elements 10 – 14 September 2007 Ale with Mixed Elements Ale with Mixed Elements C. Aymard, J. Flament, J.P. Perlat.
Numerical Investigation of Hydrogen Release from Varying Diameter Exit
Resilient Distributed Datasets: A Fault- Tolerant Abstraction for In-Memory Cluster Computing Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave,
A Comparison of Parallel Sorting Algorithms on Different Architectures Nancy M. Amato, Ravishankar Iyer, Sharad Sundaresan and Yan Wu Texas A&M University.
Parallelizing Spacetime Discontinuous Galerkin Methods Jonathan Booth University of Illinois at Urbana/Champaign In conjunction with: L. Kale, R. Haber,
Computer Architecture Lecture 26 Past and Future Ralph Grishman November 2015 NYU.
“NanoElectronics Modeling tool – NEMO5” Jean Michel D. Sellier Purdue University.
Image Processing A Study in Pixel Averaging Building a Resolution Pyramid With Parallel Computing Denise Runnels and Farnaz Zand.
Chapter 1 Basic Concepts of Operating Systems Introduction Software A program is a sequence of instructions that enables the computer to carry.
I MAGIS is a joint project of CNRS - INPG - INRIA - UJF iMAGIS-GRAVIR / IMAG Efficient Parallel Refinement for Hierarchical Radiosity on a DSM computer.
Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Advanced Computer Architecture Lecture 22 Distributed computer Interconnection.
Lecture #1: Introduction to Algorithms and Problem Solving Dr. Hmood Al-Dossari King Saud University Department of Computer Science 6 February 2012.
1 An FPGA Implementation of the Two-Dimensional Finite-Difference Time-Domain (FDTD) Algorithm Wang Chen Panos Kosmas Miriam Leeser Carey Rappaport Northeastern.
ANSYS, Inc. Proprietary © 2004 ANSYS, Inc. Chapter 5 Distributed Memory Parallel Computing v9.0.
CMPT 238 Data Structures More on Sorting: Merge Sort and Quicksort.
Mesh Refinement: Aiding Research in Synthetic Jet Actuation By: Brian Cowley.
Algorithms and Problem Solving
Parallel Density-based Hybrid Clustering
GENERAL VIEW OF KRATOS MULTIPHYSICS
Integrated tool set for electro-thermal evaluation of ICs
Chapter 4, Section 2 Graphs of Motion.
Translation Buffers (TLB’s)
Algorithms and Problem Solving
Translation Buffers (TLB’s)
Presentation made by Steponas Šipaila
Translation Buffers (TLBs)
Chapter 4, Section 2 Graphs of Motion.
Review What are the advantages/disadvantages of pages versus segments?
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

Structured ALE Solver

Overview Structured ALE mesh automatically generated Smaller input deck; Easier modifications to the mesh; Less I/O time. Shorter calculation time Sorting, searching faster and more efficient; Also more accurate. Less memory A rewritten leaner, cleaner code using less memory to accommodate larger problems. SMP, MPP, MPP-Hybrid supported Redesigned algorithm enabled SMP parallelization and hence MPP Hybrid. Enhancement on MPP efficiency 2

Automated Mesh Generation 3 User specifies mesh spacing information along three directions One node for mesh origin and another three for local coordinate system; Later during the run used to prescribe mesh translational motion and rotation

4 S-ALE: Faster Runs Much faster FSI (Fluid structure interaction) searching for structured ALE mesh A faster advection scheme with SMP implemented. ALES-ALEReduction Total Time30941s20120s34.97% FSI15449s5726s63% Advection7595s6642s12.55% Time saving in a Bridgestone tire rolling in mud model ALE solid elements; Simulation time 2000ms. Single precision MPP 48 cpus without special decomposition

SMP & MPP Hybrid: Model Description Airbag inflated by ideal gas and then kept still.; 9261 ALE solids elements with 294 shells.

SMP Scalability Speedup versus number of cores

MPP Hybrid Performance Speedup versus number of MPP x SMP cores

MPP Performance: Model Description Mine buried under soil; Generic Hull Structure; 5.5 million ALE elements; Simulation time 80ms.

MPP Scalability Ideal slope = 1 Slope = 0.90 Speedup versus number of cores

Thank You