Scalable System for Large Unstructured Mesh Simulation Miguel A. Pasenau, Pooyan Dadvand, Jordi Cotela, Abel Coll and Eugenio Oñate.

Slides:



Advertisements
Similar presentations
SkewReduce YongChul Kwon Magdalena Balazinska, Bill Howe, Jerome Rolia* University of Washington, *HP Labs Skew-Resistant Parallel Processing of Feature-Extracting.
Advertisements

LIBRA: Lightweight Data Skew Mitigation in MapReduce
Chapter 17 Design Analysis using Inventor Stress Analysis Module
CISC October Goals for today: Foster’s parallel algorithm design –Partitioning –Task dependency graph Granularity Concurrency Collective communication.
Models and Security Requirements for IDS. Overview The system and attack model Security requirements for IDS –Sensitivity –Detection Analysis methodology.
Evaluation of NoSQL databases for DIRAC monitoring and beyond
Surface Reconstruction from 3D Volume Data. Problem Definition Construct polyhedral surfaces from regularly-sampled 3D digital volumes.
Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University.
Real-time and Retrospective Analysis of Video Streams and Still Image Collections using MPEG-7 Ganesh Gopalan, College of Oceanic and Atmospheric Sciences,
1 An Empirical Study on Large-Scale Content-Based Image Retrieval Group Meeting Presented by Wyman
CAD/CAM Design Process and the role of CAD. Design Process Engineering and manufacturing together form largest single economic activity of western civilization.
Hadoop Team: Role of Hadoop in the IDEAL Project ●Jose Cadena ●Chengyuan Wen ●Mengsu Chen CS5604 Spring 2015 Instructor: Dr. Edward Fox.
Abstract Load balancing in the cloud computing environment has an important impact on the performance. Good load balancing makes cloud computing more.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
USING HADOOP & HBASE TO BUILD CONTENT RELEVANCE & PERSONALIZATION Tools to build your big data application Ameya Kanitkar.
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
_____________________________________________________________________________________________________________________ GiD-Kratos Singapore Workshop Kratos.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Dynamic Meshing Using Adaptively Sampled Distance Fields
Parallel and Distributed IR. 2 Papers on Parallel and Distributed IR Introduction Paper A: Inverted file partitioning schemes in Multiple Disk Systems.
AUTOMATION OF WEB-FORM CREATION - KINNERA ANGADI – MS FINAL DEFENSE GUIDANCE BY – DR. DANIEL ANDRESEN.
So far we have covered … Basic visualization algorithms Parallel polygon rendering Occlusion culling They all indirectly or directly help understanding.

ParCFD Parallel computation of pollutant dispersion in industrial sites Julien Montagnier Marc Buffat David Guibert.
Loosely Coupled Parallelism: Clusters. Context We have studied older archictures for loosely coupled parallelism, such as mesh’s, hypercubes etc, which.
Introduction to Parallel Rendering Jian Huang, CS 594, Spring 2002.
Pavel Slavík, Marek Gayer, Frantisek Hrdlicka, Ondrej Kubelka Czech Technical University in Prague Czech Republic 2003 Winter Simulation Conference December.
HPC computing at CERN - use cases from the engineering and physics communities Michal HUSEJKO, Ioannis AGTZIDIS IT/PES/ES 1.
Overcoming Scaling Challenges in Bio-molecular Simulations Abhinav Bhatelé Sameer Kumar Chao Mei James C. Phillips Gengbin Zheng Laxmikant V. Kalé.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
GVis: Grid-enabled Interactive Visualization State Key Laboratory. of CAD&CG Zhejiang University, Hangzhou
Managing the Level of Detail in 3D Shape Reconstruction and Representation Leila De Floriani, Paola Magillo Department of Computer and Information Sciences.
*Partially funded by the Austrian Grid Project (BMBWK GZ 4003/2-VI/4c/2004) Making the Best of Your Data - Offloading Visualization Tasks onto the Grid.
Numerical Investigation of Hydrogen Release from Varying Diameter Exit
CFX-10 Introduction Lecture 1.
Adaptive Meshing Control to Improve Petascale Compass Simulations Xiao-Juan Luo and Mark S Shephard Scientific Computation Research Center (SCOREC) Interoperable.
Finite Element Analysis
Kratos 3D FSI Analysis Tutorial
 The need for parallelization  Challenges towards effective parallelization  A multilevel parallelization framework for BEM: A compute intensive application.
Generic GUI – Thoughts to Share Jinping Gwo EMSGi.org.
Big traffic data processing framework for intelligent monitoring and recording systems 學生 : 賴弘偉 教授 : 許毅然 作者 : Yingjie Xia a, JinlongChen a,b,n, XindaiLu.
Data Structures and Algorithms in Parallel Computing Lecture 7.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
CS 351/ IT 351 Modeling and Simulation Technologies Review ( ) Dr. Jim Holten.
SIMULATION OF MULTIPROCESSOR SYSTEM AND NETWORK Manish Patel Nov 8 th 2004 Advisor: Dr. Chung-E-Wang Department of Computer Science California State University,
1 Rocket Science using Charm++ at CSAR Orion Sky Lawlor 2003/10/21.
Predictive Load Balancing Using Mesh Adjacencies for Mesh Adaptation  Cameron Smith, Onkar Sahni, Mark S. Shephard  Scientific Computation Research Center.
DPL3/10/2016 CS 551/651: Simplification Continued David Luebke
An operating system (OS) is a collection of system programs that together control the operation of a computer system.
LIOProf: Exposing Lustre File System Behavior for I/O Middleware
3D Object Representations 2009, Fall. Introduction What is CG?  Imaging : Representing 2D images  Modeling : Representing 3D objects  Rendering : Constructing.
APE'07 IV INTERNATIONAL CONFERENCE ON ADVANCES IN PRODUCTION ENGINEERING June 2007 Warsaw, Poland M. Nowakiewicz, J. Porter-Sobieraj Faculty of.
Jimmy Lin and Michael Schatz Design Patterns for Efficient Graph Algorithms in MapReduce Michele Iovino Facoltà di Ingegneria dell’Informazione, Informatica.
A Web Based Job Submission System for a Physics Computing Cluster David Jones IOP Particle Physics 2004 Birmingham 1.
Computer Science and Engineering Parallelizing Feature Mining Using FREERIDE Leonid Glimcher P. 1 ipdps’04 Scaling and Parallelizing a Scientific Feature.
HPC need and potential of ANSYS CFD and mechanical products at CERN A. Rakai EN-CV-PJ2 5/4/2016.
Evolution at CERN E. Da Riva1 CFD team supports CERN development 19 May 2011.
VisIt Project Overview
Advanced Topics in Concurrency and Reactive Programming: Case Study – Google Cluster Majeed Kassis.
Data Structures for Efficient and Integrated Simulation of Multi-Physics Processes in Complex Geometries A.Smirnov MulPhys LLC github/mulphys
Kratos 3D Structural Analysis Tutorial
High Performance Computing on an IBM Cell Processor --- Bioinformatics
Harry Xu University of California, Irvine & Microsoft Research
So far we have covered … Basic visualization algorithms
PreOpenSeesPost: a Generic Interface for OpenSees
GENERAL VIEW OF KRATOS MULTIPHYSICS
LO2 – Understand Computer Software
FREERIDE: A Framework for Rapid Implementation of Datamining Engines
Parallel Feature Identification and Elimination from a CFD Dataset
Presentation transcript:

Scalable System for Large Unstructured Mesh Simulation Miguel A. Pasenau, Pooyan Dadvand, Jordi Cotela, Abel Coll and Eugenio Oñate

29th Nov 2012 / 2 Overview Introduction Preparation and Simulation – More Efficient Partitioning – Parallel Element Splitting Post Processing – Results Cache – Merging Many Partitions – Memory usage – Off-screen mode Conclusions, Future lines Acknowledgements

29th Nov 2012 / 3 Overview Introduction Preparation and Simulation – More Efficient Partitioning – Parallel Element Splitting Post Processing – Results Cache – Merging Many Partitions – Memory usage – Off-screen mode Conclusions, Future lines Acknowledgements

29th Nov 2012 / 4 Introduction Education: Masters in Numerical Methods, trainings, seminars, etc. Publishers: magazines, books, etc. Research: PhD’s, congresses, projects, etc. One of the International Centers of Excellence on Simulation-Based Engineering and Sciences [Glotzer et al., WTEC Panel Report on International Assessment of Research and Development in Simulation Based Engineering and Science. World Technology Evaluation Center (wtec.org), 2009].

29th Nov 2012 / 5 Introduction Simulation: structures

29th Nov 2012 / 6 Introduction CFD: Computer Fluid Dynamics

29th Nov 2012 / 7 Introduction Geomechanics Industrial forming processes Electromagnetism Acoustics Bio-medical engineering Coupled problems Earth sciences

29th Nov 2012 / 8 Introduction Simulation Preparation of analysis data Visualization of results GiD Geometry description Provided by CAD or using GiD Computer Analysis

29th Nov 2012 / 9 Introduction Analysis Data generation Read in and correct CAD data Assignment of boundary conditions Definitions of analysis parameters Generation of analysis data Assignment of material properties, etc.

29th Nov 2012 / 10 Introduction Visualization of Numerical Results – Deformed shapes, temperature distributions, pressures, etc. – Vector, contour plots, graphs, – Line diagrams, results surfaces – Animated sequences – Particle line flow diagrams

29th Nov 2012 / 11

29th Nov 2012 / 12 Introduction Goal: do a CFD simulation with 100 Million elements using in-house tools Hardware: cluster with – Master node: 2 x Intel Quad Core E5410, 32 GB RAM – 3 TB disc with dedicated Gigabit to Master node – 10 nodes: 2 x Intel Quad Core E5410 and 16 GB RAM – 2 nodes: 2 x AMD Opteron Quad Core 2356 and 32 GB – Total of 96 cores, 224 GB RAM available – Infiniband 4x DDR, 20 Gbps

29th Nov 2012 / 13 Introduction Airflow around a F1 car model

29th Nov 2012 / 14 Introduction Kratos: – Multi-physics, open source framework – Parallelized for shared and distributed memory machines GiD: – Geometry handling and data management – First coarse mesh – Merging and post-processing results

29th Nov 2012 / 15 Introduction Geometry Conditions Materials Coarse mesh generation Partition Distribution Communication plan part 1 part 2 Refinement Calculation part n res. 1 res. 2 res. n · · · Merge Visualize · · ·

29th Nov 2012 / 16 Overview Introduction Preparation and Simulation – More Efficient Partitioning – Parallel Element Splitting Post Processing – Results Cache – Merging Many Partitions – Memory usage – Off-screen mode Conclusions, Future lines and Acknowledgements

29th Nov 2012 / 17 Preparation and simulation Geometry Conditions Materials Coarse mesh generation Partition Distribution Communication plan part 1 part 2 Refinement Calculation part n res. 1 res. 2 res. n · · · Merge Visualize · · ·

29th Nov 2012 / 18 Meshing Single workstation: limited memory and time Three steps: – Single node: GiD generates a coarse mesh with 13 Million tetrahedrons – Single node: Kratos + Metis divide and distribute – In parallel: Kratos refines the mesh locally

29th Nov 2012 / 19 Preparation and simulation Geometry Conditions Materials Coarse mesh generation Partition Distribution Communication plan part 1 part 2 Refinement Calculation part n res. 1 res. 2 res. n · · · Merge Visualize · · ·

29th Nov 2012 / 20  Rank0 read the model, partitions it and send the partitions to the other ranks Rank 0Rank 1 Rank 2Rank 3 Efficient partitioning: before

29th Nov 2012 / 21  Rank0 read the model, partitions it and send the partitions to the other ranks Rank 0Rank 1 Rank 2Rank 3 Efficient partitioning: before

29th Nov 2012 / 22 Requires large memory in node 0 Using the cluster time for partitioning which can be done outside Each rerun need repartitioning Same working procedure for OpenMP and MPI run Efficient partitioning: before

29th Nov 2012 / 23  Dividing and writing the partitions in another machine  Reading data of each rank separately Efficient partitioning: now

29th Nov 2012 / 24 Preparation and simulation Geometry Conditions Materials Coarse mesh generation Partition Distribution Communication plan part 1 part 2 Refinement Calculation part n res. 1 res. 2 res. n · · · Merge Visualize · · ·

29th Nov 2012 / 25 Local refinement: triangle k i j l m n i l j m k n k k i j l i l j 1 2 i j m k 1 2 k i j l m i l j m k i l j m k 1 3 2

29th Nov 2012 / 26 Local refinement: triangle  Selecting the case respecting nodes Id  The decision is not for best quality!  It is very good for parallelization  OpenMP  MPI k i j l m i l j m k i l j m k 1 3 2

29th Nov 2012 / 27 Local refinement: tetrahedron Father Element Child Elements

29th Nov 2012 / 28 Local refinement: examples

29th Nov 2012 / 29 Local refinement: examples

29th Nov 2012 / 30 Local refinement: examples

29th Nov 2012 / 31 Local refinement: uniform  A Uniform refinement can be used to obtain a mesh with 8 times more elements  Does not improve the geometry representation

29th Nov 2012 / 32 Introduction Geometry Conditions Materials Coarse mesh generation Partition Distribution Communication plan part 1 part 2 Refinement Calculation part n res. 1 res. 2 res. n · · · Merge Visualize · · ·

29th Nov 2012 / 33 Parallel calculation Calculated using 12 x 8 MPI processes Less than 1 day for 400 time steps About 180 GB memory usage Single volume mesh of 103 Million tetrahedrons split into 96 files ( mesh portion and its results)

29th Nov 2012 / 34 Overview Introduction Preparation and Simulation – More Efficient Partitioning – Parallel Element Splitting Post Processing – Results Cache – Merging Many Partitions – Memory usage – Off-screen mode Conclusions, Future lines and Acknowledgements

29th Nov 2012 / 35 Post processing Geometry Conditions Materials Coarse mesh generation Partition Distribution Communication plan part 1 part 2 Refinement Calculation part n res. 1 res. 2 res. n · · · Merge Visualize · · ·

29th Nov 2012 / 36 Post-process Challenges to face: – Single node – Big files: tens or hundreds of GB – Merging: Lots of files – Batch post-processing – Maintain generality

29th Nov 2012 / 37 Big Files: results cache Uses a defined memory pool to store results. Used to cache results stored in files. Mesh information Created Results: cuts, extrusions, tcl Temporal results User definable Memory pool Results from files: single, multiple, merge

29th Nov 2012 / 38 Big Files: results cache Results cache table RC entry timestamp RC entry timestamp · · · · · · · RC entry timestamp Result RC info RC Info file 1offsettype file 2offsettype · · · file noffsettype memory footprint Open files table filehandletype filehandletype · · · filehandletype Result RC info Result RC info Granularity of result

29th Nov 2012 / 39 Big Files: results cache Verifies result’s file(s) and gets result’s position in file and memory footprint. Results of latest analysis step in memory. Loaded on demand. Oldest results unloaded if needed. Touch on use.

29th Nov 2012 / 40 Big Files: results cache Chinese harbour: 104 GB results file 7,6 Million tetrahedrons time steps 3,16 GB memory usage ( 2 GB results’ cache)

29th Nov 2012 / 41 Big Files: results cache Chinese harbour: 104 GB results file 7,6 Million tetrahedrons time steps 3,16 GB memory usage ( 2 GB results’ cache)

29th Nov 2012 / 42 Merging many partitions Before: 2, 4, partitions Now: 32, 64, 128,... of a single volume mesh Postpone any calculation: – Skin extraction – Finding boundary edges – Smoothed normals – Neighbour information – Graphical objects creation

29th Nov 2012 / 43 Merging many partitions Telescope example 23,870,544 tetrahedrons Before32 partitions24’ 10” After32 partitions4’ 34” 128 partitions10’ 43” Single file2’ 16”

29th Nov 2012 / 44 Merging many partitions

29th Nov 2012 / 45 Merging many partitions Racing car example 103,671,344 tetrahedrons Before96 partitions> 5 hours After96 partitions51’ 21” Single file13’ 25”

29th Nov 2012 / 46 Memory usage Around 12 GB of memory used with a spike of 15 GB ( MS Windows) 17,5 GB ( Linux), including: – Volume mesh ( 103 Mtetras) – Skin mesh ( 6 Mtriangs) – Several surface and cut meshes – Stream line search tree – 2 GB of results cache – Animations

29th Nov 2012 / 47 Pictures

29th Nov 2012 / 48 Pictures

29th Nov 2012 / 49 Pictures

29th Nov 2012 / 50 Batch post-processing: off-screen GiD with no interaction and no window Command line: gid -offscreen [ WxH] -b+g batch_file_to_run Useful to: – launch costly animations in bg or in queue – use gid as template generator – use gid behind a web server: Flash Video animation Animation window: added button to generate batch file for offscreen-gid to be sent to a batch queue.

29th Nov 2012 / 51 Animation

29th Nov 2012 / 52 Overview Introduction Preparation and Simulation – More Efficient Partitioning – Parallel Element Splitting Post Processing – Results Cache – Merging Many Partitions – Memory usage – Off-screen mode Conclusions, Future lines and Acknowledgements

29th Nov 2012 / 53 Conclusions The implemented improvements helped us to achieve the milestone: Prepare, mesh, calculate and visualize a CFD simulation with 103 Million tetrahedrons GiD: also modest machines take profit of these improvements

29th Nov 2012 / 54 Future lines Faster tree creation for stream lines. – Now: ~ 90 s. creation time, 2-3 s. per stream line Mesh simplification, LOD – geometry and results criteria – Surface meshes, iso-surfaces, cuts: faster drawing – Volume meshes: faster cuts, stream lines – Near real-time Parallelize other algorithms in GiD: – Skin and boundary edges extraction – Parallel cuts and stream lines creation

29th Nov 2012 / 55 Challenges 10 9 – tetrahedrons, 6·10 8 – 6·10 9 triangles Large workstation with Infiniband to cluster and 80 GB or 800 GB RAM? Hard disk? Post process as backend of a web server in cluster? Security issues? Post process embedded in solver? Output of both: the original mesh and a simplified one?

29th Nov 2012 / 56 Acknowledgements Ministerio de Ciencia e Innovación, E-DAMS project European Commission, Real-time project

29th Nov 2012 / 57 Comments, questions ?

Thanks for your attention Scalable System for Large Unstructured Mesh Simulation