ANSYS, Inc. Proprietary © 2004 ANSYS, Inc. Chapter 5 Distributed Memory Parallel Computing v9.0.

Slides:



Advertisements
Similar presentations
Steady-state heat conduction on triangulated planar domain May, 2002
Advertisements

Parallel Jacobi Algorithm Steven Dong Applied Mathematics.
Distributed Systems CS
Lecture 38: Chapter 7: Multiprocessors Today’s topic –Vector processors –GPUs –An example 1.
Computer Abstractions and Technology
SLA-Oriented Resource Provisioning for Cloud Computing
A Discrete Adjoint-Based Approach for Optimization Problems on 3D Unstructured Meshes Dimitri J. Mavriplis Department of Mechanical Engineering University.
Module 9 Bonded Contact.
Chapter 17 Design Analysis using Inventor Stress Analysis Module
OpenFOAM on a GPU-based Heterogeneous Cluster
SolidWorks Simulation. Dassault Systemes 3 – D and PLM software PLM - Product Lifecycle Management Building models on Computer Engineering Analysis and.
Thermo-fluid Analysis of Helium cooling solutions for the HCCB TBM Presented By: Manmeet Narula Alice Ying, Manmeet Narula, Ryan Hunt and M. Abdou ITER.
Steady Aeroelastic Computations to Predict the Flying Shape of Sails Sriram Antony Jameson Dept. of Aeronautics and Astronautics Stanford University First.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming with MPI and OpenMP Michael J. Quinn.
Science Advisory Committee Meeting - 20 September 3, 2010 Stanford University 1 04_Parallel Processing Parallel Processing Majid AlMeshari John W. Conklin.
PARALLEL PROCESSING The NAS Parallel Benchmarks Daniel Gross Chen Haiout.
CHE/ME 109 Heat Transfer in Electronics LECTURE 11 – ONE DIMENSIONAL NUMERICAL MODELS.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Monica Garika Chandana Guduru. METHODS TO SOLVE LINEAR SYSTEMS Direct methods Gaussian elimination method LU method for factorization Simplex method of.
High Performance Computing (HPC) at Center for Information Communication and Technology in UTM.
Thermal-Stress Analysis
1 Parallel Simulations of Underground Flow in Porous and Fractured Media H. Mustapha 1,2, A. Beaudoin 1, J. Erhel 1 and J.R. De Dreuzy IRISA – INRIA.
Chapter 5 Vibration Analysis
Modal Analysis Appendix Five. Training Manual General Preprocessing Procedure March 29, 2005 Inventory # A5-2 Basics of Free Vibration Analysis.
Lecture 29 Fall 2006 Lecture 29: Parallel Programming Overview.
An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.
Processing of a CAD/CAE Jobs in grid environment using Elmer Electronics Group, Physics Department, Faculty of Science, Ain Shams University, Mohamed Hussein.
1 A Domain Decomposition Analysis of a Nonlinear Magnetostatic Problem with 100 Million Degrees of Freedom H.KANAYAMA *, M.Ogino *, S.Sugimoto ** and J.Zhao.
1 Interconnects Shared address space and message passing computers can be constructed by connecting processors and memory unit using a variety of interconnection.
Training Manual Aug Element Technology Overview: In this chapter, we will present a brief overview to the following topics: A.
Taking the Complexity out of Cluster Computing Vendor Update HPC User Forum Arend Dittmer Director Product Management HPC April,
ParCFD Parallel computation of pollutant dispersion in industrial sites Julien Montagnier Marc Buffat David Guibert.
CFD Lab - Department of Engineering - University of Liverpool Ken Badcock & Mark Woodgate Department of Engineering University of Liverpool Liverpool L69.
Copyright © 2010 Altair Engineering, Inc. All rights reserved.Altair Proprietary and Confidential Information Section 18 Set Creation & Solver Interface.
Message Passing Computing 1 iCSC2015,Helvi Hartmann, FIAS Message Passing Computing Lecture 1 High Performance Computing Helvi Hartmann FIAS Inverted CERN.
April 26, CSE8380 Parallel and Distributed Processing Presentation Hong Yue Department of Computer Science & Engineering Southern Methodist University.
Distributed computing using Projective Geometry: Decoding of Error correcting codes Nachiket Gajare, Hrishikesh Sharma and Prof. Sachin Patkar IIT Bombay.
Parallel Processing Steve Terpe CS 147. Overview What is Parallel Processing What is Parallel Processing Parallel Processing in Nature Parallel Processing.
Chapter Five Vibration Analysis.
APPLIED MECHANICS Lecture 13 Slovak University of Technology
ANSYS for MEMS by Manjula1 FEM of MEMS on ANSYS MEMS Summer 2007 Why FEM for MEMS? Features in ANSYS Basic Procedures Examples.
CCS Overview Rene Salmon Center for Computational Science.
Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.
4. Heat Transfer. Training Manual Aug Heat Transfer Enhancements Radiosity method extended to 2-D Transient thermal analyses are much.
High Performance Computational Fluid-Thermal Sciences & Engineering Lab GenIDLEST Co-Design Virginia Tech AFOSR-BRI Workshop July 20-21, 2014 Keyur Joshi,
Parallel Solution of the Poisson Problem Using MPI
Shape Finder Appendix Thirteen. Training Manual Shape Finder August 26, 2005 Inventory # A13-2 Chapter Overview In this chapter, using the Shape.
October 2008 Integrated Predictive Simulation System for Earthquake and Tsunami Disaster CREST/Japan Science and Technology Agency (JST)
Thermal Analysis Appendix Six. Training Manual General Preprocessing Procedure March 29, 2005 Inventory # A6-2 Basics of Steady-State Heat Transfer.
CS- 492 : Distributed system & Parallel Processing Lecture 7: Sun: 15/5/1435 Foundations of designing parallel algorithms and shared memory models Lecturer/
Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Advanced User Support for MPCUGLES code at University of Minnesota October 09,
Fundamentals of Programming Languages-II
Innovation Intelligence ® Section 18 Set Creation & Solver Interface.
Parallel Computing Presented by Justin Reschke
COUPLED ANALYSES Chapter 7. Training Manual May 15, 2001 Inventory # Fluid-Structure Analysis “One Way” Analysis –Structural deformation effect.
Mode Superposition Module 7. Training Manual January 30, 2001 Inventory # Module 7 Mode Superposition A. Define mode superposition. B. Learn.
A Parallel Hierarchical Solver for the Poisson Equation Seung Lee Deparment of Mechanical Engineering
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
On the Path to Trinity - Experiences Bringing Codes to the Next Generation ASC Platform Courtenay T. Vaughan and Simon D. Hammond Sandia National Laboratories.
Multicore Applications in Physics and Biochemical Research Hristo Iliev Faculty of Physics Sofia University “St. Kliment Ohridski” 3 rd Balkan Conference.
Lecture 3. Performance Prof. Taeweon Suh Computer Science & Engineering Korea University COSE222, COMP212, CYDF210 Computer Architecture.
Coupled Field Analysis Chapter 12. Training Manual October 30, 2001 Inventory # In this chapter, we will briefly describe how to do a thermal-stress.
Training Manual Aug Loading and Solution Enhancements in the loading and solution area include: A.Function BC tool B.Direct Matrix.
Two-Dimensional Phase Unwrapping On FPGAs And GPUs
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
A robust preconditioner for the conjugate gradient method
CSE8380 Parallel and Distributed Processing Presentation
By Brandon, Ben, and Lee Parallel Computing.
Introduction to Scientific Computing II
Presentation transcript:

ANSYS, Inc. Proprietary © 2004 ANSYS, Inc. Chapter 5 Distributed Memory Parallel Computing v9.0

ANSYS, Inc. Proprietary © 2004 ANSYS, Inc. October 1, 2004 Inventory # New Features Why Distributed Memory Parallel Computing? 1.Do you want to process large (>2 MDOF) linear models faster? 2.Are you creating models larger than available hardware resources, especially 32-bit resources, can effectively solve? 3.Do you want to reduce the time to completion of long running nonlinear analyses? 4.Do you want to increase overall throughput by processing all jobs as fast as possible? 5.Would you like to change your CAE process to maximize throughput by minimizing engineering time?

ANSYS, Inc. Proprietary © 2004 ANSYS, Inc. October 1, 2004 Inventory # New Features Distributed ANSYS (D-ANSYS) –What is it? –Benefits! –Supported Features! –Timing Results! Parallel Performance for ANSYS –Available Solvers –Licensing Distributed Memory Requirements New Boundaries - What’s Possible with Parallel Computing? Questions Distributed Memory Parallel Computing Overview

ANSYS, Inc. Proprietary © 2004 ANSYS, Inc. October 1, 2004 Inventory # New Features Distributed ANSYS (D-ANSYS) What is it? All of the ANSYS solution routines executing in parallel in distributed memory –Uses MPI (Message Passing Interface) for communication middleware –Uses cable for communication for distributed memory hardware –Treats shared memory the same way as distributed memory when used on a shared memory system All phases of the solution process are performed in parallel –Element Formulation(Stiffness Matrix Generation) –Matrix Solution(Linear Equation Solving) –Stress Recovery(Results Calculations)

ANSYS, Inc. Proprietary © 2004 ANSYS, Inc. October 1, 2004 Inventory # New Features Distributed ANSYS (D-ANSYS) What is it? Treats shared memory and distributed memory in exactly the same way so: –You may run Distributed ANSYS on a single machine with multi-processors or –on multiple machines with one or more processors in each machine (Sustained 20 mb/sec cable) Hardware platforms supported at release: –Unix:HP-UX –Unix:SGI IRIX –Linux 32-bit: Intel IA-32 –Linux 64-bit: Itanium IA-64 –Linux 64-bit:AMD Opteron (Beta)

ANSYS, Inc. Proprietary © 2004 ANSYS, Inc. October 1, 2004 Inventory # New Features Distributed ANSYS (D-ANSYS) Benefits! Whole solution phase is in parallel! –Now all of the ANSYS /SOLUTION phase is in parallel which includes stiffness matrix generation, linear equation solving and results calculation. –Means that more of the analysis is performed in parallel –Means that less wall clock time is required to perform an analysis –It is scalable, between 2X to 8X speedup on 2 to 16 processors!

ANSYS, Inc. Proprietary © 2004 ANSYS, Inc. October 1, 2004 Inventory # New Features Support Solvers: –Distributed PCG solver (EQSLVE,DPCG) –Distributed JCG solver (EQSLVE,DJCG) –Distributed Sparse Solver (EQSLVE,DSPARSE) Factorization of the matrix and back/forward substitution is done in distributed parallel mode Best performance we have observed is 6X to 8X speedup on 12 to 16 processors –Existing Shared Memory Sparse Solver (EQSLVE,SPARSE) can be used The solver itself runs only on the master process (other parts run in distributed parallel) May be run in shared memory parallel mode on the master machine (/CONFIG,NPROC,N) Distributed ANSYS (D-ANSYS) Supported features?

ANSYS, Inc. Proprietary © 2004 ANSYS, Inc. October 1, 2004 Inventory # New Features Analysis Types Supported: –Structural Analyses Supported (Single Field) For any single field structural problems Any combination of ux, uy, uz, rotx, roty, rotz and warp DOF Linear Static Analysis Nonlinear Static Analysis Full transient analyses –Thermal Analyses Supported (Single Field) Temperature degree of freedom only Steady State Thermal Full Transient Thermal Distributed ANSYS (D-ANSYS) Supported features?

ANSYS, Inc. Proprietary © 2004 ANSYS, Inc. October 1, 2004 Inventory # New Features Structural Nonlinearities Supported: –Large strain, large deflection (NLGEOM,ON) for all structural elements –Nonlinear material properties specified by the TB command –Contact nonlinearities modeled by contact elements 169 through 178 and 52 –Gasket elements (192 – 195) –Pre-tension elements (179) –18X elements with U/P formulations and contact 169 to 178 are ONLY supported by shared memory sparse solver Distributed ANSYS (D-ANSYS) Supported features?

ANSYS, Inc. Proprietary © 2004 ANSYS, Inc. October 1, 2004 Inventory # New Features DPCG within Distributed ANSYS 44 MDOF Engine Block Stress Analysis Distributed ANSYS (D-ANSYS) Timing Results!

ANSYS, Inc. Proprietary © 2004 ANSYS, Inc. October 1, 2004 Inventory # New Features DPCG within Distributed ANSYS 7.1 MDOF Stress Analysis Distributed ANSYS (D-ANSYS) Timing Results!

ANSYS, Inc. Proprietary © 2004 ANSYS, Inc. October 1, 2004 Inventory # New Features DSPARSE within Distributed ANSYS 3.5 MDOF Stress Analysis Distributed ANSYS (D-ANSYS) Timing Results!

ANSYS, Inc. Proprietary © 2004 ANSYS, Inc. October 1, 2004 Inventory # New Features DSPARSE within Distributed ANSYS 0.8 MDOF Stress Analysis Distributed ANSYS (D-ANSYS) Timing Results!

ANSYS, Inc. Proprietary © 2004 ANSYS, Inc. October 1, 2004 Inventory # New Features DSPARSE within Distributed ANSYS 1.7 MDOF Nonlinear Stress Analysis, 9 Newton Iterations Distributed ANSYS (D-ANSYS) Timing Results!

ANSYS, Inc. Proprietary © 2004 ANSYS, Inc. October 1, 2004 Inventory # New Features Parallel Performance for ANSYS Available Solvers Distributed ANSYS Solvers: –All three phases distributed Distributed Preconditioned Conjugate Gradient (DPCG) Distributed Jacobi Conjugate Gradient (DJCG) Distributed Sparse (DSPARSE) –Element Formulation and Stress Recovery Distributed Sparse (SPARSE)

ANSYS, Inc. Proprietary © 2004 ANSYS, Inc. October 1, 2004 Inventory # New Features Parallel Performance for ANSYS Available Solvers ANSYS Solvers (Sequential ANSYS): –Only Matrix Solution Distributed Distributed Preconditioned Conjugate Gradient (DPCG) Distributed Jacobi Conjugate Gradient (DJCG) Distributed Domain Solver (DDS) –Only Matrix Solution Distributed on Shared Memory Algebraic Multi-Grid Solver (AMG)

ANSYS, Inc. Proprietary © 2004 ANSYS, Inc. October 1, 2004 Inventory # New Features Parallel Performance for ANSYS Licensing Computational Solvers Master/Parent Prep/Post Meshing Licensing per ANALYSIS: One ANSYS License per ANALYSIS One Parallel Performance for ANSYS CAD One ANSYS License One PPFA License CPU #1 CPU #2 CPU #3 CPU #4 CPU #1 CPU #2 CPU #3 CPU #4 CPU #1 CPU #2 CPU #3 CPU #4 CPU #1 CPU #2 CPU #3 CPU #4 Machine #1Machine #2Machine #3Machine #4 NOT per CPU!!!!

ANSYS, Inc. Proprietary © 2004 ANSYS, Inc. October 1, 2004 Inventory # New Features Parallel Performance for ANSYS Distributed Memory Requirements How much memory is required? –Machine #1 must contain its workload and the entire preconditioner. –Machines 2 through N need to contain their workload only. –Rules of thumb for PCG/DPCG 1GB of Memory/1 Million DOF 100 Mb / 1 Million DOF for the Preconditioner

ANSYS, Inc. Proprietary © 2004 ANSYS, Inc. October 1, 2004 Inventory # New Features For a given amount of memory, how big a problem can be solved? –For the Machine #1: –For Machines 2 through N: Where MSAVE_Factor =1.0 for MSAVE,OFF 0.7 for SOLID95 and MSAVE,ON 0.5 for SOLID92 and MSAVE,ON Parallel Performance for ANSYS Distributed Memory Requirements

ANSYS, Inc. Proprietary © 2004 ANSYS, Inc. October 1, 2004 Inventory # New Features For a given problem, how much memory is needed? –For the Machine #1: –For Machines 2 through N: Where MSAVE_Factor =1.0 for MSAVE,OFF 0.7 for SOLID95 and MSAVE,ON 0.5 for SOLID92 and MSAVE,ON Parallel Performance for ANSYS Distributed Memory Requirements

ANSYS, Inc. Proprietary © 2004 ANSYS, Inc. October 1, 2004 Inventory # New Features How big a problem can be solved? –Example 1: 32 bit PC’s with 1 Gb RAM Each including Machine #1 8 Machines in Total MSAVE,OFF For Machine #1: Parallel Performance for ANSYS Distributed Memory Requirements

ANSYS, Inc. Proprietary © 2004 ANSYS, Inc. October 1, 2004 Inventory # New Features How big a problem can be solved? –Example 2: 32 bit PC’s with 2.2 Gb RAM Available on Each (with the /3 Gb switch) including the Machine #1 4 Machines in Total MSAVE,ON Model consisting wholly of Solid92 elements For Machine #1: Parallel Performance for ANSYS Distributed Memory Requirements

ANSYS, Inc. Proprietary © 2004 ANSYS, Inc. October 1, 2004 Inventory # New Features Wing Test Case Model –37,072,698 Nodes –8,744,744 SOLID95 Elements –111,218,094 Degrees of Freedom!!!!!! –14,481 DOF Constraints –1 Load Case System Resources –Used 6-CPUs out of 8 CPUs on an SGI Altix System –Used 50 Gbytes of 64 Gbytes available –Linux 64-bit Operating System What’s Possible with Parallel Computing New Boundaries – 111 MDOF

ANSYS, Inc. Proprietary © 2004 ANSYS, Inc. October 1, 2004 Inventory # New Features Number of Hosts/Processors: 6 Degrees of Freedom: Elements: Assembled: 0 Implicit: DETAILS OF PCG SOLVER SOLUTION TIME(secs) Cpu Wall Element Matrix Assembly Preconditioner Construction Preconditioner Factoring Preconditioned CG Iterations Multiply With A Solve With Precond ************************************************************ TOTAL PCG SOLVER SOLUTION CP TIME = secs TOTAL PCG SOLVER SOLUTION ELAPSED TIME = secs DPCG solves 111 Million DOFs! 6-CPU run on 8 CPU SGI Altix System 8.6 Hours Solver Time 6 Distributed MPI processors Using MSAVE,ON memory saving option What’s Possible with Parallel Computing New Boundaries – 111 MDOF

ANSYS, Inc. Proprietary © 2004 ANSYS, Inc. October 1, 2004 Inventory # New Features The Enablers: –Size is conquered by addressing Big Memory –Time is conquered by Parallel Computing The Measure of Success: –Demonstrated solving a variety of large problems in hours, NOT days! –Solved 111 MDOF Structural problem in 8.6 Hours solver time What’s Possible with Parallel Computing New Boundaries – 111 MDOF

ANSYS, Inc. Proprietary © 2004 ANSYS, Inc. October 1, 2004 Inventory # New Features CAE Breakthrough: A Customer’s Perspective “ANSYS' ability to solve models this large opens the door to an entirely new simulation paradigm. … Now, it will be possible to simulate a detailed, complete model directly; potentially shortening design time from months to weeks. … This may greatly reduce additional design costs and can provide an even shorter time to market." Jin Qian Senior Analyst Deere & Company Technical Center What’s Possible with Parallel Computing New Boundaries – 111 MDOF

ANSYS, Inc. Proprietary © 2004 ANSYS, Inc. October 1, 2004 Inventory # New Features Inventor Test Case Model –3,500,550 Nodes –2,405,636 Elements –18,993 DOF Constraints –10.5 Million Degrees of Freedom!!!!!! System Resources –11 Xeon CPUs on a Linux Networx Cluster –Machine #1 used 2.4 Gb of Memory (/3 Gb Switch) –Total Memory 9.0 Gb –Linux 32-bit Operating System What’s Possible with Parallel Computing New Boundaries – 32-bit, 10.5 MDOF

ANSYS, Inc. Proprietary © 2004 ANSYS, Inc. October 1, 2004 Inventory # New Features Number of Hosts/Processors: 11 Degrees of Freedom: Elements: Assembled: Implicit: Nodes: DETAILS OF PCG SOLVER SOLUTION TIME(secs) Cpu Wall Element Matrix Assembly Preconditioner Construction Preconditioned CG Iterations Multiply With A Solve With Precond ****************************************************************************** TOTAL PCG SOLVER SOLUTION CP TIME = secs TOTAL PCG SOLVER SOLUTION ELAPSED TIME = secs Distributed ANSYS & DPCG solves 10.5 MDOFs! 11-CPU run on an Linux Networx Cluster with Linux 32-bit 1.05 Hours Solver Time???? 11 Distributed MPI processors Using MSAVE,ON memory saving option What’s Possible with Parallel Computing New Boundaries – 32-bit, 10.5 MDOF