Impact of the Cardiac Heart Flow Alpha Project Kathy Yelick EECS Department U.C. Berkeley.

Slides:



Advertisements
Similar presentations
© Chinese University, CSE Dept. Software Engineering / Software Engineering Topic 1: Software Engineering: A Preview Your Name: ____________________.
Advertisements

1 Advancing Supercomputer Performance Through Interconnection Topology Synthesis Yi Zhu, Michael Taylor, Scott B. Baden and Chung-Kuan Cheng Department.
Parallelizing stencil computations Based on slides from David Culler, Jim Demmel, Bob Lucas, Horst Simon, Kathy Yelick, et al., UCB CS267.
Reference: Message Passing Fundamentals.
ECE669 L4: Parallel Applications February 10, 2004 ECE 669 Parallel Computer Architecture Lecture 4 Parallel Applications.
Languages and Compilers for High Performance Computing Kathy Yelick EECS Department U.C. Berkeley.
1 Synthesis of Distributed ArraysAmir Kamil Synthesis of Distributed Arrays in Titanium Amir Kamil U.C. Berkeley May 9, 2006.
Towards a Digital Human Kathy Yelick EECS Department U.C. Berkeley.
The Poisson operator, aka the Laplacian, is a second order elliptic differential operator and defined in an n-dimensional Cartesian space by: The Poisson.
MA5233: Computational Mathematics
Support for Adaptive Computations Applied to Simulation of Fluids in Biological Systems Kathy Yelick U.C. Berkeley.
ECE669 L5: Grid Computations February 12, 2004 ECE 669 Parallel Computer Architecture Lecture 5 Grid Computations.
Steady Aeroelastic Computations to Predict the Flying Shape of Sails Sriram Antony Jameson Dept. of Aeronautics and Astronautics Stanford University First.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Programming Systems for a Digital Human Kathy Yelick EECS Department U.C. Berkeley.
Support for Adaptive Computations Applied to Simulation of Fluids in Biological Systems Immersed Boundary Method Simulation in Titanium Siu Man Yau, Katherine.
Support for Adaptive Computations Applied to Simulation of Fluids in Biological Systems Kathy Yelick U.C. Berkeley.
Support for Adaptive Computations Applied to Simulation of Fluids in Biological Systems Kathy Yelick U.C. Berkeley.
Use of a High Level Language in High Performance Biomechanics Simulations Katherine Yelick, Armando Solar-Lezama, Jimmy Su, Dan Bonachea, Amir Kamil U.C.
UPC at CRD/LBNL Kathy Yelick Dan Bonachea, Jason Duell, Paul Hargrove, Parry Husbands, Costin Iancu, Mike Welcome, Christian Bell.
Support for Adaptive Computations Applied to Simulation of Fluids in Biological Systems Immersed Boundary Method Simulation in Titanium.
Yelick 1 ILP98, Titanium Titanium: A High Performance Java- Based Language Katherine Yelick Alex Aiken, Phillip Colella, David Gay, Susan Graham, Paul.
Kathy Yelick, 1 Advanced Software for Biological Simulations Elastic structures in an incompressible fluid. Blood flow, clotting, inner ear, embryo growth,
Numerical methods for PDEs PDEs are mathematical models for –Physical Phenomena Heat transfer Wave motion.
Global Address Space Applications Kathy Yelick NERSC/LBNL and U.C. Berkeley.
© Fujitsu Laboratories of Europe 2009 HPC and Chaste: Towards Real-Time Simulation 24 March
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Scientific Computing Topics for Final Projects Dr. Guy Tel-Zur Version 2,
Center for Programming Models for Scalable Parallel Computing: Project Meeting Report Libraries, Languages, and Execution Models for Terascale Applications.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.
Programming Models & Runtime Systems Breakout Report MICS PI Meeting, June 27, 2002.
Supercomputing Cross-Platform Performance Prediction Using Partial Execution Leo T. Yang Xiaosong Ma* Frank Mueller Department of Computer Science.
Assessing the influence on processes when evolving the software architecture By Larsson S, Wall A, Wallin P Parul Patel.
Supercomputing ‘99 Parallelization of a Dynamic Unstructured Application using Three Leading Paradigms Leonid Oliker NERSC Lawrence Berkeley National Laboratory.
NIH Resource for Biomolecular Modeling and Bioinformatics Beckman Institute, UIUC NAMD Development Goals L.V. (Sanjay) Kale Professor.
CFD Refinement By: Brian Cowley. Overview 1.Background on CFD 2.How it works 3.CFD research group on campus for which problem exists o Our current techniques.
1 1 What does Performance Across the Software Stack mean?  High level view: Providing performance for physics simulations meaningful to applications 
Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
© 2009 IBM Corporation Parallel Programming with X10/APGAS IBM UPC and X10 teams  Through languages –Asynchronous Co-Array Fortran –extension of CAF with.
1 CSCD 326 Data Structures I Software Design. 2 The Software Life Cycle 1. Specification 2. Design 3. Risk Analysis 4. Verification 5. Coding 6. Testing.
Parallel Solution of the Poisson Problem Using MPI
1 Qualifying ExamWei Chen Unified Parallel C (UPC) and the Berkeley UPC Compiler Wei Chen the Berkeley UPC Group 3/11/07.
I/O for Structured-Grid AMR Phil Colella Lawrence Berkeley National Laboratory Coordinating PI, APDEC CET.
CCA Common Component Architecture CCA Forum Tutorial Working Group CCA Status and Plans.
Gtb 1 Titanium Titanium: Language and Compiler Support for Scientific Computing Gregory T. Balls University of California - Berkeley Alex Aiken, Dan Bonachea,
Domain Decomposition in High-Level Parallelizaton of PDE codes Xing Cai University of Oslo.
CS 471 Final Project 2d Advection/Wave Equation Using Fourier Methods December 10, 2003 Jose L. Rodriguez
An Evaluation of Partitioners for Parallel SAMR Applications Sumir Chandra & Manish Parashar ECE Dept., Rutgers University Submitted to: Euro-Par 2001.
Cardiac Blood Flow Alpha Project Impact Kathy Yelick EECS Department U.C. Berkeley.
TR&D 2: NUMERICAL TOOLS FOR MODELING IN CELL BIOLOGY Software development: Jim Schaff Fei Gao Frank Morgan Math & Physics: Boris Slepchenko Diana Resasco.
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Advanced User Support for MPCUGLES code at University of Minnesota October 09,
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
1 Rocket Science using Charm++ at CSAR Orion Sky Lawlor 2003/10/21.
C OMPUTATIONAL R ESEARCH D IVISION 1 Defining Software Requirements for Scientific Computing Phillip Colella Applied Numerical Algorithms Group Lawrence.
1 HPJAVA I.K.UJJWAL 07M11A1217 Dept. of Information Technology B.S.I.T.
Parallel Computing Presented by Justin Reschke
Towards a Digital Human Kathy Yelick EECS Department U.C. Berkeley.
1 Titanium Review: Immersed Boundary Armando Solar-Lezama Biological Simulations Using the Immersed Boundary Method in Titanium Ed Givelberg, Armando Solar-Lezama,
First INFN International School on Architectures, tools and methodologies for developing efficient large scale scientific computing applications Ce.U.B.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
Sub-fields of computer science. Sub-fields of computer science.
Xing Cai University of Oslo
Programming Models for SimMillennium
GENERAL VIEW OF KRATOS MULTIPHYSICS
Immersed Boundary Method Simulation in Titanium Objectives
Ph.D. Thesis Numerical Solution of PDEs and Their Object-oriented Parallel Implementations Xing Cai October 26, 1998.
Parallel Programming in C with MPI and OpenMP
Presentation transcript:

Impact of the Cardiac Heart Flow Alpha Project Kathy Yelick EECS Department U.C. Berkeley

Outline Vision of a Digital Human Applications of the IBM –The Heart Model –The Cochlea Model –Others Overview of the Immersed Boundary Method The Alpha project –Solvers –Automatic tuning (FFT vs. MG) –Heart model Short term future directions

Simulation: The Third Pillar of Science Traditional scientific and engineering paradigm: 1)Do theory or paper design. 2)Perform experiments or build system. Limitations: –Too difficult -- build large wind tunnels. –Too expensive -- build a throw-away passenger jet. –Too slow -- wait for climate or galactic evolution. –Too dangerous -- drug design. Computational science paradigm: 3)Use high performance computer systems to simulate the phenomenon.

Economics of Large Scale Simulation Automotive design: –Crash and aerodynamics simulation (500+ CPUs). –Savings: approx. $1 billion per company per year. Semiconductor industry: –Device simulation and logic validation (500+ CPUs). –Savings: approx. $1 billion per company per year. Airlines: –Logistics optimization on parallel system. –Savings: approx. $100 million per airline per year. Securities industry: –Home mortgage investment and risk analysis. –Savings: approx. $15 billion per year. What about health care, which is 20% of GNP? Source: David Bailey, LBNL

From Visible Human to Digital Human Source: John Sullivan et al, WPI Source: Building 3D Models from images

Heart Simulation Calculation Developed by Peskin and McQueen at NYU –Done on a Cray C90: 1 heart-beat in 100 hours –Used for evaluating artificial heart valves –Scalable parallel version done here Implemented in a high performance Java dialect –Model also used for: Inner ear Blood clotting Embryo growth Insect flight Paper making

Simulation of a Heart

Simulation and Medicine Imagine a “digital body double” –3D image-based medical record –Includes diagnostic, pathologic, and other information Used for: –Diagnosis –Less invasive surgery-by-robot –Experimental treatments Where are we today?

Digital Human Roadmap organ 1 model scalable implementations 1 organ multiple models multiple organs 3D model construction better algorithms organ system coupled models 100x effective performance improved programmability

Last Year

Project Summary Provide easy-to-use, high performance tool for simulation of fluid flow in biological systems. –Using the Immersed Boundary Method Enable simulations on large-scale parallel machines. –Distributed memory machine including SMP clusters Using Titanium, ADR, and KeLP with AMR Specific demonstration problem: Simulation of the heart model on Blue Horizon.

Outline Short term goals and plans Technical status of project –Immersed Boundary Method –Software Tools –Solvers Next Steps

Short Term Goals for October 2001 IB Method written in Titanium (IBT) IBT Simulation on distributed memory Heart model input and visualization support in IBT Titanium running on Blue Horizon IBT users on BH and other SPs ?Performance tuning of code to exceed T90 performance ?Replace solver with (adaptive) multigrid

IB Method Users Peskin and McQueen at NYU –Heart model, including valve design At Washington –Insect flight Fauchy et al at Tulane –Small animal swimming Peter Kramer at RPI –Brownian motion in the IBM John Stocky at Simon Fraser –Paper making Others –parachutes, flags, flagellates, robot insects

Building a User Community Many users of the IB Method Lots of concern over lack of distributed memory implementation Once IBT is more robust and efficient (May ’01), advertise to users Identify 1 or 2 early adopters Longer term: workshop or short course

Long Term Software Release Model Titanium –Working with UPC and possibly others on common runtime layer –Compiler is relatively stable but needs ongoing support IB Method –Release Titanium source code –Parameterized “black box” for IB Method with possible cross-language support Visualization software is tied to SGI

Immersed Boundary Method Developed at NYU by Peskin & McQueen to model biological systems where elastic fibers are immersed in an incompressible fluid. –Fibers (e.g., heart muscles) modeled by list of fiber points –Fluid space modeled by a regular lattice

Immersed Boundary Method Structure Fiber activation & force calculation Interpolate Velocity Navier-Stokes Solver Spread Force 4 steps in each timestep Fiber Points Interaction Fluid Lattice

Challenges to Parallelization Irregular fiber lists need to interact with regular fluid lattice. –Trade-off between load balancing of fibers and minimizing communication Efficient “scatter-gather” across processors Need a scalable elliptic solver –Plan to uses multigrid –Eventually add Adaptive Mesh Refinement New algorithms under development by Colella’s group

Tools used for Implementation Titanium supports –Classes, linked data structures, overloading –Distributed data structures (global address space) –Useful for planned adaptive hierarchical structures ADR provides –Help with analysis and organization of output –Especially for hierarchical data KeLP provides –Alternative programming model for solvers ADR and KeLP are not critical for first-year

Titanium Status Titanium runs on uniprocessors, SMPs, and distributed memory with a single programming model It has run on Blue Horizon –Issues related to communication balance –Revamped backends are more organized, but BH backend not working right now Need to replace personnel

Solver Status Current solver is based on 3D FFT Multigrid might be more scalable Multigrid with adaptive meshes might be more so Balls and Colella algorithm could also be used KeLP implementations of solvers included Note: McQueen is looking into solver issues for numerical reasons unrelated to scaling Not critical for first year goals

IB Titanium Status IB (Generic) rewritten in Titanium. Running since October Contractile torus –runs on Berkeley NOW and SGI Origin Needed for heart: –Input file format –Performance tuning Uniprocessor (C code used temporarily in 2 kernels) Communication

Immersed Boundary on Titanium Performance Breakdown (torus simulation):

Immersed Boundary on Titanium

Next Steps Improve performance of IBT Generate heart input for IBT Recover Titanium on BH Identify early user(s) of IBT Improve NS solver Add functionality –Bending angles, anchorage points, source & sinks) to the software package.

Yelick(UCB), Peskin (NYU), Colella (LBNL), Baden (UCSD), Saltz (Maryland) Adaptive Computations for Fluids in Biological Systems Immersed Boundary Method Applications Human Heart (NYU) Embryo Growth (UCB) Blood Clotting (Utah) Robot Insect Flight (NYU) Pulp Fibers (Waterloo) Generic Immersed Boundary Method (Titanium) Heart (Titanium) Insect Wings Flagellate Swimming … Spectral (Titanium) Multigrid (KeLP) AMR Application Models Extensible Simulation Solvers

General Questions - How has your project addressed the goals of the PACI program (providing access to tradition HPC, providing early access to experimental systems, fostering interdisciplinary research, contributing to intellectual development, broadening the base)? - What infrastructure products (e.g., software, algorithms, etc.) have you produced? - Where have you deployed them (on NPACI systems, other systems)? - What have you done to communicate the availability of this infrastructure? - What training have you done? - What kind/size of community is using your infrastructure? - How have you integrated your work with EOT activities? - What scientific accomplishments - or other measurable impacts not covered by answers to previous questions - have resulted from its use? - What are the emerging trends/technologies that NPACI should build on/leverage? - How can we increase the impact of NPACI development to date? - How can we increase the community that uses the infrastructure you've developed?

Greg’s Slides

Scallop A latency tolerant elliptical solver library Implemented in KeLP, with a simple interface Still under development

Elliptical solvers A finite-difference based solvers –Good for regular, block-structured domains Method of Local Corrections –Local solutions corrected by a coarse solution –Good accuracy, well-conditioned solutions Limited communication –Once to generate coarse grid values –Once to correct local solutions

KeLP implementation Advantages –abstractions available in C++ –built in domain calculus –communication management –numerical kernels written in Fortran Simple interface –callable from other languages –no KeLP required in user code

A Finite Difference Domain Decomposition Method Using Local Corrections for the Poisson Equation Greg Balls University of California, Berkeley

The Poisson Equation We are interested in the solution to A particular solution to this equation is

Infinite Domain Boundary Conditions We can write our infinite domain boundary condition as These boundary conditions specify a unique solution.

The Discretized Problem We would like an approximate solution

Solving the Discretized Problem We could calculate the convolution integral at each point Multigrid provides a faster method

A Standard Finite Difference Discretization With a discretization of the Laplacian, e.g. We solve the discretized equation

A Finite Difference Approach for the Infinite Domain Problem A discrete solution can be found in three steps: 1.Solve a multigrid problem with homogeneous Dirichlet boundary conditions. 2.Do a potential calculation to set accurate inhomogeneous Dirichlet boundary conditions. 3.Solve a second multigrid problem with these boundary conditions.

A Finite Difference Approach for the Infinite Domain Problem The first multigrid solution:

A Finite Difference Approach for the Infinite Domain Problem The potential calculation:

A Finite Difference Approach for the Infinite Domain Problem The second multigrid solution:

Domain Decomposition We would like to solve this problem in parallel, calculating  h such that A basic domain decomposition strategy: Do until converged - Break into pieces. Solve on each piece. Compute coupling.

Domain Decomposition Options Point relaxation –Too much communication and too much computation. Multigrid –Less computation, but still too much communication. Finite element domain decomposition –Less communication, but still iterative.

The Importance of Communication Current parallel machines can do many floating point operations in the time that it takes to send a message. This imbalance will get worse.

Fast Particle Methods Methods such as FMM and MLC need no iteration. They take advantage of the fact that the local and far-field solutions are only weakly coupled.

A Method of Local Corrections for Finite Difference Calculations The basic strategy: –Break into pieces. –Solve on each piece. –Compute coupling through a single coarse solution. –Compute the corrected solution on each piece.

The Initial Solution An infinite domain solution is found on each piece, l The effects of all other pieces are ignored.

A Coarse Grid Charge A coarse grid charge is computed for each piece.

The Global Coarse Solution All the individual coarse grid charges are combined on a global coarse grid. A global coarse solution is found.

Setting Accurate Boundary Conditions The interpolation stencil only interpolates far-field information.

Setting Accurate Boundary Conditions The coarse stencil information is interpolated to a corresponding fine grid stencil to O(H 4 ). Local information is added from nearby fine grids.

The Corrected Solution Once the boundary conditions have been set for each piece, we solve one last time with multigrid: The full solution is then

How Accuracy Is Maintained Local error is only O(h 2 ). Error in the global coarse solution is O(H 4 ). The coarse solution is accurate to O(H 4 ) because of the error of the L 9 discretization.

Scaling for Accuracy and Performance We can scale the coarse and fine grids as The coarse grid solve represents much less work than the work done on the fine grids.

The Titanium Programming Language Titanium is a new language designed for scientific computing on parallel architectures. –SPMD parallelism. –A dialect of Java, compiled to native code. –Language support for scientific computing.

The Benefits of Titanium An object-oriented language with built- in support for fast, multi-dimensional arrays. Language support for –Tuples (Points). –Rectangular regions (RectDomains). –Expressing array bounds as RectDomains and indexing arrays by Points. A global address space

Accuracy of the Method Grid SizeNpNp NrNr Max ErrorL 2 ErrorMax Convergenc e L 2 Convergenc e e-82.18e e-82.13e e-82.02e e-81.77e e-85.32e e-85.26e e-85.05e e-84.12e

Error on a Large, High-Wavenumber Problem

Scalability of the Method Results from the SDSC IBM SP-2:

Scalability of the Method Results from the NERSC Cray T3E:

Future Work Extension to three dimensions. Implementation of different boundary conditions. Use in other solvers such as: –Euler. –Navier-Stokes.

Conclusions The method is second-order accurate. The method does not iterate between the local fine representations and the global coarse grid. The need for communication is kept to a minimum. The method is scalable.

Comparison to the Serial Method Extra computational costs: –The time spent on the coarse grid solution can be kept to less than 10% of time spent on the local fine grids. –The final multigrid solution adds 40% more fine grid work. Communication costs: –Experimentally, less than 1% of the total time