Support for Adaptive Computations Applied to Simulation of Fluids in Biological Systems Immersed Boundary Method Simulation in Titanium.

Slides:



Advertisements
Similar presentations
CSE431 Chapter 7A.1Irwin, PSU, 2008 CSE 431 Computer Architecture Fall 2008 Chapter 7A: Intro to Multiprocessor Systems Mary Jane Irwin (
Advertisements

IBM’s X10 Presentation by Isaac Dooley CS498LVK Spring 2006.
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Potential Languages of the Future Chapel,
Parallelizing stencil computations Based on slides from David Culler, Jim Demmel, Bob Lucas, Horst Simon, Kathy Yelick, et al., UCB CS267.
A High-Performance Java Dialect Kathy Yelick, Luigi Semenzato, Geoff Pike, Carleton Miyamoto, Ben Liblit, Arvind Krishnamurthy, Paul Hilfinger, Susan Graham,
Introduction CS 524 – High-Performance Computing.
Languages and Compilers for High Performance Computing Kathy Yelick EECS Department U.C. Berkeley.
1 Synthesis of Distributed ArraysAmir Kamil Synthesis of Distributed Arrays in Titanium Amir Kamil U.C. Berkeley May 9, 2006.
DISTRIBUTED AND HIGH-PERFORMANCE COMPUTING CHAPTER 7: SHARED MEMORY PARALLEL PROGRAMMING.
Support for Adaptive Computations Applied to Simulation of Fluids in Biological Systems Kathy Yelick U.C. Berkeley.
Evaluation and Optimization of a Titanium Adaptive Mesh Refinement Amir Kamil Ben Schwarz Jimmy Su.
Simulating the Cochlea With Titanium Generic Immersed Boundary Software (TiGIBS) Contractile Torus: (NYU) Oval Window of Cochlea: (CACR) Mammalian Heart:
Applications for K42 Initial Brainstorming Paul Hargrove and Kathy Yelick with input from Lenny Oliker, Parry Husbands and Mike Welcome.
Parallel Mesh Refinement with Optimal Load Balancing Jean-Francois Remacle, Joseph E. Flaherty and Mark. S. Shephard Scientific Computation Research Center.
Java for High Performance Computing Jordi Garcia Almiñana 14 de Octubre de 1998 de la era post-internet.
CUDA Programming Lei Zhou, Yafeng Yin, Yanzhi Ren, Hong Man, Yingying Chen.
Adaptive MPI Chao Huang, Orion Lawlor, L. V. Kalé Parallel Programming Lab Department of Computer Science University of Illinois at Urbana-Champaign.
Impact of the Cardiac Heart Flow Alpha Project Kathy Yelick EECS Department U.C. Berkeley.
Programming Systems for a Digital Human Kathy Yelick EECS Department U.C. Berkeley.
Efficient Parallelization for AMR MHD Multiphysics Calculations Implementation in AstroBEAR.
1 Java Grande Introduction  Grande Application: a GA is any application, scientific or industrial, that requires a large number of computing resources(CPUs,
Support for Adaptive Computations Applied to Simulation of Fluids in Biological Systems Immersed Boundary Method Simulation in Titanium Siu Man Yau, Katherine.
Support for Adaptive Computations Applied to Simulation of Fluids in Biological Systems Kathy Yelick U.C. Berkeley.
Support for Adaptive Computations Applied to Simulation of Fluids in Biological Systems Kathy Yelick U.C. Berkeley.
UPC and Titanium Open-source compilers and tools for scalable global address space computing Kathy Yelick University of California, Berkeley and Lawrence.
Use of a High Level Language in High Performance Biomechanics Simulations Katherine Yelick, Armando Solar-Lezama, Jimmy Su, Dan Bonachea, Amir Kamil U.C.
UPC at CRD/LBNL Kathy Yelick Dan Bonachea, Jason Duell, Paul Hargrove, Parry Husbands, Costin Iancu, Mike Welcome, Christian Bell.
Yelick 1 ILP98, Titanium Titanium: A High Performance Java- Based Language Katherine Yelick Alex Aiken, Phillip Colella, David Gay, Susan Graham, Paul.
Kathy Yelick, 1 Advanced Software for Biological Simulations Elastic structures in an incompressible fluid. Blood flow, clotting, inner ear, embryo growth,
Lecture 2 – MapReduce CPE 458 – Parallel Programming, Spring 2009 Except as otherwise noted, the content of this presentation is licensed under the Creative.
OOP Languages: Java vs C++
SEC(R) 2008 Intel® Concurrent Collections for C++ - a model for parallel programming Nikolay Kurtov Software and Services.
Lecture 29 Fall 2006 Lecture 29: Parallel Programming Overview.
Global Address Space Applications Kathy Yelick NERSC/LBNL and U.C. Berkeley.
1 Titanium Review: Domain Library Imran Haque Domain Library Imran Haque U.C. Berkeley September 9, 2004.
Center for Programming Models for Scalable Parallel Computing: Project Meeting Report Libraries, Languages, and Execution Models for Terascale Applications.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Parallel Programming in Split-C David E. Culler et al. (UC-Berkeley) Presented by Dan Sorin 1/20/06.
Supercomputing ‘99 Parallelization of a Dynamic Unstructured Application using Three Leading Paradigms Leonid Oliker NERSC Lawrence Berkeley National Laboratory.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
NIH Resource for Biomolecular Modeling and Bioinformatics Beckman Institute, UIUC NAMD Development Goals L.V. (Sanjay) Kale Professor.
Spring 2003CSE P5481 Issues in Multiprocessors Which programming model for interprocessor communication shared memory regular loads & stores message passing.
10/02/2012CS4230 CS4230 Parallel Programming Lecture 11: Breaking Dependences and Task Parallel Algorithms Mary Hall October 2,
October 11, 2007 © 2007 IBM Corporation Multidimensional Blocking in UPC Christopher Barton, Călin Caşcaval, George Almási, Rahul Garg, José Nelson Amaral,
A Multi-platform Co-array Fortran Compiler for High-Performance Computing John Mellor-Crummey, Yuri Dotsenko, Cristian Coarfa {johnmc, dotsenko,
1 Qualifying ExamWei Chen Unified Parallel C (UPC) and the Berkeley UPC Compiler Wei Chen the Berkeley UPC Group 3/11/07.
I/O for Structured-Grid AMR Phil Colella Lawrence Berkeley National Laboratory Coordinating PI, APDEC CET.
Gtb 1 Titanium Titanium: Language and Compiler Support for Scientific Computing Gregory T. Balls University of California - Berkeley Alex Aiken, Dan Bonachea,
Connections to Other Packages The Cactus Team Albert Einstein Institute
Unified Parallel C Kathy Yelick EECS, U.C. Berkeley and NERSC/LBNL NERSC Team: Dan Bonachea, Jason Duell, Paul Hargrove, Parry Husbands, Costin Iancu,
The Instruction Set Architecture. Hardware – Software boundary Java Program C Program Ada Program Compiler Instruction Set Architecture Microcode Hardware.
Parallelization Strategies Laxmikant Kale. Overview OpenMP Strategies Need for adaptive strategies –Object migration based dynamic load balancing –Minimal.
How to execute Program structure Variables name, keywords, binding, scope, lifetime Data types – type system – primitives, strings, arrays, hashes – pointers/references.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
1 Rocket Science using Charm++ at CSAR Orion Sky Lawlor 2003/10/21.
C OMPUTATIONAL R ESEARCH D IVISION 1 Defining Software Requirements for Scientific Computing Phillip Colella Applied Numerical Algorithms Group Lawrence.
1 HPJAVA I.K.UJJWAL 07M11A1217 Dept. of Information Technology B.S.I.T.
Kathy Yelick, Computer Science Division, EECS, University of California, Berkeley Titanium Titanium: A High Performance Language Based on Java Kathy Yelick.
Anders Nielsen Technical University of Denmark, DTU-Aqua Mark Maunder Inter-American Tropical Tuna Commission An Introduction.
CDA-5155 Computer Architecture Principles Fall 2000 Multiprocessor Architectures.
1 Titanium Review: Immersed Boundary Armando Solar-Lezama Biological Simulations Using the Immersed Boundary Method in Titanium Ed Givelberg, Armando Solar-Lezama,
1 Titanium Review: Language and Compiler Amir Kamil Titanium Language and Compiler Changes Amir Kamil U.C. Berkeley September.
First INFN International School on Architectures, tools and methodologies for developing efficient large scale scientific computing applications Ce.U.B.
Language and Compiler Support for Adaptive Mesh Refinement
Parallel Programming By J. H. Wang May 2, 2017.
Threads Cannot Be Implemented As a Library
Programming Models for SimMillennium
Implementing Simplified Molecular Dynamics Simulation in Different Parallel Paradigms Chao Mei April 27th, 2006 CS498LVK.
UPC and Titanium Kathy Yelick University of California, Berkeley and
Immersed Boundary Method Simulation in Titanium Objectives
Presentation transcript:

Support for Adaptive Computations Applied to Simulation of Fluids in Biological Systems Immersed Boundary Method Simulation in Titanium

Objectives Provide easy-to-use, high-performance tool for simulation of fluid flow in biological systems. Demonstrate the Titanium compiler and language. Allow heart simulation on large-scale parallel machines.

Outline Immersed Boundary Method Titanium Immersed Boundary Method in Titanium

Immersed Boundary Method Developed at New York University by Peskin & McQueen to model biological systems where elastic fibers are immersed in an incompressible fluid. –Mammalian heart, blood platelets, sea urchin embryos Fibers (e.g., heart muscles) modeled by list of fiber points Fluid space modeled by a regular lattice

Immersed Boundary Method Structure Fiber activation & force calculation Interpolate Velocity Navier-Stokes Solver Spread Force 4 steps in each timestep Fiber Points Interaction Fluid Lattice

IB Method: Steps 1 and 2 Fiber Activation & Force calculation –Application-specific –For the heart, use an elastic spring law Spread Force –Spread forces from fiber points list to fluid lattice via the Dirac Delta function.

IB Method: Steps 3 and 4 Navier-Stokes Solver –Calculate fluid velocities –Uses a 3D FFT Interpolate velocity: –Gather velocities of fiber points from fluid lattice via Dirac delta function –Move the fiber points

Challenges to Parallelization Irregular fiber lists need to interact with regular fluid lattice. –Trade-off between load balancing of fibers and minimizing communication Efficient “scatter-gather” across processors Need a scalable elliptic solver –Plan to uses multigrid –Eventually add Adaptive Mesh Refinement New algorithms under development at LBNL

Heart Application of the IB Method Heart simulation used to design artificial heart valves

Outline Immersed Boundary Method Titanium Immersed Boundary Method on Titanium

Titanium Motivation Applications are increasingly complex –Want classes, overloading, linked data structures –C++ is hard to read, modify and tune Machines are increasingly complex –Want compiler help for optimizations –Want clear performance model and programmer control Java is a better C++ +Safe: strongly typed, garbage collected –Performance is poor due to

Titanium for Scientific Computing Java dialect for high performance Added constructs for performance & expressiveness –Immutable, value classes –SPMD parallelism with a global address space –Multidimensional arrays –Templates –Region-based memory management Compiled to C (no JVM) with lightweight messaging (Active Messages, LAPI, shmem)

SPMD Parallelism in Titanium Explicitly parallel model: –Fixed number of threads at program startup –Usually one thread per processor Global address space: –Processors can access remote data by reading and writing through “global” references (pointers) –Bulk communication happens when copying arrays –Compiler automatically converts global pointers local ones (up to 2x speedup for “LQI”) –Compiler detects synchronizations bugs in barriers

Value Classes in Titanium Java has two distinct kinds of values –Primitive scalar types: boolean, double, int, etc. –Objects: user-defined and library types implicit level of indirection (pointer to) Titanium adds support for small objects –Look like classes with “immutable” keyword –Stored in place and passed by coping –Examples: Complex type Points used to index Titanium arrays

Titanium Arrays Java arrays are 1-dimensional –Arrays of arrays are inefficient Titanium adds multidimensional arrays –Indexed by Points (tuples of ints) –Algebra over Domains (sets of points) Helps with hierarchical algorithms, e.g., multigrid –One array may be a subarray of another e.g., a is interior of b, or a is all even elements of b –Foreach loops help compiler optimize arrays Within 2x of C for multigrid kernels –Bulk I/O provided on arrays (2x-40x speedup!)

Titanium with Other Languages Native methods are sometimes useful: –Performance: allows for comparisons with other compilers and additional control –Libraries: have interfaced to other systems like PetSC and ParMetis Requires understanding of underlying Titanium implementation in C. Lower entry cost than Java: the native method is simply #included into the generated code

Titanium Implementation Run time system and compiler for: –Uniprocessors –SMP running POSIX threads –Clusters with: Shared memory - SGI Origin cluster (ANL), Tera MTA Global Address Space - T3E (NERSC) Active Messages - NOW & Millennium (UCB) LAPI - IBM SP2, SP3 (SDSC)

Outline Immersed Boundary Method Titanium Immersed Boundary Method on Titanium

Immersed Boundary Generic Software Written by Cowen at NYU Implements subset of the IB method adequate for the heart Runs on vector machines with shared memory

Immersed Boundary on Titanium IBGS rewritten in Titanium. Running since October Contractile torus –runs on Berkeley NOW and SGI Origin Needed for heart: –Input file format –Performance tuning Uniprocessor (C code used temporarily in 2 kernels) Communication

Visualization

Immersed Boundary in Titanium Performance Breakdown (torus simulation):

Immersed Boundary in Titanium

Future work Improve performance –Especially on SP machines (Blue Horizon) Add functionality –Bending angles, anchorage points, source & sinks) to the software package. Add adaptability to NS solver (AMR) –Needed for scaling and more accurate modeling of fluid features in heart