Kathy Yelick, 1 Advanced Software for Biological Simulations Elastic structures in an incompressible fluid. Blood flow, clotting, inner ear, embryo growth,

Slides:



Advertisements
Similar presentations
Unified Parallel C at LBNL/UCB Implementing a Global Address Space Language on the Cray X1 Christian Bell and Wei Chen.
Advertisements

PGAS Language Update Kathy Yelick. PGAS Languages: Why use 2 Programming Models when 1 will do? Global address space: thread may directly read/write remote.
Parallelizing stencil computations Based on slides from David Culler, Jim Demmel, Bob Lucas, Horst Simon, Kathy Yelick, et al., UCB CS267.
Reference: Message Passing Fundamentals.
A High-Performance Java Dialect Kathy Yelick, Luigi Semenzato, Geoff Pike, Carleton Miyamoto, Ben Liblit, Arvind Krishnamurthy, Paul Hilfinger, Susan Graham,
Languages and Compilers for High Performance Computing Kathy Yelick EECS Department U.C. Berkeley.
1 Synthesis of Distributed ArraysAmir Kamil Synthesis of Distributed Arrays in Titanium Amir Kamil U.C. Berkeley May 9, 2006.
Towards a Digital Human Kathy Yelick EECS Department U.C. Berkeley.
The Poisson operator, aka the Laplacian, is a second order elliptic differential operator and defined in an n-dimensional Cartesian space by: The Poisson.
Unified Parallel C at LBNL/UCB UPC at LBNL/U.C. Berkeley Overview Kathy Yelick U.C. Berkeley, EECS LBNL, Future Technologies Group.
Support for Adaptive Computations Applied to Simulation of Fluids in Biological Systems Kathy Yelick U.C. Berkeley.
Evaluation and Optimization of a Titanium Adaptive Mesh Refinement Amir Kamil Ben Schwarz Jimmy Su.
Simulating the Cochlea With Titanium Generic Immersed Boundary Software (TiGIBS) Contractile Torus: (NYU) Oval Window of Cochlea: (CACR) Mammalian Heart:
I/O Optimization for ENZO Cosmology Simulation Using MPI-IO Jianwei Li12/06/2001.
CS267 L12 Sources of Parallelism(3).1 Demmel Sp 1999 CS 267 Applications of Parallel Computers Lecture 12: Sources of Parallelism and Locality (Part 3)
Applications for K42 Initial Brainstorming Paul Hargrove and Kathy Yelick with input from Lenny Oliker, Parry Husbands and Mike Welcome.
Java for High Performance Computing Jordi Garcia Almiñana 14 de Octubre de 1998 de la era post-internet.
Impact of the Cardiac Heart Flow Alpha Project Kathy Yelick EECS Department U.C. Berkeley.
Programming Systems for a Digital Human Kathy Yelick EECS Department U.C. Berkeley.
Support for Adaptive Computations Applied to Simulation of Fluids in Biological Systems Immersed Boundary Method Simulation in Titanium Siu Man Yau, Katherine.
Support for Adaptive Computations Applied to Simulation of Fluids in Biological Systems Kathy Yelick U.C. Berkeley.
Support for Adaptive Computations Applied to Simulation of Fluids in Biological Systems Kathy Yelick U.C. Berkeley.
UPC and Titanium Open-source compilers and tools for scalable global address space computing Kathy Yelick University of California, Berkeley and Lawrence.
Use of a High Level Language in High Performance Biomechanics Simulations Katherine Yelick, Armando Solar-Lezama, Jimmy Su, Dan Bonachea, Amir Kamil U.C.
UPC at CRD/LBNL Kathy Yelick Dan Bonachea, Jason Duell, Paul Hargrove, Parry Husbands, Costin Iancu, Mike Welcome, Christian Bell.
Support for Adaptive Computations Applied to Simulation of Fluids in Biological Systems Immersed Boundary Method Simulation in Titanium.
Yelick 1 ILP98, Titanium Titanium: A High Performance Java- Based Language Katherine Yelick Alex Aiken, Phillip Colella, David Gay, Susan Graham, Paul.
1 Titanium Review: Ti Parallel Benchmarks Kaushik Datta Titanium NAS Parallel Benchmarks Kathy Yelick U.C. Berkeley September.
Lecture 29 Fall 2006 Lecture 29: Parallel Programming Overview.
Global Address Space Applications Kathy Yelick NERSC/LBNL and U.C. Berkeley.
LLNL-PRES-XXXXXX This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Jonathan Carroll-Nellenback University of Rochester.
Center for Programming Models for Scalable Parallel Computing: Project Meeting Report Libraries, Languages, and Execution Models for Terascale Applications.
Unified Parallel C at LBNL/UCB UPC AMR Status Report Michael Welcome LBL - FTG.
1 Charm Kathy Yelick Compilation Techniques for Partitioned Global Address Space Languages Katherine Yelick U.C. Berkeley and Lawrence Berkeley National.
UPC Applications Parry Husbands. Roadmap Benchmark small applications and kernels —SPMV (for iterative linear/eigen solvers) —Multigrid Develop sense.
Presented by High Productivity Language and Systems: Next Generation Petascale Programming Wael R. Elwasif, David E. Bernholdt, and Robert J. Harrison.
4.2.1 Programming Models Technology drivers – Node count, scale of parallelism within the node – Heterogeneity – Complex memory hierarchies – Failure rates.
Automatic Differentiation: Introduction Automatic differentiation (AD) is a technology for transforming a subprogram that computes some function into a.
Center for Component Technology for Terascale Simulation Software CCA is about: Enhancing Programmer Productivity without sacrificing performance. Supporting.
GPU Architecture and Programming
Supercomputing ‘99 Parallelization of a Dynamic Unstructured Application using Three Leading Paradigms Leonid Oliker NERSC Lawrence Berkeley National Laboratory.
Session 9 Component and Deployment. OOAD with UML / Session 9 / 2 of 17 Review State Diagrams represent the software entities in terms of their states.
Presented by An Overview of the Common Component Architecture (CCA) The CCA Forum and the Center for Technology for Advanced Scientific Component Software.
1 Parallel Programming Aaron Bloomfield CS 415 Fall 2005.
1 1  Capabilities: Building blocks for block-structured AMR codes for solving time-dependent PDE’s Functionality for [1…6]D, mixed-dimension building.
© 2009 IBM Corporation Parallel Programming with X10/APGAS IBM UPC and X10 teams  Through languages –Asynchronous Co-Array Fortran –extension of CAF with.
1 Qualifying ExamWei Chen Unified Parallel C (UPC) and the Berkeley UPC Compiler Wei Chen the Berkeley UPC Group 3/11/07.
I/O for Structured-Grid AMR Phil Colella Lawrence Berkeley National Laboratory Coordinating PI, APDEC CET.
CCA Common Component Architecture CCA Forum Tutorial Working Group CCA Status and Plans.
FOUNDATION IN INFORMATION TECHNOLOGY (CS-T-101) TOPIC : INFORMATION SYSTEM – SOFTWARE.
Gtb 1 Titanium Titanium: Language and Compiler Support for Scientific Computing Gregory T. Balls University of California - Berkeley Alex Aiken, Dan Bonachea,
Cardiac Blood Flow Alpha Project Impact Kathy Yelick EECS Department U.C. Berkeley.
FY 12 IPR Parallel Framework Capabilities PT123 computational kernel handles various ODE solvers. P2P communication model. Particle-mesh correlation provides.
TR&D 2: NUMERICAL TOOLS FOR MODELING IN CELL BIOLOGY Software development: Jim Schaff Fei Gao Frank Morgan Math & Physics: Boris Slepchenko Diana Resasco.
- 1 - Workshop on Pattern Analysis Data Flow Pattern Analysis of Scientific Applications Michael Frumkin Parallel Systems & Applications Intel Corporation.
A Multi-platform Co-Array Fortran Compiler for High-Performance Computing Cristian Coarfa, Yuri Dotsenko, John Mellor-Crummey {dotsenko, ccristi,
C OMPUTATIONAL R ESEARCH D IVISION 1 Defining Software Requirements for Scientific Computing Phillip Colella Applied Numerical Algorithms Group Lawrence.
1 HPJAVA I.K.UJJWAL 07M11A1217 Dept. of Information Technology B.S.I.T.
Kathy Yelick, Computer Science Division, EECS, University of California, Berkeley Titanium Titanium: A High Performance Language Based on Java Kathy Yelick.
Towards a Digital Human Kathy Yelick EECS Department U.C. Berkeley.
1 Titanium Review: Immersed Boundary Armando Solar-Lezama Biological Simulations Using the Immersed Boundary Method in Titanium Ed Givelberg, Armando Solar-Lezama,
1 PGAS LanguagesKathy Yelick Partitioned Global Address Space Languages Kathy Yelick Lawrence Berkeley National Laboratory and UC Berkeley Joint work.
Unified Parallel C at LBNL/UCB UPC at LBNL/U.C. Berkeley Overview Kathy Yelick LBNL and U.C. Berkeley.
Compilers: History and Context COMP Outline Compilers and languages Compilers and architectures – parallelism – memory hierarchies Other uses.
Programming Models for SimMillennium
UPC and Titanium Kathy Yelick University of California, Berkeley and
Immersed Boundary Method Simulation in Titanium Objectives
Presentation transcript:

Kathy Yelick, 1 Advanced Software for Biological Simulations Elastic structures in an incompressible fluid. Blood flow, clotting, inner ear, embryo growth, … Complicated parallelization Particle/Mesh method, but “Particles” connected into materials (1D or 2D structures) Communication patterns irregular between particles (structures) and mesh (fluid) Joint work with Ed Givelberg, Armando Solar-Lezama, Charlie Peskin, Dave McQueen 2D Dirac Delta Function Code Size in Lines FortranTitanium Note: Fortran code is not parallel

2 Kathy Yelick Software Architecture Application Models Generic Immersed Boundary Method (Titanium) Heart (Titanium) Cochlea (Titanium+C) Flagellate Swimming … Spectral (Titanium + FFTW) AMR Extensible Simulation Solvers Multigrid (Titanium) Can add new models by extending material points Can add new Navier-Stokes solvers IB software and Cochlea by E. Givelberg; Heart by A. Solar based on Peskin/McQueen existing components Source:

Kathy Yelick, 3 Titanium: Java for Scientific Computing Java plus high performance parallelism No JVM; compiled to assembly (via C for portability) Features added to Java: PGAS model with SPMD execution and one-side communication Multi-dimensional rectangular arrays Locality qualification: the local keyword Lightweight (value) classes for Complex, etc. C++ like templates and operator overloading Zone-based memory management Titanium runs on shared & distributed memory

Kathy Yelick, 4 Coding Challenges: Block-Structured AMR Adaptive Mesh Refinement (AMR) is challenging Irregular data accesses and control from boundaries Mixed global/local view is useful AMR Titanium work by Tong Wen and Philip Colella Titanium AMR benchmark available

Kathy Yelick, 5 Languages Support Helps Productivity C++/Fortran/MPI AMR Chombo package from LBNL Bulk-synchronous comm: Pack boundary data between procs All optimizations done by programmer Titanium AMR Entirely in Titanium Finer-grained communication No explicit pack/unpack code Automated in runtime system General approach Language allow programmer optimizations Compiler/runtime does some automatically Work by Tong Wen and Philip Colella; Communication optimizations joint with Jimmy Su

Kathy Yelick, 6 Performance of Titanium AMR Serial: Titanium is within a few % of C++/F; sometimes faster! Parallel: Titanium scaling is comparable with generic optimizations - optimizations (SMP-aware) that are not in MPI code - additional optimizations (namely overlap) not yet implemented Comparable parallel performance Joint work with Tong Wen, Jimmy Su, Phil Colella

Kathy Yelick, 7 Particle/Mesh Method: Heart Simulation Elastic structures in an incompressible fluid. Blood flow, clotting, inner ear, embryo growth, … Complicated parallelization Particle/Mesh method, but “Particles” connected into materials (1D or 2D structures) Communication patterns irregular between particles (structures) and mesh (fluid) Joint work with Ed Givelberg, Armando Solar-Lezama, Charlie Peskin, Dave McQueen 2D Dirac Delta Function Code Size in Lines FortranTitanium Note: Fortran code is not parallel

Kathy Yelick, 8