Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSAR Overview Laxmikant (Sanjay) Kale 11 September 2001 © ©2001 Board of Trustees of the University of Illinois.

Similar presentations


Presentation on theme: "CSAR Overview Laxmikant (Sanjay) Kale 11 September 2001 © ©2001 Board of Trustees of the University of Illinois."— Presentation transcript:

1 CSAR Overview Laxmikant (Sanjay) Kale 11 September 2001 © ©2001 Board of Trustees of the University of Illinois

2 2 CS Faculty and Staff Investigators T. Baker M. Bhandarkar M. BrandyBerry M. Campbell E. de Sturler H. Edelsbrunner R. Fiedler M. Heath J. Jiao L. Kale O. Lawlor J. Liesen J. Norris D. Padua D. Reed P. Saylor K. Seamons A. Sheffer S. Teng M. Winslett plus numerous students

3 3 ©2001 Board of Trustees of the University of Illinois Computer Science Research Overview Parallel programming environment Software integration framework Parallel component frameworks Clusters Parallel I/O and data migration Performance tools and techniques Computational steering Visualization Computational mathematics and geometry Interface propagation and interpolation Linear solvers and preconditioners Eigensolvers Mesh generation and adaptation

4 4 ©2001 Board of Trustees of the University of Illinois Software Integration Framework Flexible framework for coupling stand-alone application codes (local & grid) Encapsulation via objects and threads Runtime environment to support dynamic behavior (e.g., refinement, load balancing) Intelligent interface for mediating communication between component modules Reusable abstractions People: (SWIFT team +) de Sturler, Heath, Kale, Geubelle, Parsons,.. Bhandarkar, Campbell, Jiao, Haselbacher..

5 5 ©2001 Board of Trustees of the University of Illinois APIs for Coupling Codes Experimented with three orthogonal ideas MPI based Replaces subroutine call by communication with MPI “Decouples” coupled code for greater flexibility in assigning modules to processors Charm++ based Encapsulates modules using objects and threads Replaces MPI with “adaptive” MPI transparently to user Provides automatic load balancing by migrating threads Autopilot based Uses sensors and actuators to coordinate coupled modules Provides steering and performance visualization Current solution: Incorporates ideas from above AMPI with cross communicators, integration with Roccom

6 6 ©2001 Board of Trustees of the University of Illinois AMPI Adaptive load balancing for MPI programs Uses Charm++’s load balancing framework Uses multiple MPI threads per processor Light-weight threads Rocflo Rocface Rocsolid Rocflo Rocface Rocsolid Rocflo Rocface Rocsolid Rocflo Rocface Rocsolid

7 7 ©2001 Board of Trustees of the University of Illinois AMPI and Roc* Rocflo Rocface Rocsolid Rocflo Rocface Rocsolid Rocflo Rocface Rocsolid Rocflo Rocface Rocsolid Rocflo Rocface Rocsolid

8 8 ©2001 Board of Trustees of the University of Illinois 267.75299.85301.56235.19Time Step 133.76149.01150.08117.16Pre-Cor Iter 46.8352.2052.5041.86Solid update 86.8996.7397.5075.24Fluid update 8P3,8P2 w. LB 8P3,8P2 w/o LB 16P216P3Phase Load Balancing with AMPI/Charm++ Turing cluster has processors with different speeds

9 9 ©2001 Board of Trustees of the University of Illinois Performance of GEN1 Using Charm++

10 10 ©2001 Board of Trustees of the University of Illinois AMPI: Recent progress Compiler support for automatic conversion Global variables Packing-unpacking functions Automatic checkpointing No user intervention needed  Except pack-unpack for rare, complex data structures  Triggered by user calls, or periodic Restart on a different number of processors Cross communicators Allows multiple components to communicate across Two independent MPI “Worlds” can communicate Implemented for Rocflo/Rocsolid separation

11 11 ©2001 Board of Trustees of the University of Illinois AMPI and Roc* Rocflo Rocface Rocsolid Rocflo Rocface Rocsolid Rocflo Rocface Rocsolid Rocflo Rocface Rocsolid Rocflo Rocface Rocsolid Rocflo Rocface Rocsolid Rocface Rocsolid Rocface Rocsolid Rocface Rocsolid Rocface Rocsolid Rocflo

12 12 ©2001 Board of Trustees of the University of Illinois AMPI and Roc*: Communication Rocflo Rocface Rocsolid Rocface Rocsolid Rocface Rocsolid Rocface Rocsolid Rocface Rocsolid Rocflo

13 13 ©2001 Board of Trustees of the University of Illinois Roccom -- Component Objects Manager Mechanisms for inter-component data exchange and function invocation Roccom API Programming interface for application modules Roccom developers interface C++ interface for service modules Roccom implementations Roccom easily supported by multiple runtime systems:  MPI, Charm++ (AMPI), Autopilot

14 14 ©2001 Board of Trustees of the University of Illinois Roccom Goals Mechanism for data exchange and function invocation between Roc* components Object-oriented philosophy enforcing encapsulation and enabling polymorphism Minimal changes required to existing physical modules Minimal dependencies in component development Maximal flexibility for integration

15 15 ©2001 Board of Trustees of the University of Illinois Architectures with/without Roccom Promotes modularity Eases integration of modules (e.g. Rocpanda) Enables plug-and-play of physics modules Solid HDF IO Fluid Roccom Orchestration Combustion Interface Solid HDF IO Fluid Orchestration Combustion Interface HDF IO GEN1 architectureGEN2 architecture

16 16 ©2001 Board of Trustees of the University of Illinois Autopilot and Roccom Autopilot performance monitoring system Requires some (little) source code changes Mechanisms for user/client based or automatic steering Dynamic starting, stopping, and swapping of application components at runtime Provides mechanisms for runtime performance tuning and visualization Built on top of existing Pablo performance suite Mechanisms for automatic performance based steering at runtime Remote performance visualization on workstations or I-desk using Virtue

17 17 ©2001 Board of Trustees of the University of Illinois Component Frameworks Motivation Reduce tedium of parallel programming for commonly used paradigms Encapsulate required parallel data structures and algorithms Provide easy to use interface,  Sequential programming style preserved  No alienating invasive constructs Use adaptive load balancing framework (and objects) Current and planned component frameworks FEM Multiblock AMR

18 18 ©2001 Board of Trustees of the University of Illinois FEM framework Present clean, “almost serial” interface: Hide parallel implementation in the runtime system Leave physics and time integration to user Users write code similar to sequential code Or, easily modify sequential code Input: connectivity file (mesh), boundary data and initial data Framework: Partitions data, and Starts driver for each chunk in a separate thread Automates communication, once user registers fields to be communicated Automatic dynamic load balancing

19 19 ©2001 Board of Trustees of the University of Illinois FEM Experience Previous: 3-D volumetric/cohesive crack propagation code  (Geubelle, Breitenfeld, et. al) 3-D dendritic growth fluid solidification code  (Dantzig, Jeong) Recent Adaptive insertion of cohesive elements  Mario Zaczek, Philippe Geubelle  Performance data Multi-Grain contact (in progress)  Spandan Maiti  Using FEM framework and collision detection NSF funded project  Did initial parallelization in 4 days

20 20 ©2001 Board of Trustees of the University of Illinois Performance data: ASCI Red Mesh with 3.1 million elements

21 21 ©2001 Board of Trustees of the University of Illinois Parallel Collision Detection Detect collisions (intersections) between objects scattered across processors Approach based on Charm++ Arrays Overlay regular, sparse grid of voxels (array elements) Send objects to all voxels they touch Collide voxels independently and collect results Results: 2  s per polygon; speedups to 1000s

22 22 ©2001 Board of Trustees of the University of Illinois Related Projects Multiphase load balancing Automatically identify phases, if necessary Use instrumentation of each phase to remap objects from each phase independently Automatic out-of-core execution Take advantage of data-driven execution Perfectly predictive object prefetching No programmer intervention needed Cluster Management Stretchable jobs : shrink-and-expand Assigned processors can be changed at runtime Job scheduler to maximize throughput Using stretchable jobs as well as fixed-size ones

23 23 ©2001 Board of Trustees of the University of Illinois Parallel I/O and Data Migration Parallel output of snapshots for GEN1 Combine arrays for different blocks into single virtual array Output multiple arrays at once using array group Manage metadata for outputting HDF files for Rocketeer Automatic tuning of parallel I/O performance Data migration concurrent with application Automatic choice of data migration strategy Rocpanda 3.0 Released

24 24 ©2001 Board of Trustees of the University of Illinois Parallel I/O and Data Migration Parallel output of snapshots for GENx using Rocpanda Support output of metadata, data to HDF files in Rocketeer’s format Hide cost of I/O with new general buffering scheme called greedy buffering Migrate output automatically to remote workstation Automatic tuning of parallel I/O performance Automatic selection of data migration strategy, buffer sizes and placements, communication strategy, data layouts on disk

25 25 ©2001 Board of Trustees of the University of Illinois Mesh Generation and Adaptation Library for mixed 3D cohesive element meshes A program for introducing cohesive elements based on material types. Alla Sheffer and Philippe Geubelle Mesh quality measures & Laplace smoothing in the ALE code Alla Sheffer and Mark Brandyberry Continuing: Space-Time meshing in 2DxTIME Alla Sheffer, Alper Ungor Surface parameterization Alla Sheffer, Eric de Sturler, Joerg Liesen & students In collaboration with Sandia (Cubit)

26 26 ©2001 Board of Trustees of the University of Illinois Interface Propagation and Data Transfer Jim Jiao, Mike Heath Interface propagation New approach combining best features of marker particle and level set methods Concept of null set of interface for detection of expendable data and topological change Interface data transfer Efficient and robust algorithms for mesh association between disparate meshes New algorithm for overlaying two meshes to create reference mesh from common refinement Accurate and conservative interpolation using overlaid reference mesh and least squares approximation Parallel implementation in GEN1 integrated code

27 27 ©2001 Board of Trustees of the University of Illinois Rocface: disparate meshes Robust and efficient algorithm for overlaying two surface meshes

28 28 ©2001 Board of Trustees of the University of Illinois Rocface –Interface Component Robust and efficient algorithm for overlaying two surface meshes + =

29 29 ©2001 Board of Trustees of the University of Illinois Least Squares Data Transfer Minimizes error and enforces conservation Handles node and element centered data Made possible by the overlay Achieved superb experimental results Cumulative effect over 500 steps of a coupled simulation Our method Load transfer (Farhat)

30 30 ©2001 Board of Trustees of the University of Illinois Iterative Solvers Exact and finite precision analysis of Krylov subspace methods Short-term recurrences Choice of basis in minimal residual (MR) methods New preconditioners for indefinite systems Application in surface parameterization Application of Krylov subspace methods in large scale problems GMRES with optimal truncation

31 31 ©2001 Board of Trustees of the University of Illinois Prof. Laxmikant Kale Department of Computer Science University of Illinois at Urbana-Champaign 2262 Digital Computer Laboratory 1304 West Springfield Avenue Urbana, IL 61801 USA kale@cs.uiuc.edu http://www.cs.uiuc.edu/contacts/ faculty/kale.html telephone: 217-244-0094 fax: 217-333-3501


Download ppt "CSAR Overview Laxmikant (Sanjay) Kale 11 September 2001 © ©2001 Board of Trustees of the University of Illinois."

Similar presentations


Ads by Google