Download presentation
Presentation is loading. Please wait.
Published byMelvin French Modified over 9 years ago
1
Distributed Components for Integrating Large-Scale High Performance Computing Applications – A Project Summary Nanbor Wang nanbor@txcorp.com Tech-X Corporation Boulder, CO CCA Forum Meeting, Bethesda, MD July 24, 2008 Funded by DOE OASCR SBIR Grant #DE-FG02-04ER84099
2
Distributed Components Nanbor Wang 2 Agenda Motivation for Distributed and Parallel High- Performance Computing (DPHPC) Exploring diverse distributed technologies working over Babel RMI –Distributed Proxy Components using CORBA objects and Web Services –TaoIIOP using CORBA IIOP Performance analysis (DistComp with CORBA, TAOIIP over Babel/RMI) Applications development TAOIIOP distribution Summary and acknowledgements
3
Distributed Components Nanbor Wang 3 Distributed and Parallel Component-Based Software Engineering Addresses Modern Needs of Scientific Computing Motivating scenarios for Distributed and Parallel HPC (DPHPC): –Integrate separately-developed and established codes – FSP, climate modeling, space weather modeling, each component needing its own architecture –Provide ways to better utilize high-CPU number hardware and combine computing resources of multiple clusters/computing centers –Enable parallel data streaming between computing task and post- processing task (no feedback to the solver) –Integrate multiple parallel codes using heterogeneous architectures Existing component standards and frameworks designed with enterprise applications in mind –No support for features that are important for HPC scientific applications: interoperability with scientific programming languages (FORTRAN) and parallel computing infrastructure (MPI) CCA address needs of HPC scientific applications: combustion modeling, global climate modeling, fusion and plasma simulations Tasks –Explore various distributed technologies and approaches for DPHPC –Enhance tool support for DPHPC – F2003 struct support (covered later in Stefan’s talk)
4
Distributed Components Nanbor Wang 4 Typical Parallel CCA Frameworks Support both SPMD and MPMD scenarios Stay out of the way of component parallelism –Components handle parallel communication P0P1P2P3 Group BGroup A SCMD MCMD
5
Distributed Components Nanbor Wang 5 An Illustration of DPHPC Application P0P1P2P3 Middleware Still support conventional CCA component managed parallelism Provide additional framework mediated distributed inter- component communication capability Cooperative Processing – LLNL PACO++ – INRIA Alternative MCMD
6
Distributed Components Nanbor Wang 6 Agenda Motivation for Distributed and Parallel High- Performance Computing (DPHPC) Exploring diverse distributed technologies working over Babel RMI –Distributed Proxy Components using CORBA objects and Web Services –TaoIIOP using CORBA IIOP Performance analysis (DistComp with CORBA, TAOIIP over Babel/RMI) Applications development TAOIIOP distribution Summary and acknowledgements
7
Distributed Components Nanbor Wang 7 Distributed Proxy CCA Components Hide Distributed Nature from Native Components Connect distributed parallel components by composing remote-capable proxy components into applications Hide the distributed aspect from the localized parallel CCA framework Provide low-cost mechanisms for connecting incompatible CCA infrastructures (e.g., Ccafeine, Dune, Ccain, and SciRUN) or middleware services Implemented using Web Services (gSoap) and CORBA (reported earlier) Non-standard One-off solution Little tool support
8
Distributed Components Nanbor Wang 8 BABEL RMI CLIENT BABEL RMI SERVER Babel RMI Interface Simple Protocol Babel RMI Allows Multiple Implementations Babel generates mapping for remote invocations, and has its own transfer protocol “Simple Protocol” implemented in C Thanks to Babel’s open architecture and language interoperability – users can take advantage of various distributed technologies through third party RMI libraries We have developed a CORBA protocol library for Babel RMI using TAO (version 1.5.1 or later) –The first 3 rd -party Babel RMI library –TAO is the C++ based CORBA middleware framework –This protocol is essentially a bridge between Babel and TAO BABEL RMI CLIENT BABEL RMI SERVER Babel RMI Interface TAOIIOP TAO
9
Distributed Components Nanbor Wang 9 Using CORBA in Babel RMI Allows CORBA and Babel Objects to Interoperate Goal is to –Allow interoperability between existing CORBA and Babel objects –Retain performance of CORBA IIOP protocol Possible approaches for serialization –Encapsulating Babel Simple Protocol wire-format into a block of binary data and transport it using CORBA (as Octet Sequence) –Encapsulating Babel communications into CORBA Any objects (did not follow up because of inefficiency of Any) –Mapping Babel communications to CORBA format directly (the adopted approach). CORBA uses Common Data Representation (CDR) in the wire.
10
Distributed Components Nanbor Wang 10 Conversions between CORBA & Babel types module taoiiop { module rmi { exception ServerException { string info; }; struct fcomplex { float real; float imaginary; }; struct dcomplex { double real; double imaginary; }; /* SIDL arrays are mapped to CORBA structs which keep all the metadata information and the array values are stored as CORBA sequence following the metadata */ typedef sequence ArrayDims; struct Array_Metadata { short ordering; short dims; ArrayDims stride; ArrayDims lower; ArrayDims upper; }; };
11
Distributed Components Nanbor Wang 11 TAOIIOP Operation Invocations CORBA uses Common Data Representation (CDR) – a binary serialization format, for transferring messages. Data packed directly to CDR
12
Distributed Components Nanbor Wang 12 Server-side Request Handling A default TAO servant handles all Babel invocations Requests are dispatched to target Babel objects based on the instance/object ID Need to extend TAO’s PortableServer class to expose the Input (for reading input parameters) and Output (to sending the results) CDRs –SIDL Call and Response objects get a reference to the Input and Output CDRs respectively
13
Distributed Components Nanbor Wang 13 TAOIIOP V2.0 Optimizations and Enhancements Initial implementation provides a proof-of-concept but –created temporary CORBA objects in the bridge –many extra memory allocations, copying and conversions Now we –map SIDL into CDR and back –aggregate allocations –minimize copying Version 1: Babel object Corba object CDR Corba object Babel object Version 2: Babel object CDR Babel object Numerous bug fixes and preparation for distribution
14
Distributed Components Nanbor Wang 14 Agenda Motivation for Distributed and Parallel High- Performance Computing (DPHPC) Exploring diverse distributed technologies working over Babel RMI –Distributed Proxy Components using CORBA objects and Web Services –TaoIIOP using CORBA IIOP Performance analysis (DistComp with CORBA, TAOIIP over Babel/RMI) Applications development TAOIIOP distribution Summary and acknowledgements
15
Distributed Components Nanbor Wang 15 Performance Comparison 1
16
Distributed Components Nanbor Wang 16 Performance Comparison 2
17
Distributed Components Nanbor Wang 17 Performance Comparison 3
18
Distributed Components Nanbor Wang 18 Performance Comparison 4
19
Distributed Components Nanbor Wang 19 Benchmarking Oneway and Asynchronous Calls Compute- bound task Dump data Data Analysis Compute- bound task Dump data Data Analysis signal Compute- bound task Simulation cluster Remote cluster Compute- bound task Dump data Data Analysis Compute- bound task Data Analysis signal Simulation cluster Remote cluster Compute- bound task Data Analysis Compute- bound task Synchronous Invocations Asynchronous/oneway Invocations
20
Distributed Components Nanbor Wang 20 Performance Comparison – Async and Oneway Figure shows average time for each time step Very lightweight data analysis – emphasis on transport 0 payload actually makes no remote invocation Babel team is working on a new RMI implementation
21
Distributed Components Nanbor Wang 21 Performance Analysis TaoIIOP V1.0 takes a performance hit consistently –Performing extra conversions for arrays and complex number types between CORBA and Babel –Multiple, fine-grained memory allocations –Not taking advantage of TAO’s key optimization mechanisms Distributed proxy components suffers a bit again because data marshalling TaoIIOP 2.0 has a performance gain of 10% for double and 30% for complex numbers, compared to TaoIIOP 1.0 –Optimizations: Made CORBA-Babel mapping types native in TAO by implementing optimized, zero-copy version of marshaling and demarshaling support
22
Distributed Components Nanbor Wang 22 Agenda Motivation for Distributed and Parallel High- Performance Computing (DPHPC) Exploring diverse distributed technologies working over Babel RMI –Distributed Proxy Components using CORBA objects and Web Services –TaoIIOP using CORBA IIOP Performance analysis (DistComp with CORBA, TAOIIP over Babel/RMI) Applications development TAOIIOP distribution Summary and acknowledgements
23
Distributed Components Nanbor Wang 23 DPHPC Application – Data Analysis We had developed prototype to perform online data-analysis as a proof-of-concept Using Vorpal, a particle-in-cell plasma simulation code Run in the same cluster as two group of processors Significant speedup was observed
24
Distributed Components Nanbor Wang 24 DPHPC Applications – Remote Monitoring/Steering of Simulations We have extended Vorpal to support the use of Babel RMI to send out simulator progress data Configurable from Vorpal’s initialization file: kind = historyKind kind = babelSender babelRmiURL = eclipse.txcorp.com:8081 Support specification of a URL group – a list of URLs running parallel tasks We are able to connect a running simulation to one or multiple workstations –For online data processing/analysis –For monitoring simulation Physicists are most interested in –Monitoring –Steering
25
Distributed Components Nanbor Wang 25 Agenda Motivation for Distributed and Parallel High- Performance Computing (DPHPC) Exploring diverse distributed technologies working over Babel RMI –Distributed Proxy Components using CORBA objects and Web Services –TaoIIOP using CORBA IIOP Performance analysis (DistComp with CORBA, TAOIIP over Babel/RMI) Applications development TAOIIOP distribution Summary and acknowledgements
26
Distributed Components Nanbor Wang 26 TAOIIOP v2.0 Distribution https://ice.txcorp.com/trac/taoiiop/ TAOIIOP will be available thru the distribution web page –The page is live but need some work –Can browse the subversion source code Distribution includes –TAOIIOP library –Examples (for both SimpleProtocol and TAOIIOP) Basic usage – basic types, arrays, and serializable Synchronous, asynchronous, and oneway invocations MPI and managing multiple connections Interoperability – TAOIIOP client/server working with CORBA client/server –Benchmarking codes
27
Distributed Components Nanbor Wang 27 Summary Implemented the distributed proxy components and the TaoIIOP Babel RMI protocol for connecting distributed CCA applications into an integrated systems Conducted performance benchmarking on preliminary prototype implementation (version 1.0) to identify key optimizations needed Implemented the optimizations to minimize the overhead (version 2.0) Interoperability with CORBA can be achieved with little/no performance penalty
28
Distributed Components Nanbor Wang 28 Future Directions Increase ease of use For example, elegant support for multiple parallel components Implement more scenarios of mixing distributed and high performance components involving several clusters and real applications –Extend existing work with VORPAL –Welcome other applications too Synergy with MCMD Support for petascale HPC applications –Remote monitoring/steering of large-scale simulations on supercomuters (e.g., franklin) –Can take advantage of CORBA-Babel RMI interoperability for now and switch to TAOIIOP later
29
Distributed Components Nanbor Wang 29 Acknowledgement Many people helped make this project possible –Office of Advanced Scientific Computing Research –Tech-X Johan Carlsson Stefan Muszala Roopa Pundaleeka Stephen Tramer Paul Hamill Jaiganesh Balasubramanian (Vanderbilt University) Fang (Cherry) Liu (Indiana University) –Babel Team from Lawrence Livermore Gary Kumfert Tom Epperly James Leek –Greater CCA community David Bernholdt Rob Armstrong Too many to be named specifically
30
Distributed Components Nanbor Wang 30 Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.