Tammy Dahlgren, Tom Epperly, Scott Kohn, & Gary Kumfert
GKK 2 CASC Goals l Describe our vision to the CCA l Solicit contributions (code) for: u RMI (SOAP | SOAP w/ Mime types) u Parallel Network Algs (general arrays) l Encourage Collaboration
GKK 3 CASC Outline l Background on l General MxN Solution : bottom-up u Initial Assumptions u MUX Component u MxNRedistributable interface l Parallel Handles to a Parallel Distributed Component l Tentative Research Strategy
GKK 4 CASC l Quorum - web voting l Alexandria - component repository l Babel - language interoperability u maturing to platform interoperability … implies some RMI mechanism SOAP | SOAP w/ MIME types open to suggestions, & contributed sourcecode
GKK 5 CASC Babel & MxN problem l Unique Opportunities u SIDL communication directives u Babel generates code anyway u Users already link against Babel Runtime Library u Can hook directly into Intermediate Object Representation (IOR)
GKK 6 CASC Impls and Stubs and Skels Application Stubs Skels IORs Impls l Application: uses components in user’s language of choice l Client Side Stubs: translate from application language to C l Internal Object Representation: Always in C l Server Side Skeletons: translates IOR (in C) to component implementation language l Implementation: component developers choice of language. (Can be wrappers to legacy code)
GKK 7 CASC Out of Process Components Application Stubs IPC IORs Skels IORs Impls IPC
GKK 8 CASC Internet Remote Components Application Stubs Marshaler IORs Line Protocol Unmarshaler Skels IORs Impls
GKK 9 CASC Outline l Background on l General MxN Solution : bottom-up u Initial Assumptions u MUX Component u MxNRedistributable interface l Parallel Handles to a Parallel Distributed Component l Tentative Research Strategy
GKK 10 CASC Initial Assumptions l Working Point-to-Point RMI l Object Persistence
GKK 11 CASC Example #1: 1-D Vectors 0.0, 1.1 p0 x 2.2, 3.3 p1 x 4.4, 5.5 p2 x.9,.8,.7 p3 y.6,.5,.4 p4 y double d = x.dot( y );
GKK 12 CASC Example #1: 1-D Vectors p0 x p1 x p2 x p3 y p4 y double d = x.dot( y );
GKK 13 CASC Rule #1: Owner Computes double vector::dot( vector& y ) { // initialize double * yData = new double[localSize]; y.requestData( localSize, local2global, yData); // sum all x[i] * y[i] double localSum = 0.0; for( int i=0; i<localSize; ++i ) { localSum += localData[i] * yData[i]; } // cleanup delete[] yData; return localMPIComm.globalSum( localSum ); }
GKK 14 CASC Design Concerns l vector y is not guaranteed to have data mapped appropriately for dot product. l vector y is expected to handle MxN data redistribution internally u Should each component implement MxN redistribution? y.requestData( localSize, local2global, yData);
GKK 15 CASC Outline l Background on l General MxN Solution : bottom-up u Initial Assumptions u MUX Component u MxNRedistributable interface l Parallel Handles to a Parallel Distributed Component l Tentative Research Strategy
GKK 16 CASC Vector Dot Product: Take #2 double vector::dot( vector& y ) { // initialize MUX mux( *this, y ); double * yData = mux.requestData( localSize, local2global ); // sum all x[i] * y[i] double localSum = 0.0; for( int i=0; i<localSize; ++i ) { localSum += localData[i] * yData[i]; } // cleanup mux.releaseData( yData ); return localMPIComm.globalSum( localSum ); }
GKK 17 CASC Generalized Vector Ops vector ::parallelOp( vector & y ) { // initialize MUX mux( *this, y ); vector newY = mux.requestData( localSize, local2global ); // problem reduced to a local operation result = x.localOp( newY ); // cleanup mux.releaseData( newY ); return localMPIComm.reduce( localResult ); }
GKK 18 CASC Rule #2: MUX distributes data l Users invoke parallel operations without concern to data distribution l Developers implement local operation assuming data is already distributed l Babel generates code that reduces a parallel operation to a local operation l MUX handles all communication l How general is a MUX?
GKK 19 CASC Example #2: Undirected Graph
GKK 20 CASC Key Observations l Every Parallel Component is a container and is divisible to subsets. l There is a minimal (atomic) addressable unit in each Parallel Component. l This minimal unit is addressable in global indices.
GKK 21 CASC Atomicity l Vector (Example #1): u atom - scalar u addressable - integer offset l Undirected Graph (Example #2): u atom - vertex with ghostnodes u addressable - integer vertex id l Undirected Graph (alternate): u atom - edge u addressable - ordered pair of integers
GKK 22 CASC Outline l Background on l General MxN Solution : bottom-up u Initial Assumptions u MUX Component u MxNRedistributable interface l Parallel Handles to a Parallel Distributed Component l Tentative Research Strategy
GKK 23 CASC MxNRedistributable Interface interface Serializable { store( in Stream s ); load( in Stream s ); }; interface MxNRedistributable extends Serializable { int getGlobalSize(); local int getLocalSize(); local array getLocal2Global(); split ( in array maskVector, out array pieces); merge( in array pieces); };
GKK 24 CASC Rule #3: All Parallel Components implement “MxNRedistributable” l Provides standard interface for MUX to manipulate component l Minimal coding requirements to developer l Key to abstraction u split() u merge() l Manipulates “atoms” by global address
GKK 25 CASC Now for the hard part slides illustrating how it all fits together for an Undirected Graph
GKK 26 CASC pid=0 pid=1 %> mpirun -np 2 graphtest
GKK 27 CASC BabelOrb * orb = BabelOrb.connect( “ pid=0 pid=1 1 2 orb 3 4
GKK 28 CASC Graph * graph = orb-> create(“graph”,3); orb graph MUX
GKK 29 CASC graph MUX graph MUX graph MUX graph->load(“file://...”); orb graph MUX graph MUX Fancy MUX Routing 1 2
GKK 30 CASC graph->doExpensiveWork(); orb graph MUX graph MUX
GKK 31 CASC PostProcessor * pp = new PostProcessor; orb graph MUX graph MUX pp
GKK 32 CASC pp->render( graph ); orb graph MUX graph MUX pp l MUX queries graph for global size (12) l Graph determines particular data layout (blocked) require l MUX is invoked to guarantee that layout before render implementation is called
GKK 33 CASC MUX solves general parallel network flow problem ( client & server ) orb graph MUX graph MUX pp require 0, 1, 2, 3 4, 5 6, 7 8, 9, 10, 11
GKK 34 CASC MUX opens communication pipes orb graph MUX graph MUX pp require 0, 1, 2, 3 4, 5 6, 7 8, 9, 10, 11
GKK 35 CASC MUX splits graphs with multiple destinations (server-side) orb graph MUX graph MUX pp require 0, 1, 2, 3 4, 5 6, 7 8, 9, 10,
GKK 36 CASC MUX sends pieces through communication pipes (persistance) orb graph MUX graph MUX pp require
GKK 37 CASC MUX receives graphs through pipes & assembles them (client side) orb graph MUX graph MUX pp require
GKK 38 CASC pp -> render_impl( graph ); (user’s implementation runs)
GKK 39 CASC Outline l Background on l General MxN Solution : bottom-up u Initial Assumptions u MUX Component u MxNRedistributable interface l Parallel Handles to a Parallel Distributed Component l Tentative Research Strategy
GKK 40 CASC Summary l All distributed components are containers and subdivisable l The smallest globally addressable unit is an atom l MxNRedistributable interface reduces general component MxN problem to a 1-D array of ints l MxN problem is a special case of the general problem N handles to M instances l Babel is uniquely positioned to contribute a solution to this problem
GKK 41 CASC Outline l Background on l General MxN Solution : bottom-up u Initial Assumptions u MUX Component u MxNRedistributable interface l Parallel Handles to a Parallel Distributed Component l Tentative Research Strategy
GKK 42 CASC Tentative Research Strategy l Java only, no Babel u serialization & RMI built-in l Build MUX l Experiment l Write Paper l Finish 0.5.x line l add serialization l add RMI l Add in technology from Fast Track Fast TrackSure Track
GKK 43 CASC Open Questions l Non-general, Optimized Solutions l Client-side Caching issues l Fault Tolerance l Subcomponent Migration l Inter vs. Intra component communication l MxN, MxP, or MxPxQxN
GKK 44 CASC MxPxQxN Problem Long-Haul Network
GKK 45 CASC MxPxQxN Problem Long-Haul Network
GKK 46 CASC The End
GKK 47 CASC Work performed under the auspices of the U. S. Department of Energy by the University of California, Lawrence Livermore National Laboratory under Contract W-7405-Eng-48 UCRL-VG