Flexible Control of Data Transfer between Parallel Programs Joe Shang-chieh Wu Alan Sussman Department of Computer Science University of Maryland, USA
Grid Corona and solar wind Global magnetospheric MHD Thermosphere- ionosphere model Rice convection model Particle and Hybrid model
Grid What is the problem? Coupling existing (parallel) programs –for physical simulations more accurate answers can be obtained –for visualization, flexible transmission of data between simulation and visualization codes Exchange data across shared or overlapped regions in multiple parallel programs Couple multi-scale (space & time) programs Focus on multiple time scale problems (when to exchange data)
Grid Roadmap Motivation Approximate Matching Matching properties Performance results Conclusions and future work
Grid Is it important? Petroleum reservoir simulations – multi- scale, multi-resolution code Special issue in May/Jun 2004 of IEEE Computing in Science & Engineering “It’s then possible to couple several existing calculations together through an interface and obtain accurate answers.” Earth System Modeling Framework several US federal agencies and universities. (
Grid Solving multiple space scales 1.Appropriate tools 2.Coordinate transformation 3.Domain knowledge
Grid Matching is OUTSIDE components Separate matching (coupling) information from the participating components –Maintainability – Components can be developed/upgraded individually –Flexibility – Change participants/components easily –Functionality – Support variable-sized time interval numerical algorithms or visualizations Matching information is specified separately by application integrator Runtime match via simulation time stamps
Grid Separate codes from matching define region Sr12 define region Sr4 define region Sr5... Do t = 1, N, Step0... // computation jobs export(Sr12,t) export(Sr4,t) export(Sr5,t) EndDo define region Sr0... Do t = 1, M, Step1 import(Sr0,t)... // computation jobs EndDo Importer Ap1 Exporter Ap0 Ap1.Sr0 Ap2.Sr0 Ap4.Sr0 Ap0.Sr12 Ap0.Sr4 Ap0.Sr5 Configuration file # Ap0 cluster0 /bin/Ap Ap1 cluster1 /bin/Ap Ap2 cluster2 /bin/Ap Ap4 cluster4 /bin/Ap4 4 # Ap0.Sr12 Ap1.Sr0 REGL 0.05 Ap0.Sr12 Ap2.Sr0 REGU 0.1 Ap0.Sr4 Ap4.Sr0 REG 1.0 #
Grid Matching implementation Library is implemented with POSIX threads Each process in each program uses library threads to exchange control information in the background, while applications are computing in the foreground One process in each parallel program runs an extra representative thread to exchange control information between parallel programs –Minimize communication between parallel programs –Keep collective correctness in each parallel program –Improve overall performance
Grid Approximate Matching Exporter Ap0 produces a sequence of data object A at simulation times 1.1, 1.2, 1.5, and Importer Ap1 requests the same data object A at time 1.3 Is there a match for If Yes, which one and why?
Grid Supported matching policies = LUBminimum f(x) with f(x) ≥ x GLBmaximum f(x) with f(x) ≤ x REGf(x) minimizes |f(x)-x| with |f(x)-x| ≤ p REGUf(x) minimizes f(x)-x with 0 ≤ f(x)-x ≤ p REGLf(x) minimizes x-f(x) with 0 ≤ x-f(x) ≤ p FASTRany f(x) with |f(x)-x| ≤ p FASTUany f(x) with 0 ≤ f(x)-x ≤ p FASTLany f(x) with 0 ≤ x-f(x) ≤ p
Grid Acceptable ≠ Matchable t e’ t e’’
Grid Region-type matches te’te’
Grid Experimental setup Question : How much overhead introduced by runtime matching? 6 PIII-600 processors, connected by channel-bonded Fast Ethernet u tt = u xx + u yy + f(t,x,y), solve 2-d diffusion equation by the finite element method. u(t,x,y) : 512x512 array, on 4 processors (Ap1) f(t,x,y) : 32x512 array, on 2 processors (Ap2) All data in Ap2 is sent (exported) to Ap1 using matching criterion Ap1 receives (imports) data with 3 different scenarios matches made for each scenario (results averaged over multiple runs)
Grid Experiment result 1 P10P11P12P13 Case A341ms336ms610ms614ms Case B620ms618ms Case C624ms612ms340ms339ms Ap1 execution time (average)
Grid Experiment result 2 Do t = 1, N import (data, t) compute u EndDo Do t = 1, N Request a match for Receive data compute u EndDo Matching timeData Transfer timeComputation TimeMatching Overhead Case A944us6.1ms605ms13% Case B708us2.9ms613ms20% Case C535us6.8ms614ms7% Ap1 pseudo code Ap1 overhead in the slowest process
Grid Experiment result 3 Slowest ProcessFastest Process Case A944us (P13)4394us (P11) Case B708us (P10)3468us (Others) Case C535us (P10)3703us (P13) Comparison of matching time Fastest process (P11) - high cost, remote match Slowest process (P13) - low cost, local match High cost match can be hidden
Grid Conclusions & Future work Conclusions –Low overhead approach for flexible data exchange between different time scale e- Science components Ongoing & future work –Performance experiments in Grid environment –Caching strategies to efficiently deal with slow importers –Real applications – space weather is the first one
End of Talk
Grid Main components
Grid Local and Remote requests
Grid Space Science Application