Self-managing Grid Services for Efficient Data Management Anastasios Gounaris (University of Cyprus, University of Manchester) CoreGRID Summer School Budapest, September 2007
2 Web/Grid Services play an increasingly important role in large-scale data management applications E.g., SkyQuery (figure taken from [2]) Or more generic WSMSs that compile, optimize and evaluate workflows composed of multiple calls to remote WSs in order to analyze data [1] [1] U. Srivastava, K. Munagala, J. Widom, and R. Motwani.Query optimization over web services. In VLDB, pages 355–366, [2] Tanu Malik et al. SkyQuery: A Web Service Approach to Federate Databases. In CIDR 2003
3 At the middleware level: OGSA-DAI in brief Different types of data resources - including relational, XML and files - can be exposed via web services onto Grids. A number of popular data resource products are supported. Through OGSA-DAI, data resources can become grid- enabled in a clean, non-intrusive, straightforward way. Perform documents can describe/include data creation, query, modification and delivery tasks For more information, visit the OGSA-DAI website:
4 Services For Accessing Multiple Grid Data Sources and Computational Resources …Id like to find similar proteins to those proteins that have interacted with any other protein in more than X cases... ProteinID1ProteinID2 …. ProteinIDSequence …. ProteinIDSequence …. Subquery2 Subquery1 BLAST
5 OGSA-DQP in brief Provides services that enable nodes to participate in the evaluation of a query Additionally, it provides a coordinator service. Presents to the user a unified view of the available (computational, analysis and OGSA-DAI) resources in a location-transparent way. OGSA-DQP ethos: Light-weight, loosely-coupled integration of data and computational resources. Wrapper-mediator plus: –Allocation of query sub-plans to arbitrary grid nodes. –Partitioned and pipelined parallelism. Service-based in that: –Queries run over services. –Query processor is itself implemented as a collection of services. For more information, visit the OGSA-DQP website:
6 Services For Semantic Integration …Id like to find authors of papers from the database community… AuthorField …. Database ProfessorSchool …. Database ResearcherTopic …. Database - Extend OGSA-DQP with semantic integration capabilities Subquery3 Subquery2 Subquery1 [3] Gounaris, Comito, Sakellariou and Talia. A Service-Oriented System to Support Data Integration on Data Grids. CCGrid 2007 Semantic Map TopicField School Author ResearcherProfessor
7 BUT, there are some problems - continuing the example of slide 4 -IF, machines have the same capacity, connection speed and load, execution time will be t -BUT, IF the load of one machine doubles, then the execution time will be 2* t, whereas the optimum is 4/3 * t -ALSO, IF a third machine becomes available, we must be able to employ it -THUS, we need ADAPTIVE, SELF-MANAGING SOLUTIONS BLAST1 BLAST2 0.5
8 -Concave-like performance graphs are very common -BUT, finding the optimal point is challenging, since the volatility of the environment and the noise insert local optima and non-linearities -In addition, we may have functions of the form F(x,y,z,...) instead of F(x) Service tuning is non-trivial F(x) x
9 Outline of (the rest of) this talk -Component-based adaptivity solutions -Monitoring -Architecture -An example from OGSA-DQP -Control theoretical approaches
10 Monitoring with external component… … such as R-GMA: + RDBMS technology allows for expressive queries -Centralized -Cannot cope well with rapidly evolving properties -Cannot cover application specific metric (e.g., per tuple join cost, selectivity of an operator) R-GMA
11 Self-monitoring -Can monitor both at machine level (e.g., current load, memory available, etc.) and at application level (e.g., per tuple cost of subquery, selectivities, etc.) -No unnecessary information dissemination -… which results in significantly low overheads [4] Anastasios Gounaris, Norman W. Paton, Alvaro A. A. Fernandes, Rizos Sakellariou: "Self-monitoring Query Execution for Adaptive Query Processing". Data Knowl. Eng. 51(3): (2004)
12 Going one step further: Self-managing According to [5], an adaptive, self-managing element is as follows: [5] Jeffrey O. Kephart, David M. Chess: The Vision of Autonomic Computing. IEEE Computer 36(1): (2003). For more information visit: Main advantages: components can be re-used easier combinations of different adaptivity flavors clean engineering easier configuration
13 A Corresponding Adaptivity Framework for Grid Query Services (extension to OGSA-DQP) Query Evaluator query Monitoring Response Assessment Resourc e pool Resourc e pool Execution info Resource info Monitoring events Issues with current execution Prompt for adaptation Monitoring Component Assessment Component Response Component Adaptivity phasesAdaptivity components WS boundaries [6] Gounaris, Paton, Sakellariou, Fernandes, Smith, Watson: "Modular Adaptive Query Processing for Service-Based Grids ". Proc. of the 3rd IEEE International Conference on Autonomic Computing, 2006 [7] A Gounaris Resource Aware Query Processing on the Grid, PhD thesis, 2005
14 Additional info How can these inter-operate? –Through publish subscribe What is the exact role of the adaptivity components –Depends on the adaptivity scenario that is to be supported
15 A dynamic balancing example table_scan (protein) exchange op_call (Blast) project Site C Site A, B Evaluator Detector Site ASite B Coordinator Evaluator Detector Diagnoser Responder 1 subscribe The Detector, Diagnoser and Responder are specific Instantiations of the Monitoring, Assessment and Response Components, respectively
16 A dynamic balancing example table_scan (protein) exchange op_call (Blast) project Site C Site A, B Evaluator Detector Site ASite B Coordinator Evaluator Detector Diagnoser Responder 2 Evaluators pass on monitoring info to detectors
17 A dynamic balancing example table_scan (protein) exchange op_call (Blast) project Site C Site A, B Evaluator Detector Site ASite B Coordinator Evaluator Detector Diagnoser Responder 3 Detectors notify the diagnoser of the actual behaviour
18 A dynamic balancing example table_scan (protein) exchange op_call (Blast) project Site C Site A, B Evaluator Detector Site ASite B Coordinator Evaluator Detector Diagnoser Responder 4 Diagnoser notifies responder
19 A dynamic balancing example table_scan (protein) exchange op_call (Blast) project Site C Site A, B Evaluator Detector Site A Site B Coordinator Evaluator Detector Diagnoser Responder 5 checks progress
20 A dynamic balancing example table_scan (protein) exchange op_call (Blast) project Site C Site A, B Evaluator Detector Site C Coordinator Diagnoser Responder 6 enforces redistribution
21 Categories of Monitoring M1: Notifications on the cost of processing a tuple in a subquery (tuple processing cost, selectivity, time waiting for input). M2: Notifications about communication costs (cost of sending a buffer, recipient of buffer, size of buffer).
22 Monitoring: Parameters M1 notifications: –Events produced by evaluator every 10 tuples; –Averages are calculated for last 25 events; –Threshold for sending notification is a 20% change in cost per tuple. M2 notifications are for every buffer.
23 Assessment Based o the monitoring info, calculate the throughput for each subquery. The assessment component seeks to identify where there is a workload imbalance. –Existing Balance Vector W = (w 1, …, w n ) –Proposed Balance Vector W = (w 1, …, w n ) Notify response component if: –(| w i - w i |) / w i ) > threshold (20%) for some i.
24 Response If evaluation not close to completion and If progress since last adaptation > x% then apply new distribution vector W response can be retrospective: resends tuples to consumer based on W. or prospective: no tuples are resent, but future tuples are sent to consumers based on W.
25 Such a framework is general E.g., - by extending the RC, it can adapt in a different way (e.g., by replacing a slow machine), while re- using the same MC and AC. -By extending the AC, it can solve a different problem (e.g., checking also for vertical imbalances) -By deploying a hierarchy of components, problems can be solved locally and at different levels in order to attain scalability [8] Anastasios Gounaris, Jim Smith, Norman W. Paton, Rizos Sakellariou, Alvaro A.A. Fernandes, Paul Watson: "Adapting to Changing Resource Performance in Grid Query Processing". Proc. of the 1st International Workshop on Data Management in Grids, DMG'05 (held in conjunction with VLDB 2005).
26 Outline of (the rest of) this talk -Component-based adaptivity solutions -Monitoring -Architecture -An example from OGSA-DQP -Control theoretical approaches -Runtime optimization (hill climbing) -Extremum control -System identification
27 A real problem We want to retrieve a large dataset from a WS It is widely accepted that due to performance limitations of XML parsers, SOAP, etc., we must split the data into several chunks The performance is shown in the figure
28 Main observations The optimal block size depends on –The actual connection –The data itself (avg. tuple size) Main challenges: –No model –Significant noise Local minima –Requirement for fast convergence
29 Runtime optimization Newton-based approach. Let x be the block size and y=f(x) be the transfer time. One chunk transfer corresponds to one adaptivity step Performance: LOW! (Such a technique is known to be sensitive to noise, and works better for quadratic models)
30 On extremum control Objective: find the (optimal) value of the control variable(s) that yields the maximum (minimum) value of a process output (or performance measure) in the presence of noise. y = f(x) Extremum controller ykyk xkxk
31 Switching extremum control Let y be the performance metric and x k the block size at the kth step. Then x k = x k-1 – gain* sign(Δx*Δy) Rationale: –detect the side of the optimum point where the current block size resides on. Intuition behind: –the next block size must be greater than the previous one, if, in the last step, an increase has lead to performance improvement, or a decrease has lead to performance degradation. –Otherwise, the block size must become smaller.
32 Extensions Linear models are not suitable, thus it is better the gain to be adaptive gain = b* Δx*Δy/y The impact of noise must be mitigated. To this end, the values of x,y are the average over a window To enable continuous search of the block size space, add a dither signal
33 Evaluation We used 5 queries over data from the TPC-H benchmark
34 Results Starting point: 1000 tuples, b = 15 The performance degradation drops by an order of magnitude or less
35 Intra-query behavior The technique is characterized by stability and quick convergence However, in some cases, overshooting is avoided only by imposing hard upper and lower limits. Also, the oscillation in individual runs are rather significant
36 System identification Choose a model, e.g., Or, … and apply (Recursive) Least Squares. Then, optimal point can be found analytically Which model(s) to choose in each case, remains an open issue!
37 Summary Remarks -Component-based adaptivity solutions -Enable a wide range of adaptive solutions -Decentralized adaptations -Services are self-monitoring -In line with the IBM autonomic toolbox -Control theoretical approaches -seem very promising for self-optimization when a performance metric must be optimized on the fly -In the presence of noise … -… and absence of a model
38 Thank you! Questions?