Cost Framework for a Heterogeneous Distributed Semi-structured Environment Tianxiao Liu (1)(2) Tuyet-Tram Dang-Ngoc (1) Dominique Laurent (1) DBMAN 2007 (1) ETIS Laboratory University of Cergy-Pontoise Cergy-Pontoise, France (2) Xcalia S.A., Paris, France June 18 th, 2007
Outline Motivation Cost models for heterogeneous data sources Contributions Generic language for cost communication Dynamic cost estimation framework Conclusion DBMAN 2007
Motivation Cost-based query optimization Various execution plans for the same query Different costs for each plan (execution time, price, communication, etc.) Cost model used to estimate the cost of candidate plans Cost formulas: source oriented or operation oriented Statistics of data sources Problems in the case of mediation context Data source autonomy: cost models not available Integration of various cost models at mediator level Cost communication between components of the system DBMAN 2007
Cost models for heterogeneous data sources Cost models based on operation implementation Generic cost modelsSpecific methods Known sourcesHeterogeneous autonomous sources Relational Data sources Object oriented Data souces Semi-structured Data sources Operation [GP89] [ML86] [SA82] Sampling [ZL98] Calibration [DKS92] Adaptive [Zhu95] Adapted Refined Operation [CD92] [BMG93] [DOA+94] Calibration [GST96] Access Path [GGT96] Extended Flora [Flo96] [Gru96] Hybrid cost model [NGT98] Cost model by history [ACP96] Wrapper [HKWY97] [ROH99] Operation [AAN01] [MW99] XQuery Self-Learning [ZHJGML05] Applied DBMAN 2007
Background XLive mediation system and its XQuery evaluation process DBMAN 2007 Wrapper … XQuery Query Result (XML) Relational data source XML data source Web services Canonized XQuery Tree Graph View (TGV) Annotated TGV XAlgebra Query Canonization Modeling Annotation Transformation Evaluation Cost-based Optimization Response Wrapper operators Mediator Equivalent rules Search Strategy Mediator Information Repository Wrapper Information Repository Cost information Mediator operators
Background Tree Graph View (TGV) An example of XQueryTGV presentation DBMAN 2007
Generic cost model in a mediation context Design a generic cost model… Source type: relational, semi-structured, web-service… Specific methods Calibration, History… APIs implemented by the system Principle: as accurate as possible …Using cost formulas Equation systems Statistics expressed also in the form of equation Constant values Existing generic cost model (Disco) Object Oriented environment Predefined variables in the language DBMAN 2007
Our proposal: Generic Language for Cost Communication (GLCC) A language based on XML Cost formulas and equation systems in the form of MathML A generic language No predefined variables Express different costs for various optimization objectives (time, price…) DBMAN 2007
Dynamic cost estimation framework Cooperation and communication between different components of XLive Use execution results (response time) to improve the accuracy of cost models Cost communication performed in GLCC DBMAN 2007
Overall cost estimation on the mediator TGV cost annotation For one or a group of operations in a TGV, annotate with cost information Annotated DBMAN 2007
Overall cost estimation on the mediator Cost Annotation Tree (CAT) Breadth-first traversal of CAT to associate the execution cost for each node DBMAN 2007
Conclusion and future work Contributions First cost-based query optimization framework for XML-based mediation system Generic language Suitable for various search strategies Future work Cost model validation: Accuracy and performance Calibrating cost of native XML Data sources Search Strategy DBMAN 2007
Thanks for your attention! Questions? DBMAN 2007