CS6321 Query Optimization Over Web Services Utkarsh Kamesh Jennifer Rajeev Shrivastava Munagala Wisdom Motwani Presented By Ajay Kumar Sarda
CS632 2 Motivation Web services emerging as a popular standard for sharing data and functionality Databases behind web services DBMS-like capabilities when data sources are web services Need for query optimization for queries spanning multiple web services
CS632 3 Motivating Example A credit card company wants to send out mails for it’s new credit card offer. I: Potential recipient names WS 1 :name(n) credit rating (cr) WS 2 :name(n) credit card number (ccn) WS 3 :card number (ccn) payment history (ph) One Possible execution is WS 1,WS 2,WS 3 Is it optimal?
Challenges Different response time of web services Precedence constraints Tradeoff between linear pipeline and parallelism Parsing SOAP/XML headers overhead
Related Work Query optimization in the presence of limited access patterns Binding pattern R (A b, B f ) Annotated query plans in the search space,prunes invalid and non-viable plans Starts with initial set S of plans containing only atomic plans S is iteratively updated by adding new plans obtained by combining plans from S using selection and join operations
CS632 6 Outline of the Talk WSMS Preliminaries Query Optimization with and without precedence constraints Data Chunking Experimental Evaluation Conclusion Future work
CS632 7 WSMS Architecture
CS632 8 Query Model Web Service denoted as WS(X b i,,Y f i ) X i - Bound Attributes Y i - Free Attributes
CS632 9 Query Model (Contd.)
CS Query Plans
CS Execution Model T i created for each web service T i takes input from join thread J i J i joins the outputs of parents of WS i J out joins the outputs of all leaves web service.
CS Execution Model (Contd.)
CS Statistics Per-tuple response time(C i ) c i =1/r i where r i is maximum rate of at which results of invocations can be obtained from Ws i Depends on web service provisioning, network conditions and load on the web service Selectivity(S i ) Average number of returned tuples that remain unfiltered after applying predicates S i 1 (proliferative)
CS Bottleneck Cost Metric Query plan H P i (H) -the set of predecessors of WS i in H R[S]-- the combined selectivity of all the web services in S Every tuple in I input to plan H, the average number of tuples that WS i needs to process is given by R[P i (H)] Average processing time required by WS i per original input tuple in I is is R[P i (H)].C i Cost of the query plan H max(R[P i (H)].C i )
CS Bottleneck Cost Metric (Contd.) Plan 1 : max(2*I, 10*0.1*I, 5*0.5*I)=2.5 Plan 2 : max(2*I, 10*I, 5*5*I)=25 Plan 2 is 10 times slower than plan 1
CS Q.O without Precedence Constraints Lemma: “There exists an optimal plan that is a linear ordering of the selective web services, i.e., has no parallel dispatch of data.” SiSi
Q.O without Precedence Constraints Lemma: “Let WS 1,..., WS n be a plan with a linear ordering of the selective web services. If c i > c i+1, then WS i and WS i+1 can be swapped without increasing the cost of the plan.” C i > C i+1 FiCiFiCi F i S i C i+1 C i+1 (S i, C i ) F i S i+1 C i F i C i+1 (S i+1, C i+1 ) CiCi
CS Q.O without Precedence Constraints(Contd.) Theorem : “For selective web services with no precedence constraints, the optimal plan is a linear ordering of the web services by increasing response time, ignoring selectivity's.”
CS Q.O with Precedence Constraints Constructs the plan DAG H incrementally by greedily adding to it one web service at a time Web service chosen should be the one that can be added to H with minimum cost, and all of whose prerequisite web services have already been added to H M i -- the set of all web services that are prerequisites for WS i
CS Adding a Web Service to the Plan A partial plan H (bar) and add WS x Compute the best cut C x such that on placing edges from the web services in C x to WS x, cost is minimized PC x –set of all the web services in C x and all the predecessors in H(bar) Cost incurred by adding WS x is Cost(WS x )=R[PC x ]. C x
CS Adding a Web Service (Contd.) A variable Zi with every WSi, set to 1 if Wsi belongs to PCx. Optimal set PCx obtained by solving LP problem
CS Greedy Algorithm
CS Data Chunking Parsing SOAP/XML headers and network cost overhead on web service call Pass tuples to a web service in chunks Response time of WS i depends on input chunk size C i (k) – Response time of WS i on a chunk of size k A limit k i max exists on max chunk size
CS Data Chunking (Contd.) Query Optimizer must decide on optimal chunk size for each web service “The optimal chunk size to be used by WSi is Ki* such that ci(Ki*)/Ki* is minimized” Profiling combined with query processing for trying out various chunk sizes Intermediate tuples between any two web services in the pipelined plan are buffered
CS Experimental Evaluation Total running time as metric Compare the plans produced by optimizer against Parallel – Dispatch data in parallel SelOrder—Choose WS with lower selectivity Compare the running time with and without chunking Compare the WSMS cost against the slowest web service
CS Experimental Setup WSMS prototype is multithreaded system in Java Apache Axis tools for communicating with web services Java Reflection Different costs by varying delays Different selectivities by rejecting tuple with probability 1-S i
CS No Precedence Constraints WS1,WS2,WS3,WS 4 Selectivities set as 0.4,0.3,0.2,0.1 Range of cost c varied from [0.2,2] to [2,2] Parallel – WS4 SelOrder – WS4
CS Precedence Constraints WS 1,WS 2,WS 3,WS 4 WS 1 < WS 3,WS 2 < WS 4 Selectivities : 2,1,0.1,0.1 Uniform cost of WS 1,WS 2,WS 3 with WS 4 varied from 0.4 to 2
CS Data Chunking WS1,WS2,WS3,WS 4 No precedence constraints Uniform cost Selectivity set to 0.5 Web Services are arranged in linear pipeline (Optimizer) Equal chunk size
CS WSMS Cost Vs Bottleneck Cost No precedence constraints Uniform web service costs Selectivity set to 0.5 Web Services arranged in linear pipeline
CS Future Work Different input tuples to follow different plans Adaptive plans that changes with response times Web Services with monetary costs Multiple web services for same data Profiling techniques that track response time and selectivities Caching Techniques at WSMS
CS Conclusion Web Service Management System Bottleneck cost – cost of pipelined plan Optimal pipelined plan respecting precedence constraints Optimal chunk size
References Query Optimization over Web Services U. Srivastava, J. Widom, K. Munagala, and R. Motwani Query optimization in the presence of limited access patterns. In Proc. of ACM SIGMOD Conf. on Management of Data
CS Thank You!