Download presentation
Presentation is loading. Please wait.
1
Query Optimization over Web Services Utkarsh Srivastava Jennifer Widom Jennifer Widom Kamesh Munagala Rajeev Motwani
2
2 Performance Numbers Relative Contribution to Research 0 20 40 60 80 100 012345 Time in Program (years) Percent Contribution Student Advisor This Work
3
3 Future Directions (sample) Web services with monetary cost Web services with unstable response times (QoS guarantees?) Multiple web services for same data Caching web-service query results More expressive queries, also workflows Web service profiling and statistics-tracking
4
4 New Query Optimization Problem First Steps in Big Problem Our contribution
5
5 Web Services Standardized way of sharing data and functionality Data, Functionality Description and discovery WSDL,UDDI Users/ Clients SOAP Communication Web Services
6
6 Reuters Example Web Services WS 1 Stock symbol NASDAQ Company info WS 2 Stock symbol Stock activity
7
7 Querying Across Web Services WS 1 Stock symbol NASDAQ Company info WS 2 Stock symbol Stock activity Get info about all companies with high-activity stock User/ Client Query Results Reuters Easy Transparent Efficient Etc.
8
8 Same Basic Goal as Traditional DBMS Data Database Management System Query Results User/ Client Declarative Interface Easy Transparent Efficient Etc.
9
9 Web Service Management System Web Service Management System Query User/ Client Results WS 1 NASDAQ WS 2 Reuters Easy Transparent Efficient Etc.
10
10 WSMS Architecture Client WS 1 WS 2 WS n Query + input data Results Declarative Interface WS Invocations Metadata Component Web service registration Schema mapper Query Processing Component Plan execution Response- time profiler Statistics tracker Profiling and Statistics Component WSMS Plan selection
11
11 Running Example Credit card company wants to send offers to people with: a) credit rating > 600, and b) payment history = “good” on prior credit card Company has at its disposal: L : List of potential recipients (identified by SSN) WS 1 : SSN credit rating WS 2 : SSN cc number(s) WS 3 : cc number payment history
12
12 Plan 1 Client WS 1 WS 2 WS 3 WSMS L(SSN) SSN cr SSN ccn ccn ph Filter on cr, keep SSN SSN SSN,cr SSN,ccn SSN,ccn,ph Filter on ph, keep SSN Note: Pipelined processing SSNcr 1500 2700 SSNccn 2123 2456 ccnph 123bad 456good SSN 1 2 2 Query Plan
13
13 Simple Representation of Plan 1 L Results WS 1 WS 3 WS 2 SSN crSSN ccn ccn ph
14
14 Plan 2 Client WS 1 WS 2 WS 3 WSMS L(SSN) SSN cr SSN ccn ccn ph Filter on cr, keep SSN SSN SSN,cr SSN,ccn SSN,ccn,ph Filter on ph, keep SSN SSNcr 1500 2700 SSNccn 2123 2456 ccnph 123bad 456good SSN 1 2 2 Join SSN
15
15 Simple Representation of Plan 2 LResults WS 1 WS 2 WS 3 SSN cr SSN ccnccn ph
16
16 Quiz LResults WS 1 WS 2 WS 3 L Results WS 1 WS 3 WS 2 Which plan is better? Plan 2 Plan 1 Cost metric: steady-state throughput Assume join is “free” Plan 1 is never worse
17
17 Query Optimization Primer Possible query plans: P 1, …, P n Data/access statistics: S Execution cost metric: cost(P i, S) GOAL: Find least-cost plan
18
18 Query Optimization Primer Possible query plans: P 1, …, P n Data/access statistics: S Execution cost metric: cost(P i, S) GOAL: Find least-cost plan
19
19 Queries and Plans “Select-Project-Join” queries over input data L and set of web services WS 1, …, WS n Precedence constraints Output of WS i may be needed as input for WS j Ex: WS 2 : SSN ccn and WS 3 : ccn ph Precedence DAG defines space of query plans
20
20 Query Optimization Primer Possible query plans: P 1, …, P n Data/access statistics: S Execution cost metric: cost(P i, S) GOAL: Find least-cost plan
21
21 Statistics 1)Web service response times 2)Web service selectivities Our contribution New Query Optimization Problem
22
22 Statistics: Response Times r i : per-tuple response time of WS i from client WS 1 SSN cr Client SSN cr Assume independent response times within query plans r1r1 r i ≈ 1/throughput, can be reduced by batching, parallel calls batching (see paper) Our contribution New Query Optimization Problem
23
23 Statistics: Selectivities s i : selectivity of WS i Average # output tuples per input tuple to WS i including post-filtering in query plan WS 1 : SSN cr, filter cr > 600 If 90% of SSNs have cr > 600 then s 1 = 0.9 WS 2 : SSN ccn If on average each SSN has 2 credit cards then s 2 = 2.0 Our contribution Assume independent selectivities within query plans New Query Optimization Problem
24
24 Query Optimization Primer Possible query plans: P 1, …, P n Data/access statistics: S Execution cost metric: cost(P i, S) GOAL: Find least-cost plan
25
25 Bottleneck Cost Metric Our contribution New Query Optimization Problem
26
26 Bottleneck Cost Metric Conference Lunch Buffet Dish 1Dish 2Dish 3Dish 4 Average per-tuple processing time = response time of slowest (bottleneck) stage in pipeline Note: selectivities=1 in this example
27
27 Cost Equation for Plan P R i (P): Predecessors of WS i in plan P Fraction of input tuples seen by WS i = WS i response time per input tuple = (assumes WSMS processing is not the bottleneck) Bottleneck cost metric: Π j ∈ R i (P) s j (Π j ∈ R i (P) s j ) r i cost(P) = max 1 ≤i≤n ( ( Π j ∈ R i (P) s j ) r i )
28
28 Contrast with Sum Cost Metric Dish 1Dish 2Dish 3Dish 4 Stream filter ordering Expensive predicate placement “Polite” Lunch Buffet cost(P) = ∑ 1 ≤i≤n ( ( Π j ∈ R i (P) s j ) r i )
29
29 Problem Statement Input: Web services WS 1, …, WS n Response times r 1, …, r n Selectivities s 1, …, s n Precedence constraints among web services Output: Web services arranged into a plan P P respects all precedence constraints cost(P) is minimized
30
30 No Precedence Constraints All selectivities ≤ 1 Theorem: Optimal to order linearly by r i (selectivities irrelevant) General case (optimal): … join at WSMS “selective” web services ordered by response-time “proliferative” web services Results
31
31 With Precedence Constraints cost(P) = max 1 ≤i≤n ( ( Π j ∈ R i (P) s j ) r i )
32
32 With Precedence Constraints Sum cost metric Hard to even obtain a factor O(n ) of optimal 0 20 40 60 80 100 012345 Time in Program (years) Percent Contribution Student Advisor cost(P) = ∑ 1 ≤i≤n ( ( Π j ∈ R i (P) s j ) r i )
33
33 With Precedence Constraints Bottleneck (max) cost metric Surprisingly, optimal solution in polynomial time O(n 5 ) algorithm in paper – Add one WS at a time to the plan – WS chosen by solving a linear program 0 20 40 60 80 100 012345 Time in Program (years) Percent Contribution Student Advisor cost(P) = max 1 ≤i≤n ( ( Π j ∈ R i (P) s j ) r i )
34
34 Example Revisited LResults WS 1 WS 2 WS 3 L Results WS 1 WS 3 WS 2 Plan 2 Plan 1 SSN cr SSN ccn ccn ph WS 2 WS 1 WS 3 WS 1 WS 2 WS 3 Selective Proliferative WS 2 WS 3 Precedence constraint max 1 ≤i≤n ( ( Π j ∈ R i (P) s j ) r i )
35
35 Implementation Built prototype WSMS query processor Optimizer and execution engine Assumes schema issues resolved, statistics provided Written in Java and uses Apache Axis (open-source SOAP implementation) Experiments (see paper) validate analytical results
36
36 Isn’t Problem the Same as … ? Web Service composition Targeted for workflow-oriented applications No provably optimal strategies Parallel/distributed query optimization Freedom to place query operators Much larger space of execution plans Data integration, mediators For general sources of data Optimization of total resource consumption
37
37 Future Directions (sample) Web services with monetary cost Web services with unstable response times (QoS guarantees?) Multiple web services for same data Caching web-service query results More expressive queries, also workflows Web service profiling and statistics-tracking
38
38 Conclusion Our contribution New Query Optimization Problem
39
39 Conclusion Our contribution New Query Optimization Problem
40
40 Questions? 0 20 40 60 80 100 012345 Time in Program (years) Percent Contribution Student Advisor
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.