Query Optimization over Web Services Utkarsh Srivastava Jennifer Widom Jennifer Widom Kamesh Munagala Rajeev Motwani.

Slides:



Advertisements
Similar presentations
L3S Research Center University of Hanover Germany
Advertisements

2 A bank application needs to access information from the customer database and integrate it with loan credit history information stored in a legacy database.
A Prototype Implementation of a Framework for Organising Virtual Exhibitions over the Web Ali Elbekai, Nick Rossiter School of Computing, Engineering and.
Dependence Precedence. Precedence & Dependence Can we execute a 1000 line program with 1000 processors in one step? What are the issues to deal with in.
OLAP Query Processing in Grids
Linear Programming. Introduction: Linear Programming deals with the optimization (max. or min.) of a function of variables, known as ‘objective function’,
Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.
Distributed DBMS© M. T. Özsu & P. Valduriez Ch.6/1 Outline Introduction Background Distributed Database Design Database Integration Semantic Data Control.
بهينه سازي پرس جو هاي سرويس هاي وب علي رهبري. سرويس هاي وب روشي استاندارد براي به اشتراک گذاري اطلاعات و قابليت ها 2 Data, کاربرد توصيف و پيدا کردن WSDL,UDDI.
Parallel Database Systems The Future Of High Performance Database Systems David Dewitt and Jim Gray 1992 Presented By – Ajith Karimpana.
1 Primitives for Workload Summarization and Implications for SQL Prasanna Ganesan* Stanford University Surajit Chaudhuri Vivek Narasayya Microsoft Research.
SECTIONS 21.4 – 21.5 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin INFORMATION INTEGRATION.
Operator Placement for In-Network Stream Query Processing U. Srivastava, K. Mungala, and J. Widom, PODS 2005 ICS280 class presentation by Iosif Lazaridis.
Adaptive Ordering of Pipelined Stream Filters S. Babu, R. Motwani, K. Munagala, I. Nishizawa, and J. Widom In Proc. of SIGMOD 2004, June 2004.
Operator Placement for In-Network Stream Query Processing.
Chapter 10: Stream-based Data Management Title: Design, Implementation, and Evaluation of the Linear Road Benchmark on the Stream Processing Core Authors:
Flow Algorithms for Two Pipelined Filtering Problems Anne Condon, University of British Columbia Amol Deshpande, University of Maryland Lisa Hellerstein,
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
Parametric Query Generation Student: Dilys Thomas Mentor: Nico Bruno Manager: Surajit Chaudhuri.
1 Optimizing Utility in Cloud Computing through Autonomic Workload Execution Reporter : Lin Kelly Date : 2010/11/24.
1 Towards an end-to-end architecture for handling sensitive data Hector Garcia-Molina Rajeev Motwani and students.
SS ZG653Second Semester, Topic Architectural Patterns Pipe and Filter.
Web Service Implementation Maitreya, Kishore, Jeff.
© Copyright 2000 M. Rodriguez-Martinez, All Rights Reserved Automatic Deployment of Application-Specific Metadata and Code in MOCHA Manuel Rodriguez-Martinez.
Middleware Enabled Data Sharing on Cloud Storage Services Jianzong Wang Peter Varman Changsheng Xie 1 Rice University Rice University HUST Presentation.
Optimizing Queries and Diverse Data Sources Laura M. Hass Donald Kossman Edward L. Wimmers Jun Yang Presented By Siddhartha Dasari.
A Theoretical Study of Optimization Techniques Used in Registration Area Based Location Management: Models and Online Algorithms Sandeep K. S. Gupta Goran.
M i SMob i S Mob i Store - Mobile i nternet File Storage Platform Chetna Kaur.
SITools Enhanced Use of Laboratory Services and Data Romain Conseil
An Integration Framework for Sensor Networks and Data Stream Management Systems.
Agent Model for Interaction with Semantic Web Services Ivo Mihailovic.
FPGA FPGA2  A heterogeneous network of workstations (NOW)  FPGAs are expensive, available on some hosts but not others  NOW provide coarse- grained.
Join Synopses for Approximate Query Answering Swarup Achrya Philip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented by Bhushan Pachpande.
Bringing Value of Big Data to Business: SAP's Integrated Strategy [1] Group 6 - Ziqi Fan, Sheng Chen.
Static Translation of Stream Programs S. M. Farhad School of Information Technology The University of Sydney.
Data-Centric Human Computation Jennifer Widom Stanford University.
Efficient Provisioning of Service Level Agreements for Service Oriented Applications Valeria Cardellini, Emiliano Casalicchio, Vincenzo Grassi, Francesco.
LATA: A Latency and Throughput- Aware Packet Processing System Author: Jilong Kuang and Laxmi Bhuyan Publisher: DAC 2010 Presenter: Chun-Sheng Hsueh Date:
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
PMIT-6101 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
INNOV-10 Progress® Event Engine™ Technical Overview Prashant Thumma Principal Software Engineer.
1 Distributed Databases Chapter 21, Part B. 2 Introduction v Data is stored at several sites, each managed by a DBMS that can run independently. v Distributed.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE User Forum, Manchester, 10 May ‘07 Nicola Venuti
Supporting Top-k join Queries in Relational Databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Z. Joseph, CSE-UT Arlington.
CS6321 Query Optimization Over Web Services Utkarsh Kamesh Jennifer Rajeev Shrivastava Munagala Wisdom Motwani Presented By Ajay Kumar Sarda.
INRIA - Progress report DBGlobe meeting - Athens November 29 th, 2002.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 15 – Query Optimization.
CISC 849 : Applications in Fintech Namami Shukla Dept of Computer & Information Sciences University of Delaware iCARE : A Framework for Big Data Based.
Data Manipulation with Globus Toolkit Ivan Ivanovski TU München,
Chapter 9: Web Services and Databases Title: NiagaraCQ: A Scalable Continuous Query System for Internet Databases Authors: Jianjun Chen, David J. DeWitt,
Rate-Based Query Optimization for Streaming Information Sources Stratis D. Viglas Jeffrey F. Naughton.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
OGSA-DQP Steven Lynden University of Manchester. Data access & integration with OGSA-DAI: GGF 17 2 Introduction OGSA-DQP is a service based distributed.
Written By: Presented By: Swarup Acharya,Amr Elkhatib Phillip B. Gibbons, Viswanath Poosala, Sridhar Ramaswamy Join Synopses for Approximate Query Answering.
Static Translation of Stream Program to a Parallel System S. M. Farhad The University of Sydney.
BlinkDB: Queries with Bounded Errors and Bounded Response Times on Very Large Data Authored by Sameer Agarwal, et. al. Presented by Atul Sandur.
Application of Design Patterns to Geometric Decompositions V. Balaji, Thomas L. Clune, Robert W. Numrich and Brice T. Womack.
Lifemapper 2.0 Using and Creating Geospatial Data and Open Source Tools for the Biological Community Aimee Stewart, CJ Grady, Dave Vieglais, Jim Beach.
OPERATING SYSTEMS CS 3502 Fall 2017
Applying Control Theory to Stream Processing Systems
Web Ontology Language for Service (OWL-S)
Efficient Query Processing for Modern Data Management
Introduction to cosynthesis Rabi Mahapatra CSCE617
Streaming Sensor Data Fjord / Sensor Proxy Multiquery Eddy
Outline Introduction Background Distributed DBMS Architecture
View and Index Selection Problem in Data Warehousing Environments
Query Optimization CS 157B Ch. 14 Mien Siao.
Kabra and DeWitt presented by Zack Ives CSE 590DB, May 11, 1998
REED : Robust, Efficient Filtering and Event Detection
ReStore: Reusing Results of MapReduce Jobs
Presentation transcript:

Query Optimization over Web Services Utkarsh Srivastava Jennifer Widom Jennifer Widom Kamesh Munagala Rajeev Motwani

2 Performance Numbers Relative Contribution to Research Time in Program (years) Percent Contribution Student Advisor This Work

3 Future Directions (sample) Web services with monetary cost Web services with unstable response times (QoS guarantees?) Multiple web services for same data Caching web-service query results More expressive queries, also workflows Web service profiling and statistics-tracking

4 New Query Optimization Problem First Steps in Big Problem Our contribution

5 Web Services Standardized way of sharing data and functionality Data, Functionality Description and discovery WSDL,UDDI Users/ Clients SOAP Communication Web Services

6 Reuters Example Web Services WS 1 Stock symbol NASDAQ Company info WS 2 Stock symbol Stock activity

7 Querying Across Web Services WS 1 Stock symbol NASDAQ Company info WS 2 Stock symbol Stock activity Get info about all companies with high-activity stock User/ Client Query Results Reuters Easy Transparent Efficient Etc.

8 Same Basic Goal as Traditional DBMS Data Database Management System Query Results User/ Client Declarative Interface Easy Transparent Efficient Etc.

9 Web Service Management System Web Service Management System Query User/ Client Results WS 1 NASDAQ WS 2 Reuters Easy Transparent Efficient Etc.

10 WSMS Architecture Client WS 1 WS 2 WS n Query + input data Results Declarative Interface WS Invocations Metadata Component Web service registration Schema mapper Query Processing Component Plan execution Response- time profiler Statistics tracker Profiling and Statistics Component WSMS Plan selection

11 Running Example Credit card company wants to send offers to people with: a) credit rating > 600, and b) payment history = “good” on prior credit card Company has at its disposal: L : List of potential recipients (identified by SSN) WS 1 : SSN  credit rating WS 2 : SSN  cc number(s) WS 3 : cc number  payment history

12 Plan 1 Client WS 1 WS 2 WS 3 WSMS L(SSN) SSN  cr SSN  ccn ccn  ph Filter on cr, keep SSN SSN SSN,cr SSN,ccn SSN,ccn,ph Filter on ph, keep SSN Note: Pipelined processing SSNcr SSNccn ccnph 123bad 456good SSN Query Plan

13 Simple Representation of Plan 1 L Results WS 1 WS 3 WS 2 SSN  crSSN  ccn ccn  ph

14 Plan 2 Client WS 1 WS 2 WS 3 WSMS L(SSN) SSN  cr SSN  ccn ccn  ph Filter on cr, keep SSN SSN SSN,cr SSN,ccn SSN,ccn,ph Filter on ph, keep SSN SSNcr SSNccn ccnph 123bad 456good SSN Join SSN

15 Simple Representation of Plan 2 LResults WS 1 WS 2 WS 3 SSN  cr SSN  ccnccn  ph

16 Quiz LResults WS 1 WS 2 WS 3 L Results WS 1 WS 3 WS 2 Which plan is better? Plan 2 Plan 1 Cost metric: steady-state throughput Assume join is “free” Plan 1 is never worse

17 Query Optimization Primer Possible query plans: P 1, …, P n Data/access statistics: S Execution cost metric: cost(P i, S) GOAL: Find least-cost plan

18 Query Optimization Primer Possible query plans: P 1, …, P n Data/access statistics: S Execution cost metric: cost(P i, S) GOAL: Find least-cost plan

19 Queries and Plans “Select-Project-Join” queries over input data L and set of web services WS 1, …, WS n Precedence constraints Output of WS i may be needed as input for WS j Ex: WS 2 : SSN  ccn and WS 3 : ccn  ph Precedence DAG defines space of query plans

20 Query Optimization Primer Possible query plans: P 1, …, P n Data/access statistics: S Execution cost metric: cost(P i, S) GOAL: Find least-cost plan

21 Statistics 1)Web service response times 2)Web service selectivities Our contribution New Query Optimization Problem

22 Statistics: Response Times r i : per-tuple response time of WS i from client WS 1 SSN  cr Client SSN cr Assume independent response times within query plans r1r1 r i ≈ 1/throughput, can be reduced by batching, parallel calls batching (see paper) Our contribution New Query Optimization Problem

23 Statistics: Selectivities s i : selectivity of WS i Average # output tuples per input tuple to WS i including post-filtering in query plan WS 1 : SSN  cr, filter cr > 600 If 90% of SSNs have cr > 600 then s 1 = 0.9 WS 2 : SSN  ccn If on average each SSN has 2 credit cards then s 2 = 2.0 Our contribution Assume independent selectivities within query plans New Query Optimization Problem

24 Query Optimization Primer Possible query plans: P 1, …, P n Data/access statistics: S Execution cost metric: cost(P i, S) GOAL: Find least-cost plan

25 Bottleneck Cost Metric Our contribution New Query Optimization Problem

26 Bottleneck Cost Metric Conference Lunch Buffet Dish 1Dish 2Dish 3Dish 4 Average per-tuple processing time = response time of slowest (bottleneck) stage in pipeline Note: selectivities=1 in this example

27 Cost Equation for Plan P R i (P): Predecessors of WS i in plan P Fraction of input tuples seen by WS i = WS i response time per input tuple = (assumes WSMS processing is not the bottleneck) Bottleneck cost metric: Π j ∈ R i (P) s j (Π j ∈ R i (P) s j ) r i cost(P) = max 1 ≤i≤n ( ( Π j ∈ R i (P) s j ) r i )

28 Contrast with Sum Cost Metric Dish 1Dish 2Dish 3Dish 4 Stream filter ordering Expensive predicate placement “Polite” Lunch Buffet cost(P) = ∑ 1 ≤i≤n ( ( Π j ∈ R i (P) s j ) r i )

29 Problem Statement Input: Web services WS 1, …, WS n Response times r 1, …, r n Selectivities s 1, …, s n Precedence constraints among web services Output: Web services arranged into a plan P P respects all precedence constraints cost(P) is minimized

30 No Precedence Constraints All selectivities ≤ 1 Theorem: Optimal to order linearly by r i (selectivities irrelevant) General case (optimal): … join at WSMS “selective” web services ordered by response-time “proliferative” web services Results

31 With Precedence Constraints cost(P) = max 1 ≤i≤n ( ( Π j ∈ R i (P) s j ) r i )

32 With Precedence Constraints Sum cost metric Hard to even obtain a factor O(n  ) of optimal Time in Program (years) Percent Contribution Student Advisor cost(P) = ∑ 1 ≤i≤n ( ( Π j ∈ R i (P) s j ) r i )

33 With Precedence Constraints Bottleneck (max) cost metric Surprisingly, optimal solution in polynomial time O(n 5 ) algorithm in paper – Add one WS at a time to the plan – WS chosen by solving a linear program Time in Program (years) Percent Contribution Student Advisor cost(P) = max 1 ≤i≤n ( ( Π j ∈ R i (P) s j ) r i )

34 Example Revisited LResults WS 1 WS 2 WS 3 L Results WS 1 WS 3 WS 2 Plan 2 Plan 1 SSN  cr SSN  ccn ccn  ph WS 2 WS 1 WS 3 WS 1 WS 2 WS 3 Selective Proliferative WS 2 WS 3 Precedence constraint max 1 ≤i≤n ( ( Π j ∈ R i (P) s j ) r i )

35 Implementation Built prototype WSMS query processor Optimizer and execution engine Assumes schema issues resolved, statistics provided Written in Java and uses Apache Axis (open-source SOAP implementation) Experiments (see paper) validate analytical results

36 Isn’t Problem the Same as … ? Web Service composition Targeted for workflow-oriented applications No provably optimal strategies Parallel/distributed query optimization Freedom to place query operators Much larger space of execution plans Data integration, mediators For general sources of data Optimization of total resource consumption

37 Future Directions (sample) Web services with monetary cost Web services with unstable response times (QoS guarantees?) Multiple web services for same data Caching web-service query results More expressive queries, also workflows Web service profiling and statistics-tracking

38 Conclusion Our contribution New Query Optimization Problem

39 Conclusion Our contribution New Query Optimization Problem

40 Questions? Time in Program (years) Percent Contribution Student Advisor