CS6321 Query Optimization Over Web Services Utkarsh Kamesh Jennifer Rajeev Shrivastava Munagala Wisdom Motwani Presented By Ajay Kumar Sarda.

Slides:

Advertisements

Similar presentations

Lindsey Bleimes Charlie Garrod Adam Meyerson

Advertisements

Choosing an Order for Joins

Analysis of : Operator Scheduling in a Data Stream Manager CS561 – Advanced Database Systems By Eric Bloom.

ECE 667 Synthesis and Verification of Digital Circuits

Hadi Goudarzi and Massoud Pedram

Overcoming Limitations of Sampling for Agrregation Queries Surajit ChaudhuriMicrosoft Research Gautam DasMicrosoft Research Mayur DatarStanford University.

CS4432: Database Systems II

1 CS 201 Compiler Construction Machine Code Generation.

Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.

Case Study: BibFinder BibFinder: A popular CS bibliographic mediator –Integrating 8 online sources: DBLP, ACM DL, ACM Guide, IEEE Xplore, ScienceDirect,

Efficient Query Evaluation on Probabilistic Databases

New Sampling-Based Summary Statistics for Improving Approximate Query Answers P. B. Gibbons and Y. Matias (ACM SIGMOD 1998) Rongfang Li Feb 2007.

Query Optimization over Web Services Utkarsh Srivastava Jennifer Widom Jennifer Widom Kamesh Munagala Rajeev Motwani.

Date:2011/06/08 吳昕澧 BOA: The Bayesian Optimization Algorithm.

Sharing Aggregate Computation for Distributed Queries Ryan Huebsch, UC Berkeley Minos Garofalakis, Yahoo! Research † Joe Hellerstein, UC Berkeley Ion Stoica,

Beneficial Caching in Mobile Ad Hoc Networks Bin Tang, Samir Das, Himanshu Gupta Computer Science Department Stony Brook University.

Operator Placement for In-Network Stream Query Processing U. Srivastava, K. Mungala, and J. Widom, PODS 2005 ICS280 class presentation by Iosif Lazaridis.

Coloring Away Communication in Parallel Query Optimization Waqar Hasan, Rajeev Motwani Stanford University Παυλάτος Χρήστος

Adaptive Ordering of Pipelined Stream Filters S. Babu, R. Motwani, K. Munagala, I. Nishizawa, and J. Widom In Proc. of SIGMOD 2004, June 2004.

Cache Placement in Sensor Networks Under Update Cost Constraint Bin Tang, Samir Das and Himanshu Gupta Department of Computer Science Stony Brook University.

Exploiting Correlated Attributes in Acquisitional Query Processing Amol Deshpande University of Maryland Joint work with Carlos Sam

Flow Algorithms for Two Pipelined Filtering Problems Anne Condon, University of British Columbia Amol Deshpande, University of Maryland Lisa Hellerstein,

1 Improving Hash Join Performance through Prefetching _________________________________________________By SHIMIN CHEN Intel Research Pittsburgh ANASTASSIA.

Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 5: February 2, 2009 Architecture Synthesis (Provisioning, Allocation)

Chain: Operator Scheduling for Memory Minimization in Data Stream Systems Authors: Brian Babcock, Shivnath Babu, Mayur Datar, and Rajeev Motwani (Dept.

An Adaptive Multi-Objective Scheduling Selection Framework For Continuous Query Processing Timothy M. Sutherland Bradford Pielech Yali Zhu Luping Ding.

1 Query Planning with Limited Source Capabilities Chen Li Stanford University Edward Y. Chang University of California, Santa Barbara.

Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 5: February 2, 2009 Architecture Synthesis (Provisioning, Allocation)

Escaping local optimas Accept nonimproving neighbors – Tabu search and simulated annealing Iterating with different initial solutions – Multistart local.

Optimizing Queries and Diverse Data Sources Laura M. Hass Donald Kossman Edward L. Wimmers Jun Yang Presented By Siddhartha Dasari.

Network Optimization Models

NiagaraCQ : A Scalable Continuous Query System for Internet Databases (modified slides available on course webpage) Jianjun Chen et al Computer Sciences.

An Integration Framework for Sensor Networks and Data Stream Management Systems.

Chapter 13 Query Processing Melissa Jamili CS 157B November 11, 2004.

Scheduling Many-Body Short Range MD Simulations on a Cluster of Workstations and Custom VLSI Hardware Sumanth J.V, David R. Swanson and Hong Jiang University.

Join Synopses for Approximate Query Answering Swarup Achrya Philip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented by Bhushan Pachpande.

Searching for Extremes Among Distributed Data Sources with Optimal Probing Zhenyu (Victor) Liu Computer Science Department, UCLA.

Energy Consumption in Mobile Phones: A Measurement Study and Implications for Network Applications REF:Balasubramanian, Niranjan, Aruna Balasubramanian,

Approximate XML Joins Huang-Chun Yu Li Xu. Introduction XML is widely used to integrate data from different sources. Perform join operation for XML documents:

The Fast Optimal Voltage Partitioning Algorithm For Peak Power Density Minimization Jia Wang, Shiyan Hu Department of Electrical and Computer Engineering.

Resource Mapping and Scheduling for Heterogeneous Network Processor Systems Liang Yang, Tushar Gohad, Pavel Ghosh, Devesh Sinha, Arunabha Sen and Andrea.

Center for E-Business Technology Seoul National University Seoul, Korea Optimization of Multi-Domain Queries on the Web Daniele Braga, Stefano Ceri, Florian.

Adaptive Query Processing in Data Stream Systems Paper written by Shivnath Babu Kamesh Munagala, Rajeev Motwani, Jennifer Widom stanfordstreamdatamanager.

Efficient Processing of Top-k Spatial Preference Queries

Lecture 15- Parallel Databases (continued) Advanced Databases Masood Niazi Torshiz Islamic Azad University- Mashhad Branch

New Sampling-Based Summary Statistics for Improving Approximate Query Answers Yinghui Wang

CS4432: Database Systems II Query Processing- Part 2.

Pipelined and Parallel Computing Partition for 1 Hongtao Du AICIP Research Nov 3, 2005.

Minimizing Delay in Shared Pipelines Ori Rottenstreich (Technion, Israel) Joint work with Isaac Keslassy (Technion, Israel) Yoram Revah, Aviran Kadosh.

Adaptive Ordering of Pipelined Stream Filters Babu, Motwani, Munagala, Nishizawa, and Widom SIGMOD 2004 Jun 13-18, 2004 presented by Joshua Lee Mingzhu.

Efficient Resource Allocation for Wireless Multicast De-Nian Yang, Member, IEEE Ming-Syan Chen, Fellow, IEEE IEEE Transactions on Mobile Computing, April.

Ohio State University Department of Computer Science and Engineering Servicing Range Queries on Multidimensional Datasets with Partial Replicas Li Weng,

03/02/20061 Evaluating Top-k Queries Over Web-Accessible Databases Amelie Marian Nicolas Bruno Luis Gravano Presented By: Archana and Muhammed.

Chapter 9: Web Services and Databases Title: NiagaraCQ: A Scalable Continuous Query System for Internet Databases Authors: Jianjun Chen, David J. DeWitt,

ECE 526 – Network Processing Systems Design Programming Model Chapter 21: D. E. Comer.

NiagaraCQ : A Scalable Continuous Query System for Internet Databases Jianjun Chen et al Computer Sciences Dept. University of Wisconsin-Madison SIGMOD.

Construction of Optimal Data Aggregation Trees for Wireless Sensor Networks Deying Li, Jiannong Cao, Ming Liu, and Yuan Zheng Computer Communications and.

Written By: Presented By: Swarup Acharya,Amr Elkhatib Phillip B. Gibbons, Viswanath Poosala, Sridhar Ramaswamy Join Synopses for Approximate Query Answering.

Output Perturbation with Query Relaxation By: XIAO Xiaokui and TAO Yufei Presenter: CUI Yingjie.

BAHIR DAR UNIVERSITY Institute of technology Faculty of Computing Department of information technology Msc program Distributed Database Article Review.

Efficient Evaluation of XQuery over Streaming Data

A paper on Join Synopses for Approximate Query Answering

Chapter 12: Query Processing

Efficient Query Processing for Modern Data Management

Spatial Online Sampling and Aggregation

CS 201 Compiler Construction

Brian Babcock, Shivnath Babu, Mayur Datar, and Rajeev Motwani

Chapter 6 Network Flow Models.

Self-organizing Tuple Reconstruction in Column-stores

Efficient Processing of Top-k Spatial Preference Queries

Presentation transcript:

CS6321 Query Optimization Over Web Services Utkarsh Kamesh Jennifer Rajeev Shrivastava Munagala Wisdom Motwani Presented By Ajay Kumar Sarda

CS632 2 Motivation Web services emerging as a popular standard for sharing data and functionality Databases behind web services DBMS-like capabilities when data sources are web services Need for query optimization for queries spanning multiple web services

CS632 3 Motivating Example A credit card company wants to send out mails for it’s new credit card offer. I: Potential recipient names WS 1 :name(n)  credit rating (cr) WS 2 :name(n)  credit card number (ccn) WS 3 :card number (ccn)  payment history (ph) One Possible execution is WS 1,WS 2,WS 3 Is it optimal?

Challenges Different response time of web services Precedence constraints Tradeoff between linear pipeline and parallelism Parsing SOAP/XML headers overhead

Related Work Query optimization in the presence of limited access patterns Binding pattern R (A b, B f ) Annotated query plans in the search space,prunes invalid and non-viable plans Starts with initial set S of plans containing only atomic plans S is iteratively updated by adding new plans obtained by combining plans from S using selection and join operations

CS632 6 Outline of the Talk WSMS Preliminaries Query Optimization with and without precedence constraints Data Chunking Experimental Evaluation Conclusion Future work

CS632 7 WSMS Architecture

CS632 8 Query Model Web Service denoted as WS(X b i,,Y f i )  X i - Bound Attributes  Y i - Free Attributes

CS632 9 Query Model (Contd.)

CS Query Plans

CS Execution Model T i created for each web service T i takes input from join thread J i J i joins the outputs of parents of WS i J out joins the outputs of all leaves web service.

CS Execution Model (Contd.)

CS Statistics Per-tuple response time(C i )  c i =1/r i where r i is maximum rate of at which results of invocations can be obtained from Ws i  Depends on web service provisioning, network conditions and load on the web service Selectivity(S i )  Average number of returned tuples that remain unfiltered after applying predicates  S i 1 (proliferative)

CS Bottleneck Cost Metric Query plan H P i (H) -the set of predecessors of WS i in H R[S]-- the combined selectivity of all the web services in S Every tuple in I input to plan H, the average number of tuples that WS i needs to process is given by R[P i (H)] Average processing time required by WS i per original input tuple in I is is R[P i (H)].C i Cost of the query plan H max(R[P i (H)].C i )

CS Bottleneck Cost Metric (Contd.)  Plan 1 : max(2*I, 10*0.1*I, 5*0.5*I)=2.5  Plan 2 : max(2*I, 10*I, 5*5*I)=25  Plan 2 is 10 times slower than plan 1

CS Q.O without Precedence Constraints Lemma: “There exists an optimal plan that is a linear ordering of the selective web services, i.e., has no parallel dispatch of data.” SiSi

Q.O without Precedence Constraints Lemma: “Let WS 1,..., WS n be a plan with a linear ordering of the selective web services. If c i > c i+1, then WS i and WS i+1 can be swapped without increasing the cost of the plan.” C i > C i+1 FiCiFiCi F i S i C i+1 C i+1 (S i, C i ) F i S i+1 C i F i C i+1 (S i+1, C i+1 ) CiCi

CS Q.O without Precedence Constraints(Contd.) Theorem : “For selective web services with no precedence constraints, the optimal plan is a linear ordering of the web services by increasing response time, ignoring selectivity's.”

CS Q.O with Precedence Constraints Constructs the plan DAG H incrementally by greedily adding to it one web service at a time Web service chosen should be the one that can be added to H with minimum cost, and all of whose prerequisite web services have already been added to H M i -- the set of all web services that are prerequisites for WS i

CS Adding a Web Service to the Plan A partial plan H (bar) and add WS x Compute the best cut C x such that on placing edges from the web services in C x to WS x, cost is minimized PC x –set of all the web services in C x and all the predecessors in H(bar) Cost incurred by adding WS x is Cost(WS x )=R[PC x ]. C x

CS Adding a Web Service (Contd.) A variable Zi with every WSi, set to 1 if Wsi belongs to PCx. Optimal set PCx obtained by solving LP problem

CS Greedy Algorithm

CS Data Chunking Parsing SOAP/XML headers and network cost overhead on web service call Pass tuples to a web service in chunks Response time of WS i depends on input chunk size C i (k) – Response time of WS i on a chunk of size k A limit k i max exists on max chunk size

CS Data Chunking (Contd.) Query Optimizer must decide on optimal chunk size for each web service “The optimal chunk size to be used by WSi is Ki* such that ci(Ki*)/Ki* is minimized” Profiling combined with query processing for trying out various chunk sizes Intermediate tuples between any two web services in the pipelined plan are buffered

CS Experimental Evaluation Total running time as metric Compare the plans produced by optimizer against  Parallel – Dispatch data in parallel  SelOrder—Choose WS with lower selectivity Compare the running time with and without chunking Compare the WSMS cost against the slowest web service

CS Experimental Setup WSMS prototype is multithreaded system in Java Apache Axis tools for communicating with web services Java Reflection Different costs by varying delays Different selectivities by rejecting tuple with probability 1-S i

CS No Precedence Constraints WS1,WS2,WS3,WS 4 Selectivities set as 0.4,0.3,0.2,0.1 Range of cost c varied from [0.2,2] to [2,2] Parallel – WS4 SelOrder – WS4

CS Precedence Constraints WS 1,WS 2,WS 3,WS 4 WS 1 < WS 3,WS 2 < WS 4 Selectivities : 2,1,0.1,0.1 Uniform cost of WS 1,WS 2,WS 3 with WS 4 varied from 0.4 to 2

CS Data Chunking WS1,WS2,WS3,WS 4 No precedence constraints Uniform cost Selectivity set to 0.5 Web Services are arranged in linear pipeline (Optimizer) Equal chunk size

CS WSMS Cost Vs Bottleneck Cost No precedence constraints Uniform web service costs Selectivity set to 0.5 Web Services arranged in linear pipeline

CS Future Work Different input tuples to follow different plans Adaptive plans that changes with response times Web Services with monetary costs Multiple web services for same data Profiling techniques that track response time and selectivities Caching Techniques at WSMS

CS Conclusion Web Service Management System Bottleneck cost – cost of pipelined plan Optimal pipelined plan respecting precedence constraints Optimal chunk size

References Query Optimization over Web Services U. Srivastava, J. Widom, K. Munagala, and R. Motwani Query optimization in the presence of limited access patterns. In Proc. of ACM SIGMOD Conf. on Management of Data

CS Thank You!