Loading a Cache with Query Results Laura Haas, IBM Almaden Donald Kossmann, Univ. Passau Ioana Ursu, IBM Almaden.

Slides:



Advertisements
Similar presentations
Query Optimization Reserves Sailors sid=sid bid=100 rating > 5 sname (Simple Nested Loops) Imperative query execution plan: SELECT S.sname FROM Reserves.
Advertisements

Natural Data Clustering: Why Nested Loops Win So Often May, 2008 ©2008 Dan Tow, All rights reserved SingingSQL.
Hopkins Storage Systems Lab, Department of Computer Science Automated Physical Design in Database Caches T. Malik, X. Wang, R. Burns Johns Hopkins University.
CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.
Using the Optimizer to Generate an Effective Regression Suite: A First Step Murali M. Krishna Presented by Harumi Kuno HP.
The State of the Art in Distributed Query Processing by Donald Kossmann Presented by Chris Gianfrancesco.
Cache Memory By JIA HUANG. "Computer Science has only three ideas: cache, hash, trash.“ - Greg Ganger, CMU.
IBM Software Group ® Recommending Materialized Views and Indexes with the IBM DB2 Design Advisor (Automating Physical Database Design) Jarek Gryz.
1 HYRISE – A Main Memory Hybrid Storage Engine By: Martin Grund, Jens Krüger, Hasso Plattner, Alexander Zeier, Philippe Cudre-Mauroux, Samuel Madden, VLDB.
Paper by: A. Balmin, T. Eliaz, J. Hornibrook, L. Lim, G. M. Lohman, D. Simmen, M. Wang, C. Zhang Slides and Presentation By: Justin Weaver.
Query Rewrite: Predicate Pushdown (through grouping) Select bid, Max(age) From Reserves R, Sailors S Where R.sid=S.sid GroupBy bid Having Max(age) > 40.
Wish Branches A Review of “Wish Branches: Enabling Adaptive and Aggressive Predicated Execution” Russell Dodd - October 24, 2006.
1 Query Optimization In Compressed Database Systems Zhiyuan Chen and Johannes Gehrke Cornell University Flip Korn AT&T Labs.
1 Introduction to Load Balancing: l Definition of Distributed systems. Collection of independent loosely coupled computing resources. l Load Balancing.
Query Optimization. General Overview Relational model - SQL  Formal & commercial query languages Functional Dependencies Normalization Physical Design.
1 External Sorting for Query Processing Yanlei Diao UMass Amherst Feb 27, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
1DBTest2008. Motivation Background Relational Data Warehousing (DW) SQL Server 2008 Starjoin improvement Testing Challenge Extending Enterprise-class.
Don’t Scrap It, Wrap It! A Wrapper Architecture for Legacy Data Sources Mary Tork Roth Peter Schwarz IBM Almaden.
Relational Database Performance CSCI 6442 Copyright 2013, David C. Roberts, all rights reserved.
1 Overview of Database Federation and IBM Garlic Project Presented by Xiaofen He.
Optimizing Queries and Diverse Data Sources Laura M. Hass Donald Kossman Edward L. Wimmers Jun Yang Presented By Siddhartha Dasari.
Multi-Query Optimization Prasan Roy Indian Institute of Technology - Bombay.
1 Reading Report 2 Yin Chen 17 Feb 2004 References : The State of the Art in Distributed Query Processing, Donald Kossmann, ACM computing Sruveys, Sep.
Paging. Memory Partitioning Troubles Fragmentation Need for compaction/swapping A process size is limited by the available physical memory Dynamic growth.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
Access Path Selection in a Relational Database Management System Selinger et al.
Query Optimization. overview Histograms A histogram is a data structure maintained by a DBMS to approximate a data distribution Equiwidth vs equidepth.
EN : Adv. Storage and TP Systems Cost-Based Query Optimization.
Database Management 9. course. Execution of queries.
Query optimization in relational DBs Leveraging the mathematical formal underpinnings of the relational model.
A. Cavalli - F. Semeria INFN Experience With Globus GIS 1 A. Cavalli - F. Semeria INFN First INFN Grid Workshop Catania, 9-11 April 2001 INFN Experience.
Meta Scheduling Sathish Vadhiyar Sources/Credits/Taken from: Papers listed in “References” slide.
Web Caching By Neeraj Agrawal. Caching Caching is widely used for improving performance in many context( e.g processor caches in hardware, buffer pool.
Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec.
CPS216: Advanced Database Systems Notes 08:Query Optimization (Plan Space, Query Rewrites) Shivnath Babu.
Searching for Extremes Among Distributed Data Sources with Optimal Probing Zhenyu (Victor) Liu Computer Science Department, UCLA.
Budget-based Control for Interactive Services with Partial Execution 1 Yuxiong He, Zihao Ye, Qiang Fu, Sameh Elnikety Microsoft Research.
Copyright © 2004 Pearson Education, Inc.. Chapter 15 Algorithms for Query Processing and Optimization.
1 Exploring Custom Instruction Synthesis for Application-Specific Instruction Set Processors with Multiple Design Objectives Lin, Hai Fei, Yunsi ACM/IEEE.
Status “Lifetime of a Query” –Query Rewrite –Query Optimization –Query Execution Optimization –Use cost-estimation to iterate over all possible plans,
Dive into the Query Optimizer Dive into the Query Optimizer: Undocumented Insight Benjamin Nevarez Blog: benjaminnevarez.com
© ETH Zürich Eric Lo ETH Zurich a joint work with Carsten Binnig (U of Heidelberg), Donald Kossmann (ETH Zurich), Tamer Ozsu (U of Waterloo) and Peter.
Distributed Databases DBMS Textbook, Chapter 22, Part II.
1 Distributed Databases Chapter 21, Part B. 2 Introduction v Data is stored at several sites, each managed by a DBMS that can run independently. v Distributed.
Supporting Top-k join Queries in Relational Databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Z. Joseph, CSE-UT Arlington.
From Theory to Practice: Efficient Join Query Processing in a Parallel Database System Shumo Chu, Magdalena Balazinska and Dan Suciu Database Group, CSE,
M.Kersten MonetDB, Cracking and recycling Martin Kersten CWI Amsterdam.
Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT By Jyothsna Natarajan Instructor: Prof. Yanqing Zhang Course: Advanced Operating Systems.
Query Processing CS 405G Introduction to Database Systems.
Parallel Programming in Chess Simulations Part 2 Tyler Patton.
Cost Estimation For each plan considered, must estimate cost: –Must estimate cost of each operation in plan tree. Depends on input cardinalities. –Must.
Chapter 13 Query Optimization Yonsei University 1 st Semester, 2015 Sanghyun Park.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
1 Semijoin Reduction in Query Processors Stocker, Kossman, Braumandl, Kemper Integrating Semi-Join-Reducers into State-of-the-Art Query Processors ICDE.
Eugene Meidinger Execution Plans
Management of Broadband Media Assets on Wide Area Networks Lars-Olof Burchard.
Query Optimization. overview Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin) DBA,
Introduction to Load Balancing:
CS222P: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Automatic Physical Design Tuning: Workload as a Sequence
Lecture 12 Lecture 12: Indexing.
DATA CACHING IN WSN Mario A. Nascimento Univ. of Alberta, Canada
Objective of This Course
Query Processing CSD305 Advanced Databases.
Recommending Materialized Views and Indexes with the IBM DB2 Design Advisor (Automating Physical Database Design) Jarek Gryz.
Overview of Query Evaluation
Kabra and DeWitt presented by Zack Ives CSE 590DB, May 11, 1998
Quality-aware Middleware
CS222: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
October 9, 2003.
Presentation transcript:

Loading a Cache with Query Results Laura Haas, IBM Almaden Donald Kossmann, Univ. Passau Ioana Ursu, IBM Almaden

2 Background & Motivation Applications invoke queries and methods Queries select relevant objects Methods work with relevant objects Example: find hotels and reserve rooms Other examples: CAX, SAP R/3, Web foreach h in (select oid from hotels h where city = Edinburgh) h.requestRoom(3, Sep-6, Sep-12);

3 Background and Motivation Traditional client-server systems: –methods are executed by clients with caching –queries are executed by clients and servers –query processing is independent of caching Problems: –data must be fetched twice –objects are faulted in individually Terrible performance in many environments

4 Traditional System server cachequery processor foreach h in (select oid from...) h.reserveRoom();

5 Goal & Solution Load Cache as a by-product of queries. –copy relevant objects while executing the query Cache operators do the copying Extend the query optimizer –which collections should be cached? –when to copy? Assumption: caching in the granularity of objects

6 HotelsCities Cache Join foreach h in (select oid from...) h.reserveRooms(); server

7 Tradeoffs What to cache? –Cost of Cache operator must be smaller than savings obtained by this kind of pre-caching When to cache? –late so that only relevant objects are cached –early so that other operators are not affected N.B. Cache operators affect the cost of other (lower) operators in the plan

8 HotelsCities Cache Join server Early vs. Late Cache Operators: Copying Irrelevant Objects

9 Hotels Cities Cache Join Early vs. Late Cache Operators: Late Projections Early Cache - Cheap Join Late Cache - Expensive Join Hotels Cities Join Cache

10 Alternative Approaches Determine candidate collections for caching; i.e. what to cache: –carry out data flow analysis –analyze select clause of the query; cache if oid is returned Determine when to cache candidate objects: –heuristics –cost-based approach

11 Caching at the Top Heuristics Policy –cache all candidate collections –cache no irrelevant objects (i.e., late caching) Algorithm –generate query plan for select * query –place Cache operator at the top of plan –push down Cache operator through non- reductive operations N.B.: Simulates „external“ approach

Cache Operator Push Down Cache Operator may be pushed down non-reductive operations Cache(h,c) Sort Join HotelsCities Initial Plan Sort Cache(h,c) Join HotelsCities 1. Push Down Cache(h) Join HotelsCache(c) Cities 2. Push Down Push-down reduces the cost of non-reductive operations without causing irrelevant objects being copied

Caching at the Bottom Heuristics Policy –cache all candidate collections –increase cost of other operations as little as possible (i.e., early caching) Algorithm –extend optimizer to produce plan with Cache operators as low as possible (details in paper) –pull-up Cache operators through pipeline Pull-up reduces the number of irrelevant objects that are cached without increasing the cost of pipelined operators

14 Cost-based Cache Operator Placement Try to find the best possible plan –Cache operators only if they are benefitial –Find best place for Cache operators in plan –Join order and site selection depends on caching Extend the query optimizer –enumerate all possible Caching plans –estimate cost and benefit of Cache operators –extended pruning condition for dyn. programming

15 Enumerating all Caching Plans HotelsCities Join Cache(h,c) HotelsCities Join Cache(h)Cache(c) HotelsCities Join Cache(h) HotelsCities Join Plans with Join at the Server Plans with Join at the Client HotelsCities Join Cache(h) HotelsCities Join Cache(c)

16 Costing of Cache Operators Overhead of Cache Operators –cost to probe hash table for every object –cost to copy objects which are not yet cached Benefit of Cache Operators –savings: relevant objects are not refetched –savings depend on costs to fault-in object and current state of the cache Cost = Overhead - Benefit –only Cache operators with Cost < 0 are useful

17 Summary of Approaches Heuristics –simple to implement –not much additional optimization overhead –poor plans in certain situations Cost-based –very good plans –huge search space, slows down query optimizer

18 Performance Experiments Test Environment –Garlic heterogeneous database system –UDB, Lotus Notes, WWW servers Benchmark –relational BUCKY benchmark database –simple queries to multi-way cross-source joins –simple accessor methods

19 Application Run Time (secs) single-table query + accessor method

20 Application Run Time (secs) three-way joins + accessor method

21 Query Optimization Times(secs) vary number of candidate collections

22 Conclusions Loading the cache with query results can result in huge wins –for search & work applications –if client-server interaction is expensive Use cost-based approach for simple queries –four or less candidate collections Use heuristics for complex queries Caching at Bottom heuristics is always at least as good as traditional, do-nothing approach

23 Future Work Explore full range of possible approaches –e.g. cost-based Cache operator pull-up and push-down Consider tradeoff of optimization time and application run time (meta optimization) –invest in optimization time only if high gains in application run-time can be expected –consider state of the cache, dynamic optimization