WATCHMAN: A Data Warehouse Intelligent Cache Manager Peter ScheuermannJunho ShimRadek Vingralek Presentation by: Akash Jain.

Slides:



Advertisements
Similar presentations
Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.
Advertisements

The Performance Impact of Kernel Prefetching on Buffer Cache Replacement Algorithms (ACM SIGMETRIC 05 ) ACM International Conference on Measurement & Modeling.
Background Virtual memory – separation of user logical memory from physical memory. Only part of the program needs to be in memory for execution. Logical.
Design of the fast-pick area Based on Bartholdi & Hackman, Chpt. 7.
1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank.
Supporting top-k join queries in relational databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by Rebecca M. Atchley Thursday, April.
Query Optimization of Frequent Itemset Mining on Multiple Databases Mining on Multiple Databases David Fuhry Department of Computer Science Kent State.
Fast Algorithms For Hierarchical Range Histogram Constructions
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
Multimedia Proxy Caching Mechanism for Quality Adaptive Streaming Applications in the Internet Reza Rejaie Haobo Yu Mark Handley Deborah Estrin Presented.
Peer-to-Peer Based Multimedia Distribution Service Zhe Xiang, Qian Zhang, Wenwu Zhu, Zhensheng Zhang IEEE Transactions on Multimedia, Vol. 6, No. 2, April.
Improving Proxy Cache Performance: Analysis of Three Replacement Policies Dilley, J.; Arlitt, M. A journal paper of IEEE Internet Computing, Volume: 3.
Towards a Theory of Cache-Efficient Algorithms Summary for the seminar: Analysis of algorithms in hierarchical memory – Spring 2004 by Gala Golan.
Distributed DBMSPage 5. 1 © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture  Distributed Database.
ICNP'061 Benefit-based Data Caching in Ad Hoc Networks Bin Tang, Himanshu Gupta and Samir Das Department of Computer Science Stony Brook University.
Adaptive Content Management in Structured P2P Communities Jussi Kangasharju Keith W. Ross David A. Turner.
Visual Querying By Color Perceptive Regions Alberto del Bimbo, M. Mugnaini, P. Pala, and F. Turco University of Florence, Italy Pattern Recognition, 1998.
ECE7995 Caching and Prefetching Techniques in Computer Systems Lecture 8: Buffer Cache in Main Memory (IV)
SAIU: An Efficient Cache Replacement Policy for Wireless On-demand Broadcasts Jianliang Xu, Qinglong Hu, Dik Lun Department of Computer Science in HK University.
A Hybrid Caching Strategy for Streaming Media Files Jussara M. Almeida Derek L. Eager Mary K. Vernon University of Wisconsin-Madison University of Saskatchewan.
A Case for Delay-conscious Caching of Web Documents Peter Scheuermann, Junho Shim, Radek Vingralek Department of Electrical and Computer Engineering Northwestern.
OS Spring’04 Virtual Memory: Page Replacement Operating Systems Spring 2004.
15.7 BUFFER MANAGEMENT Buffer Management Architecture The buffer manager controls main memory directly, as in many relational DBMS’s The buffer.
Copyright © Cengage Learning. All rights reserved. CHAPTER 11 ANALYSIS OF ALGORITHM EFFICIENCY ANALYSIS OF ALGORITHM EFFICIENCY.
A Characterization of Processor Performance in the VAX-11/780 From the ISCA Proceedings 1984 Emer & Clark.
Query Planning for Searching Inter- Dependent Deep-Web Databases Fan Wang 1, Gagan Agrawal 1, Ruoming Jin 2 1 Department of Computer.
1 Ekow J. Otoo Frank Olken Arie Shoshani Adaptive File Caching in Distributed Systems.
Managing Large RDF Graphs (Infinite Graph) Vaibhav Khadilkar Department of Computer Science, The University of Texas at Dallas FEARLESS engineering.
Flashing Up the Storage Layer I. Koltsidas, S. D. Viglas (U of Edinburgh), VLDB 2008 Shimin Chen Big Data Reading Group.
Mining Optimal Decision Trees from Itemset Lattices Dr, Siegfried Nijssen Dr. Elisa Fromont KDD 2007.
Achieving Non-Inclusive Cache Performance with Inclusive Caches Temporal Locality Aware (TLA) Cache Management Policies Aamer Jaleel,
Database Management 9. course. Execution of queries.
Topology aggregation and Multi-constraint QoS routing Presented by Almas Ansari.
Segment-Based Proxy Caching of Multimedia Streams Authors: Kun-Lung Wu, Philip S. Yu, and Joel L. Wolf IBM T.J. Watson Research Center Proceedings of The.
Message-Passing for Wireless Scheduling: an Experimental Study Paolo Giaccone (Politecnico di Torino) Devavrat Shah (MIT) ICCCN 2010 – Zurich August 2.
Join Synopses for Approximate Query Answering Swarup Achrya Philip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented by Bhushan Pachpande.
Automatic Cache Update Control for Scalable Resource Information Service with WS-Management September 23, 2009 Kumiko Tadano, Fumio Machida, Masahiro Kawato,
Partially Supervised Classification of Text Documents by Bing Liu, Philip Yu, and Xiaoli Li Presented by: Rick Knowles 7 April 2005.
Chapter 21 Virtual Memoey: Policies Chien-Chung Shen CIS, UD
Proxy Cache and YOU By Stuart H. Schwartz. What is cache anyway? The general idea of cache is simple… Buffer data from a slow, large source within a (usually)
Mingyang Zhu, Huaijiang Sun, Zhigang Deng Quaternion Space Sparse Decomposition for Motion Compression and Retrieval SCA 2012.
PMIT-6101 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
New Sampling-Based Summary Statistics for Improving Approximate Query Answers Yinghui Wang
Supporting Top-k join Queries in Relational Databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Z. Joseph, CSE-UT Arlington.
Bing LiuCS Department, UIC1 Chapter 8: Semi-supervised learning.
Energy-Efficient Data Caching and Prefetching for Mobile Devices Based on Utility Huaping Shen, Mohan Kumar, Sajal K. Das, and Zhijun Wang P 邱仁傑.
A BRIEF INTRODUCTION TO CACHE LOCALITY YIN WEI DONG 14 SS.
An Overview of Proxy Caching Algorithms Haifeng Wang.
Energy Efficient Prefetching and Caching Athanasios E. Papathanasiou and Michael L. Scott. University of Rochester Proceedings of 2004 USENIX Annual Technical.
Video Caching in Radio Access network: Impact on Delay and Capacity
Jiahao Chen, Yuhui Deng, Zhan Huang 1 ICA3PP2015: The 15th International Conference on Algorithms and Architectures for Parallel Processing. zhangjiajie,
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
1 Passive Reinforcement Learning Ruti Glick Bar-Ilan university.
Mingze Zhang, Mun Choon Chan and A. L. Ananda School of Computing
The Impact of Replacement Granularity on Video Caching
Chapter 21 Virtual Memoey: Policies
RE-Tree: An Efficient Index Structure for Regular Expressions
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
CJT 765: Structural Equation Modeling
Adaptive Cache Replacement Policy
Spatial Online Sampling and Aggregation
CARP: Compression Aware Replacement Policies
K Nearest Neighbor Classification
Virtual Memory: Working Sets
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
ReStore: Reusing Results of MapReduce Jobs
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
Presentation transcript:

WATCHMAN: A Data Warehouse Intelligent Cache Manager Peter ScheuermannJunho ShimRadek Vingralek Presentation by: Akash Jain

Motivation n Data Warehouse - Infrequent Updates. - For Data Analysis and Decision Support System (DSS) Queries. - Query Performance is important. n DSS Queries follow hierarchical “Drill Down Analysis” pattern. n Caching at multiple levels. n Queries at higher level are likely to occur frequently in a multiuser environment.

Overview n Least Normalized Cost Replacement (LNC-R): - Goal: Minimize response time. (not maximize hit ratio) - Uses profit metric: Considers average rate of reference of the retrieved set, its size and cost of associated query. n Least Normalized Cost Admission (LNC-A): - Goal: Should a retrieved set be admitted in the cache? - Uses profit metric similar to LNC-R. - No reference frequency information available for newly retrieved sets. n LNC-RA: Integration of LNC-R with LNC-A. n Sends hints to buffer manager to improve its hit ratio.

Cache Replacement Algorithm: LNC-R n Parameters per retrieved set RS i corresponding to query Q i. -  i : average rate of reference to query Q i. - s i : size of the set retrieved by query Q i. - c i : cost of execution of query Q i. n Maximize Cost Savings Ratio(CSR) defined as: - CSR = (  i c i h i ) / (  i c i r i ) - h i : number of times that references to query Q i were satisfied from cache. - r i : total number of references to query Q i. n Performance metric: - profit(RS i ) =  i * c i / s i

Cache Replacement Algorithm: LNC-R (Contd.) n Algorithm: - Assume RS i with size s i to be admitted, s i > available free space. - Sort retrieved sets in cache in ascending order of profit. - Heuristic: Size does matter! n Calculation of  i : - Based on moving average of last K inter-arrival times of requests to RS i. -  i = K / (t - t k ), t: current time, t k : time of K-th reference. n If K references not available, use maximal available. n Give less referenced retrieved set higher priority of replacement

Cache Admission Algorithm: LNC-A n Aim: Prevent caching of retrieved sets which may cause response time degradation. n Cache RS i only if profit(RS i ) > profit(C), where C is a set of replacement candidates and profit(C) =  RS i  C (  i * c i / s i ) n WATCHMAN retains the reference information, use moving average to calculate  i n If no previous information present for RS i, use estimated profit defined as e-profit(RS i ) = c i / s i n Replace profit by e-profit in previous expressions.

LNC* And LNC-RA n Given: - {RS 1, RS 2, … RS n } = set of retrieved set of all queries. - r 1, r 2,... r n, = retrieved set reference string. - Prob(r i = RS k ) = p k for all i > 1. n Obtain: - I* = { i: RS i is in the cache } where I*  N = { 1, 2,…,n } such that min (  i  N - I* p i * c i ) subject to the constraint  i  I* s i  S where S is the cache size. IS NP-COMPLETE ! n Constrained model: -  i  I* s i = S.

LNC* And LNC-RA (Contd.) n LNC* Algorithm: - Sort {RS 1, RS 2, … RS n } in descending order of p i * c i / s i. - Assign I* from the start of the list until cache is full. n LNC-RA approximates LNC*: - p i =  i /  where  =  i  N  i. - Using a sample of last K references, as K  , LNC-RA converges to I* for sufficiently long reference strings.

Retained Reference Information Problem n Form of starvation - A new retrieved set in cache is the first candidate for eviction. - Reason: We are not storing the reference information. n Proposal (another paper): - Retain reference information only for certain period after the last reference. - Problems: - Non-consideration of size and cost. - Does not take the cache size into account. n WATCHMAN’s proposal: - Evict reference information of a set whenever the profit associated with the set is smaller than the least profit among all cached sets.

Conclusion n Can apply similar ideas in Web caching. n Not such a novel idea. Algorithms are known in theory. n Hints can improve the performance of buffer manager. n LNC-RA improves CSR ratio by a factor of 3when compared with LRU. n LNC-A improve CSR on an average by 19%.