Master’s Thesis Semantic Query Caching in Mobile Environments By: Jekkin Shah Advisor: Dr. Konstantinos Kalpakis.

Slides:



Advertisements
Similar presentations
Chapter 10: Designing Databases
Advertisements

Semantic Data Caching and Replacement. Outline Motivation Client Caching Architecture Model of Semantic Caching Simulations and Results Conclusion and.
W3C Workshop on Web Services Mark Nottingham
Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.
Indications in green = Live content Indications in white = Edit in master Indications in blue = Locked elements Indications in black = Optional elements.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part C Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
SLAW: A Mobility Model for Human Walks Lee et al..
CENTRE Cellular Network’s Positioning Data Generator Fosca GiannottiKDD-Lab Andrea MazzoniKKD-Lab Puntoni SimoneKDD-Lab Chiara RensoKDD-Lab.
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
1 Chapter 12: Decision-Support Systems for Supply Chain Management CASE: Supply Chain Management Smooths Production Flow Prepared by Hoon Lee Date on 14.
Beneficial Caching in Mobile Ad Hoc Networks Bin Tang, Samir Das, Himanshu Gupta Computer Science Department Stony Brook University.
Chapter 8 File organization and Indices.
Goal: To build a ubiquitous and robust storage infrastructure Requirement: Scalability, availability, performance, robustness Solution: Dynamic object.
Peer-to-Peer Based Multimedia Distribution Service Zhe Xiang, Qian Zhang, Wenwu Zhu, Zhensheng Zhang IEEE Transactions on Multimedia, Vol. 6, No. 2, April.
Brent Dingle Marco A. Morales Texas A&M University, Spring 2002
Virtual Memory Chapter 8. Hardware and Control Structures Memory references are dynamically translated into physical addresses at run time –A process.
Improving Proxy Cache Performance: Analysis of Three Replacement Policies Dilley, J.; Arlitt, M. A journal paper of IEEE Internet Computing, Volume: 3.
TrafficView: A Scalable Traffic Monitoring System Tamer Nadeem, Sasan Dashtinezhad, Chunyuan Liao, Liviu Iftode* Department of Computer Science University.
1 Chapter 8 Virtual Memory Virtual memory is a storage allocation scheme in which secondary memory can be addressed as though it were part of main memory.
Database caching in MANETs Based on Separation of Queries and Responses Author: Hassan Artail, Haidar Safa, and Samuel Pierre Publisher: Wireless And Mobile.
Object Naming & Content based Object Search 2/3/2003.
1 ENHANCHING THE WEB’S INFRASTUCTURE: FROM CACHING TO REPLICATION ECE 7995 Presented By: Pooja Swami and Usha Parashetti.
Finding Nearby Wireless Hotspots CSE 403 LCA Presentation Team Members: Chris Scoville Tessa MacDuff Matt Mohebbi Aiman Erbad Khalil El Haitami.
Preparing Data for Analysis and Analyzing Spatial Data/ Geoprocessing Class 11 GISG 110.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
Performance of Web Applications Introduction One of the success-critical quality characteristics of Web applications is system performance. What.
HERO: Online Real-time Vehicle Tracking in Shanghai Xuejia Lu 11/17/2008.
An Integration Framework for Sensor Networks and Data Stream Management Systems.
Accessing to Spatial Data in Mobile Environment Presented By Jekkin Shah.
UbiStore: Ubiquitous and Opportunistic Backup Architecture. Feiselia Tan, Sebastien Ardon, Max Ott Presented by: Zainab Aljazzaf.
Master Thesis Defense Jan Fiedler 04/17/98
1 Evaluating top-k Queries over Web-Accessible Databases Paper By: Amelie Marian, Nicolas Bruno, Luis Gravano Presented By Bhushan Chaudhari University.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Chapter 13 Query Processing Melissa Jamili CS 157B November 11, 2004.
Modeling Storing and Mining Moving Object Databases Proceedings of the International Database Engineering and Applications Symposium (IDEAS’04) Sotiris.
Communication Paradigm for Sensor Networks Sensor Networks Sensor Networks Directed Diffusion Directed Diffusion SPIN SPIN Ishan Banerjee
Multicache-Based Content Management for Web Caching Kai Cheng and Yahiko Kambayashi Graduate School of Informatics, Kyoto University Kyoto JAPAN.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
Spatio-temporal Pattern Queries M. Hadjieleftheriou G. Kollios P. Bakalov V. J. Tsotras.
1 How will execution time grow with SIZE? int array[SIZE]; int sum = 0; for (int i = 0 ; i < ; ++ i) { for (int j = 0 ; j < SIZE ; ++ j) { sum +=
August 30, 2004STDBM 2004 at Toronto Extracting Mobility Statistics from Indexed Spatio-Temporal Datasets Yoshiharu Ishikawa Yuichi Tsukamoto Hiroyuki.
INTRODUCTION TO GIS  Used to describe computer facilities which are used to handle data referenced to the spatial domain.  Has the ability to inter-
Project Presentation By: Dean Morrison 12/6/2006 Dynamically Adaptive Prepaging for Effective Virtual Memory Management.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Continual Neighborhood Tracking for Moving Objects Yoshiharu Ishikawa Hiroyuki Kitagawa Tooru Kawashima University of Tsukuba, Japan
Content caching and scheduling in wireless networks with elastic and inelastic traffic Group-VI 09CS CS CS30020 Performance Modelling in Computer.
Big traffic data processing framework for intelligent monitoring and recording systems 學生 : 賴弘偉 教授 : 許毅然 作者 : Yingjie Xia a, JinlongChen a,b,n, XindaiLu.
Dec. 13, 2003W 2 Implementation and Evaluation of an Adaptive Neighborhood Information Retrieval System for Mobile Users Yoshiharu Ishikawa.
1 Querying the Physical World Son, In Keun Lim, Yong Hun.
03/02/20061 Evaluating Top-k Queries Over Web-Accessible Databases Amelie Marian Nicolas Bruno Luis Gravano Presented By: Archana and Muhammed.
Chapter 9: Web Services and Databases Title: NiagaraCQ: A Scalable Continuous Query System for Internet Databases Authors: Jianjun Chen, David J. DeWitt,
Lectures 8 & 9 Virtual Memory - Paging & Segmentation System Design.
1 Copyright © 2003 KAIST All Rights Reserved. Using Semantic Caching to Manage Location Dependent Data in Mobile Computing CS 744 Database Lab.
SQL IMPLEMENTATION & ADMINISTRATION Indexing & Views.
Module 11: File Structure
Chapter 14: System Protection
Database Management System
The Impact of Replacement Granularity on Video Caching
Semantic Data Caching and Replacement
Chapter 12: Query Processing
Li Weng, Umit Catalyurek, Tahsin Kurc, Gagan Agrawal, Joel Saltz
Chapter 11: File System Implementation
Chapter 11: Indexing and Hashing
Indexing and Hashing Basic Concepts Ordered Indices
Cooperative Caching, Simplified
Group Based Management of Distributed File Caches
Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.
Chapter 11: Indexing and Hashing
Probabilistic Ranking of Database Query Results
Presentation transcript:

Master’s Thesis Semantic Query Caching in Mobile Environments By: Jekkin Shah Advisor: Dr. Konstantinos Kalpakis

Master’s Thesis 2 Semantic Query Caching in Mobile Environments Introduction Motivation Contribution Concept of Semantic Caching Issues involved in semantic caching System Architecture Prototype and Experiments Conclusion and further work

Master’s Thesis 3 Introduction Disparate works and progresses in :  Geographic Information System (GIS)  Global Positioning System (GPS)  Wireless Technology  Handheld devices Convergence to Mobile Geographic Information System (mobile GIS) Rapid growth in mobile GIS applications in all walks of life Emphasis on spatial data, its storage, retrieval and manipulation

Master’s Thesis 4 Mobile GIS GISGPS WirelessHandheld Convergence

Master’s Thesis 5 Growing List of Applications Car navigation systems Emergency services Real time stock quotes Field services Real time tracking and routing of shipments Environmental surveys and the list is growing rapidly …

Master’s Thesis 6 Semantic Query Caching in Mobile Environments Introduction Motivation Contribution Concept of Semantic Caching Issues involved in semantic caching System Architecture Prototype and Experiments Conclusion and further work

Master’s Thesis 7 Motivation Hungry !!! Lets find a nearby restaurant query Q1: FIND restaurants WHERE location = “ nearby ” McDonalds2 miles Dominos Express2.4 miles Taco Bell3 miles Subway4 miles McDonalds5 miles ……. Found 37 matches

Master’s Thesis 8 Example 1 (cont.) Wait …We also need some gas !!! Lets see if we can find a gas station near McDonalds. query Q2: FIND “McDonalds” WHERE gas Station = “ nearby ” McDonalds5 miles McDonalds12.4 miles Found 2 matches

Master’s Thesis 9 Shouldn’t we speed up the process ? Query Q1 is in local cache Query Q1 subsumes query Q2 Why do we need to execute query Q2 from scratch ?? We need a technique to determine and extract Q2 from Q1 Unfortunately, traditional techniques like page caching do not provide much help in this case Q 1 Q2Q2

Master’s Thesis 10 A new approach – Semantic Caching Along with query results, store the queries also in cache Use these queries (query descriptors) to determine if and how a new query can be answered from cache  Check if the required data is present in cache.  Extract the data from cache Add, remove, merge data by performing corresponding operation on query descriptors Manage cache by managing the query descriptors Think of query descriptors as intelligent pointer references that implicitly contain some information about the data they refer to

Master’s Thesis 11 Problems with traditional caching Pointer references do not contain any implicit information Q1  p1,p2,p3,p4,p5,p6 Q2  p7,p8,p9,p10,p11,p12 Q3  all the pages Space constraints will make it difficult to store all the pages in cache. p1 p3 p5 p7p8 p9p10 p11p12 p2 p4 p6 data3

Master’s Thesis 12 Semantic Query Caching in Mobile Environments Introduction Motivation Contribution Concept of Semantic Caching Issues involved in semantic caching System Architecture Prototype and Experiments Conclusion and further work

Master’s Thesis 13 Contribution An architecture for Semantic Caching in mobile environments A system prototype as a “proof-of-concept” with the following building blocks  Query parser and validator  A Solver for determining query satisfiability  An Executor for processing partial and remainder queries  A Cache manager for efficiently managing the cache A cache replacement algorithm Techniques for query processing

Master’s Thesis 14 Semantic Query Caching in Mobile Environments Introduction Motivation Contribution Concept of Semantic Caching Issues involved in semantic caching System Architecture Prototype and Experiments Conclusion and further work

Master’s Thesis 15 Issues in semantic caching Although the idea of semantic caching is straight forward, store query descriptors along with their results, the issues involved are much harder !! Simple concept but Difficult Implementation Issues 1. We need to decide if the answer is present in cache 2. If present, do we have sufficient information to extract it ?

Master’s Thesis 16 Answering Queries from Cache Q1 Select * from db where A > 50 Q2Select * from db where B < 550 Q3Select * from db where (A > 200 and B < 300) Is result of Q3 present in (Q1 + Q2) ?

Master’s Thesis 17 Solving the implication problem Let T = { Q1, Q2 } be a set of query descriptors already in cache We need to show that Q  T We show that ¬ (Q  T) is FALSE ¬ (Q  T)  ¬ (¬ Q  T)  Q  (¬T)  Q  ¬(T1  T2  T3  T4)  Q  (¬T1)  (¬T2)  (¬T3)  (¬T4) This is the primary technique used in our thesis. The algorithm is adopted from [LY85].

Master’s Thesis 18 Solving the implication problem (Cont.) Exponential growth in the number of equations to be solved. Solution:  Clustering based on Signatures  Signature created by taking into account the predicate attributes present in the query  Restriction on the number of clusters created  Signature used in indexing the query descriptors Attr A, B Attr X, D

Master’s Thesis 19 Data Extraction problem Select * from db where A > 50 Select * from db where B < 550 Select * from db where (A > 200 and B < 300 and C = 100 ) Data1 Data2 Data3 Can we extract Data3 ? We fetch attribute C from remote source and take a Cartesian product with the data already present in cache

Master’s Thesis 20 Answering Partial Queries What happens if Q  T is FALSE ? There may be a non empty intersection set between Q and T Answer (Q  T) locally (Partial match) Send (Q  ¬ T) to the server (Remainder Query) T1 T2 Q

Master’s Thesis 21 Semantic Query Caching in Mobile Environments Introduction Motivation Contribution Concept of Semantic Caching Issues involved in semantic caching System Architecture Prototype and Experiments Conclusion and further work

Master’s Thesis 22 Semantic Caching Architecture Solver (Query implication) Query parser and Validator Cache manager Remote db Local Cache Executor results query

Master’s Thesis 23 Cache Structure Local Cache is implemented as relational database structures Query descriptors are stored in one table indexed by their signatures Corresponding query results (data) are stored in another table An auxiliary table associates the query descriptors with its corresponding data Cache manager interacts with query descriptor table Manipulation of data is achieved through the manipulation of query descriptors

Master’s Thesis 24 Cache: Operations and Management Cache Manager: Replacement module :  Replacement : Determines what needs to be cached and what can be purged out Management module :  Addition : Granularity of addition is a semantic region  Deletion : Removal of region, though not necessarily leading to the removal of data  Merge : To simplify query processing, two or more regions can be merged  Decomposition : A very large region, can be decomposed for efficiency reasons

Master’s Thesis 25 Cache Replacement Theory and Assumptions What is the performance metric ? Conventional caching schemes optimize one or more of the following parameters with the goal of improving the performance  Hit ratio  Response time  Data transmission time Due to the dynamics of our application domain, none of these parameters truly reflect the performance of our applications

Master’s Thesis 26 Theory and Assumptions (Cont.) Cache Hit Rate : how do we define hit rate ?  One: At least one data record obtained from cache  All: All data records to be obtained from local cache  Mid: 50% of data records to be satisfied from local cache Response time:  Partially answered queries make it difficult to accurately define the response time Data transmission time:  Lot of dependence on the actual network parameters like latency and bandwidth

Master’s Thesis 27 Theory and Assumptions (Cont.) Mobile environments: Premium on bandwidth Our goal: “To minimize the cost of servicing the requests that cannot be answered from the local cache” Cost is measured in terms of time Performance metric is Byte hit rate (BHR):  Ratio of actual amount of data served from local cache to the amount of data transferred from the remote source Assumptions:  Negligible query execution time  Uniform latency and bandwidth across the network

Master’s Thesis 28 Replacement Algorithm Guiding Action Selection function (GAS) to assign a value to each semantic region GAS value = a + (s * f * b)  s = size of data transferred from the remote source  f = frequency of access of the query  a, b are domain specific parameters  a = freshness count of each query  b = 1/S d, where S d is the distance between the current location of the moving object and the location of query Using the GAS function the value of each semantic region is calculated

Master’s Thesis 29 Replacement Algorithm (Cont.) For each query in cache we have,  GAS value (V i )  Weight (W i )  Also, we have a limit on the total size of the cache (W) and also the total number of queries (K) that can be admitted Problem definition:  “Given a set of rectangles with a weight and a value, choose at most K rectangles that gives maximum value, provided the weight does not exceed W” The problem can be formulated as the 0-1 Knapsack problem with additional cardinality constraint

Master’s Thesis 30 Semantic Query Caching in Mobile Environments Introduction Motivation Contribution Concept of Semantic Caching Issues involved in semantic caching System Architecture Prototype and Experiments Conclusion and further work

Master’s Thesis 31 Experiments (Setup) Requirements  Workload (datasets and queries)  Modeling the behavior of the moving object  Query execution guidelines Real datasets  Hard to obtain  Complexity in processing due to complex structures of spatial objects Synthetic dataset generator  Easily generated  Various parameters can be controlled

Master’s Thesis 32 Workload Query load selection  Tables Restaurants: LocX, LocY, Name, ID, tables, City, Zip Gas Stations: LocX, LocY, Name, ID, Low, Mid, High  Query specifications: Rectangular queries (select and project only) Number of queries issued per trip : Type of queries: Location aware, location dependent and non- location related Frequency of issuance: Selected randomly ranging from 5 ms to 100 ms Overlap rate: 10-25%

Master’s Thesis 33 Experiments (Moving Object) Behavior of Moving Object  Generating Spatio-Temporal Dataset (GSTD) [PT00]  Moves in a 2D space  Static points and regions called infrastructure emulate real life objects like buildings, rivers, roads etc.  Trajectories are generated using specific guidelines Initial statistical distribution of infrastructure objects Source and destination location Speed of moving object Direction of motion Duration of journey

Master’s Thesis 34 Query Execution Guidelines Controllable parameters  Type of queries: Location dependent, Location aware, Non-location related  Frequency of query issuance  Selectivity of chosen queries  Query overlap rate Parameters are chosen in a variety of combinations  Random  Gaussian distribution  Skewed distribution

Master’s Thesis 35 Results Cache Size Vs Hit Rate ( NEW vs m-LRU) The NEW replacement scheme compares roughly equal to modified LRU replacement scheme BHR increases upto 70% when cache size is progressively increased

Master’s Thesis 36 Results Hit rates Vs Number of queries (NEW scheme) Increasing the number of queries in the system does not substantially increase the hit rates. Byte hit rate performs nearly equal to Hit rate Mid

Master’s Thesis 37 Semantic Query Caching in Mobile Environments Introduction Motivation Contribution Concept of Semantic Caching Issues involved in semantic caching System Architecture Prototype and Experiments Conclusion and further work

Master’s Thesis 38 Conclusion  No assumption made on Spatial Locality of Reference  Query descriptors act as “Intelligent” References  Can support Content Based Reasoning  Ability to take advantage of Schema Knowledge Page / Tuple caching schemes do not scale well in our GIS domain Reasons: “Unintelligent” pointer references Questionable assumption of Spatial Locality of Reference Inability to take advantage of Semantic Overlaps

Master’s Thesis 39 Advantages of Semantic Caching Benefits of Semantic Caching  Leverages semantic locality found in typical mobile GIS applications  Adapts dynamically to the patterns of user queries rather than caching static clusters of tuples  Minimizes cost of cache lookup due to compact representation of query descriptors  Capable of providing partial and/or approximate answers to queries quickly

Master’s Thesis 40 Conclusion (Cont.) Shortcomings of Semantic Caching  Complicated cache management schemes  Too restrictive. Solver can process only simple type of queries  Captures the semantics of the query and not the result objects. Hence, fails to utilize cached objects when the semantics of the query do not match

Master’s Thesis 41 Conclusion (Cont.) Future work … Lots of things  Make the solver more general to handle different types of queries  Make the caching scheme flexible enough to capture the semantics of the query descriptors as well as the result objects  Simpler cache management  Ability to share cache with peers