Efficient Data Compression in Location Based Services Yuni Xia, Yicheng Tu, Mikhail Atallah, Sunil Prabhakar.

Slides:



Advertisements
Similar presentations
Indexing DNA Sequences Using q-Grams
Advertisements

T.Sharon-A.Frank 1 Multimedia Compression Basics.
Urban Computing with Taxicabs
Data Compression CS 147 Minh Nguyen.
Mining Frequent Spatio-temporal Sequential Patterns
Correlation Search in Graph Databases Yiping Ke James Cheng Wilfred Ng Presented By Phani Yarlagadda.
Rule Discovery from Time Series Presented by: Murali K. Kadimi.
Frequent Closed Pattern Search By Row and Feature Enumeration
Quick Review of Apr 10 material B+-Tree File Organization –similar to B+-tree index –leaf nodes store records, not pointers to records stored in an original.
G. Alonso, D. Kossmann Systems Group
MATH 685/ CSI 700/ OR 682 Lecture Notes
A Novel Scheme for Video Similarity Detection Chu-Hong Hoi, Steven March 5, 2003.
1 Rare Event Simulation Estimation of rare event probabilities with the naive Monte Carlo techniques requires a prohibitively large number of trials in.
Compression Techniques. Digital Compression Concepts ● Compression techniques are used to replace a file with another that is smaller ● Decompression.
Efficient Similarity Search in Sequence Databases Rakesh Agrawal, Christos Faloutsos and Arun Swami Leila Kaghazian.
Fractal Image Compression
T.Sharon-A.Frank 1 Multimedia Size of Data Frame.
Digital Voice Communication Link EE 413 – TEAM 2 April 21 st, 2005.
1. 2 General problem Retrieval of time-series similar to a given pattern.
Efficient Data Mining for Path Traversal Patterns CS401 Paper Presentation Chaoqiang chen Guang Xu.
1 A DATA MINING APPROACH FOR LOCATION PREDICTION IN MOBILE ENVIRONMENTS* by Gökhan Yavaş Feb 22, 2005 *: To appear in Data and Knowledge Engineering, Elsevier.
1 Chapter 1 Introduction. 2 Outline 1.1 A Very Abstract Summary 1.2 History 1.3 Model of the Signaling System 1.4 Information Source 1.5 Encoding a Source.
Time Series Report - Basic 1.Introduction 2.Long term trend 3.Seasonal Component 4.Residual Component 5.Predictions 6.Conclusion (Refer to “Stats enquiry.
EEE377 Lecture Notes1 EEE436 DIGITAL COMMUNICATION Coding En. Mohd Nazri Mahmud MPhil (Cambridge, UK) BEng (Essex, UK) Room 2.14.
CS401 presentation1 Effective Replica Allocation in Ad Hoc Networks for Improving Data Accessibility Takahiro Hara Presented by Mingsheng Peng (Proc. IEEE.
Major Tasks in Data Preprocessing(Ref Chap 3) By Prof. Muhammad Amir Alam.
Benjamin Post Cole Kelleher. Encyclopedia Articles: PostGIS, C. Strobl, pp Oracle Spatial, Geometries, R. Kothuri and S. Ravada, page
Still Image Conpression JPEG & JPEG2000 Yu-Wei Chang /18.
Chapter 5 Mining Association Rules with FP Tree Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.
Maintaining and Querying a Database Microsoft Access 2010.
DEXA 2005 Quality-Aware Replication of Multimedia Data Yicheng Tu, Jingfeng Yan and Sunil Prabhakar Department of Computer Sciences, Purdue University.
Evaluating Robustness of Signal Timings for Conditions of Varying Traffic Flows 2013 Mid-Continent Transportation Research Symposium – August 16, 2013.
Lecture 9 of Advanced Databases Storage and File Structure (Part II) Instructor: Mr.Ahmed Al Astal.
Ch5 Mining Frequent Patterns, Associations, and Correlations
: Chapter 12: Image Compression 1 Montri Karnjanadecha ac.th/~montri Image Processing.
Optimal n fe Tian-Li Yu & Kai-Chun Fan. n fe n fe = Population Size × Convergence Time n fe is one of the common used metrics to measure the performance.
1 Chapter 24 Developing Efficient Algorithms. 2 Executing Time Suppose two algorithms perform the same task such as search (linear search vs. binary search)
Seongbo Shim, Yoojong Lee, and Youngsoo Shin Lithographic Defect Aware Placement Using Compact Standard Cells Without Inter-Cell Margin.
Object and Event Recognition in Wireless Multimedia Sensor Networks Clint Mueller CS441.
Wen He Tsinhua University, Beijing, China and Xi'an Communication Institute, Xi'an, China Deyi Li Tsinhua University, Beijing, China and Chinese.
IMAGE COMPRESSION USING BTC Presented By: Akash Agrawal Guided By: Prof.R.Welekar.
Sequential Pattern Mining
Shape-based Similarity Query for Trajectory of Mobile Object NTT Communication Science Laboratories, NTT Corporation, JAPAN. Yutaka Yanagisawa Jun-ichi.
A Novel Multicast Routing Protocol for Mobile Ad Hoc Networks Zeyad M. Alfawaer, GuiWei Hua, and Noraziah Ahmed American Journal of Applied Sciences 4:
Energy-Efficient Monitoring of Extreme Values in Sensor Networks Loo, Kin Kong 10 May, 2007.
Advances in digital image compression techniques Guojun Lu, Computer Communications, Vol. 16, No. 4, Apr, 1993, pp
Authoring and Multimedia Data. Data Sources Data Types Data Compression Techniques Data Security.
Chapter 1 Background 1. In this lecture, you will find answers to these questions Computers store and transmit information using digital data. What exactly.
Big traffic data processing framework for intelligent monitoring and recording systems 學生 : 賴弘偉 教授 : 許毅然 作者 : Yingjie Xia a, JinlongChen a,b,n, XindaiLu.
February 4, Location Based M-Services Soon there will be more on-line personal mobile devices than on-line stationary PCs. Location based mobile-services.
Predicting the Location and Time of Mobile Phone Users by Using Sequential Pattern Mining Techniques Mert Özer, Ilkcan Keles, Ismail Hakki Toroslu, Pinar.
3-D WAVELET BASED VIDEO CODER By Nazia Assad Vyshali S.Kumar Supervisor Dr. Rajeev Srivastava.
A Multicast Routing Algorithm Using Movement Prediction for Mobile Ad Hoc Networks Huei-Wen Ferng, Ph.D. Assistant Professor Department of Computer Science.
Data funneling : routing with aggregation and compression for wireless sensor networks Petrovic, D.; Shah, R.C.; Ramchandran, K.; Rabaey, J. ; SNPA 2003.
1 What is Multimedia? Multimedia can have a many definitions Multimedia means that computer information can be represented through media types: – Text.
CJ 425 Crime Mapping Unit 6 Seminar “Patterns”. Outline Repeat Incidents Tactical Analysis – Definition – Information Used 7 types of Patterns Inductive/Deductive.
Graph Indexing From managing and mining graph data.
Similarity Measurement and Detection of Video Sequences Chu-Hong HOI Supervisor: Prof. Michael R. LYU Marker: Prof. Yiu Sang MOON 25 April, 2003 Dept.
1 IP Routing table compaction and sampling schemes to enhance TCAM cache performance Author: Ruirui Guo, Jose G. Delgado-Frias Publisher: Journal of Systems.
Performance Comparison of Ad Hoc Network Routing Protocols Presented by Venkata Suresh Tamminiedi Computer Science Department Georgia State University.
Kalman Filter and Data Streaming Presented By :- Ankur Jain Department of Computer Science 7/21/03.
Calculating Processing and Storage requirements for Megapixel CCTV
Fast Approximate Query Answering over Sensor Data with Deterministic Error Guarantees Chunbin Lin Joint with Etienne Boursier, Jacque Brito, Yannis Katsis,
Data Compression.
Networks and Communication Systems Department
Increasing Watermarking Robustness using Turbo Codes
Authors: Guanghan Ning, Zhi Zhang, Xiaobo Ren, Haohong Wang,
Why Compress? To reduce the volume of data to be transmitted (text, fax, images) To reduce the bandwidth required for transmission and to reduce storage.
Mining Sequential Patterns
Presentation transcript:

Efficient Data Compression in Location Based Services Yuni Xia, Yicheng Tu, Mikhail Atallah, Sunil Prabhakar

LBS – Large amount of Data l In LBS, large amount of location data is generated constantly due to the continuous movements of the objects. l The data is usually required to be stored for a fairly long time in order to answer window or history queries. This poses challenges to the efficiency of data storage and retrieval.

Data Redundancy Observation 1: The movements of objects usually contains periodic patterns. The periodicity is ubiquitous among moving object data. Examples: l City buses repeat the same routes every hour or so. l Most people tend to repeat the same or similar moving patterns every weekday and follow other patterns during the weekends.

Data Redundancy Observation 2: A majority of objects stay in a quasi-static state for a long time. They tend to stay in a state that is not exactly static, and move within a short range, for which we call the quasi-static state. For example, many people move within an office building during the day and stay at home at night for a long period of time.

Data Redundancy Observation 2: A majority of objects stay in a quasi-static state for a long time. They tend to stay in a state that is not exactly static, and move within a short range, for which we call the quasi-static state. l When the movement of an object is smaller than the precision requirement of the location based services, we can regard the object as static for a period and avoid storing multiple locations which are almost the same.

Data Redundancy Observation 3: There are large number of common or shared segments among the moving object trajectories. For example, Numerous vehicles take the same freeway; a large number of people, especially in big cities, take the same subway/bus/train/ferry route every day

Periodicity l Take the moving trajectories as time series l Discover Periodic Patterns The problem of mining partial periodic pattern can be defined as follows: Given a discrete data sequence S, a minimum support min sup and a period window W, and: 1. the set of frequent periods T such that 1 <= T <= W; all frequent T-period patterns w.r.t. min sup for each T found in 1.

Trajectory Compression

Experiment l Tool: City Simulator 2.0 [9] developed at IBM. l The City Simulator simulates the motion of people moving in a city. l In our experiments, the number of people is set to 100,000 and the experiments run for 3030 seconds, during which each object updates its location for 100 times.

Compression vs. Precision

Compression vs. precision Obviously, the larger the precision threshold is, the more location changes fall within that range and thus, the higher the compression ratio will be. When the precision threshold is 10 meters, the compression ratio can be as high as 35, which means the compressed data is only around 3% of the original size and 97% of the data movements are within 10 meters and can be ignored. Even when the precision requirement is lowered to 2 meters, the compression ratio is still more than 3, which means the compressed data is less than 1/3 of the original data size

Communication vs. precision

Communication vs. Precision We learn from this experiment: The larger the precision threshold is, the more location changes fall within the precision range and need not be reported, therefore, the smaller the communication cost will be. When the precision is 2 meters, the communication cost is 32% of the original one, which means 2/3 of the location updates are within the precision requirement range and do not need reporting. As the precision threshold gets larger, the communication cost keeps decreasing. When the precision threshold reaches 10 meters, the communication cost is reduced to only 2:7% of the original one

Shared Segments Among Trajectories l While periodicity and quasi-static features can help us exploit the redundancy that exists within each time series, discovering shared segments enables further compression by exploiting the redundancy among the time series. l Discovering shared segments among the time series is similar to finding out frequent sets or sequences. l The frequent sets or sequences represent the hot areas or roads which repeat many times in the moving object database and should be represented in a more efficient way.

Frequent Sequences Storage l After getting the frequent sequences, we can store the sequences only once. l If they reoccur in the future, only the links to the sequences (instead of the whole sequence) should be stored. l Furthermore, the frequent sequences can be encoded using an Entropy Coding such as Human coding or Arithmetic coding to give additional compression.

Conclusion l We propose new approaches for efficiently compressing and storing moving objects data for supporting location based services. l The redundancy among moving object data is huge due to the ubiquitous periodicity of movements, the quasi-static moving feature of many objects and the large number of common or shared segments among the moving object trajectories.

Conclusion l In this paper, the trajectories of each moving object data are taken as time series. l We apply time series data mining techniques to find the periodic patterns within each time series and the frequent patterns among them. The mining results can help compress the data and reduce the redundancy significantly.