A DATA MINING APPROACH FOR LOCATION PREDICTION IN MOBILE ENVIRONMENTS*

Slides:

Advertisements

Similar presentations

Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.

Advertisements

Association Rule Mining

Indexing DNA Sequences Using q-Grams

Mining Frequent Patterns II: Mining Sequential & Navigational Patterns Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.

gSpan: Graph-based substructure pattern mining

Introduction to Algorithms Rabie A. Ramadan rabieramadan.org 2 Some of the sides are exported from different sources.

Mining Sequential Patterns Authors: Rakesh Agrawal and Ramakrishnan Srikant. Presenter: Jeremy Dalmer.

FP (FREQUENT PATTERN)-GROWTH ALGORITHM ERTAN LJAJIĆ, 3392/2013 Elektrotehnički fakultet Univerziteta u Beogradu.

Data Mining Association Analysis: Basic Concepts and Algorithms

New Sampling-Based Summary Statistics for Improving Approximate Query Answers P. B. Gibbons and Y. Matias (ACM SIGMOD 1998) Rongfang Li Feb 2007.

Dimitrios Katsaros* † Yannis Manolopoulos* † Aristotle University, Greece *University of Thessaly, Greece Suffix Tree Based Prediction for Pervasive Computing.

1 Prediction-based Strategies for Energy Saving in Object Tracking Sensor Networks Yingqi Xu, Wang-Chien Lee Proceedings of the 2004 IEEE International.

Clustered alignments of gene- expression time series data Adam A. Smith, Aaron Vollrath, Cristopher A. Bradfield and Mark Craven Department of Biosatatistics.

Aki Hecht Seminar in Databases (236826) January 2009

WebKDD 2001 Aristotle University of Thessaloniki 1 Effective Prediction of Web-user Accesses: A Data Mining Approach Nanopoulos Alexandros Katsaros Dimitrios.

On the Construction of Energy- Efficient Broadcast Tree with Hitch-hiking in Wireless Networks Source: 2004 International Performance Computing and Communications.

Association Rule Mining (Some material adapted from: Mining Sequential Patterns by Karuna Pande Joshi)‏

1 A DATA MINING APPROACH FOR LOCATION PREDICTION IN MOBILE ENVIRONMENTS* by Gökhan Yavaş Feb 22, 2005 *: To appear in Data and Knowledge Engineering, Elsevier.

Fast Algorithms for Association Rule Mining

Mining Association Rules

1 Fast Algorithms for Mining Association Rules Rakesh Agrawal Ramakrishnan Srikant Slides from Ofer Pasternak.

Mining Association Rules

USpan: An Efficient Algorithm for Mining High Utility Sequential Patterns Authors: Junfu Yin, Zhigang Zheng, Longbing Cao In: Proceedings of the 18th ACM.

Secure Incremental Maintenance of Distributed Association Rules.

Approximate Frequency Counts over Data Streams Loo Kin Kong 4 th Oct., 2002.

Approximate Frequency Counts over Data Streams Gurmeet Singh Manku, Rajeev Motwani Standford University VLDB2002.

Efficient Data Mining for Calling Path Patterns in GSM Networks Information Systems, accepted 5 December 2002 SPEAKER: YAO-TE WANG ( 王耀德 )

ANALYSIS AND IMPLEMENTATION OF GRAPH COLORING ALGORITHMS FOR REGISTER ALLOCATION By, Sumeeth K. C Vasanth K.

August 30, 2004STDBM 2004 at Toronto Extracting Mobility Statistics from Indexed Spatio-Temporal Datasets Yoshiharu Ishikawa Yuichi Tsukamoto Hiroyuki.

Intelligent DataBase System Lab, NCKU, Taiwan Josh Jia-Ching Ying 1, Wang-Chien Lee 2, Tz-Chiao Weng 1 and Vincent S. Tseng 1 1 Department of Computer.

A Data Mining Approach for Location Prediction in Mobile Environments Data & Knowledge Engineering Volume 54, Issue 2, August 2005, Pages 121–146 劉康全 1.

Paging Area Optimization Based on Interval Estimation in Wireless Personal Communication Networks By Z. Lei, C. U. Saraydar and N. B. Mandayam.

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

An Energy-Efficient Approach for Real-Time Tracking of Moving Objects in Multi-Level Sensor Networks Vincent S. Tseng, Eric H. C. Lu, & Kawuu W. Lin Institute.

1 Discovering Calendar-based Temporal Association Rules SHOU Yu Tao May. 21 st, 2003 TIME 01, 8th International Symposium on Temporal Representation and.

Dynamic Bandwidth Reservation in Cellular Networks Using Road Topology Based Mobility Predictions InfoCom 2004 Speaker : Bo-Chun Wang

Indexing and Mining Free Trees Yun Chi, Yirong Yang, Richard R. Muntz Department of Computer Science University of California, Los Angeles, CA {

Ning Jin, Wei Wang ICDE 2011 LTS: Discriminative Subgraph Mining by Learning from Search History.

Gspan: Graph-based Substructure Pattern Mining

Computer Science and Engineering Jianye Yang 1, Ying Zhang 2, Wenjie Zhang 1, Xuemin Lin 1 Influence based Cost Optimization on User Preference 1 The University.

Rapid Association Rule Mining Amitabha Das, Wee-Keong Ng, Yew-Kwong Woon, Proc. of the 10th ACM International Conference on Information and Knowledge Management(CIKM’01),2001.

2010 IEEE Global Telecommunications Conference (GLOBECOM 2010)

Cohesive Subgraph Computation over Large Graphs

Data Mining Find information from data data ? information.

Effective Prediction of Web-user Accesses: A Data Mining Approach

Database Management System

Data Mining: Concepts and Techniques

A paper on Join Synopses for Approximate Query Answering

David K. Y. Yau Department of Computer Science Purdue University

Chapter 3: Wireless WANs and MANs

Frequent Pattern Mining

Byung Joon Park, Sung Hee Kim

Analysis and design of algorithm

Lin Lu, Margaret Dunham, and Yu Meng

Authors Bo Sun, Fei Yu, Kui Wu, Yang Xiao, and Victor C. M. Leung.

Data Mining Association Analysis: Basic Concepts and Algorithms

Data Mining Association Analysis: Basic Concepts and Algorithms

Mining Sequential Patterns

Predictive Performance

Section 7.12: Similarity By: Ralucca Gera, NPS.

Gyozo Gidofalvi Uppsala Database Laboratory

Association Rule Mining

A Parameterised Algorithm for Mining Association Rules

Data Mining Association Analysis: Basic Concepts and Algorithms

Mining Sequential Patterns

Effective Prediction of Web-user Accesses: A Data Mining Approach

Continuous Density Queries for Moving Objects

Discovery of Significant Usage Patterns from Clickstream Data

Finding Frequent Itemsets by Transaction Mapping

Association Analysis: Basic Concepts

Presentation transcript:

A DATA MINING APPROACH FOR LOCATION PREDICTION IN MOBILE ENVIRONMENTS* by Gökhan Yavaş Feb 22, 2005 *: To appear in Data and Knowledge Engineering, Elsevier

Mobility Prediction Based On Mobility Rules Experimental Results Outline Introduction Background Work Mobility Prediction Based On Mobility Rules Experimental Results Conclusion Future Work

Introduction Personal Communication Systems are becoming more popular Dynamic relocation of users gives rise to the problem of Mobility Management Methods for storing and updating the location information of users Mobility Prediction: the prediction of a user’s next inter-cell movement

Motivation Predicted movement can be used for effectively allocating resources instead of blindly allocating excessive resources Benefit to the broadcast program generation [1], data items can be broadcast to the predicted cell Location prediction is crucial in processing of location dependent queries [2], since answer depends on the location of user Queries depending on future positions can be answered by effective location prediction [1] Y. Saygin and O. Ulusoy. Exploiting Data Mining Techniques for Broadcasting Data in Mobile Computing Environments. IEEE Transactions on Knowledge and Data Engineering, 14(6): 1387-1399, 2002. [2] R. Agrawal and R. Srikant. Mining sequential patterns. In Proceedings of the IEEE Conference on Data Engineering (ICDE’95), pages 3–14, 1995. [2] G. Gok and O. Ulusoy. Transmission of Continuous Query Results in Mobile Computing Systems. Information Sciences, 125(1-4): 37-63, 2000

Network Model PCS network partitioned into smaller areas called cells Each cell has a Base Station (BS), used for broadcasting and receiving information Home Location Register (HLR): database which keeps the inter-cell movement history of user Visitor Location Register (VLR): each BS has a database which keeps the profiles of the users located in this cell.

Problem Definition It is possible for us to get the movement history of a mobile user from HLR of a user Movement trajectories in the form of T=<(id1, t1) ... (idk, tk)> Partitioned into subsequences, named user actual paths, UAPs UAPs have the form of U=<c1, c2, ..., cn> We mine UAPs to find user mobility patterns, UMPs

Related Work The roots of our method go back to the Apriori algorithm [3] Association rule mining Sequential pattern mining problem [4] Ordering of the items in an itemset must be taken into consideration Not appropriate for our domain, because does not take into account the network topology [3] R. Agrawal, R. Srikant, Fast Algorithms for mining association rules. In Proceedings of Very Large Databases Conference (VLDB’94), pages 487-499, 1994. [4] R. Agrawal and R. Srikant. Mining sequential patterns. In Proceedings of the IEEE Conference on Data Engineering (ICDE’95), pages 3–14, 1995.

Mobility Prediction Based On Mobility Rules Mining UMPs from Graph Traversals: Movement data mined for discovering regularities (UMP) in inter-cell movements Generation of Mobility Rules: Mobility rules are extracted from UMPs Mobility Prediction: Prediction of next inter-cell movement based on mobility rules

Mining UMPs from Graph Traversals An example coverage region and corresponding graph G Vertices of G: the cells in the coverage region Edges of G: if two cells, A and B, are neighbors in the coverage region, then there are two edges in G, A  B and B  A

Mining UMPs from Graph Traversals Subsequence definition: Assume we have two UAPs, A = <a1, a2, ... , an> and B = <b1, b2, ... , bm>. B is a subsequence of A, iff all cells in B also exist in A while keeping their order in B Example: A=<c3, c4, c0, c1, c6, c5>, then B=<c4, c5> is a length-2 subsequence of A. In other words, B is contained by A

Mining UMPs from Graph Traversals Every candidate has a count value that keeps the support given to this candidate by UAPs This is the point our work extends algorithm in [5, 6] Method in [5, 6] increments the count value of a candidate by 1 if this candidate is contained by a UAP Unfair !!! Treats in the same way a highly corrupted candidate pattern a slightly corrupted (or even not corrupted at all) candidate pattern [5] A. Nanopoulos, D. Katsaros, Y. Manolopoulos, A Data Mining Algorithm for Generalized Web Prefetching, IEEE Transactions on Knowledge and Data Engineering, 15(5): 1155-1169, 2003. [6] A. Nanopoulos, D. Katsaros, Y. Manolopoulos, Effective Prediction of Web User Accesses: A Data Mining Approach, In Proceedings of the WebKDD Workshop (WebKDD’01), 2001.

Mining UMPs from Graph Traversals Should consider the degree of corruption for the mobile motion prediction context Support assigned to a candidate pattern B by a UAP A (i.e., suppInc)

Mining UMPs from Graph Traversals Define totDist value by means of the notion of string alignment Definition 2.1: If x and y are each single character or space, then (x, y) denotes the score of aligning x and y. In our case, the scoring function is defined as follows:

Mining UMPs from Graph Traversals Definition 2.3: Let A be a UAP and B be a pattern. A containment alignment X' maps A and B into strings A‘ and B‘ where: |A'| = |B'| B is contained by A, and Removal of all spaces from A' and B' leaves A and B Total score of the alignment X':

Mining UMPs from Graph Traversals For any two patterns, there may be more than one alignment Ex: Consider A=<c3, c4, c0, c1, c6, c5, c8, c5>, B=<c4, c5>

Mining UMPs from Graph Traversals Definition 2.4: An optimal containment alignment of UAP A and pattern B is one that has the minimum possible value for these two patterns Total score of an alignment: sum of penalties An optimal alignment should have the minimum number of mismatches, which means the minimum score of alignment totDist(A, B) = Score of the optimal alignment for the UAP A and pattern B

Mining UMPs from Graph Traversals Example: Given UAP A=<c3, c4, c0, c1, c6, c5, c8> and pattern B=<c4, c5 , c8 > , optimal containment alignment for these: Score of the alignment = totDist (A, B) = 3 Support assigned to the candidate pattern B by the UAP A:

Mining UMPs from Graph Traversals The quality of the patterns will improve since this method is a more accurate way of support counting Degree of corruption taken into account This will give rise to more accurate mobility rules Resulting in the prediction accuracy improved compared to the accuracy by using the rules that are generated with the former way of support counting Application of different methods for totDist will affect the quality of rules

Mining UMPs from Graph Traversals Candidate Generation: Example: C = <c1, c2, ..., ck> N+(ck) : the set of all nodes in G, which have an incoming edge from the cell ck A cell from N+(ck) is attached to the end of C to generate C' Add C' to the set of Candidates

Mining UMPs from Graph Traversals Apriori Pruning can be used? NO due to the nature of our new support counting method Support is no longer monotonically decreasing with the increasing size of the pattern A length-(k-1) subpattern S of a length-k pattern P doesn’t need to be large even if P is large Ex: UAP <1, 6, 0, 3, 2>, P1 = <1, 0, 2> and its subpattern P2 = <1, 2> UAP assigns a support to P1 and to P2

Mining UMPs from Graph Traversals Example: Use suppmin= 1.33 UMP Mining Algorithm Database of UAPs Set of all large Patterns (UMPs)

Generation of Mobility Rules Extract rules from the UMPs For a rule: R: < c1, c2, …, ci-1 >  < ci, ci+1, ... ck > A confidence value is calculated: Head Tail

Generation of Mobility Rules The rules which have confidence higher than confmin are selected All possible mobility rules for the UMPs given in previous example are:

Mobility Prediction User has followed a path P=< c1, c2, …, ci-1 > up to now Find the rules whose head parts are contained in P and the last cell in their head is ci-1 Store the first cell of tail along with the (confidence + support) of rule as a tuple Sort these tuples w.r.t. the (confidence + support) values in descending order Select the first m tuples

Mobility Prediction Example: Assume that the current trajectory of the user is P=<2, 3, 0, 4> Matching Rules: <4>  <0> <4>  < 5> <3, 4>  <0> < 3, 4 >  <5> Sorted tuple array is: TupleArray = [(5, 85.83), (0, 76.5)] If m=1, then Predicted Cells Set = {5} If m=2, then Predicted Cells Set = {5, 0}

Simulation Design Mobile users travel on a 15 by 15 hexagonal shaped network To generate UAPs, first UMPs are generated UMPs are taken as a random walk over the network Two types of UAPs: Outliers: a random walk over the network Non-outliers: those which follow a UMP o (outlier percentage): ratio of the number of outliers to the number of non-outliers

Simulation Design Corruption mechanism: insert random cells between the consecutive cells of an UMP c (corruption ratio): denotes the ratio of the number of such random cells to the number of cells in the corresponding UMP Three possible outcomes of a prediction Correct prediction Incorrect prediction No prediction Two performance measures:

Algorithms Used for Comparison Mobility Prediction Based on Transition Matrix (TM) A cell-to-cell transition matrix formed Select the m most probable cells from the transition matrix Ignorant Prediction Randomly select the m neighboring cells of the current cell

Impact of m on Precision and Recall Decreasing precision for both our algorithm and TM Increasing probability of making some incorrect predictions as m increases Increasing recall for all algorithms, but more significant increase for TM and Ignorant prediction

Impact of m on Precision and Recall Setting m as small as possible is convenient for our method The increase rate in the recall value from m values 1 to 2 is maximum for TM m ≥ 3 would cause excessive network resource waste Thus choose m = 2

Impact of Suppmin Reduced recall and precision The increase in the suppmin value leads to a decrease in the number of mined mobility rules Number of correct predictions is reduced Choose suppmin=0.1

Impact of Confmin Increasing precision Decreasing recall Higher quality rules with the increasing confmin Leading to a higher decrease rate in number of predictions when compared to the decrease rate in number of correct predictions Decreasing recall The number of mined rules is reduced leading to a decrease in the number of correct predictions Choose confmin=80

Impact of Corruption Factor Decreasing precision and recall for our method and TM For all c, better precision than TM but worse recall than TM For our method, as c increases: The number of mined mobility rules decreases No prediction in some cases because no matching rules due to the corrupted UAPs

Impact of Outlier Percentage Both performance measures not affected significantly for all methods Rules extracted from outlier UAPs not used commonly, thus not reducing recall and precision significantly

Conclusion A data mining algorithm for the prediction of user movements in a mobile computing system Algorithm is based on Mining the mobility patterns of users Then forming mobility rules from these patterns Finally predicting a mobile user’s next movements by using the mobility rules A good performance when compared to the performance of Ignorant Method

Conclusion Performance when compared to the TM Better Precision: More accurate predictions Most of its predictions made at each request are correct Worse Recall: Our method may not make prediction in response to some of the prediction requests Because there may not be any matching rule for the current trajectory of the user when a prediction request is made

Future Work For calculating the totDist value, our method: Decrease the support given to pattern by a UAP as the number of corrupted cells increases in pattern Other methods may be employed for calculating totDist value No time domain of the mobility patterns and mobility rules considered In real life, mobility patterns might be related to time Some specific rules valid for a specific time interval Extend our algorithm to include the time domain of mobility rules A candidate pruning criterion suitable for our support counting method may be employed

? Questions & Comments