Mining Long Sharable Patterns in Trajectories of Moving Objects Győző Gidofalvi and Torben Bach Pedersen Arrrrgggg, all this spatio- temporal data from.

Slides:



Advertisements
Similar presentations
Why is that LOV in the screen not returning me desired value?
Advertisements

Access 2007 ® Use Databases How can Microsoft Access 2007 help you structure your database?
Data Mining (Apriori Algorithm)DCS 802, Spring DCS 802 Data Mining Apriori Algorithm Spring of 2002 Prof. Sung-Hyuk Cha School of Computer Science.
Mining Frequent Spatio-temporal Sequential Patterns
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall COS 236 Day 25.
ICDM'06 Panel 1 Apriori Algorithm Rakesh Agrawal Ramakrishnan Srikant (description by C. Faloutsos)
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rules l Mining Association Rules between Sets of Items in Large Databases (R. Agrawal, T. Imielinski & A. Swami) l Fast Algorithms for.
Rakesh Agrawal Ramakrishnan Srikant
Efficiently Mining Long Patterns from Databases Roberto J. Bayardo Jr. IBM Almaden Research Center.
Learning Fuzzy Association Rules and Associative Classification Rules Jianchao Han Computer Science Department California State University Dominguez Hills.
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Association Rule Mining Part 2 (under construction!) Introduction to Data Mining with Case Studies Author: G. K. Gupta Prentice Hall India, 2006.
XP Chapter 3 Succeeding in Business with Microsoft Office Access 2003: A Problem-Solving Approach 1 Analyzing Data For Effective Decision Making.
Chapter 3 Database Management
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Introduction to Data Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
Temporal Pattern Matching of Moving Objects for Location-Based Service GDM Ronald Treur14 October 2003.
Database Management: Getting Data Together Chapter 14.
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall COS 346 Day 26.
Fast Algorithms for Association Rule Mining
Association Analysis (3). FP-Tree/FP-Growth Algorithm Use a compressed representation of the database using an FP-tree Once an FP-tree has been constructed,
Performance and Scalability: Apriori Implementation.
Association Discovery from Databases Association rules are a simple formalism for expressing positive connections between columns in a 0/1 matrix. A classical.
Association Rules. 2 Customer buying habits by finding associations and correlations between the different items that customers place in their “shopping.
Mining Frequent Itemsets with Constraints Takeaki Uno Takeaki Uno National Institute of Informatics, JAPAN Nov/2005 FJWCP.
REU S UMMER 2009 P ROJECT Association Rule Preprocessing By: Walter Garcia University of Houston - Downtown.
Privacy Preserving Data Mining on Moving Object Trajectories Győző Gidófalvi Geomatic ApS Center for Geoinformatik Xuegang Harry Huang Torben Bach Pedersen.
Analyzing Data For Effective Decision Making Chapter 3.
Approximate Frequency Counts over Data Streams Loo Kin Kong 4 th Oct., 2002.
1 Verifying and Mining Frequent Patterns from Large Windows ICDE2008 Barzan Mozafari, Hetal Thakkar, Carlo Zaniolo Date: 2008/9/25 Speaker: Li, HueiJyun.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
The DM Process – MS’s view (DMX). The Basics  You select an algorithm, show the algorithm some examples called training example and, from these examples,
Information Systems Today (©2006 Prentice Hall) 3-1 CS3754 Class Note 12 Summery of Relational Database.
CONTACT INFORMATION Győző Gidófalvi: KTH Royal Institute of Technology – Geodesy and Geoinformatics – Manohar Kaul: Uppsala.
Frequent Route Based Continuous Moving Object Location- and Density Prediction on Road Networks KTH – Royal Institute of Technology Uppsala University.
Access 2007 ® Use Databases How can Microsoft Access 2007 help you structure your database?
SQL Basic. What is SQL? SQL (pronounced "ess-que-el") stands for Structured Query Language. SQL is used to communicate with a database.
Outline Knowledge discovery in databases. Data warehousing. Data mining. Different types of data mining. The Apriori algorithm for generating association.
Id 1 id k id 2 … λ = 60 sec even odd PRIVACY-PRESERVING TRAJECTORY COLLECTION Győző Gidófalvi Xuegang Huang and Torben Bach Pedersen Problem Setting Accurate.
Winter 2006Winter 2002 Keller, Ullman, CushingJudy Cushing 19–1 Warehousing The most common form of information integration: copy sources into a single.
1 Inverted Matrix: Efficient Discovery of Frequent Items in Large Datasets in the Context of Interactive Mining -SIGKDD’03 Mohammad El-Hajj, Osmar R. Zaïane.
Web Mining Issues Size Size –>350 million pages –Grows at about 1 million pages a day Diverse types of data Diverse types of data.
Chapter 6.  Problems of managing Data Resources in a Traditional File Environment  Effective IS provides user with Accurate, timely and relevant information.
© 2003 Prentice Hall, Inc.3-1 Chapter 3 Database Management Information Systems Today Leonard Jessup and Joseph Valacich.
Predicting the Location and Time of Mobile Phone Users by Using Sequential Pattern Mining Techniques Mert Özer, Ilkcan Keles, Ismail Hakki Toroslu, Pinar.
CSCI 6962: Server-side Design and Programming Shopping Carts and Databases.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Reducing Number of Candidates Apriori principle: – If an itemset is frequent, then all of its subsets must also be frequent Apriori principle holds due.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Association Rule Mining COMP Seminar BCB 713 Module Spring 2011.
© Prentice Hall1 DATA MINING Web Mining Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University Companion slides.
Reducing Number of Candidates
Frequent Pattern Mining
Association Rules.
Spatio-temporal Rule Mining: Issues and Techniques
Workbench Data Definition Language (DDL)
Gyozo Gidofalvi Uppsala Database Laboratory
A Parameterised Algorithm for Mining Association Rules
Transactional data Algorithm Applications
I don’t need a title slide for a lecture
Association Rule Mining
Tutorial on Assignment 3 in Data Mining 2008 Association Rule Mining
Amer Zaheer PC Mohammad Ali Jinnah University, Islamabad
COMP5331 FP-Tree Prepared by Raymond Wong Presented by Raymond Wong
Approximate Frequency Counts over Data Streams
Frequent-Pattern Tree
Chapter 3 Database Management
Mining Long, Sharable Patterns in Trajectories of Moving Objects
Updating Databases With Open SQL
Database SQL.
Updating Databases With Open SQL
Presentation transcript:

Mining Long Sharable Patterns in Trajectories of Moving Objects Győző Gidofalvi and Torben Bach Pedersen Arrrrgggg, all this spatio- temporal data from moving objects!! What to do?! What to do…?! People are predictable animals… How can I extract the regularities / patterns in the trajectories?

But why do I even bother? What is in it for me? If I knew the patterns..., I could: 1.aid the management, storage, and retrieval of trajectories 2.improve tracking of moving objects 3.provide customized Location- Based Services (LBS) and Location-Based Advertising (LBA) If the patterns were long and shared by multiple users / objects, maybe I could get them to carpool, and reduce traffic!

To find patterns I need to: 1.identify trips a trip ends when the total displacement in the last k GPS readings is less than δ 2.capture periodicity of patterns Map date-time domain to time-of-day domain 3.eliminate the problem of noisy GPS readings Substitute readings with spatio-temporal regions Identification of trips Capturing the periodicity of patterns Eliminating the problem of noisy GPS readings

If I place for each trip, the unique spatio- temporal region identifiers inside a basket, I can analyze the set of baskets using a Frequent Itemset Mining (FIM) algorithm [AGR94]. Frequent itemsets will represent frequent trajectories. Look at my sample trajectory database! For clarity the time-of-day domain is projected down to the 2D image. Lines represent trips performed by a single object. The number of time the trip was performed is represented by the width of the line (also written in parenthesis in the legend).

Frequent itemset mining, but for carpooling... a frequent itemset is only interesting if it is part of trips of multiple (say n) cars a frequent itemset is only interesting if it is long subsets of frequent itemsets are only interesting if they are part of more trips than the superset interesting frequent itemsets may be part of a relatively few trips, and in that case traditional FIM algorithms run slow How do I modify a traditional FIM algorithm to do all this efficiently? Wait! All the data is already in my database in a relational format T =, where the columns contain trip-, object-, and spatio-temporal region identifiers. So…, how do I write this algorithm in my favourite language, SQL?

STEP 1: Filter infrequent items! DEF: An item (ST-region) is frequent if the number of transactions (trips) that contain that item >= MinSupp (for example 4), and the number of unique objects associated with those transactions >= n (for example 2). INSERT INTO F (item, count) SELECT item, count(*) FROM T GROUP BY item HAVING COUNT(DISTINCT oid) >= n AND COUNT(*) >= MinSupp CREATE VIEW TFV AS SELECT T.oid, T.tid, T.item FROM T, F WHERE T.item = F.item Infrequent Items SQL statements to filter infrequent items

INSERT INTO TF (tid, oid, item) SELECT tid, oid, item FROM TFV WHERE tid IN (SELECT tid FROM TFV GROUP BY tid HAVING COUNT(item) >= MinLength) STEP 2: Filter short transactions! DEF: A transaction is short if the number of items in it >= MinLength (for example 4) Short Transactions SQL statement to filter short transactions

INSERT INTO T_i (oid, tid, item) SELECT t1.oid, t1.tid, t1.item FROM TF t1, TF t2 WHERE t1.oid = t2.oid and t1.tid = t2.tid and t2.item = i SQL statement to construct item-projected DB support in predecessor DB support in item-projected DB projecting item single most frequent itemset in item-projected DB INSERT INTO FT_i (item, count) SELECT item, COUNT(*) FROM T_i GROUP BY item HAVING COUNT(DISTINCT oid) >= n SELECT item FROM FT_i WHERE count =(SELECT MAX(count) FROM FT_i) SQL statements to find the single most frequent itemset STEP 3 & 4: Project DB and find the single most frequent itemset! DEF: An item-projected DB, T_i, contains all the items from the transactions containing item i. There is a single most frequent itemset in any T_i, and its items all have maximal support in T_i.

support in predecessor DB support in item-projected DB Unnecessary items INSERT INTO FT_i (item, count) SELECT item, count(*) count FROM T_i GROUP BY item HAVING COUNT(DISTINCT oid) >= n DELETE FROM TF WHERE TF.item IN (SELECT F.item FROM F, FT_i WHERE F.item = FT_i.item AND F.count = FT_i.count) SQL statements to delete unnecessary items STEP 5: Delete unnecessary items from predecessor DB! An item that has the same support in the projected DB as in the predecessor DB, can be deleted from the later.

Project, find, delete, project, find, delete,... until I find: 3 patterns in the sample trajectory DB, AND in a real-world dataset with 1.8 million records, 28 (at least 200 long) patterns that were supported at least by 4 trajectories, which belonged to at least 2 users.

Performance results for varying MinSupp My experiments show, that: 1.the SQL implementation of my algorithm is effective 2.my algorithm behaves as expected when I vary its parameters Performance results for varying MinLength

Alright! That’s it! I am goin’ HOME! I learned that such an algorithm has many uses in moving object DB management, Location-Based Services, and Location-Based Advertising. AMM, I MMMM EHMM MMM, SQL AHMM, LBS, MMMMMM CARPOOLING…? We all learned something today,… I learned that Cartman is not such a d____s after all, and can create something simple, effective, and useful.