1 Knowledge Discovery from Transportation Network Data Paper Review Jiang, W., Vaidya, J., Balaporia, Z., Clifton, C., and Banich, B. Knowledge Discovery.

Slides:



Advertisements
Similar presentations
ADAPTIVE FASTEST PATH COMPUTATION ON A ROAD NETWORK: A TRAFFIC MINING APPROACH Hector Gonzalez, Jiawei Han, Xiaolei Li, Margaret Myslinska, John Paul Sondag.
Advertisements

Ch2 Data Preprocessing part3 Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
 Data mining has emerged as a critical tool for knowledge discovery in large data sets. It has been extensively used to analyze business, financial,
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
gSpan: Graph-based substructure pattern mining
Native-Conflict-Aware Wire Perturbation for Double Patterning Technology Szu-Yu Chen, Yao-Wen Chang ICCAD 2010.
1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)
Spatiotemporal Pattern Mining For Travel Behavior Prediction UIC IGERT Seminar 02/14/2007 Chad Williams.
Data Mining Techniques: Clustering
Critical Analysis Presentation: T-Drive: Driving Directions based on Taxi Trajectories Authors of Paper: Jing Yuan, Yu Zheng, Chengyang Zhang, Weilei Xie,
C++ Programming: Program Design Including Data Structures, Third Edition Chapter 21: Graphs.
Data Mining and Intrusion Detection
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Spatial and Temporal Data Mining V. Megalooikonomou Introduction to Decision Trees ( based on notes by Jiawei Han and Micheline Kamber and on notes by.
Using Search in Problem Solving
Graph-Based Concept Learning Jesus A. Gonzalez, Lawrence B. Holder, and Diane J. Cook Department of Computer Science and Engineering University of Texas.
Structural Knowledge Discovery Used to Analyze Earthquake Activity Jesus A. Gonzalez Lawrence B. Holder Diane J. Cook.
Graph-Based Data Mining Diane J. Cook University of Texas at Arlington
CSE 222 Systems Programming Graph Theory Basics Dr. Jim Holten.
Assets and Dynamics Computation for Virtual Worlds.
Discovery of Aggregate Usage Profiles for Web Personalization
ML ALGORITHMS. Algorithm Types Classification (supervised) Given -> A set of classified examples “instances” Produce -> A way of classifying new examples.
Data Mining – Intro.
The Shortest Path Problem
Chapter 5 Data mining : A Closer Look.
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.
Data Mining Techniques
Introduction to Operations Research
1 SUBSTRUCTURE DISCOVERY IN REAL WORLD SPATIO-TEMPORAL DOMAINS Jesus A. Gonzalez Supervisor:Dr. Lawrence B. Holder Committee:Dr. Diane J. Cook Dr. Lynn.
Network Aware Resource Allocation in Distributed Clouds.
1 Fast Failure Recovery in Distributed Graph Processing Systems Yanyan Shen, Gang Chen, H.V. Jagadish, Wei Lu, Beng Chin Ooi, Bogdan Marius Tudor.
Rate-based Data Propagation in Sensor Networks Gurdip Singh and Sandeep Pujar Computing and Information Sciences Sanjoy Das Electrical and Computer Engineering.
Representing and Using Graphs
Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU.
Hub Location Problems Chapter 12
1 Discovering Robust Knowledge from Databases that Change Chun-Nan HsuCraig A. Knoblock Arizona State UniversityUniversity of Southern California Journal.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Finding Top-k Shortest Path Distance Changes in an Evolutionary Network SSTD th August 2011 Manish Gupta UIUC Charu Aggarwal IBM Jiawei Han UIUC.
Privacy-Preserving K-means Clustering over Vertically Partitioned Data Reporter : Ximeng Liu Supervisor: Rongxing Lu School of EEE, NTU
University at BuffaloThe State University of New York Lei Shi Department of Computer Science and Engineering State University of New York at Buffalo Frequent.
Challenges in Mining Large Image Datasets Jelena Tešić, B.S. Manjunath University of California, Santa Barbara
Fig.1. Flowchart Functional network identification via task-based fMRI To identify the working memory network, each participant performed a modified version.
An Efficient Linear Time Triple Patterning Solver Haitong Tian Hongbo Zhang Zigang Xiao Martin D.F. Wong ASP-DAC’15.
Presenter : Kuang-Jui Hsu Date : 2011/3/24(Thur.).
Presentation Template KwangSoo Yang Florida Atlantic University College of Engineering & Computer Science.
1 Subgraphs A subgraph S of a graph G is a graph such that The vertices of S are a subset of the vertices of G The edges of S are a subset of the edges.
Machine Learning Queens College Lecture 7: Clustering.
Introduction to Data Mining by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.
Data Structures and Algorithms in Parallel Computing Lecture 7.
Graphs. Introduction Graphs are a collection of vertices and edges Graphs are a collection of vertices and edges The solid circles are the vertices A,
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
Knowledge Discovery and Data Mining 19 th Meeting Course Name: Business Intelligence Year: 2009.
Chapter 20: Graphs. Objectives In this chapter, you will: – Learn about graphs – Become familiar with the basic terminology of graph theory – Discover.
Indexing and Mining Free Trees Yun Chi, Yirong Yang, Richard R. Muntz Department of Computer Science University of California, Los Angeles, CA {
Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1),
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
A K-Main Routes Approach to Spatial Network Activity Summarization(SNAS) Group 8.
Gspan: Graph-based Substructure Pattern Mining
Improving Parallelism in Structural Data Mining Min Cai, Istvan Jonyer, Marcin Paprzycki Computer Science Department, Oklahoma State University, Stillwater,
Graphs – Breadth First Search
Data Mining K-means Algorithm
Nithin Michael, Yao Wang, G. Edward Suh and Ao Tang Cornell University
Mining Frequent Subgraphs
Spatio-temporal Rule Mining: Issues and Techniques
Latent Space Model for Road Networks to Predict Time-Varying Traffic
Critical Issues with Respect to Clustering
Clustering Wei Wang.
Resource Allocation for Distributed Streaming Applications
Presentation transcript:

1 Knowledge Discovery from Transportation Network Data Paper Review Jiang, W., Vaidya, J., Balaporia, Z., Clifton, C., and Banich, B. Knowledge Discovery from Transportation Network Data. In ICDE, 2005

2 Outline ● Background. ● Experiments. Structurally Similar Routes Temporally Repeated Routes ● Experiment results. ● Conventional techniques. ● New challenges.

3 A natural application area for Data Mining ● Transportation and logistics are an important sector of the economy. --Transportation consumes 60% of oil worldwide ● Data mining has lead to significant gains in other areas ● Computer use is widespread in transportation and logistics. --Inventory management, parcel tracking, and even on- truck location sensors

4 Existing Applications Data Mining ● Mining with transactional characteristics of freight and events. -- i.e. classification on safety/accident records might find that trucks are prone to accidents at 7:00 AM on east - west roads. -- NO geometry of the network. Network Structure ● Optimization -- Finds solution (Minimize cost)

5 Transportation Networks ● Graph problems ● Graph mining i.e. Finding the frequent sub-graphs Algorithms * WARMR * AGM * SUBDUE * FSG

6 Dataset ● Six months of origin-destination (OD) data from a large third-party logistic company. 98,292 transactions. ● Represented as a directed graph by mapping locations to vertices. ● Each transaction can then be represented as the edge of an OD pair. ● The edges are labeled with the other attributes of the transaction: pickup date, delivery date, distance, hours, weight, and mode. (binning strategy)

7

8 Mining Interests ● Structurally Similar Routes --Identify structurally similar patterns that occur in many locations. Methods * SUBDUE * FSG ● Temporally Repeated Routes --Find patterns of routes repeated in time, rather than space. Method * FSG

9 Structurally Similar Routes ● We assign all vertices the same label. ● Three variants for edge labels: weight, distance, and time. -- OD_TD : TOTAL-DISTANCE -- OD_GW : GROSS-WEIGHT -- OD_TH : MOVE-TRANSIT-HOURS

10 Experiments with SUBDUE (MDL principle) SUBDUE: A substructure discovery system Results: ● Took about 3.25 hours to handle a graph of 100 vertices and 561 edges to find the best 3 patterns of beam size 4. ● Would need 6 months on the complete graph. ● Results were trivial.

11 ● Significant traffic from node 2 to node 4 via node 3, but not much return traffic (deadheading)

12 Experiments with FSG ● FSG mines patterns across a set of graph transactions. ● Divides the single graph into multiple distinct sub-graphs, and treats each sub-graph as a separate transaction. ✔ Breadth first partitioning ✔ Depth first partitioning ✔ Both may result in patterns being broken across partitions

13 Results ● Partition sizes; 400, 800, 1200 and ● Depth-first partitioning: 200 frequent patterns were found with the minimum support 120. ● Breadth-first partitioning: 667 frequent patterns were found with the minimum support 240. ● Had runtime and memory problems with lower supports on the breadth-first partitions. ● FSG is not an appropriate tool to use for mining recurrence patterns in a large single graph

14

15 Temporally Repeated Routes ● FSG ● Exploits the temporal nature of the transportation graph ● Partition each graph into a set of graph transactions based on date

16 Results ● Unable to run FSG on the entire data set due to insufficient memory / swap space. ● Most were small patterns. (The following is the biggest one)

17 Patterns Discovered by Using Conventional Mining Algorithms ● Mapped the dataset into a standard “transactional” representation. ● Used traditional data mining approaches. ● Used Weka for association rule mining, instance (tuple) classification and cluster analysis on the transportation data.

18 Evaluations of Conventional Algorithms ● Traditional data mining techniques have produced interesting and meaningful results to summarize our data. ● Further experimentation is required to explore the potential and limitations of these techniques on temporal transportation network data. ● Lose some insights from the structural characteristics of the data.

19 Challenges for Data Mining Research ● Handling the temporal aspects of graphs (dynamic graphs). ● Incorporating the notion of events into a graph. ● Expanding graph mining techniques beyond data similar to molecular structures. ● Determining what makes a graph pattern interesting.