Chao Zhang1, Yu Zheng2, Xiuli Ma3, Jiawei Han1

Slides:



Advertisements
Similar presentations
Arnd Christian König Venkatesh Ganti Rares Vernica Microsoft Research Entity Categorization Over Large Document Collections.
Advertisements

ADAPTIVE FASTEST PATH COMPUTATION ON A ROAD NETWORK: A TRAFFIC MINING APPROACH Hector Gonzalez, Jiawei Han, Xiaolei Li, Margaret Myslinska, John Paul Sondag.
Swarm: Mining Relaxed Temporal Moving Object Clusters
 Data mining has emerged as a critical tool for knowledge discovery in large data sets. It has been extensively used to analyze business, financial,
VLDB 2011 Pohang University of Science and Technology (POSTECH) Republic of Korea Jongwuk Lee, Seung-won Hwang VLDB 2011.
gSpan: Graph-based substructure pattern mining
HIERARCHY REFERENCING TIME SYNCHRONIZATION PROTOCOL Prepared by : Sunny Kr. Lohani, Roll – 16 Sem – 7, Dept. of Comp. Sc. & Engg.
Mining Frequent Patterns in Data Streams at Multiple Time Granularities CS525 Paper Presentation Presented by: Pei Zhang, Jiahua Liu, Pengfei Geng and.
Social Media Mining Chapter 5 1 Chapter 5, Community Detection and Mining in Social Media. Lei Tang and Huan Liu, Morgan & Claypool, September, 2010.
Image Indexing and Retrieval using Moment Invariants Imran Ahmad School of Computer Science University of Windsor – Canada.
Frequent Subgraph Pattern Mining on Uncertain Graph Data
Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1.
Next Century Challenges: Scalable Coordination in Sensor Networks Deborah Estrin, Ramesh Govindan, John Heidemann, Satish Kumar (Some images and slides.
Using Contiguous Bi-Clustering for data driven temporal analysis of fMRI based functional connectivity.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
University of Minnesota
Towards Scalable Critical Alert Mining Bo Zong 1 with Yinghui Wu 1, Jie Song 2, Ambuj K. Singh 1, Hasan Cam 3, Jiawei Han 4, and Xifeng Yan 1 1 UCSB, 2.
RACE: Time Series Compression with Rate Adaptivity and Error Bound for Sensor Networks Huamin Chen, Jian Li, and Prasant Mohapatra Presenter: Jian Li.
Stream Clustering CSE 902. Big Data Stream analysis Stream: Continuous flow of data Challenges ◦Volume: Not possible to store all the data ◦One-time.
Rui Yan, Yan Zhang Peking University
USpan: An Efficient Algorithm for Mining High Utility Sequential Patterns Authors: Junfu Yin, Zhigang Zheng, Longbing Cao In: Proceedings of the 18th ACM.
Last Words COSC Big Data (frameworks and environments to analyze big datasets) has become a hot topic; it is a mixture of data analysis, data mining,
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 2007 (TPDS 2007)
Assembler Efficient Discovery of Spatial Co-evolving Patterns in Massive Geo-sensory Data Sheng QIAN SIGKDD 2015.
An Example Use Case Scenario
Tracking with Unreliable Node Sequences Ziguo Zhong, Ting Zhu, Dan Wang and Tian He Computer Science and Engineering, University of Minnesota Infocom 2009.
Time Series Data Analysis - I Yaji Sripada. Dept. of Computing Science, University of Aberdeen2 In this lecture you learn What are Time Series? How to.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
1 Frequent Subgraph Mining Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY June 12, 2010.
On Node Classification in Dynamic Content-based Networks.
DATA AGGREGATION Siddhartha Sarkar Roll no: CSE-4 th Year-7 th semester Sensor Networks (CS 704D) Assignment.
Finding Top-k Shortest Path Distance Changes in an Evolutionary Network SSTD th August 2011 Manish Gupta UIUC Charu Aggarwal IBM Jiawei Han UIUC.
Data Mining Find information from data data ? information.
Mining Document Collections to Facilitate Accurate Approximate Entity Matching Presented By Harshda Vabale.
Mining Top-K Large Structural Patterns in a Massive Network Feida Zhu 1, Qiang Qu 2, David Lo 1, Xifeng Yan 3, Jiawei Han 4, and Philip S. Yu 5 1 Singapore.
1 Finding Spread Blockers in Dynamic Networks (SNAKDD08)Habiba, Yintao Yu, Tanya Y., Berger-Wolf, Jared Saia Speaker: Hsu, Yu-wen Advisor: Dr. Koh, Jia-Ling.
Clustering High-Dimensional Data. Clustering high-dimensional data – Many applications: text documents, DNA micro-array data – Major challenges: Many.
CLUSTERING HIGH-DIMENSIONAL DATA Elsayed Hemayed Data Mining Course.
Arizona State University1 Fast Mining of a Network of Coevolving Time Series Wei FanHanghang TongPing JiYongjie Cai.
黃福銘 (Angus F.M. Huang) ANTS Lab, IIS, Academia Sinica Exploring Spatial-Temporal Trajectory Model for Location.
Construction of Optimal Data Aggregation Trees for Wireless Sensor Networks Deying Li, Jiannong Cao, Ming Liu, and Yuan Zheng Computer Communications and.
University at BuffaloThe State University of New York Pattern-based Clustering How to cluster the five objects? qHard to define a global similarity measure.
DATA MINING TECHNIQUES (DECISION TREES ) Presented by: Shweta Ghate MIT College OF Engineering.
Gspan: Graph-based Substructure Pattern Mining
Implementation of Classifier Tool in Twister Magesh khanna Vadivelu Shivaraman Janakiraman.
ST-MVL: Filling Missing Values in Geo-sensory Time Series Data
SNS COLLEGE OF TECHNOLOGY
Data Mining Find information from data data ? information.
Data Mining Soongsil University
T-Share: A Large-Scale Dynamic Taxi Ridesharing Service
Computing and Compressive Sensing in Wireless Sensor Networks
Urban Sensing Based on Human Mobility
G10 Anuj Karpatne Vijay Borra
Multi-level predictive analytics and motif discovery across large dynamic spatiotemporal networks and in complex sociotechnical systems: An organizational.
Author: Hsun-Ping Hsieh, Shou-De Lin, Yu Zheng
Wireless Sensor Network Architectures
Fast Approximate Query Answering over Sensor Data with Deterministic Error Guarantees Chunbin Lin Joint with Etienne Boursier, Jacque Brito, Yannis Katsis,
Mining Spatio-Temporal Reachable Regions over Massive Trajectory Data
Jiawei Han Department of Computer Science
Latent Space Model for Road Networks to Predict Time-Varying Traffic
Transactional data Algorithm Applications
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
A Fast Algorithm for Subspace Clustering by Pattern Similarity
Graph Classification SEG 5010 Week 3.
Geometrically Inspired Itemset Mining*
Data Mining: Concepts and Techniques
Multi-Information Based GCPs Selection Method
2019/9/14 The Deep Learning Vision for Heterogeneous Network Traffic Control Proposal, Challenges, and Future Perspective Author: Nei Kato, Zubair Md.
Efficient Aggregation over Objects with Extent
Presentation transcript:

Chao Zhang1, Yu Zheng2, Xiuli Ma3, Jiawei Han1 Efficient Discovery of Spatial Co-evolving Patterns in Massive Geo-sensory Data Chao Zhang1, Yu Zheng2, Xiuli Ma3, Jiawei Han1 1UIUC, 2Microsoft, 3PKU czhang82@illinois.edu

Big Geo-Sensory Data is Ubiquitous Wireless sensor network (WSN): multiple sensors are deployed at different locations to monitor the target condition cooperatively. The geo-sensory data is becoming big A modern WSN can contain hundreds of sensors, with each sensor collecting millions of records.

Spatial Co-evolving Pattern: An Example Goal: mining a set of spatially correlated sensors that exhibit frequent co-evolution. Frequent co-evolution for [s1, s2, s3] s1 decreases in AQI, s2 and s3 increase in AQI Caused by traffic flow in off-work hours

Spatial Co-evolving Pattern Mining A spatial co-evolving pattern (SCP) contains a set of spatially connected sensors the frequent co-evolution in their readings We aim to find all SCPs from the input geo-sensory data.

Why is it a Challenging Problem? The truly interesting evolutions are often flooded by numerous trivial fluctuations. Existing motif discovery methods can only find trivial motifs from such data. AQI Change Distribution

Why is it a Challenging Problem? The combinatorial nature of SCP leads to an extremely large search space. An arbitrary number of sensors. The occurring time intervals of an SCP is uncertain. Spatial combination Temporal combination

Assembler: A Two-stage Approach Stage I: find frequent evolutions for individual sensors. Stage II: assemble individual evolutions into SCPs based on the spatial constraint.

Stage I: Mining Frequent Evolutions for Individual Sensors To find interesting evolutions, we must filter trivial fluctuations and identify evolving intervals. In geo-sensory data, the changes occur with different rates and durations.

Extract Evolutions using Wavelet Transform We capture multi-scale changes using wavelet transform. In the wavelet space, the coefficients of different bases measure the strengths of changes. We preserve large coefficients and discard small coefficients.

Detecting Frequent Evolutions After extracting evolving intervals in the time series, we detect frequent evolutions via clustering. A segment-and-group approach: Partition each interval into line segments. Use mean shift to cluster the line segments based on slope (change rate). Segment Mean shift clustering

Stage II: SCP Generation From individual sensors to groups of sensors Pattern assembling via timestamp intersection.

Stage II: SCP Generation Anti-monotonicity: if one set of sensors have SCP, any of its subsets must also have SCP. A baseline method based on Apriori: starting with patterns on individual sensors, obtain SCPs in a bottom-up manner. The baseline method is not efficient enough It generates numerous candidates and keep them in memory. It performs pair-wise comparison to examine whether two candidates can be joined.

Stage II: SCP Generation Can we leverage the spatial constraint to generate SCPs more efficiently? The connectivity graph Each set of spatially connected sensors corresponds to a connected component in the connectivity graph.

Parent Relation We define the parent relation between two connected components. Example: Suppose V = 1 -> 2 -> 3 -> 4 -> 5 -> 6, then the parent relation generates: {245} -> {25} -> {5} ->

The SCP Search Tree Starting from any connected component, by performing the roll-up operation, we can reach the same node . A tree structure: each node is a connected component along with the SCPs occurring on it.

Reverse Search of SCPs Starting from the root node, we perform depth-first construction of the SCP search tree SCPs are obtained on-the-fly Unqualified branches are pruned with anti-monotonicity.

Experimental Evaluation Data sets Air: the AQI data collected by 180 sensors in northern China during 1.5 years. Bike: the bike rental data of 332 docks in New York during one year.

Example SCPs On the Air data set:

Example SCPs On the Bike data set:

Running Time Comparison Efficiency Scalability

Summary We study the problem of mining spatial co-evolving patterns from massive geo-sensory data. We propose the two-stage method Assembler: Stage 1: it obtains frequent evolutions for individual sensors. Stage 2: it assembles single patterns into SCPs. The experiment results show that Assembler is effective and efficient.