Assembler Efficient Discovery of Spatial Co-evolving Patterns in Massive Geo-sensory Data Sheng QIAN 2015-08-01 SIGKDD 2015.

Slides:



Advertisements
Similar presentations
Mining User Similarity Based on Location History Yu Zheng, Quannan Li, Xing Xie Microsoft Research Asia.
Advertisements

Correlation Search in Graph Databases Yiping Ke James Cheng Wilfred Ng Presented By Phani Yarlagadda.
More on Clustering Hierarchical Clustering to be discussed in Clustering Part2 DBSCAN will be used in programming project.
Course Syllabus 1.Color 2.Camera models, camera calibration 3.Advanced image pre-processing Line detection Corner detection Maximally stable extremal regions.
電腦視覺 Computer and Robot Vision I Chapter2: Binary Machine Vision: Thresholding and Segmentation Instructor: Shih-Shinh Huang 1.
Course Syllabus 1.Color 2.Camera models, camera calibration 3.Advanced image pre-processing Line detection Corner detection Maximally stable extremal regions.
Texture Segmentation Based on Voting of Blocks, Bayesian Flooding and Region Merging C. Panagiotakis (1), I. Grinias (2) and G. Tziritas (3)
Constructing Popular Routes from Uncertain Trajectories Ling-Yin Wei 1, Yu Zheng 2, Wen-Chih Peng 1 1 National Chiao Tung University, Taiwan 2 Microsoft.
Data Mining Association Analysis: Basic Concepts and Algorithms
COM (Co-Occurrence Miner): Graph Classification Based on Pattern Co-occurrence Ning Jin, Calvin Young, Wei Wang University of North Carolina at Chapel.
Core Text Mining Operations 2007 년 02 월 06 일 부산대학교 인공지능연구실 한기덕 Text : The Text Mining Handbook pp.19~41.
University of Minnesota
Video summarization by graph optimization Lu Shi Oct. 7, 2003.
CS292 Computational Vision and Language Visual Features - Colour and Texture.
A Multiresolution Symbolic Representation of Time Series
Algorithm: For all e E t, define X e = {w e if e G t, 1 - w e otherwise}. Measure likelihood of substructure S by. Flag S as anomalous if, where is an.
CSci 6971: Image Registration Lecture 5: Feature-Base Regisration January 27, 2004 Prof. Chuck Stewart, RPI Dr. Luis Ibanez, Kitware Prof. Chuck Stewart,
On Power-Law Relationships of the Internet Topology.
Minimizing interference for the highway model in Wireless Ad-hoc and Sensor Networks Haisheng Tan, Tiancheng, Lou, Francis C.M. Lau, YuexuanWang, Shiteng.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
Distributed Constraint Optimization Michal Jakob Agent Technology Center, Dept. of Computer Science and Engineering, FEE, Czech Technical University A4M33MAS.
Efficient Algorithms for Robust Feature Matching Mount, Netanyahu and Le Moigne November 7, 2000 Presented by Doe-Wan Kim.
Efficient Gathering of Correlated Data in Sensor Networks
1 SD-Rtree: A Scalable Distributed Rtree Witold Litwin & Cédric du Mouza & Philippe Rigaux.
資訊碩一 蔡勇儀  Introduction  Method  Background generation and updating  Detection of moving object  Shape control points.
Time-focused density-based clustering of trajectories of moving objects Margherita D’Auria Mirco Nanni Dino Pedreschi.
FlowString: Partial Streamline Matching using Shape Invariant Similarity Measure for Exploratory Flow Visualization Jun Tao, Chaoli Wang, Ching-Kuang Shene.
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
Knowledge Discovery and Delivery Lab (ISTI-CNR & Univ. Pisa)‏ www-kdd.isti.cnr.it Anna Monreale Fabio Pinelli Roberto Trasarti Fosca Giannotti A. Monreale,
Tracking with Unreliable Node Sequences Ziguo Zhong, Ting Zhu, Dan Wang and Tian He Computer Science and Engineering, University of Minnesota Infocom 2009.
Trajectory Pattern Mining
Distributed Anomaly Detection in Wireless Sensor Networks Ksutharshan Rajasegarar, Christopher Leckie, Marimutha Palaniswami, James C. Bezdek IEEE ICCS2006(Institutions.
Boundary Recognition in Sensor Networks by Topology Methods Yue Wang, Jie Gao Dept. of Computer Science Stony Brook University Stony Brook, NY Joseph S.B.
Time Series Data Analysis - I Yaji Sripada. Dept. of Computing Science, University of Aberdeen2 In this lecture you learn What are Time Series? How to.
Spatial-Temporal Models in Location Prediction Jingjing Wang 03/29/12.
HW#2: A Strategy for Mining Association Rules Continuously in POS Scanner Data.
Efficient Elastic Burst Detection in Data Streams Yunyue Zhu and Dennis Shasha Department of Computer Science Courant Institute of Mathematical Sciences.
Course 13 Curves and Surfaces. Course 13 Curves and Surface Surface Representation Representation Interpolation Approximation Surface Segmentation.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
The Haar + Tree: A Refined Synopsis Data Structure Panagiotis Karras HKU, September 7 th, 2006.
1 Shape Segmentation and Applications in Sensor Networks Xianjin Xhu, Rik Sarkar, Jie Gao Department of CS, Stony Brook University INFOCOM 2007.
Outline Introduction – Frequent patterns and the Rare Item Problem – Multiple Minimum Support Framework – Issues with Multiple Minimum Support Framework.
Challenges in Mining Large Image Datasets Jelena Tešić, B.S. Manjunath University of California, Santa Barbara
CVPR 2006 New York City Spatial Random Partition for Common Visual Pattern Discovery Junsong Yuan and Ying Wu EECS Dept. Northwestern Univ.
Forecasting Fine-Grained Air Quality Based on Big Data Date: 2015/10/15 Author: Yu Zheng, Xiuwen Yi, Ming Li1, Ruiyuan Li1, Zhangqing Shan, Eric Chang,
Database Management Systems, R. Ramakrishnan 1 Algorithms for clustering large datasets in arbitrary metric spaces.
Predicting the Location and Time of Mobile Phone Users by Using Sequential Pattern Mining Techniques Mert Özer, Ilkcan Keles, Ismail Hakki Toroslu, Pinar.
Indexing Time Series. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases Time Series databases Text databases.
Clustering High-Dimensional Data. Clustering high-dimensional data – Many applications: text documents, DNA micro-array data – Major challenges: Many.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Using category-Based Adherence to Cluster Market-Basket Data Author : Ching-Huang Yun, Kun-Ta Chuang, Ming-Syan Chen Graduate : Chien-Ming Hsiao.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
黃福銘 (Angus F.M. Huang) ANTS Lab, IIS, Academia Sinica Exploring Spatial-Temporal Trajectory Model for Location.
Location-based Social Networks 6/11/20161 CENG 770.
University at BuffaloThe State University of New York Pattern-based Clustering How to cluster the five objects? qHard to define a global similarity measure.
Shape2Pose: Human Centric Shape Analysis CMPT888 Vladimir G. Kim Siddhartha Chaudhuri Leonidas Guibas Thomas Funkhouser Stanford University Princeton University.
DATA MINING and VISUALIZATION Instructor: Dr. Matthew Iklé, Adams State University Remote Instructor: Dr. Hong Liu, Embry-Riddle Aeronautical University.
Feature learning for multivariate time series classification Mustafa Gokce Baydogan * George Runger * Eugene Tuv † * Arizona State University † Intel Corporation.
Dr. Hongqin FAN Department of Building and Real Estate
More on Clustering in COSC 4335
Computing and Compressive Sensing in Wireless Sensor Networks
Supervised Time Series Pattern Discovery through Local Importance
Fast Approximate Query Answering over Sensor Data with Deterministic Error Guarantees Chunbin Lin Joint with Etienne Boursier, Jacque Brito, Yannis Katsis,
Mining Spatio-Temporal Reachable Regions over Massive Trajectory Data
Chao Zhang1, Yu Zheng2, Xiuli Ma3, Jiawei Han1
CSE572, CBS572: Data Mining by H. Liu
Data Mining: Introduction
CSE572: Data Mining by H. Liu
Presentation transcript:

Assembler Efficient Discovery of Spatial Co-evolving Patterns in Massive Geo-sensory Data Sheng QIAN SIGKDD 2015

Content 1. Introduction 2. Problem Description 3. The Assembler Method Stage I Detecting Individual Evolutions Stage II SCP Generation Time and space complexity 4. Experiment

Introduction Spatial Co-evolving Patterns(SCP) e.g. AQI Sensors in Beijing

Introduction Challenge Interesting evolutions are often flooded by trivial fluctuations The pattern search space is extremely large

Problem Description Our Interest

Problem Description Symbol S = {s 1, s 2,..., s m }Sensors l i Location of s i T = {t 1, t 2,..., t n }Time domain

Problem Description Definitions

Definitions

Definitions

Method: I. Detecting Individual Evolutions Haar Wavelet Transformation

Method: I. Detecting Individual Evolutions Haar Wavelet Transformation c ij

Method: I. Detecting Individual Evolutions Evolving interval extraction

Method: I. Detecting Individual Evolutions Mining Frequent Evolutions Segment-and-group approach 1. Segement: bottom-up 2. Mean Shift: divide segements into groups such that the segments in the same group have similar slopes

Method: II. SCP Generation The Anti-monotonicity Property

Method: II. SCP Generation Find SCP by intersecting matching timestamps

Method: II. SCP Generation SCP Search Tree

Method: II. SCP Generation Neighbor and Parent

Method: II. SCP Generation SCP Search Tree

Method: II. SCP Generation Algorithm

Mining Frequent Evolutions Segment-and-group approach 1. Segement: bottom-up 2. Mean Shift: divide segements into groups such that the segments in the same group have similar slopes

Method: Discussion Time Complexity Segment approach : Segment approach : O(n e · l e · l s ) ≈ O(m) ls is small, ne · le <m Mean Shift : Mean Shift : O(n l · k) ≈ O(m) k: the avg. number of shifting operation Second Stage : Second Stage : O(n G (n|E G | + n p 2 n s )) n G : the number of connected components in G that have SCPs |E G | : the number of edges in G n p : the maximum number of SCPs on a connected component n s : the maximum support of an SCP

Method: Discussion Space Complexity Segment & Mean Shift: nearly linear Second Stage: Second Stage: O(n · n p · n s )

Method: Discussion Parameters Setting The minimum support θ How many occurrences can be considered frequent enough The distance threshold h What distance makes two sensors reachable The change threshold δ How much change in the reading reflects a significant and unusual behavior The mean shift bandwidth ω

Experiment Dataset 1. Air is an air quality data set. 180 air quality sensors are deployed in 16 cities in northern China (Beijing, Tianjin, and 14 cities in the Hebei Province). Each sensor has measured the hourly AQI during the period – Bike is the Citi Bike rental data set for the 332 rental docks in New York, we record the number of available bikes at each dock every 30 minutes during – Syn-Sensor is a collection of 4 synthetic data sets used to evaluate the scalability of Assembler w.r.t. the number of sensors n

Experiment Illumination

Illumination

Efficiency Study Varing and h Efficiency Study Varing θ and h

Experiment Efficiency Study Varing and w Efficiency Study Varing δ and w

Experiments Scalability

Thank you