DB group seminar 2006/06/29The University of Hong Kong, Dept. of Computer Science Neighborhood based detection of anomalies in high dimensional spatio-temporal.

Slides:



Advertisements
Similar presentations
Clustering Data Streams Chun Wei Dept Computer & Information Technology Advisor: Dr. Sprague.
Advertisements

7/03Spatial Data Mining G Dong (WSU) & H. Liu (ASU) 1 6. Spatial Mining Spatial Data and Structures Images Spatial Mining Algorithms.
FATEMEH ARBAB DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF CALGARY WINTER 2009 Ear Biometric.
Data Mining Cluster Analysis: Advanced Concepts and Algorithms
9/15/2008 CTBTO Data Mining/Data Fusion Workshop 1 Spatiotemporal Stream Mining Applied to Seismic+ Data Margaret H. Dunham CSE Department Southern Methodist.
TASC: Topology Adaptive Spatial Clustering for Sensor Networks
WFM 6202: Remote Sensing and GIS in Water Management © Dr. Akm Saiful IslamDr. Akm Saiful Islam WFM 6202: Remote Sensing and GIS in Water Management Akm.
Automatically Annotating and Integrating Spatial Datasets Chieng-Chien Chen, Snehal Thakkar, Crail Knoblock, Cyrus Shahabi Department of Computer Science.
The Evolution of Spatial Outlier Detection Algorithms - An Analysis of Design CSci 8715 Spatial Databases Ryan Stello Kriti Mehra.
Spatial Analysis – vector data analysis
Spatial Data Mining. 2 Introduction Spatial data mining is the process of discovering interesting, useful, non-trivial patterns from large spatial datasets.
Discrete Geometry Tutorial 2 1
CPSC 335 Geometric Data Structures in Computer Modeling and GIS Dr. Marina L. Gavrilova Assistant Professor Dept of Comp. Science, University of Calgary,
Spatio – Temporal Outlier Detection in Environmental Data
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 12 —
Data Mining Cluster Analysis: Advanced Concepts and Algorithms Lecture Notes for Chapter 9 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
CPSC 695 Future of GIS Marina L. Gavrilova. The future of GIS.
University of CreteCS4831 The use of Minimum Spanning Trees in microarray expression data Gkirtzou Ekaterini.
Spatial Outlier Detection and implementation in Weka Implemented by: Shan Huang Jisu Oh CSCI8715 Class Project, April Presented by Jisu.
Information Systems and GIS Chapter 2 Slides from James Pick, Geo-Business: GIS in the Digital Organization, John Wiley and Sons, Copyright © 2008.
© University of Minnesota Data Mining CSCI 8980 (Fall 2002) 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center.
Geographic Information Systems : Data Types, Sources and the ArcView Program.
Computer Modelling Of Fallen Snow Paul Fearing University of British Columbia Vancouver, Canada.
A Unified Approach to Spatial Outliers Detection Chang-Tien Lu Spatial Database Lab Department of Computer Science University of Minnesota
2015/7/21 Incremental Clustering for Mining in a Data Warehousing Environment Martin Ester Hans-Peter Kriegel J.Sander Michael Wimmer Xiaowei Xu Proceedings.
Basic Concepts of GIS January 29, What is GIS? “A powerful set of tools for collecting, storing, retrieving, transforming and displaying spatial.
Prepared by Abzamiyeva Laura Candidate of the department of KKGU named after Al-Farabi Kizilorda, Kazakstan 2012.
Basic Spatial Analysis
UNC Chapel Hill M. C. Lin Point Location Chapter 6 of the Textbook –Review –Algorithm Analysis –Dealing with Degeneracies.
Detecting Distance-Based Outliers in Streams of Data Fabrizio Angiulli and Fabio Fassetti DEIS, Universit `a della Calabria CIKM 07.
Data Mining Techniques
11 Comparison of Perturbation Approaches for Spatial Outliers in Microdata Natalie Shlomo* and Jordi Marés** * Social Statistics, University of Manchester,
Preparing Data for Analysis and Analyzing Spatial Data/ Geoprocessing Class 11 GISG 110.
Dr. Marina Gavrilova 1.  Autocorrelation  Line Pattern Analyzers  Polygon Pattern Analyzers  Network Pattern Analyzes 2.
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Clustering CS 685: Special Topics in Data Mining Spring 2008 Jinze Liu.
The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.
Tessellations Sets of connected discrete two-dimensional units -can be irregular or regular –regular (infinitely) repeatable patter of regular polygon.
1 CSE 980: Data Mining Lecture 17: Density-based and Other Clustering Algorithms.
A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.
Garrett Poppe, Liv Nguekap, Adrian Mirabel CSUDH, Computer Science Department.
Density-Based Clustering Algorithms
Spatial Data Mining Ashkan Zarnani Sadra Abedinzadeh Farzad Peyravi.
RDF: A Density-based Outlier Detection Method Using Vertical Data Representation Dongmei Ren, Baoying Wang, William Perrizo North Dakota State University,
Spatial & Terrain Analysis Nigel Trodd Coventry University in 3D.
October 27, 2015Data Mining: Concepts and Techniques1 Data Mining: Concepts and Techniques — Slides for Textbook — — Chapter 7 — ©Jiawei Han and Micheline.
1 Clustering Sunita Sarawagi
On Graphs Supporting Greedy Forwarding for Directional Wireless Networks W. Si, B. Scholz, G. Mao, R. Boreli, et al. University of Western Sydney National.
Probabilistic Coverage in Wireless Sensor Networks Authors : Nadeem Ahmed, Salil S. Kanhere, Sanjay Jha Presenter : Hyeon, Seung-Il.
Spatial DBMS Spatial Database Management Systems.
DBSCAN Data Mining algorithm Dr Veljko Milutinović Milan Micić
Data Mining Cluster Analysis: Advanced Concepts and Algorithms Lecture Notes for Chapter 9 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
A New Voronoi-based Reconstruction Algorithm
Presented by Ho Wai Shing
1 Efficient and Effective Clustering Methods for Spatial Data Mining Raymond T. Ng, Jiawei Han Pavan Podila COSC 6341, Fall ‘04.
Data Mining Cluster Analysis: Advanced Concepts and Algorithms
Trajectory Data Mining
Towards Unifying Vector and Raster Data Models for Hybrid Spatial Regions Philip Dougherty.
CDS 301 Fall, 2008 Domain-Modeling Techniques Chap. 8 November 04, 2008 Jie Zhang Copyright ©
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Clustering COMP Research Seminar GNET 713 BCB Module Spring 2007 Wei Wang.
Presented by Niwan Wattanakitrungroj
Dr. Hongqin FAN Department of Building and Real Estate
Gyan Ranjan University of Minnesota, MN
Geographical Information Systems
Data Mining Cluster Analysis: Advanced Concepts and Algorithms
University of Houston, USA
Yongli Zhang and Christoph F. Eick University of Houston, USA
DATA MINING Introductory and Advanced Topics Part II - Clustering
MEgo2Vec: Embedding Matched Ego Networks for User Alignment Across Social Networks Jing Zhang+, Bo Chen+, Xianming Wang+, Fengmei Jin+, Hong Chen+, Cuiping.
Scale-Space Representation for Matching of 3D Models
Data Mining Classification: Alternative Techniques
Presentation transcript:

DB group seminar 2006/06/29The University of Hong Kong, Dept. of Computer Science Neighborhood based detection of anomalies in high dimensional spatio-temporal Sensor Datasets (SAC’04) Nabil R. Adam Vandana Pursnani Janeja Vijayalakshmi Atluri Presented by Leonidas Mak

DB group seminar Agenda Spatial data mining Problem & proposed solution Approach overview Implementation detail Discussion of result

DB group seminar Spatial data mining Deals with knowledge discovery from spatial data sets of: Spatial (point, location, etc.) Non-spatial (population, speed, etc.) Two properties of spatial objects make spatial data mining different from others Spatial dependency Spatial heterogeneity

DB group seminar Spatial data mining When considering spatial object: Spatial & non-spatial attributes Implicit and explicit spatial relationships Region of influence Underlying spatial process Influence the behavior of the object and its neighboring objects

DB group seminar Spatial data mining Consider the spatial process near the objects when performing spatial analysis To identify outliers & trends in the region of influence Spatial features in the vicinity of the objects Underlying spatial process Identify similarly behaving objects

DB group seminar Problem & proposed solution Spatial outlier detection Objects behave very differently from their neighborhood Graph based neighborhood [3] [11] [3][11] Does not capture the semantic relationship between the objects and the area of influence Some clustering techniques also Delaunay triangulation [5] [5] Voronoi diagram

DB group seminar Problem & proposed solution Refine the concept of “a neighborhood of an object” To characterize similarly behaving objects Spatial relationships Semantic relationships Identification of spatio-temporal outliers in high dimensions

DB group seminar Proposed approach to solution Take into account of both spatial and semantic relationships Features of these objects can be different Despite the close proximity of them Each object has an immediate neighborhood Micro Neighborhood (M i ) M i can be extended or merged with others Macro Neighborhood (MaN)

DB group seminar Some definitions Outlier (in terms of distance) [7] [7] An object o in a dataset T is a DB(p,D) outlier if at least a fraction p of the objects in T are at a greater distance D from o Voronoi diagrams (of a set of objects O) [10] [10] The subdivision of the plane into n polygons, with a point q in the polygon corresponding to object o i iff

DB group seminar Some definitions Jaccard Coefficient (JC) [10] [10] Measure the similarity of asymmetric binary variables To quantify the similarity match (1-1 match), indicating the similarity of features dc0 ba1 01 Object i Object j

DB group seminar Some definitions Silhouette Coefficient (SC) [10] [10] To identify the quality of clustering result in terms of structure and its overlapping on other clusters [6] [6] 0.7 < SC <= 1.0 Strong structure 0.5 < SC <= 0.7 Medium structure SC < 0.25 no structure To indicate the similarity of two sparial micro neighborhoods silhouette of data i Silhouette Coefficient of cluster X

DB group seminar Overview of the approach 1. Generation of Micro Neighborhood To generate Voronoi polygons Input: set of objects with spatial locations Output: Voronoi diagram 2. Identification of Spatial Relationships Input: Voronoi diagram, edge list Output: adjacency matrix indicating if one M i is a neighbor of any other M i s

DB group seminar Overview of the approach 3. Identification of semantic relationships Calculating JC and SC Input of JC part: a set of micro neighborhoods Characterized by feature vector Representing the spatial processes Input of SC part: a set of micro neighborhoods Characterized by a set of points Readings over a period of time

DB group seminar Overview of the approach 4. Generation of Macro Neighborhood Input: neighborhood (adjacency) matrix, JC, SC Output: Macro Neighborhood 5. Detecting outliers Based on the distance values of various points Use Distance based outlier detection [7]

DB group seminar Overview of the approach Generation of Micro Neighborhood Identification of Spatial Relationships Identification of Semantic Relationships Generation of Macro Neighborhood Outliers Detection Obj. set Voronoi Diagra m Edge list Feature vector Set of points Neighborhood matrix JC SC Macro neighborhoo d

DB group seminar Generation of Micro Neighborhood The definition of neighborhood is based on the concept of Voronoi diagrams Generate the Voronoi polygon around each spatial object A feature q lies in a Voronoi polygon is associated with the related object Region of influence is defined as the Voronoi polygon

DB group seminar Generation of Micro Neighborhood Micro Neighborhood (M i ) is defined as: Region of influence; dominance of one object over the other Spatial features have their own spatial process Sensor (object) River (spatial feature) Micro Neighborhood

DB group seminar Identification of Spatial Relationships Spatial relationships are binary relations between pairs of objects Object: point, line, polygons, etc. Relationship: topological, distance, etc. Topological relationship of adjacency Determined by the shared edge of two Voronoi polygon Edge list is generated by Triangle: 2D mesh generator [12] for the Delaunay triangulation

DB group seminar Identification of Spatial Relationships Edge list format Edge# Two micro neighborhoods are adjacent If there is an edge between two 2 spatial objects The adjacency information is stored in the neighborhood adjacency matrix

DB group seminar Identification of Semantic Relationships Micro Neighborhood can be characterized by Present/absent of spatial features Other spatial processes Results in feature vector of 0’s and 1’s [14] [14] Object itself may also have an associated set of readings (points in neighborhood) Make use of the features and also the data points in the neighborhood

DB group seminar Identification of Semantic Relationships JC is used to identify binary valued attributes in feature vector SC is used for non-binary valued attributes, such as readings of sensors To measure the overlap of the micro neighborhoods Based on the readings over a period of time Two micro neighborhoods are considered as semantic similar for Higher JC Lower SC

DB group seminar Generation of Macro Neighborhood Each M i can be consider as an implicit sub- cluster or grouping Macro Neighborhood can be defined in terms of Spatial relationship between M i Semantic relationship Spatial, non-spatial attributes Macro Neighborhood is defined as a graph: With outer edges E’ from M i Links, l = (m i,m i+1 ) holds iff spatial & semantic neighbor

DB group seminar Generation of Macro Neighborhood Spatial neighbor (m i,m i+1 ) refers to spatial relation between polygons Semantic neighbor refers to semantic relation based on JC & SC such that Merge the M i & M j to form MaN

DB group seminar Outlier detection Graph based spatial outlier detection [11] [11] It is important to identify the outliers as well as the neighborhood Since a given point can be the outlier of several clusters Spatio-Temporal Outlier is defined as: A point x i is a spatio-temporal outlier iff it differs sufficiently from other points in the Marco neighborhood

DB group seminar Outlier detection First identify Macro Neighborhood Utilize distance based outlier detection technique [7] [7] Consider proximity in terms of distance threshold as one of the determining factor Investigate whether the object is an outlier (spatial outlier) If more than a certain number of points are outliers for that object

DB group seminar Dataset Data sets Highway traffic monitoring [11] [11] Water monitoring [14] [14] Highway traffic monitoring Traffic reading from 60 stations in time slots of 5 minutes Non-spatial attributes: volume, occupancy Spatial attributes: latitude, longitude Feature matrix: traffic flow direction, clustering

DB group seminar Dataset Water monitoring 7 stations monitoring water quality of rivers Feature matrix consists of 21 features Used to show the characteristics in the M i Spatial attributes: latitude, longitude Temporal attributes: date, time of sampling Data points consists of >100 attributes

DB group seminar Results (Spatial) Spatial relationships are identified by applying program TRIANGLE [12] [12] Generate edges for nodes that are judged adjacent to each other Adjacency is expressed into a matrix High connectivity  collapse into one big neighborhood

DB group seminar Results (Spatial + JC) Incremental building of Macro Neighborhood JC = 0.5 MaN consists of polygons 2,4,6,7 JC = 0.2 MaN consists of polygons 2,3,4,6,7 Incremental merging on the basis of less restrictive threshold of JC

DB group seminar Results (Spatial + JC) Refinement in outliers detected Number of outliers detected varied as JC changes WaterMonitoring Data: Num Outliers vs. JC JC THRESHOLD NUM. OUTLIERS

DB group seminar Results (Spatial + JC) Systematic elimination of outliers Consistency in Outlier detection If one neighborhood has no outliers at low JC threshold, it is consistently at higher threshold value O1: Outliers detected at high threshold of JC O2: Outliers detected at low threshold of JC 2,4JC = 0.8 2,3,4,8JC = 0.5 Outliers (part of)

DB group seminar Results (Spatial + SC) Similar conclusion for adding SC SC decrease Neighborhood is more refined WaterMonitoring Data: Num Outliers vs. SC JC THRESHOLD NUM. OUTLIERS

DB group seminar Results (Spatial + JC + SC) Low JC & High SC  big neighborhood More outliers High JC & Low LC  refined neighborhood Reduced outliers

DB group seminar References: [1] F. Aurenhammer. Voronoi Diagrams: A Survey of a Fundamental Geometric Data Structure. ACM Computing Surveys, Vol 23(3), , 1991 [2] M. Ester, A. Frommelt, H.-P. Kriegel, and J. Sander. Algorithms for characterization and trend detection in spatial databases. In Proceedings of 4th Int. Conf. on Knowledge Discovery and Data Mining (KDD), [3] M. Ester, H. P. Kriegel, and J. Sander. Spatial Data Mining: A Database Approach. In Proceedings of the International Symposium on Large Spatial Databases, Berlin, Germany, July 1997, pp [4] M. Ester, H. -P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases. In Proceedings of 4th Int. Conf. on Knowledge Discovery and Data Mining (KDD), [5] I. Kang, T. Kim, and K. Li. A Spatial Data Mining Method by Delaunay Triangulation. In Proceedings of the 5th International Workshop on Advances in Geographic Information Systems (GIS-97), pages 35-39, 1997.

DB group seminar References: [6] L. Kaufman and P. J. Rousseeuw. Finding Groups in Data: an Introduction to Cluster Analysis. John Wiley & Sons, [7] E. M. Knorr and R. T. Ng. Algorithms for Mining Distance-Based Outliers in Large Datasets. In Proceedings of 24th Int. Conf. Very Large Data Bases, VLDB, 1998 [8] H. J. Miller and J. Han, Geographic Data Mining & Knowledge Discovery, Publisher: Taylor & Francis; 1st edition [9] Minnesota Highway traffic dataset: [10] A. Okabe, B. Boots, K. Sugihara, S. Chiu. Spatial Tessellations: Concepts and Applications of Voronoi Diagrams. pp John Wiley, 2000.

DB group seminar References: [11] S. Shekhar, C. Lu, and P. Zhang. Detecting Graph-Based Spatial Outlier: Algorithms and Applications(A Summary of Results). In Computer Science & Engineering Department, UMN, Technical Report , [12] J. R. Shewchuk, Triangle: Engineering a 2D Quality Mesh Generator and Delaunay Triangulator. First Workshop on Applied Computational Geometry (Philadelphia, Pennsylvania), pages , ACM, May 1996 [13] D. Unwin, Introductory Spatial analysis, Publisher: Routledge Kegan & Paul. January 1982 [14] USGS, National Stream Water Quality Network (NASQAN), Published Data: [15] Water Monitoring, the Meadowlands Environmental Research Institute, and the New Jersey Meadowlands Commision :