Spatial Data Mining: Spatial outlier detection Spatial outlier A data point that is extreme relative to it neighbors Given A spatial graph G={V,E} A neighbor.

Slides:



Advertisements
Similar presentations
Hotspot/cluster detection methods(1) Spatial Scan Statistics: Hypothesis testing – Input: data – Using continuous Poisson model Null hypothesis H0: points.
Advertisements

Data Mining Anomaly Detection Lecture Notes for Chapter 10 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to.
Data Mining Anomaly Detection Lecture Notes for Chapter 10 Introduction to Data Mining by Minqi Zhou © Tan,Steinbach, Kumar Introduction to Data Mining.
BIRCH: Is It Good for Databases? A review of BIRCH: An And Efficient Data Clustering Method for Very Large Databases by Tian Zhang, Raghu Ramakrishnan.
Presented by: GROUP 7 Gayathri Gandhamuneni & Yumeng Wang.
Thursday, September 12, 2013 Effect Size, Power, and Exam Review.
Jing Gao 1, Feng Liang 1, Wei Fan 2, Chi Wang 1, Yizhou Sun 1, Jiawei Han 1 University of Illinois, IBM TJ Watson Debapriya Basu.
Spatial Outlier Detection and implementation in Weka Implemented by: Shan Huang Jisu Oh CSCI8715 Class Project, April Presented by Jisu.
Lecture 4 Normal distribution? Sample data at :
Data Mining Anomaly Detection Lecture Notes for Chapter 10 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to.
Spatial Data Mining: Three Case Studies For additional details Shashi Shekhar, University of Minnesota Presented.
A Unified Approach to Spatial Outliers Detection Chang-Tien Lu Spatial Database Lab Department of Computer Science University of Minnesota
Improving the Accuracy of Continuous Aggregates & Mining Queries Under Load Shedding Yan-Nei Law* and Carlo Zaniolo Computer Science Dept. UCLA * Bioinformatics.
Shashi ShekharMining For Spatial Patterns1 Mining for Spatial Patterns Shashi Shekhar Department of Computer Science University of Minnesota
Anomaly Detection. Anomaly/Outlier Detection  What are anomalies/outliers? The set of data points that are considerably different than the remainder.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining Anomaly Detection Lecture Notes for Chapter 10 Introduction to Data Mining by.
Shashi ShekharMining For Spatial Patterns1 Mining for Spatial Patterns Shashi Shekhar Department of Computer Science University of Minnesota
Advanced GIS Using ESRI ArcGIS 9.3 Arc ToolBox 5 (Spatial Statistics)
Ch 5 Practical Point Pattern Analysis Spatial Stats & Data Analysis by Magdaléna Dohnalová.
From Last week.
Analyzing Graphs Section 2.3. Important Characteristics of Data Center: a representative or average value that indicates where the middle of the data.
1 Pattern Recognition Pattern recognition is: 1. A research area in which patterns in data are found, recognized, discovered, …whatever. 2. A catchall.
Normal Distribution.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining Anomaly Detection Lecture Notes for Chapter 10 Introduction to Data Mining by.
Data Mining Anomaly Detection Lecture Notes for Chapter 10 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to.
Data Mining Anomaly Detection © Tan,Steinbach, Kumar Introduction to Data Mining.
Data Mining Anomaly/Outlier Detection Lecture Notes for Chapter 10 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Lecture 7: Outlier Detection Introduction to Data Mining Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.
Chapter 4 Additional Derivative Topics
The Purpose of Statistics (Data Analysis)
Making Decisions about a Population Mean with Confidence Lecture 33 Sections 10.1 – 10.2 Fri, Mar 30, 2007.
Data Mining Anomaly Detection Lecture Notes for Chapter 10 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to.
Data Mining Anomaly/Outlier Detection Lecture Notes for Chapter 10 Introduction to Data Mining by Tan, Steinbach, Kumar.
© Copyright McGraw-Hill 2004
Discriminative Frequent Pattern Analysis for Effective Classification By Hong Cheng, Xifeng Yan, Jiawei Han, Chih- Wei Hsu Presented by Mary Biddle.
Ahmad Salam AlRefai.  Introduction  System Features  General Overview (general process)  Details of each component  Simulation Results  Considerations.
Hypothesis Testing Steps for the Rejection Region Method State H 1 and State H 0 State the Test Statistic and its sampling distribution (normal or t) Determine.
Review Confidence Intervals Sample Size. Estimator and Point Estimate An estimator is a “sample statistic” (such as the sample mean, or sample standard.
1 1 Slide © 2003 Thomson/South-Western. 2 2 Slide © 2003 Thomson/South-Western Chapter 3 Descriptive Statistics: Numerical Methods Part B n Measures of.
Why Is It There? Chapter 6. Review: Dueker’s (1979) Definition “a geographic information system is a special case of information systems where the database.
Review of Hypothesis Testing: –see Figures 7.3 & 7.4 on page 239 for an important issue in testing the hypothesis that  =20. There are two types of error.
Mining Statistically Significant Co-location and Segregation Patterns.
Chapter 2 HYPOTHESIS TESTING
More on Inference.
Inference for the Mean of a Population
Task 2. Average Nearest Neighborhood
Spatial Outlier Detection
How to describe a graph Otherwise called CUSS
Fundamentals of regression analysis
Statistical significance & the Normal Curve
Data Mining Anomaly Detection
Outlier Discovery/Anomaly Detection
Measures of Central Tendency
More on Inference.
NORMAL PROBABILITY DISTRIBUTIONS
Probability & Statistics Probability Theory Mathematical Probability Models Event Relationships Distributions of Random Variables Continuous Random.
Chapter 23 Comparing Means.
Data Mining Anomaly/Outlier Detection
Normal and Skewed distributions
Hypothesis Tests One Sample Means
Statistical Process Control
CSE572, CBS572: Data Mining by H. Liu
(4)² 16 3(5) – 2 = 13 3(4) – (1)² 12 – ● (3) – 2 9 – 2 = 7
Spatial Data Mining: Three Case Studies
CSE572: Data Mining by H. Liu
Data Mining Anomaly Detection
Significance Test for a Mean
7.4 Hypothesis Testing for Proportions
Data Mining Anomaly Detection
Chapter 3 Additional Derivative Topics
Presentation transcript:

Spatial Data Mining: Spatial outlier detection Spatial outlier A data point that is extreme relative to it neighbors Given A spatial graph G={V,E} A neighbor relationship (K neighbors) An attribute function f: V -> R An aggregation function f aggr : R k -> R Confidence level threshold  Find O = {v i | v i  V, v i is a spatial outlier} Objective Correctness: The attribute values of v i is extreme, compared with its neighbors Computational efficiency Constraints Attribute value is normally distributed Computation cost dominated by I/O op.

Spatial Data Mining: Spatial outlier detection Spatial Outlier Detection Test 1. Choice of Spatial Statistic S(x) = [f(x)–E y  N(x) (f(y))] Theorem: S(x) is normally distributed if f(x) is normally distributed 2. Test for Outlier Detection | (S(x) -  s ) /  s | >  Hypothesis I/O cost determined by clustering efficiency f(x)S(x) Spatial outlier and its neighbors

Spatial Data Mining: Spatial outlier detection Results 1. CCAM achieves higher clustering efficiency (CE) 2. CCAM has lower I/O cost 3. Higher CE leads to lower I/O cost 4. Page size improves CE for all methods Z-order CCAM I/O costCE value Cell-Tree