Spatial Data Mining: Spatial outlier detection Spatial outlier A data point that is extreme relative to it neighbors Given A spatial graph G={V,E} A neighbor.

Slides:

Advertisements

Similar presentations

Hotspot/cluster detection methods(1) Spatial Scan Statistics: Hypothesis testing – Input: data – Using continuous Poisson model Null hypothesis H0: points.

Advertisements

Data Mining Anomaly Detection Lecture Notes for Chapter 10 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to.

Data Mining Anomaly Detection Lecture Notes for Chapter 10 Introduction to Data Mining by Minqi Zhou © Tan,Steinbach, Kumar Introduction to Data Mining.

BIRCH: Is It Good for Databases? A review of BIRCH: An And Efficient Data Clustering Method for Very Large Databases by Tian Zhang, Raghu Ramakrishnan.

Presented by: GROUP 7 Gayathri Gandhamuneni & Yumeng Wang.

Thursday, September 12, 2013 Effect Size, Power, and Exam Review.

Jing Gao 1, Feng Liang 1, Wei Fan 2, Chi Wang 1, Yizhou Sun 1, Jiawei Han 1 University of Illinois, IBM TJ Watson Debapriya Basu.

Spatial Outlier Detection and implementation in Weka Implemented by: Shan Huang Jisu Oh CSCI8715 Class Project, April Presented by Jisu.

Lecture 4 Normal distribution? Sample data at :

Data Mining Anomaly Detection Lecture Notes for Chapter 10 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to.

Spatial Data Mining: Three Case Studies For additional details Shashi Shekhar, University of Minnesota Presented.

A Unified Approach to Spatial Outliers Detection Chang-Tien Lu Spatial Database Lab Department of Computer Science University of Minnesota

Improving the Accuracy of Continuous Aggregates & Mining Queries Under Load Shedding Yan-Nei Law* and Carlo Zaniolo Computer Science Dept. UCLA * Bioinformatics.

Shashi ShekharMining For Spatial Patterns1 Mining for Spatial Patterns Shashi Shekhar Department of Computer Science University of Minnesota

Anomaly Detection. Anomaly/Outlier Detection  What are anomalies/outliers? The set of data points that are considerably different than the remainder.

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining Anomaly Detection Lecture Notes for Chapter 10 Introduction to Data Mining by.

Shashi ShekharMining For Spatial Patterns1 Mining for Spatial Patterns Shashi Shekhar Department of Computer Science University of Minnesota

Advanced GIS Using ESRI ArcGIS 9.3 Arc ToolBox 5 (Spatial Statistics)

Ch 5 Practical Point Pattern Analysis Spatial Stats & Data Analysis by Magdaléna Dohnalová.

From Last week.

Analyzing Graphs Section 2.3. Important Characteristics of Data Center: a representative or average value that indicates where the middle of the data.

1 Pattern Recognition Pattern recognition is: 1. A research area in which patterns in data are found, recognized, discovered, …whatever. 2. A catchall.

Normal Distribution.

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining Anomaly Detection Lecture Notes for Chapter 10 Introduction to Data Mining by.

Data Mining Anomaly Detection Lecture Notes for Chapter 10 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to.

Data Mining Anomaly Detection © Tan,Steinbach, Kumar Introduction to Data Mining.

Data Mining Anomaly/Outlier Detection Lecture Notes for Chapter 10 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.

Lecture 7: Outlier Detection Introduction to Data Mining Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.

Chapter 4 Additional Derivative Topics

The Purpose of Statistics (Data Analysis)

Making Decisions about a Population Mean with Confidence Lecture 33 Sections 10.1 – 10.2 Fri, Mar 30, 2007.

Data Mining Anomaly Detection Lecture Notes for Chapter 10 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to.

Data Mining Anomaly/Outlier Detection Lecture Notes for Chapter 10 Introduction to Data Mining by Tan, Steinbach, Kumar.

© Copyright McGraw-Hill 2004

Discriminative Frequent Pattern Analysis for Effective Classification By Hong Cheng, Xifeng Yan, Jiawei Han, Chih- Wei Hsu Presented by Mary Biddle.

Ahmad Salam AlRefai.  Introduction  System Features  General Overview (general process)  Details of each component  Simulation Results  Considerations.

Hypothesis Testing Steps for the Rejection Region Method State H 1 and State H 0 State the Test Statistic and its sampling distribution (normal or t) Determine.

Review Confidence Intervals Sample Size. Estimator and Point Estimate An estimator is a “sample statistic” (such as the sample mean, or sample standard.

1 1 Slide © 2003 Thomson/South-Western. 2 2 Slide © 2003 Thomson/South-Western Chapter 3 Descriptive Statistics: Numerical Methods Part B n Measures of.

Why Is It There? Chapter 6. Review: Dueker’s (1979) Definition “a geographic information system is a special case of information systems where the database.

Review of Hypothesis Testing: –see Figures 7.3 & 7.4 on page 239 for an important issue in testing the hypothesis that  =20. There are two types of error.

Mining Statistically Significant Co-location and Segregation Patterns.

Chapter 2 HYPOTHESIS TESTING

More on Inference.

Inference for the Mean of a Population

Task 2. Average Nearest Neighborhood

Spatial Outlier Detection

How to describe a graph Otherwise called CUSS

Fundamentals of regression analysis

Statistical significance & the Normal Curve

Data Mining Anomaly Detection

Outlier Discovery/Anomaly Detection

Measures of Central Tendency

More on Inference.

NORMAL PROBABILITY DISTRIBUTIONS

Probability & Statistics Probability Theory Mathematical Probability Models Event Relationships Distributions of Random Variables Continuous Random.

Chapter 23 Comparing Means.

Data Mining Anomaly/Outlier Detection

Normal and Skewed distributions

Hypothesis Tests One Sample Means

Statistical Process Control

CSE572, CBS572: Data Mining by H. Liu

(4)² 16 3(5) – 2 = 13 3(4) – (1)² 12 – ● (3) – 2 9 – 2 = 7

Spatial Data Mining: Three Case Studies

CSE572: Data Mining by H. Liu

Data Mining Anomaly Detection

Significance Test for a Mean

7.4 Hypothesis Testing for Proportions

Data Mining Anomaly Detection

Chapter 3 Additional Derivative Topics

Presentation transcript:

Spatial Data Mining: Spatial outlier detection Spatial outlier A data point that is extreme relative to it neighbors Given A spatial graph G={V,E} A neighbor relationship (K neighbors) An attribute function f: V -> R An aggregation function f aggr : R k -> R Confidence level threshold  Find O = {v i | v i  V, v i is a spatial outlier} Objective Correctness: The attribute values of v i is extreme, compared with its neighbors Computational efficiency Constraints Attribute value is normally distributed Computation cost dominated by I/O op.

Spatial Data Mining: Spatial outlier detection Spatial Outlier Detection Test 1. Choice of Spatial Statistic S(x) = [f(x)–E y  N(x) (f(y))] Theorem: S(x) is normally distributed if f(x) is normally distributed 2. Test for Outlier Detection | (S(x) -  s ) /  s | >  Hypothesis I/O cost determined by clustering efficiency f(x)S(x) Spatial outlier and its neighbors

Spatial Data Mining: Spatial outlier detection Results 1. CCAM achieves higher clustering efficiency (CE) 2. CCAM has lower I/O cost 3. Higher CE leads to lower I/O cost 4. Page size improves CE for all methods Z-order CCAM I/O costCE value Cell-Tree