Download presentation
Presentation is loading. Please wait.
Published byBenjamin McCarthy Modified over 9 years ago
1
Anomaly Detection for Scientific Data Mark Schwabacher NASA ARC, Code TI (formerly IC, TC) ROSES Code S & T Workshop February 17, 2005
2
What is Anomaly Detection? Seek to find parts of the data (“anomalies”) that are different from the rest of the data “Supervised” approaches use examples of anomalies; “unsupervised” approaches do not.
3
How can Anomaly Detection be Applied to Scientific Data? Examples: –Data from Earth-observing satellites –Data from telescopes Direct scientists’ attention to anomalies – could lead to scientific discoveries Detect errors, so they can be corrected
4
Example Earth Science Application: Vegetation Data Joint work with Ranga Myneni of Boston University Used Leaf Area Index (LAI) & Fraction Absorbed of Photosynthetically Available Radiation (FPAR) from Moderate Resolution Imaging Spectroradiometer (MODIS) instrument aboard the Terra and Aqua satellites
5
Results Used MODIS data from one time point at 4 km resolution (7.7 million pixels within Earth’s land area) Used 4 variables: LAI, FPAR, QA, and latitude Used an unsupervised, distance-based anomaly detection algorithm The #1 outlier was in northern Russia and the #2 outlier was in southern New Zealand Both points had unusually high LAI and FPAR values for their latitudes Investigation revealed a bug in the software that produced the LAI and FPAR products Error was corrected, and new versions of the data were made available to the scientific community.
6
Algorithm Used: Orca (Distance-Based Outliers) The main idea is to find points in low density regions of the feature space x d V is the total volume within radius d N is the total number of examples k is the number of examples in sphere Joint work with Stephen Bay of ISLE
7
Orca Algorithm Based on nested loops –For each example, find it’s nearest neighbors with a sequential scan Modified with a pruning rule –While performing the sequential scan, Keep track of closest neighbors found so far prune examples once the neighbors found so far indicate that the example cannot be a top outlier Worst case O(N 2 ) distance computations In practice, runs in nearly linear time Can handle millions of data points
8
Conclusions Anomaly detection algorithms can find previously- unknown anomalies in large scientific data sets Could lead to scientific discoveries or correction of errors Different algorithms find qualitatively different anomalies, so it is worth running multiple algorithms I presented one algorithm (Orca) that runs in nearly linear time so it can be applied to very large data sets
9
Pruning Outliers based on distance to the 3rd nearest neighbor (k=3) x d sequential scan d is distance to 3 rd nearest neighbor for the weakest top outlier
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.