Spatio-Temporal Outlier Detection in Precipitation Data

Slides:



Advertisements
Similar presentations
Normal Conditions. The Walker Circulation El Niño Conditions.
Advertisements

Essentials of Oceanography
The Evolution of Spatial Outlier Detection Algorithms - An Analysis of Design CSci 8715 Spatial Databases Ryan Stello Kriti Mehra.
The Effects of El Niño-Southern Oscillation on Lightning Variability over the United States McArthur “Mack” Jones Jr. 1, Jeffrey M. Forbes 1, Ronald L.
Seasonal Climate Forecast (Forecast Method) (Revised: May 26, 2012) This product is published by the Oregon Department of Agriculture (ODA), in cooperation.
Contour Tracing Seo, Jong-hoon Media System Lab. Yonsei University.
Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.
CSC1016 Coursework Clarification Derek Mortimer March 2010.
Spatio-Temporal Databases
Past and future changes in temperature extremes in Australia: a global context Workshop on metrics and methodologies of estimation of extreme climate events,
Computer Science Spatio-Temporal Aggregation Using Sketches Yufei Tao, George Kollios, Jeffrey Considine, Feifei Li, Dimitris Papadias Department of Computer.
Spatio-Temporal Databases. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases …..
Saxitoxin concentrations in coastal Oregon shellfish: The influence of El Niño and the Pacific Decadal Oscillation. Jacqui Tweddle, Boston University (previously.
1 SINA: Scalable Incremental Processing of Continuous Queries in Spatio-temporal Databases Mohamed F. Mokbel, Xiaopeng Xiong, Walid G. Aref Presented by.
Temporal Databases. Outline Spatial Databases Indexing, Query processing Temporal Databases Spatio-temporal ….
Spatial Information Systems (SIS) COMP Spatial access methods: Indexing.
1 The Recycling Rate of Atmospheric Moisture Liming Li, Moustafa Chahine, Edward Olsen, Eric Fetzer, Luke Chen, Xun Jiang, and Yuk Yung NASA Sounding Science.
Pairwise Sequence Alignment Part 2. Outline Global alignments-continuation Local versus Global BLAST algorithms Evaluating significance of alignments.
A Procedure for Automated Quality Control and Homogenization of historical daily temperature and precipitation data (APACH). Part 1: Quality Control of.
Visualization of El Nino Data CS525D Project Zaixian Xie.
Indexing Spatio-Temporal Data Warehouses Dimitris Papadias, Yufei Tao, Panos Kalnis, Jun Zhang Department of Computer Science Hong Kong University of Science.
Dr. Hesam Izakian October 2014
SIO 210: ENSO conclusion Dec. 2, 2004 Interannual variability (end of this lecture + next) –Tropical Pacific: El Nino/Southern Oscillation –Southern Ocean.
Three cases: (1) La Nina event in 1989 (2) A strong El Nino in 1998 (3) A moderate El Nino in 1987 Three fields: (a) Surface temperatures and anomalies.
Integrating Multi-Media with Geographical Information in the BORG Architecture R. George Department of Computer Science Clark Atlanta University Atlanta,
School of Information Technologies The University of Sydney Australia Spatio-Temporal Analysis of the relationship between South American Precipitation.
December 2002 Section 2 Past Changes in Climate. Global surface temperatures are rising Relative to average temperature.
Demo. Overview Overall the project has two main goals: 1) Develop a method to use sensor data to determine behavior probability. 2) Use the behavior probability.
UNIVERSITY OF SOUTHERN CALIFORNIA 1 ELECTION: Energy-efficient and Low- latEncy sCheduling Technique for wIreless sensOr Networks S. Begum, S. Wang, B.
Gapped BLAST and PSI- BLAST: a new generation of protein database search programs By Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schäffer, Jinghui.
Paper Review R 馮培寧 Kirsten Feng.
Efficiently Computed Lexical Chains As an Intermediate Representation for Automatic Text Summarization H.G. Silber and K.F. McCoy University of Delaware.
The climate and climate variability of the wind power resource in the Great Lakes region of the United States Sharon Zhong 1 *, Xiuping Li 1, Xindi Bian.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Renata Gonçalves Tedeschi Alice Marlene Grimm Universidade Federal do Paraná, Curitiba, Paraná 1. OBJECTIVES 1)To asses the influence of ENSO on the frequency.
This document gives one example of how one might be able to “fix” a meteorological file, if one finds that there may be problems with the file. There are.
Meredith Taghon Physical Oceanography Fall 2015 Bigger Stronger Faster: Current El Niño
1 Validation for CRR (PGE05) NWC SAF PAR Workshop October 2005 Madrid, Spain A. Rodríguez.
Efficient OLAP Operations in Spatial Data Warehouses Dimitris Papadias, Panos Kalnis, Jun Zhang and Yufei Tao Department of Computer Science Hong Kong.
Using the National Multi-Model Ensemble (NMME) System Johnna Infanti Advisor: Ben Kirtman.
CSE554Contouring IISlide 1 CSE 554 Lecture 3: Contouring II Fall 2011.
CSE554Contouring IISlide 1 CSE 554 Lecture 5: Contouring (faster) Fall 2013.
Lecture 9: Air-Sea Interactions EarthsClimate_Web_Chapter.pdfEarthsClimate_Web_Chapter.pdf, p ; Ch. 16, p ; Ch. 17, p
What is GIS? “A powerful set of tools for collecting, storing, retrieving, transforming and displaying spatial data”
To clarify, coordinate and synthesize research devoted to achieve a better understanding of ENSO diversity, including: surface and sub-surface characteristics,
Indicators for Climate Change over Mauritius Mr. P Booneeady Pr. SDDV Rughooputh.
El Niño-Southern Oscillation (ENSO) and the Huanghe Using the ENSO index, and river water and sediment discharge to understand changing climate, and human.
A genetic algorithm for irregularly shaped spatial clusters Luiz Duczmal André L. F. Cançado Lupércio F. Bessegato 2005 Syndromic Surveillance Conference.
Trends in floods in small catchments – instantaneous vs. daily peaks
CSE 554 Lecture 5: Contouring (faster)
Data Mining Soongsil University
Data Structures: Disjoint Sets, Segment Trees, Fenwick Trees
G10 Anuj Karpatne Vijay Borra
Lecture No.43 Data Structures Dr. Sohail Aslam.
University of Houston, USA
Data Structures: Segment Trees, Fenwick Trees
Spatial Online Sampling and Aggregation
Undergrad Research Study by Katarena Jarrell
EL NINO Figure (a) Average sea surface temperature departures from normal as measured by satellite. During El Niño conditions upwelling is greatly.
Yongli Zhang and Christoph F. Eick University of Houston, USA
On Discovery of Gathering Patterns from Trajectories
An AS Lesson Using the LDS to teach content on Data Collection and Processing.
Josie Baulch, Justin Sheffield, Jadu Dash
El Niño and the Southern Oscillation (“ENSO”)
Dennis P. Lettenmaier Andrew W. Wood, and Kostas Andreadis
Annabelle C. Singer, Loren M. Frank  Neuron 
Related Work in Camera Network Tracking
Data Mining CSCI 307, Spring 2019 Lecture 24
Donghui Zhang, Tian Xia Northeastern University
Efficient Aggregation over Objects with Extent
Presentation transcript:

Spatio-Temporal Outlier Detection in Precipitation Data SensorKDD 2008 Sunday, 24th August, 2008 Spatio-Temporal Outlier Detection in Precipitation Data Elizabeth Wu, Wei Liu, Sanjay Chawla The University of Sydney, Australia

Outline What is a spatio-temporal outlier? Motivation Previous Work Contributions Our Approach Future Work

Aims Find moving spatial outlier paths in South American precipitation data. Show how the paths can be compared to weather phenomenon, such as the El Niño Southern Oscillation (ENSO).

What is a Spatio-Temporal Outlier? “A spatio-temporal object whose thematic attribute values are significantly different from those of other spatially and temporally referenced objects in its spatial and/or temporal neighborhoods.” – Cheng and Li (2006) 5 4 3 2 1 5 4 3 2 1 5 4 3 2 1 5 4 3 2 1 5 4 3 2 1 Insert a map of your country. 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 t=1 t=2 t=3 t=4 t=5

What is a spatio-temporal object? “A time-evolving spatial object whose evolution or ‘history’ is represented by a set of instances (o_id, si, ti) where the spacestamp si is the location of object o_id at timestamp ti.” - Theodoris et. al. (1999) Simply put, A point becomes a line A 2D region becomes a 3D region time time y co-ordinate y co-ordinate x co-ordinate x co-ordinate

Figure: Stations used to produce gridded precipitation fields Data Figure: Stations used to produce gridded precipitation fields South American precipitation data (NOAA) 10 years (1995-2004) 2.5 x 2.5° grids 31 latitude x 23 longitude divisions 713 grids total 2,609,580 possible data values Missing data – spatially and temporally El Niño Southern Oscillation Data (NOAA) Southern Oscillation Index (SOI) Measures the difference in Sea Surface Temperature (SST) between Tahiti and Darwin The lower the score, the more intense an El Niño event

Motivation Why would we be interested in moving outlier regions in precipitation data? Knowing the location, time and duration of past extreme precipitation events helps to understand and prepare for future events. We can analyse how different phenomenon interact. E.g. ENSO and precipitation.

Previous Work Spatial Scan Statistics Used to find spatial outliers Cluster detection using the spatial scan statistic in spatio-temporal point data (Iyengar, 2004) Exact-Grid and Approx-Grid (Agarwal et. al., 2006) Uses the Kulldorff Spatial Scan Statistic Finds the highest discrepancy region (by location and size) in a spatial grid dataset. Spatio-temporal outlier detection (Birant and Kut, 2006) Limited to finding outliers over a single time period. time y co-ordinate x co-ordinate

Contributions Extended Exact-Grid and Approx-Grid to find the top-k outliers in a single time period. Developed the Outstretch & RecurseNodes algorithm to find outliers that repeatedly appear over several time periods. Apply to South American Precipitation data. Analyse the behaviour of the outliers against the El Niño Southern Oscilation (ENSO).

Our Approach Find the top-k outliers in a spatial grid for each time period Extend Exact-Grid and Approx-Grid algorithms Use Oustretch to find spatial outliers which extend over several time periods. Use RecurseNodes to extract the sequences from the Outstretch tree.

Finding the top-k outliers Find every possible region size and shape in the grid. Get each region’s discrepancy value to determine which is a more significant outlier. Our extension keeps track of the top-k regions rather than just the top-1. left right top bottom

Kulldorff Scan Statistic Uses two values: Measurement – Number of incidences of an event E.g. In how many cells is precipitation extreme? M – for the whole dataset m(p) - for the cell p mR = ΣpєR m(p) / M Baseline – Total population at risk I.e. How many cells have we recorded values for? B – for the whole dataset b(p) - for the cell p bR = ΣpєR b(p) / B We find the discrepancy for local region R by subsitution into: When mR > bR d(mR, bR) = mRlog(mR/bR) + (1-mR)log((1-mR)/(1-bR)) Otherwise d(mR, bR) = 0

Kulldorff Scan Statistic: Example M = 6 = total # cells with “1” in entire grid ΣpєR m(p) = 4 = total # cells with “1” in R mR = ΣpєR m(p)/M = 0.67 B = 16 = total # cells in entire grid ΣpєR b(p) = Sum of b’s in region = 4 = total # cells in R bR = ΣpєR b(p)/B = 0.25 Result: d(mR, bR) = 0.3836 1 4 3 2 1 1 2 3 4 1 4 3 2 1 1 2 3 4

Finding the top-k outliers: Exact-Grid left right top bottom

Finding the top-k outliers: Exact-Grid left right top bottom

Finding the top-k outliers: Exact-Grid left right top bottom

Finding the top-k outliers: Exact-Grid left right top bottom

Finding the top-k outliers: Exact-Grid left right top bottom

Finding the top-k outliers: Exact-Grid left right top bottom

Finding the top-k outliers: Exact-Grid left right top bottom

Finding the top-k outliers: Exact-Grid left right top bottom

Finding the top-k outliers: Exact-Grid left right top bottom

Finding the top-k outliers: Exact-Grid left right top bottom

Finding the top-k outliers: Exact-Grid left right top bottom

Finding the top-k outliers: Exact-Grid left right top bottom

Finding the top-k outliers: Exact-Grid left right top bottom

Finding the top-k outliers: Exact-Grid left right top bottom

Finding the top-k outliers: Exact-Grid left right top bottom

Finding the top-k outliers: Exact-Grid left right top bottom

Finding the top-k outliers: Exact-Grid left right top bottom

Finding the top-k outliers: Exact-Grid left right top bottom

Finding the top-k outliers: Exact-Grid left right top bottom

Finding the top-k outliers: Exact-Grid left right top bottom

Finding the top-k outliers: Exact-Grid left right top bottom

Finding the top-k outliers: Exact-Grid left right top bottom

Finding the top-k outliers: Exact-Grid left right top bottom

Finding the top-k outliers: Exact-Grid left right top bottom

Finding the top-k outliers: Exact-Grid left right top bottom

Finding the top-k outliers: Exact-Grid left right top bottom

Finding the top-k outliers: Exact-Grid left right top bottom

Finding the top-k outliers: Exact-Grid left right top Keeps moving top and bottom lines until all regions have been examined between the left and right lines… bottom

Finding the top-k outliers: Exact-Grid left right Keeps moving top and bottom lines until all regions have been examined… top bottom

Finding the top-k outliers: Exact-Grid left right top bottom

Finding the top-k outliers: Exact-Grid left right top bottom

Finding the top-k outliers: Exact-Grid left right top bottom

Finding the top-k outliers: Exact-Grid left right top bottom

Finding the top-k outliers: Exact-Grid left right top bottom

Finding the top-k outliers: Exact-Grid left right top bottom

Finding the top-k outliers: Exact-Grid left right top bottom

Finding the top-k outliers: Exact-Grid left right top bottom

Finding the top-k outliers: Exact-Grid left right top bottom

Finding the top-k outliers: Exact-Grid left right top bottom

Finding the top-k outliers: Exact-Grid left right Same again… Top and bottom lines define all possible areas between the left and right lines… top bottom

Finding the top-k outliers: Exact-Grid left right top bottom

Finding the top-k outliers: Exact-Grid left right top bottom

Finding the top-k outliers: Exact-Grid left right top bottom

Finding the top-k outliers: Exact-Grid left right top bottom

Finding the top-k outliers: Exact-Grid left right top bottom

Finding the top-k outliers: Exact-Grid Continue until all regions have been examined… left right top bottom

Finding the top-k outliers: Approx-Grid Reduces the time complexity of the algorithm by using only two sweep lines and finding the interval that maximises the discrepancy function (See Agarwal et al. (2006) paper). m(I,j) stores the sum of the m(p)’s for each column top For each move of a sweep line, run the Linear1D algorithm to find the interval that maximises the discrepancy function bottom

Finding the top-k outliers: Considerations Overlapping Regions

Finding the top-k outliers: Considerations Overlapping Regions – Overlap types

Finding the top-k outliers: Considerations Chain effect One option: Union Solution d=0.45 d=0.54 d=0.51

Finding the top-k outliers: Considerations Chosen Option: Allow a percentage of overlap If this overlap is less than allowable_overlap % then, keep both regions in the top-k list. d=0.45 d=0.51

Outstretch Outstretch – find the paths of the outliers over time. t=1 5 4 3 2 1 5 4 3 2 1 5 4 3 2 1 5 4 3 2 1 5 4 3 2 1 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 t=1 t=2 t=3 t=4 t=5

This region (dark green) has been stretched by r=2 grid cells… Outstretch Use Outstretch to find spatial outliers which extend over several time periods. Check the same region (slightly stretched to cover more area) in the next time period, to see if another outlier lies in the region. If it is, then it is considered to be part of the spatio-temporal outlier, which is now extended over an additional time period. Store in a tree data structure. This region (dark green) has been stretched by r=2 grid cells… In the next time period, we will check if any outliers fall in that area. r

Outstretch Store outliers found over subsequent time periods in a tree data structure. Node Num Children Children 1,1 1 {2,2} 1,2 3 {2,2}, {2,3}, {2,4} 1,3 {2,1} 1,4 {2,4} 2,1 {3,2} 2,2 {3,1}, {3,4} 2,3 2 {3,3} 2,4 - 3,1 3,2 3,3 3,4 1,1 1,2 1,3 1,4 2,1 2,4 2,2 2,3 3,1 3,2 3,3 3,4

Outstretch Stretch the top-k outliers from t=1 by r (their spatial neighbourhood). 1,1 1,2 1,3 1,4 1 2 4 3

Outstretch From the top-k in t=2, find those which fall inside the stretched region from the previous period, t=1. 1,1 1,2 1,3 1,4 1 2 2 3 2,1 2,4 2,2 2,3 4 4 1 3

Outstretch Stretch the new outliers from t=2 and find the outliers from t=3, that fall in the newly stretched regions. 1,1 1,2 1,3 1,4 1 4 2 2 1 3 3 2,1 2,4 2,2 2,3 4 4 2 1 3 3,1 3,2 3,3 3,4

RecurseNodes Now that we’ve stored all the sequences in the tree, how do we get them out? Use RecurseNodes to extract the sequences from the Outstretch tree. Node Num Children Children 1,1 1 {2,2} 1,2 3 {2,2}, {2,3}, {2,4} 1,3 {2,1} 1,4 {2,4} 2,1 {3,2} 2,2 {3,1}, {3,4} 2,3 2 {3,3} 2,4 - 3,1 3,2 3,3 3,4

RecurseNodes Adds the full sequence to the sequence_list. Does not add subsequences. Looks at every item in the list. If the item has children, append it to the current sequence. The grandchildren will then be examined, if any. If it doesn’t, sequence is complete, so we move onto the next child. Keep track of which children have already been seen.

RecurseNodes Start at {1,1} We notice it has a child {2,2} Check {2,2} We notice {2,2} has two children {3,1} and {3,4}. Check {3,1} first. {3,1} has no children. Stop and store sequence: [ {1,1}, {2,2}, {3,1} ] Now check {3,4}. {3,4} has no children. Stop and store sequence: [ {1,1}, {2,2}, {3,4} ] And so on… Node Num Children Children 1,1 1 {2,2} 1,2 3 {2,2}, {2,3}, {2,4} 1,3 {2,1} 1,4 {2,4} 2,1 {3,2} 2,2 {3,1}, {3,4} 2,3 2 {3,3} 2,4 - 3,1 3,2 3,3 3,4

Results: Exact vs. Approx-Grid Top-k Length and number of outliers found Exact-Grid Top-k: finds longer sequences than Approx-Grid Top-k Approx-Grid Top-k Is faster than Exact- Grid Top-k Outlier Discovery – Time Taken Exact-Grid Top-k O(n4k) 229s Approx-Grid Top-k O(n3k) 35s

Results: Mean discrepancy of Exact-Grid Top-k sequences and the mean SOI Notice that some of the discrepancies at the centre time period are higher during the more intense El Niño event This is showing that there are more extreme extremes during an El Niño event.

Results: Mean discrepancy of Approx-Grid Top-k sequences and the mean SOI We also find extreme extremes in the Approx-Grid Top-k sequences

Future Work Evaluate against Other metrics (besides SOI), such as Sea Surface Temperature (SST) Point data Other data e.g. other precipitation data.

Conclusion Our contributions: Results showed: Top-k extension to Exact and Approx-Grid algorithms Outlier sequence discovery over time Evaluate using precipitation data Compared results to the El Niño Southern Oscillation Index (SOI) Results showed: More extreme extreme values during El Niño periods Able to find these with both Exact and Approx-Grid algorithms

Questions Please ask