Anomaly Detection in Problematic GPS Time Series Data and Modeling Dafna Avraham, Yehuda Bock Institute of Geophysics and Planetary Physics, Scripps Institution.

Slides:



Advertisements
Similar presentations
Descriptive Measures MARE 250 Dr. Jason Turner.
Advertisements

Design Rule Generation for Interconnect Matching Andrew B. Kahng and Rasit Onur Topaloglu {abk | rtopalog University of California, San Diego.
Earth, Atmospheric and Planetary Sciences Massachusetts Institute of Technology 77 Massachusetts Avenue | A | Cambridge MA V F.
Correlation and Linear Regression
STA305 week 31 Assessing Model Adequacy A number of assumptions were made about the model, and these need to be verified in order to use the model for.
Chapter 8 Linear Regression © 2010 Pearson Education 1.
Objectives (BPS chapter 24)
Spatial Autocorrelation Basics NR 245 Austin Troy University of Vermont.
“Real-time” Transient Detection Algorithms Dr. Kang Hyeun Ji, Thomas Herring MIT.
Financial Networks with Static and dynamic thresholds Tian Qiu Nanchang Hangkong University.
Multi-Scale Analysis for Network Traffic Prediction and Anomaly Detection Ling Huang Joint work with Anthony Joseph and Nina Taft January, 2005.
Scripps Orbit and Permanent Array Center Report to SCIGN Coordinating Board Yehuda Bock Scripps Institution of Oceanography La Jolla Contributions by Matthijs.
Measures of Central Tendency
Regression Chapter 10 Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania.
Homework Questions. Quiz! Shhh…. Once you are finished you can work on the warm- up (grab a handout)!
X-12 ARIMA Eurostat, Luxembourg Seasonal Adjustment.
Lesson 4 Compare datas.
Business Statistics - QBM117 Statistical inference for regression.
SOPAC's Instantaneous Global Plate Motion Model: Yehuda Bock, Linette Prawirodirdjo, Peng Fang, Paul Jamason, Shimon Wdowinski (TAU, UMiami) Scripps Orbit.
Descriptive Methods in Regression and Correlation
1 Statistical Analysis - Graphical Techniques Dr. Jerrell T. Stracener, SAE Fellow Leadership in Engineering EMIS 7370/5370 STAT 5340 : PROBABILITY AND.
Computational Methods of Scientific Programming Lecturers Thomas A Herring, Room A, Chris Hill, Room ,
10 IMSC, August 2007, Beijing Page 1 An assessment of global, regional and local record-breaking statistics in annual mean temperature Eduardo Zorita.
Review Measures of central tendency
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
Measures of Position. ● The standard deviation is a measure of dispersion that uses the same dimensions as the data (remember the empirical rule) ● The.
Statistics: Mean of Absolute Deviation
Progress Toward a New Weather Generator Eric Schmidt, Colorado State University - Pueblo Dr. James O’Brien, Florida State University Anthony Arguez, Florida.
Objective: Understanding and using linear regression Answer the following questions: (c) If one house is larger in size than another, do you think it affects.
Modern Navigation Thomas Herring MW 11:00-12:30 Room
Dealing with earthquakes and other non-linear motions T. A. Herring R. W. King M. A. Floyd Massachusetts Institute of Technology GPS Data Processing and.
Motivation Quantify the impact of interannual SST variability on the mean and the spread of Probability Density Function (PDF) of seasonal atmospheric.
Geo479/579: Geostatistics Ch4. Spatial Description.
FROST DAYS AND TROPICAL NIGHTS IN THE IBERIAN PENINSULA, F.S. Rodrigo, S. Fernández-Montes Department of Applied Physics, University of Almería.
STATISTICAL METHODS AND DATA MANAGEMENT TOOLS FOR OUTLIER DETECTION IN TRI DATA Dr. Nagaraj K. Neerchal and Justin Newcomer Department of Mathematics and.
Agresti/Franklin Statistics, 1 of 88 Chapter 11 Analyzing Association Between Quantitative Variables: Regression Analysis Learn…. To use regression analysis.
Application of a North America reference frame to the Pacific Northwest Geodetic Array (PANGA) M M Miller, V M Santillan, Geodesy Laboratory, Central Washington.
Earth, Atmospheric and Planetary Sciences Massachusetts Institute of Technology 77 Massachusetts Avenue | Cambridge MA V F
Analyzing Expression Data: Clustering and Stats Chapter 16.
1 Watermarking Scheme Capable of Resisting Sensitivity Attack IEEE signal processing letters, vol. 14, no. 2, February. 2007, pp Xinpeng Zhang.
Past and Projected Changes in Continental-Scale Agro-Climate Indices Adam Terando NC Cooperative Research Unit North Carolina State University 2009 NPN.
Chapter 7 Scatterplots, Association, and Correlation.
Slide 1 Regression Assumptions and Diagnostic Statistics The purpose of this document is to demonstrate the impact of violations of regression assumptions.
Variability Introduction to Statistics Chapter 4 Jan 22, 2009 Class #4.
Locations. Soil Temperature Dataset Observations Data is – Correlated in time and space – Evolving over time (seasons) – Gappy (Due to failures) – Faulty.
Correlation  We can often see the strength of the relationship between two quantitative variables in a scatterplot, but be careful. The two figures here.
Monitoring Global Droughts from Space Zhong Liu 1,4, W.L. Teng 2,4, S. Kempler 4, H. Rui 3,4, G. Leptoukh 4, and E. Ocampo 3,4 1 George Mason University,
12/12/01Fall AGU Vertical Reference Frames for Sea Level Monitoring Thomas Herring Department of Earth, Atmosphere and Planetary Sciences
1 Statistical Analysis - Graphical Techniques Dr. Jerrell T. Stracener, SAE Fellow Leadership in Engineering EMIS 7370/5370 STAT 5340 : PROBABILITY AND.
Jason-1 POD reprocessing at CNES Current status and further developments L. Cerri, S. Houry, P. Perrachon, F. Mercier. J.P. Berthias with entries from.
Lecture 4 Chapter 3. Bivariate Associations. Objectives (PSLS Chapter 3) Relationships: Scatterplots and correlation  Bivariate data  Scatterplots (2.
Techniques for Decision-Making: Data Visualization Sam Affolter.
Copyright © 2013, 2009, 2005 Pearson Education, Inc. 1 3 Polynomial and Rational Functions Copyright © 2013, 2009, 2005 Pearson Education, Inc.
2013 IEEE 14th International Conference on Mobile Data Management Authors: 1. Jiansu Pu 2. Siyuan Liu 3. Ye Ding 4. Huamin Qu 5. Lionel Ni By: Farah Kamw.
(Unit 6) Formulas and Definitions:. Association. A connection between data values.
Simple Linear Regression Relationships Between Quantitative Variables.
Interminimum Changes in Global Total Electron Content and Neutral Mass Density John Emmert, Sarah McDonald Space Science Division, Naval Research Lab Anthony.
Modeling Problems & Signals
EXPLORATORY DATA ANALYSIS and DESCRIPTIVE STATISTICS
Correlation, Bivariate Regression, and Multiple Regression
Hector Mine & Landers Earthquakes
Instrumental Surface Temperature Record
Averages and Variation
Instrumental Surface Temperature Record
Advanced Placement Statistics Ch 1.2: Describing Distributions
Hippocampal “Time Cells”: Time versus Path Integration
Statistics Vocabulary Continued
Higher National Certificate in Engineering
Statistics Vocabulary Continued
Multineuronal Firing Patterns in the Signal from Eye to Brain
Presentation transcript:

Anomaly Detection in Problematic GPS Time Series Data and Modeling Dafna Avraham, Yehuda Bock Institute of Geophysics and Planetary Physics, Scripps Institution of Oceanography, University of California, San Diego, La Jolla, CA Introduction Geophysical Anomalies and Conclusions Anomalous event detection in Global Positioning System (GPS) time series is an important matter in geodetic research. The Scripps Orbit and Permanent Array Center (SOPAC) generates continuous and daily time series in three dimensions for over 1400 global GPS stations that are analyzed using a computerized modeling program, which is limited to fitting slopes (velocities), offsets, periodic (annual and semiannual) terms, and postseismic decays. Currently, anomalous events are not adequately recognized or considered. We have developed anomaly detection algorithms that are capable of detecting signals, outliers, trends in the data, and modeling problems. The algorithms contain modified versions of noise analysis, correlation statistics, and threshold utility. They run on the complete set of global GPS time series, successfully uncovering a majority of the previously undetected anomalies. We spatially cluster the types of anomalies in order to reveal the geophysical factors that contribute to the occurrence of the incongruities. We are developing a new interactive environment that will allow users to analyze on-the-fly temporal and spatial subsets of GPS time series in various ways, and to detect anomalous events using these newly developed methods. We are incorporating this into the GPS Explorer data portal, a joint project of SOPAC and JPL to provide user-friendly GPS data products and on-line modeling applications. ( Anomaly Detection Algorithms for GPS Time Series References Anomalies in GPS Time Series Modeling Problems: These are seen when the model does not represent the data well. Often times, this happens either because the model is lacking an important model term(s), or because the data has gaps and jumps that mislead the model. Signals: Many signals such as postseismic decays, anthropogenic effects, and volcanic signals are recognized as data that deviate away from the model in particular patterns. Outliers: Outliers are caused by many different sources. If the outliers are extreme, they can distort the model, and it is therefore very important to detect and remove them from the data. Trend: Due to geophysical forces, GPS time series inherently contain a linear velocity (trend). Thus, the series are detrended before further analysis is performed. Nevertheless, some series still contain a significant trend, especially when two or more trends are estimated. The existence of trend in a detrended series signifies the need for further modeling of the data, and so trend detection is critical. Signal and Modeling Problem Detection Algorithm Outlier Detection AlgorithmTrend Detection Algorithm Spatial Clustering of Anomalies Signals and Modeling Problems Trend Outliers Above: Spatial diagrams displaying, in orange, the anomalous GPS time series that our algorithms detected in Western United States. It is important to consider the spatial component of problematic sites because spatial clusters (seen here as condensed orange areas) often indicate underlying geophysical signals that may have gone unnoticed or unaccounted for in the model. These diagrams were created using GPS Explorer, an on-line data and modeling application created by SOPAC and JPL ( The daily GPS time series data is displayed in sets of three plots per GPS site, representing the north, east, and up directional components, respectively from top to bottom. Left: Signal and/or Modeling Problems. Middle: Outliers in beginning of series. Right: Trends in detrended data. Problem: A model that does not consistently fit the data constitutes a modeling problem. Similarly, data that deviates away from the model in a particular pattern represents a signal. Method: Search each GPS site for existence of eight-month windows during which the residual series does not change sign, and therefore does not resemble white noise. This signifies a lack of important, but unaccounted for, model terms. Problem: Outliers are problematic because they skew the data, and in turn, they can bias the model. Extreme outliers must be removed. Method: Create a threshold for each residual series that is equal to 5 times the interquartile range (IQR). The IQR is a very robust estimator of the spread of the series since it is more resistant to outliers than the standard deviation. Thus, residuals that cross this threshold correspond to outliers. Problem: A detrended series that still exhibits significant trend indicates that the data contains unaccounted for information and/or modeling. Method: Using the correlation coefficient, , we can measure the strength of the linear association between time (X) and distance (Y) in GPS data. Since -1<   <1, with a value of 0 representing no linear association, and a value close to 1 or -1 representing a strong linear association, we determined that a value greater than.7 or less than -.7 signifies trend. [1] [2] Nikolaidis, R. (2002), Observation of Geodetic and Seismic Deformation with the Global Positioning System, Ph.D. thesis, Univ. of Calif., San Diego. [3] Diebold, F.X. (2007). Elements of Forecasting. Mason, OH: Thomson Higher Education. [4] Stoodley, K.D.C. and Mirnia, M. (1979). The Automatic Detection of Transients, Step Changes and Slope Changes in the Monitoring of Medical Time Series. Journal of the Royal Statistical Society, Series D, 28, Santa Ana basin San Gabriel basin Los Angeles basin Anthropogenic effects: The algorithms detect anomalous sites (in orange) in the Los Angeles basin, Santa Ana basin, and San Gabriel basin, which are regions where anthropogenic effects occur. Long Valley Caldera Mount St. Helens Yellowstone Parkfield Earthquake San Simeon Earthquake Volcanic Signals: Volcanoes affect ground motion in patterns that the anomaly detection algorithms consistently recognize, which is seen above as concentrations of detected sites in volcanic regions (Mt. St. Helens, Long Valley Caldera, and Yellowstone). Postseismic Deformation: The algorithms effectively detect post- seismic deformation, which is the anomalous trademark of medium to large earthquakes (1992 Mw=7.3 Landers, 1999 Mw= 7.1 Hector Mine, 2003 Mw=6.5 San Simeon, and 2004 Mw= 6.0 Parkfield). The epicenter for each earthquake is circled in red on the map above. The algorithms we developed successfully detect many GPS time series that exhibit geophysical anomalies, which often occur in the form of anthropogenic effects (such a groundwater removal and oil extraction), volcanic signals, or postseismic deformation. Hector Mine & Landers Earthquakes Acknowledgments. Dafna Avraham is a 2009 SCEC intern under the ACCESS-U project. Support is also provided by the NASA MEaSUREs project “Solid Earth Science ESDR System” with JPL. Help in this research was provided by Brendan Crowell, Peng Fang, Paul Jamason and Mindy Squibb at SOPAC.