Gap-filling and Fault-detection for the life under your feet dataset.

Slides:



Advertisements
Similar presentations
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Advertisements

Life Under Your Feet Johns Hopkins University Computer Science Earth and Planetary Sciences Physics and Astronomy
Use of Kalman filters in time and frequency analysis John Davis 1st May 2011.
Dimension reduction (1)
Streaming Pattern Discovery in Multiple Time-Series Spiros Papadimitriou Jimeng Sun Christos Faloutsos Carnegie Mellon University VLDB 2005, Trondheim,
Jayant Gupchup Graduate student, Johns Hopkins University Representative Slides.
1 Multivariate Statistics ESM 206, 5/17/05. 2 WHAT IS MULTIVARIATE STATISTICS? A collection of techniques to help us understand patterns in and make predictions.
An introduction to Principal Component Analysis (PCA)
Principal Component Analysis
Principal Components. Karl Pearson Principal Components (PC) Objective: Given a data matrix of dimensions nxp (p variables and n elements) try to represent.
Distributed Regression: an Efficient Framework for Modeling Sensor Network Data Carlos Guestrin Peter Bodik Romain Thibaux Mark Paskin Samuel Madden.
Multi-Scale Analysis for Network Traffic Prediction and Anomaly Detection Ling Huang Joint work with Anthony Joseph and Nina Taft January, 2005.
Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 7: Coding and Representation 1 Computational Architectures in.
Privacy Preservation for Data Streams Feifei Li, Boston University Joint work with: Jimeng Sun (CMU), Spiros Papadimitriou, George A. Mihaila and Ioana.
Principal Component Analysis Principles and Application.
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
Tomo-gravity Yin ZhangMatthew Roughan Nick DuffieldAlbert Greenberg “A Northern NJ Research Lab” ACM.
Empirical Modeling Dongsup Kim Department of Biosystems, KAIST Fall, 2004.
Gwangju Institute of Science and Technology Intelligent Design and Graphics Laboratory Multi-scale tensor voting for feature extraction from unstructured.
Extensions of PCA and Related Tools
Multivariate Approaches to Analyze fMRI Data Yuanxin Hu.
Feature extraction 1.Introduction 2.T-test 3.Signal Noise Ratio (SNR) 4.Linear Correlation Coefficient (LCC) 5.Principle component analysis (PCA) 6.Linear.
CSE554AlignmentSlide 1 CSE 554 Lecture 5: Alignment Fall 2011.
Life Under Your Feet. Sensor Network Design Philosophies Use low cost components No access to line power –Deployed in remote locations Radio is the.
Towards a Model-Based Data Collection Framework for Environmental Monitoring Networks Research Proposal Jayant Gupchup Department of Computer Science,
Review of Statistics and Linear Algebra Mean: Variance:
Recent advances for the inversion of the particulate backscattering coefficient at different wavelengths H. Loisel, C. Jamet, and D. Dessailly.
Outline What Neural Networks are and why they are desirable Historical background Applications Strengths neural networks and advantages Status N.N and.
Center for Satellite Applications and Research (STAR) Review 09 – 11 March 2010 MAP (Maximum A Posteriori) x is reduced state vector [SST(x), TCWV(w)]
IORAS activities for DRAKKAR in 2006 General topic: Development of long-term flux data set for interdecadal simulations with DRAKKAR models Task: Using.
Basics of Neural Networks Neural Network Topologies.
Digital Media Lab 1 Data Mining Applied To Fault Detection Shinho Jeong Jaewon Shim Hyunsoo Lee {cinooco, poohut,
Combining CMORPH with Gauge Analysis over
Descriptive Statistics vs. Factor Analysis Descriptive statistics will inform on the prevalence of a phenomenon, among a given population, captured by.
CSE 185 Introduction to Computer Vision Face Recognition.
Lei Li Computer Science Department Carnegie Mellon University Pre Proposal Time Series Learning completed work 11/27/2015.
AIRS Radiance and Geophysical Products: Methodology and Validation Mitch Goldberg, Larry McMillin NOAA/NESDIS Walter Wolf, Lihang Zhou, Yanni Qu and M.
Streaming Problems in Astrophysics
Analyzing wireless sensor network data under suppression and failure in transmission Alan E. Gelfand Institute of Statistics and Decision Sciences Duke.
CpSc 881: Machine Learning
PCA vs ICA vs LDA. How to represent images? Why representation methods are needed?? –Curse of dimensionality – width x height x channels –Noise reduction.
Streaming Pattern Discovery in Multiple Time-Series Jimeng Sun Spiros Papadimitrou Christos Faloutsos PARALLEL DATA LABORATORY Carnegie Mellon University.
Locally Optimized Precipitation Detection over Land Grant Petty Atmospheric and Oceanic Sciences University of Wisconsin - Madison.
Classification and Prediction: Ensemble Methods Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Model Based Event Detection in Sensor Networks Jayant Gupchup, Andreas Terzis, Randal Burns, Alex Szalay.
Feature Selection and Extraction Michael J. Watts
Locations. Soil Temperature Dataset Observations Data is – Correlated in time and space – Evolving over time (seasons) – Gappy (Due to failures) – Faulty.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 10: PRINCIPAL COMPONENTS ANALYSIS Objectives:
Multivariate statistical methods. Multivariate methods multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). confirmation.
Martina Uray Heinz Mayer Joanneum Research Graz Institute of Digital Image Processing Horst Bischof Graz University of Technology Institute for Computer.
How Dirty is your Data : The Duality between detecting Events and Faults J. Gupchup A. Terzis R. Burns A. Szalay Department of Computer Science Johns Hopkins.
1 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Machine learning, pattern recognition and statistical data modelling.
Enabling Real Time Alerting through streaming pattern discovery Chengyang Zhang Computer Science Department University of North Texas 11/21/2016 CRI Group.
Principal Component Analysis
Outlier Processing via L1-Principal Subspaces
Machine Learning Dimensionality Reduction
Application of Independent Component Analysis (ICA) to Beam Diagnosis
PCA vs ICA vs LDA.
Blind Signal Separation using Principal Components Analysis
Object Modeling with Layers
Outline S. C. Zhu, X. Liu, and Y. Wu, “Exploring Texture Ensembles by Efficient Markov Chain Monte Carlo”, IEEE Transactions On Pattern Analysis And Machine.
Principal Component Analysis
PCA based Noise Filter for High Spectral Resolution IR Observations
Descriptive Statistics vs. Factor Analysis
What is Regression Analysis?
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Introduction to Sensor Interpretation
Feature Selection Methods
Principal Component Analysis
Introduction to Sensor Interpretation
Presentation transcript:

Gap-filling and Fault-detection for the life under your feet dataset

Locations

Soil Temperature Dataset

Observations Data is – Correlated in time and space – Evolving over time (seasons) – Gappy (Due to failures) – Faulty (Noise, Jumps in values)

Examples of Faults

Motivation Environmental Scientists want gap-filled and fault-free data 8% of the data is missing Of the available data, 10% is faulty Scientific libraries such as covariance, SVD etc do not handle gaps and faults well

Basic Observations Hardware failures causes entire “days” of data to be missing – Temporal correlation for integrity is less effective If we look in the spatial domain, – For a given time, at least > 50% of the sensors are active – Use the spatial basis instead. – Initialize spatial basis using the period marked “good period” in previous slide

Basic Methods

The Johns Hopkins University Principal Component Analysis (PCA) PCA : -Finds axes of maximum variance -Reduces original dimensionality (In e.g. from 2 variables => 1 variable) First Principal Component Variable #1 Variable #2 X : Points original space O : Projection on PC1

Gap Filling (Connolly & Szalay) + Original signal representation as a linear combination of orthogonal vectors Optimization function - W λ = 0 if data is absent - Find coefficients, a i ’s, that minimize function. + Connolly et al., A Robust Classification of Galaxy Spectra: Dealing with Noisy and Incomplete Data,

Initialization Good Period

Naïve Gap-Filling

Observations & Incremental Robust PCA * Faulty Data – Undesirable – Pollutes the PCA model – Impacts the quality of gap-filling Basis is time-dependent – Correlations are changing over time Simultaneous fault-detection & gap-filling – Initialize basis using reliable data – Using basis at time t, Detect and remove faulty readings Weight new vectors depending on residuals Use weights to update bases Gap fill using the current basis Update thresholds for declaring faults – Iterate over data to improve estimates + Budavári, T., Wild, V., Szalay, A. S., Dobos, L., Yip, C.-W Monthly Notices of the Royal Astronomical Society, 394, 1496,

Iterative and Robust gap-filling

Example 1

Example 2

Error Distribution in Location

Error Distribution in Time

Impact of number of missing values Even when > 50% of the values are missing Reconstruction is fairly good

Error Distribution IDW (Location)

Error Distribution IDW (time)

Discussion Reception of this work by sensor network community Compare with other approaches – K-nn interpolation – Other model based methods such as BBQ Reconstruction depends – Number of missing values – Estimation of basis (seasonal effects) – Use the tradeoff between reconstruction accuracy and power savings

original

Iterative and Robust gap-filling

Smooth / Original

Original and Ratio

Soil Moisture and Ratio

Moisture vs Ratio (zoom in)

Statistics

Locations and Durations

Olin

Principal Components

original

Slope and Intercept

PC1 and PC2

Forest Grass Winter

Forest Grass Summer

Mean scatter good locations

Mean scatter 1 bad location

2 st component good scatter

2 nd component bad scatter

Extra

Soil Moisture Error (Location)

Soil Moisture in Time

Original Vs Fault Removed

Example of running algorithm

Daily Basis

Collected measurements

Temporal Gap Filling Data is correlated in the temporal domain Data shows – Diurnal patterns – Trends Trends are modeled using slope and intercept Correlations captured using functional PCA

Accuracy of gap-filling High Errors for some locations are actually due to faults Not necessarily due to the gap-filling method