How Dirty is your Data : The Duality between detecting Events and Faults J. Gupchup A. Terzis R. Burns A. Szalay Department of Computer Science Johns Hopkins.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

International Technology Alliance In Network & Information Sciences International Technology Alliance In Network & Information Sciences 1 A framework for.
Quantification of Spatially Distributed Errors of Precipitation Rates and Types from the TRMM Precipitation Radar 2A25 (the latest successive V6 and V7)
Sensor-Based Abnormal Human-Activity Detection Authors: Jie Yin, Qiang Yang, and Jeffrey Junfeng Pan Presenter: Raghu Rangan.
Life Under Your Feet Johns Hopkins University Computer Science Earth and Planetary Sciences Physics and Astronomy
On the Implications of the Log-normal Path Loss Model: An Efficient Method to Deploy and Move Sensor Motes Yin Chen, Andreas Terzis November 2, 2011.
Fault-Tolerant Target Detection in Sensor Networks Min Ding +, Dechang Chen *, Andrew Thaeler +, and Xiuzhen Cheng + + Department of Computer Science,
David Chu--UC Berkeley Amol Deshpande--University of Maryland Joseph M. Hellerstein--UC Berkeley Intel Research Berkeley Wei Hong--Arched Rock Corp. Approximate.
Jayant Gupchup Graduate student, Johns Hopkins University Representative Slides.
Collaboration FST-ULCO 1. Context and objective of the work  Water level : ECEF Localization of the water surface in order to get a referenced water.
Edith C. H. Ngai1, Jiangchuan Liu2, and Michael R. Lyu1
1. 2 Outline Background on Landslides Landslides Prediction System Architecture Solution Evaluation.
Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.
Probabilistic Aggregation in Distributed Networks Ling Huang, Ben Zhao, Anthony Joseph and John Kubiatowicz {hling, ravenben, adj,
Distributed Regression: an Efficient Framework for Modeling Sensor Network Data Carlos Guestrin Peter Bodik Romain Thibaux Mark Paskin Samuel Madden.
Boundary Detection Jue Wang and Runhe Zhang. May 17, 2004 UCLA EE206A In-class presentation 2 Outline Boundary detection using static nodes Boundary detection.
Microsoft E-Science Data Storage Model for Environmental Monitoring Wireless Sensor Networks Jayant Gupchup †, R.
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering On-line Alert Systems for Production Plants A Conflict Based Approach.
MANETs A Mobile Ad Hoc Network (MANET) is a self-configuring network of mobile nodes connected by wireless links. Characteristics include: no fixed infrastructure.
1 Distributed Online Simultaneous Fault Detection for Multiple Sensors Ram Rajagopal, Xuanlong Nguyen, Sinem Ergen, Pravin Varaiya EECS, University of.
Probabilistic Data Aggregation Ling Huang, Ben Zhao, Anthony Joseph Sahara Retreat January, 2004.
1 A Dynamic Clustering and Scheduling Approach to Energy Saving in Data Collection from Wireless Sensor Networks Chong Liu, Kui Wu and Jian Pei Computer.
Partha Mukherjee & Sandip Sen Department of Math & CS University of Tulsa Comparing Reputation Schemes for Detecting Malicious Nodes in Sensor Networks.
Modeling spatially-correlated sensor network data Apoorva Jindal, Konstantinos Psounis Department of Electrical Engineering-Systems University of Southern.
Ad-Hoc Localization Using Ranging and Sectoring Krishna Kant Chintalapudi, Amit Dhariwal, Ramesh Govindan, Gaurav Sukhatme Computer Science Department,
Probability Grid: A Location Estimation Scheme for Wireless Sensor Networks Presented by cychen Date : 3/7 In Secon (Sensor and Ad Hoc Communications and.
Data Mining – Intro.
Modeling Count Data over Time Using Dynamic Bayesian Networks Jonathan Hutchins Advisors: Professor Ihler and Professor Smyth.
Speed and Direction Prediction- based localization for Mobile Wireless Sensor Networks Imane BENKHELIFA and Samira MOUSSAOUI Computer Science Department.
EShare: A Capacitor-Driven Energy Storage and Sharing Network for Long-Term Operation(Sensys 2010) Ting Zhu, Yu Gu, Tian He, Zhi-Li Zhang Department of.
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 2011 Predicting Solar Generation from Weather Forecasts Using Machine Learning Navin.
CS450 Network Embedded Sensing Systems Week 11: Time Synchronization and Reconstruction Jayant Gupchup.
Much variation but little change Rafael Rosa and Gerald Stanhill Institute of Soil, Water and Environmental research, Agricultural Research Organization,
Economic Cooperation Organization Training Course on “Drought and Desertification” Alanya Facilities, Antalya, TURKEY presented by Ertan TURGU from Turkish.
IN23A-1072: Life Under Your Feet: A Wireless Soil Ecology Sensor Network K. Szlavecz 1, A. Terzis 1, R. Musaloiu 1, A. Szalay 1, J. Gupchup 1, C.-J. Liang.
In a lot of applications, wireless sensing systems are used for inference and prediction on environmental phenomena. Statistical models are widely used.
IDIES Temporal Integrity Challenges in Long-term Environmental Monitoring Sensor Networks. Jayant Gupchup † Alex.
Building and End-to-end System for Long Term Soil Monitoring Katalin Szlávecz, Alex Szalay, Andreas Terzis, Razvan Musaloiu-E., Sam Small, Josh Cogan,
A Statistical Comparison of Weather Stations in Carberry, Manitoba, Canada.
Life Under Your Feet. Sensor Network Design Philosophies Use low cost components No access to line power –Deployed in remote locations Radio is the.
Time Series Data Analysis - I Yaji Sripada. Dept. of Computing Science, University of Aberdeen2 In this lecture you learn What are Time Series? How to.
Recognizing Activities of Daily Living from Sensor Data Henry Kautz Department of Computer Science University of Rochester.
DISCERN: Cooperative Whitespace Scanning in Practical Environments Tarun Bansal, Bo Chen and Prasun Sinha Ohio State Univeristy.
Algorithms for Wireless Sensor Networks Marcela Boboila, George Iordache Computer Science Department Stony Brook University.
Microsoft E-Science Data Storage Model for Environmental Monitoring Wireless Sensor Networks Jayant Gupchup †, R.
Applications of Neural Networks in Time-Series Analysis Adam Maus Computer Science Department Mentor: Doctor Sprott Physics Department.
RIDA: A Robust Information-Driven Data Compression Architecture for Irregular Wireless Sensor Networks Nirupama Bulusu (joint work with Thanh Dang, Wu-chi.
Estimating Component Availability by Dempster-Shafer Belief Networks Estimating Component Availability by Dempster-Shafer Belief Networks Lan Guo Lane.
Gap-filling and Fault-detection for the life under your feet dataset.
Automated QA/QC Technique for Climate Sensor Data EPSCoR Hawaii HGDR Scientific Data Management Portal Development Team.
A Passive Approach to Sensor Network Localization Rahul Biswas and Sebastian Thrun International Conference on Intelligent Robots and Systems 2004 Presented.
K. Kolomvatsos 1, C. Anagnostopoulos 2, and S. Hadjiefthymiades 1 An Efficient Environmental Monitoring System adopting Data Fusion, Prediction & Fuzzy.
Jayant Gupchup Phoenix, EWSN 2010 Phoenix: An Epidemic Approach to Time Reconstruction Jayant Gupchup †, Douglas Carlson †, Răzvan Musăloiu-E. †,*, Alex.
Secure In-Network Aggregation for Wireless Sensor Networks
Syed Hassan Ahmed Syed Hassan Ahmed, Safdar H. Bouk, Nadeem Javaid, and Iwao Sasase RIU Islamabad. IMNIC’12, RIU Islamabad.
Real-Time Mapping Systems for Routine and Emergency Monitoring Defining Boundaries between Fairy Tales and Reality A. Brenning (1) and G. Dubois (2) (1)
Model Based Event Detection in Sensor Networks Jayant Gupchup, Andreas Terzis, Randal Burns, Alex Szalay.
Combined Human, Antenna Orientation in Elevation Direction and Ground Effect on RSSI in Wireless Sensor Networks Syed Hassan Ahmed, Safdar H. Bouk, Nadeem.
Experiences and Challenges in Campaign Style Deployments using Wireless Sensor Networks Jayant Gupchup †, Scott Pitz *, Douglas Carlson †, Chih-Han Chang.
Locations. Soil Temperature Dataset Observations Data is – Correlated in time and space – Evolving over time (seasons) – Gappy (Due to failures) – Faulty.
ParkNet: Drive-by Sensing of Road-Side Parking Statistics Irfan Ullah Department of Information and Communication Engineering Myongji university, Yongin,
In the name of God.
Data Mining – Intro.
ANOMALOUS NOISE EVENTS CONSIDERATIONS FOR THE COMPUTATION OF ROAD TRAFFIC NOISE LEVELS : THE DYNAMAP'S MILAN CASE STUDY F. Orga (1), R. M. Alsina-Pagès.
A. Srivastava, S. Pandey, P. Banerjee, Y. Wu
Using Baseline Data in Quality Problem Solving
Spatial interpolation
K. Kolomvatsos1, C. Anagnostopoulos2, and S. Hadjiefthymiades1
Intelligent Contextual Data Stream Monitoring
Facultad de Ingeniería, Centro de Cálculo
Presentation transcript:

How Dirty is your Data : The Duality between detecting Events and Faults J. Gupchup A. Terzis R. Burns A. Szalay Department of Computer Science Johns Hopkins University

Outline  Background  Problem Statement  Experiments  Results  Discussion

Application  Monitoring nesting conditions of the Maryland Box turtles  Science Questions: Do nesting conditions determine sex ?  Important to correlate observations with environmental events (rain, snow etc)

Duality of Faults & Events  Data gathered from Sensor Networks contain faults  Delivering faulty data consumes resources and pollutes statistics  Need for fault detection techniques  Fault Detection methods detect readings that deviate from “normal” or “expected” values  Environmental Events : –Scientifically interesting –Deviate from the norm

Research Question(s)  Are “Events” misclassified as “Faults” ?  What metrics could be used to quantify the misclassification ?  How does the misclassification vary with: – Type of Fault – Type of Fault Detection method – Type of modality (Moisture, Temperature)  Is it possible to design a fault detection mechanism that minimizes the misclassification ?

Know Thy Faults  Short Faults –Sudden Change in measurement  Noise Faults –Large variations in amplitude than expected –Little or no variation in amplitude (unresponsive)

Fault Detection Methods  SHORT Rule – If X i – X (i-1) > δ SHORT mark current measurement as fault (point method) δ SHORT is established from domain knowledge  NOISE Rule – Take W successive samples – IF ( σ W ≤ σ train -σ allow ) OR ( σ W ≥ σ train +σ allow ), mark all W readings as faulty (block method) –σ train and σ allow are established from training data  Linear Least-square Estimation (LLSE) – Estimate expected value of a sensor’s value using other sensors using LLSE – If X model – X actual > δ LLSE for k of the node’s neighbors, mark the reading as faulty (point method) A. Sharma, L. Golubchik, and R. Govindan, “On the prevalence of sensor faults in real world deployments”, IEEE conference on Sensor, Mesh and Ad Hoc Communications and networks (SECON), 2007

Evaluation Metrics  Misclassification error (μ) for Point faults: μ = event readings tagged as faults / total event measurements Total Misclassification (μ )= ∑ i D i / ∑ i E i  Misclassification error (μ) for Block Faults: Misclassification  Fault detection evaluation metric : False negative ratio = fraction of faults failed to be detected Event Period (Ei) time Misclassification Di Event Period (Ei) time Di

Jug bay Deployment Map Turtle Nests , Weather Station Courtesy: Google maps

Dataset Sensor Data:  Box temperature and soil moisture  3 motes from Jug Bay (previous slide)  5 months of data (sampled every 10 min.)  Train Data Set (1 month), Test Data Set (4 months) Event Ground Truth (Weather Data):  Precipitation data collected from a weather station ~ 700 m away (sampled every 15 min.)  21 major events (i.e. rainfall) occurred  Total rainfall hours : 158 hours

Faults Ground Truth Start with a clean data set Inject Faults to Establish ground Truth

Methodology For Each Fault Detection Method & Each modality  Use 1 st month’s data to Train  Obtain Model Parameters  Evaluate Method on Fault-Injected Test Data

Soil Moisture ‘SHORT RULE’ Reducing the number of misclassification errors increases false negatives

Misclassification LLSE method ModalityMisclassification errorFalse Negatives Box Temperature0.3 %77.19 % Soil Moisture46.3 %50.03 % Higher misclassification can occur due to : Spatial & Temporal Heterogeneity of the soil

Lessons Learned  There exists a tension between detecting Events and Faults  Fault Detection Algorithms need to take this into consideration –Events can be misclassified as faults  Need for novel Fault Detection methods that are robust in the presence of Events

Need for Pattern Recognition techniques

Acknowledgements  Abhishek Sharma, Dept. of Computer Science, University of Southern California  Chris Swarth, Jug Bay Wetlands Sanctuary  Life Under Your Feet team  Marcus Chang, University of Copenhagen (Courtesy : Andreas Terzis)

Questions !!!!