In a lot of applications, wireless sensing systems are used for inference and prediction on environmental phenomena. Statistical models are widely used.

In a lot of applications, wireless sensing systems are used for inference and prediction on environmental phenomena. Statistical models are widely used to represent these environmental phenomena: Models characterize how unknown quantities (phenomena) are related to known quantities (measurements): Choosing the models involves a great deal of uncertainty. Often a single model M is used. If M does not characterize a phenomenon correctly, the inferences and predictions will not be accurate. It is better to start with multiple plausible models and select the model by collecting measurements at informative locations. Reducing Uncertainty in Sensor Calibration Reducing Uncertainty in Hardware Functionality (Fault Detection/Diagnosis) Take Physical Sample Reducing Uncertainty in Model Selection Minimizing Data Uncertainty through System Design DeploymentData quality indicators Bangladesh45% GDISensors reported 3-60% faulty data Ecuador Volcano 82% false negative rate / 13% false positive rate Macroscope8 of 33 temperature sensors faulty Laura Balzano, Nabil Hajj Chehade, Sheela Nair, Nithya Ramanathan, Abhishek Sharma, Deborah Estrin, Leana Golubchik, Ramesh Govindan, Mark Hansen, Eddie Kohler, Greg Pottie, Mani Srivastava Integrity Group, Center for Embedded Networked Sensing Introduction: There are Many Sources of Uncertainty in Interpreting Data Introduction: There are Many Sources of Uncertainty in Interpreting Data Environment Modeling Uncertainty Sensor Calibration Uncertainty UCLA – UCR – Caltech – USC – UC Merced Center for Embedded Networked Sensing Data uncertainty can be reduced through careful system design! Hardware Uncertainty Wireless sensing systems utilize low cost and unreliable hardware Faults are commonExamples of Sensor Faults Accurate calibration function is required to translate data from sensors Calibration parameters for most sensors drifts non-deterministically over time Problem Description: Online fault detection and diagnosis By detecting faults when they occur, instead of after the fact, users can take actions in the field to validate questionable data and fix hardware faults. Confidence Assumptions: Faults can be common, an initial fault-free training period is not always available, environmental phenomena are hard to predict so tight bounds on expected behavior are not possible Evaluated in real-world Deployments Confidence detects faults with low false positive and negative rates. Difficult to validate what is truly a fault without ground truth In our San Joaquin deployment we validated data by analyzing soil samples taken from each sensor Outlier Detection: Using a continually updated distribution, in place of statically defined thresholds, makes Confidence resilient to human configuration error and adaptable to dynamic environments Replace Sensor Gradient Standard Deviation Readings are mapped into a multi-dimensional space defined by carefully chosen features: gradient, distance from LDR, distance from NLDR, standard deviation. Points far from the origin are faulty Assume a normal distribution of distances for good points. Points outside 2 standard deviations of the mean distance are considered outliers and are rejected. All other points are used to continually update distribution parameters. Points are clustered using an online K-means algorithm. Clusters are associated with a previously successful remediating action Bangladesh Detects 85% of faulty data in a real-world data-trace captured in Bangladesh even though over one third of the data are faulty San Joaquin River We ran Confidence In a deployment of 20 sensors in San Joaquin. Confidence accurately detected all 4 faults that occurred and correctly diagnosed 3 of the 4 faults, with no false positives or negatives Data-driven techniques for identifying faulty sensor readings 1) Rule/Heuristic-based methods 2) Linear-Least Squares Estimation based method Exploits correlation in the data measured at different sensors LLSE Equation: HMM model: Number of states Transition probabilities Conditional probability: Pr [O | S ] SHORT Rule: Compute the rate of change between two successive samples. If it is above a threshold, this is an instance of SHORT fault. NOISE Rule : Compute the std. deviation of samples within time window W. If it is above a threshold, the samples are corrupted by NOISE fault. Results Analyzed data sets from real world deployments to characterize the prevalence of data faults using these 3 methods. NAMOS deployment : CONSTANT+NOISE faults, up to 30% of samples affected by data faults. Intel Lab, Berkeley deployment: CONSTANT+NOISE faults, up to 20% of samples affected by data faults. Great Duck Island deployment: SHORT+NOISE faults, 10-15% of samples affected by data faults. SensorScope deployment: SHORT faults, very few samples affected by data faults. SHORT faultNOISE fault 3) Learning data models : Hidden Markov Models Injected CONSTANT fault NO YES Signatures for modeling normal and faulty behavior Difficult to initialize sensor signature without learning period that is guaranteed to be fault-free. –Can use a stricter threshold during learning period to decrease chance of incorporating faults into sensor signature Method is dependent on accurately representing fault models, which is difficult without available labeled training data. Summarize sensor and fault behaviors using a signature: multivariate probability density of features (Cahill, Lambert, Pinhiero, and Sun; 2000) Features chosen to exploit differences between faulty and normal behavior. Current features summarize temporal and spatial information: –Temporal: actual reading, change between successive readings, voltage –Spatial: diff. from neighboring sensors. Calculate score for new readings using log likelihood ratio: Higher scores are more suspicious. Use of sensor signatures allows for sensor- specific fault detection. Fault Detection Algorithm (adapted from Detecting Fraud In the Real World; Cahill, Lambert, Pinhiero, and Sun; 2000) Tested on one week of Cold-Air Drainage data 4/06 4/08 4/10 4/12 stuck-at fault Sensor 2 malfunctioning at start of deployment; Noisy readings are learned as “normal” sensor behavior update sensor sig. Signature update requires online density estimation Sequentially update density estimate with each new reading Unable to store historical data Must compactly represent density No single parametric family flexible enough to represent all distributions of features Developing a new method to do this using log-splines. Calculate Features: X t Calculate score New reading Sensor signature: S t Fault signature: F Score > threshold? update fault sig. Sensor 1 Sensor 2 unusually noisy readings Low voltage Problem Description: Blind Calibration Blindly calibrate sensor response from routine measurements collected from the sensor network. Manual calibration is not a scalable practice! Consider a network with n sensors. We can call the vector of a true signal from the n sensors x: And the vector of the measured signal y: Then assuming the measured signals y are a linear function of x: and assuming the true signals x lie in a known r-dimensional subspace of R n which can be defined by P, the orthogonal projection matrix onto the orthogonal complement of that subspace: Then under certain conditions on P, with no noise and exact knowledge of the subspace, we can perfectly recover the gain factors and partially recover the offset factors. robust to noise Error at 2% noise in the measured signal: Gain: <.01% Offset: <2.4% robust to mismodeling Error at 10% of the true signal outside of the assumed subspace: Gain: <1% Offset: <4% Evaluation: In a deployment with all sensors in a styrofoam box, thus with a 1-d signal subspace, the algorithm recovers the gains and offsets almost exactly. In a deployment with sensors spread across a valley at the James Reserve, using a 4-d signal subspace constructed from the calibrated data, the gain calibration was quite accurate. The offset calibration, as expected, captured some of the non-zero mean signal; additionally it was sensitive to the model. Algorithm: T-Designs A sequential algorithm is used to iteratively collect measurements that maximize the discrimination between the two models [1]. Evaluation on Real Data: Likelihoods: M1  0.1754, M2  3.4368  M2 fits better. Generalization: In case of multiple models, apply the same algorithm to the best two models that fit the data at each iteration (worst case). Problem Description: Optimal Sensor placement Where should we collect measurements to optimally choose a model that represents the field? Assumptions: Two plausible models. Gaussian noise. Idea: Find the locations where the “difference” between the two models is the largest. Technically: [1] A.C. Atkinson and V.V. Fedorov. Optimal design: Experiments for discriminating between several models. Biometrika 62, 289-303, 1975.

In a lot of applications, wireless sensing systems are used for inference and prediction on environmental phenomena. Statistical models are widely used.

Similar presentations

Presentation on theme: "In a lot of applications, wireless sensing systems are used for inference and prediction on environmental phenomena. Statistical models are widely used."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

In a lot of applications, wireless sensing systems are used for inference and prediction on environmental phenomena. Statistical models are widely used.

Similar presentations

Presentation on theme: "In a lot of applications, wireless sensing systems are used for inference and prediction on environmental phenomena. Statistical models are widely used."— Presentation transcript:

Similar presentations

About project

Feedback