Dealing with Noisy Data

Slides:



Advertisements
Similar presentations
Forecasting OPS 370.
Advertisements

All about noisy time series analysis in which features exist and long term behavior is present; finite boundary conditions are always an issue.
Continuous Probability Distributions.  Experiments can lead to continuous responses i.e. values that do not have to be whole numbers. For example: height.
McGraw-Hill Ryerson Copyright © 2011 McGraw-Hill Ryerson Limited. Adapted by Peter Au, George Brown College.
Lecture 4 Linear Filters and Convolution
Introduction to Inference Estimating with Confidence Chapter 6.1.
Part II – TIME SERIES ANALYSIS C2 Simple Time Series Methods & Moving Averages © Angel A. Juan & Carles Serrat - UPC 2007/2008.
Impact Evaluation Session VII Sampling and Power Jishnu Das November 2006.
Forecasting and Statistical Process Control MBA Statistics COURSE #5.
AP Statistics Section 9.3A Sample Means. In section 9.2, we found that the sampling distribution of is approximately Normal with _____ and ___________.
Chapter 6 Lecture 3 Sections: 6.4 – 6.5.
1 Chapter 7 Sampling Distributions. 2 Chapter Outline  Selecting A Sample  Point Estimation  Introduction to Sampling Distributions  Sampling Distribution.
Confidence intervals: The basics BPS chapter 14 © 2006 W.H. Freeman and Company.
Basic Time Series Analyzing variable star data for the amateur astronomer.
Random Variables (1) A random variable (also known as a stochastic variable), x, is a quantity such as strength, size, or weight, that depends upon a.
Discrete and Continuous Random Variables. Yesterday we calculated the mean number of goals for a randomly selected team in a randomly selected game.
1 1 Chapter 6 Forecasting n Quantitative Approaches to Forecasting n The Components of a Time Series n Measures of Forecast Accuracy n Using Smoothing.
Lecture 8: Measurement Errors 1. Objectives List some sources of measurement errors. Classify measurement errors into systematic and random errors. Study.
Statistics for Business and Economics Module 1:Probability Theory and Statistical Inference Spring 2010 Lecture 4: Estimating parameters with confidence.
Yandell – Econ 216 Chap 16-1 Chapter 16 Time-Series Forecasting.
Copyright © Cengage Learning. All rights reserved. Normal Curves and Sampling Distributions 7.
Short-Term Forecasting
Chapter Nine Hypothesis Testing.
And distribution of sample means
Statistics for Managers Using Microsoft® Excel 5th Edition
GS/PPAL Section N Research Methods and Information Systems
Lecture Slides Elementary Statistics Twelfth Edition
Chapter 4 Basic Estimation Techniques
Confidence Interval Estimation
Introduction to inference Estimating with confidence
Normal Distribution and Parameter Estimation
Forecasting Methods Dr. T. T. Kachwala.
Smoothing Serial Data.
Linear Filters and Edges Chapters 7 and 8
MTH 161: Introduction To Statistics
Chapter 6. Continuous Random Variables
Properties of the Normal Distribution
Sampling Distributions
Cycles and Exponential Smoothing Models
Elementary Statistics
Sampling Distributions and The Central Limit Theorem
Sampling Distributions
Convolutional Networks
Smoothing Serial Data.
Chapter 9 Hypothesis Testing.
Confidence intervals: The basics
Lecture Slides Elementary Statistics Twelfth Edition
Descriptive and Inferential
CHAPTER 10 Comparing Two Populations or Groups
CS5112: Algorithms and Data Structures for Applications
CHAPTER 9 Testing a Claim
Sampling and Power Slides by Jishnu Das.
Confidence intervals: The basics
Continuous Probability Distributions Part 2
CHAPTER 9 Testing a Claim
REPORT of the REGIME SHIFTS DETECTION GROUP
8. One Function of Two Random Variables
CHAPTER – 1.2 UNCERTAINTIES IN MEASUREMENTS.
NonLinear Dimensionality Reduction or Unfolding Manifolds
Estimating Population Parameters Based on a Sample
CHAPTER 9 Testing a Claim
Sampling Distributions and The Central Limit Theorem
CHAPTER 9 Testing a Claim
Normal Distribution and z-scores
Objectives 6.1 Estimating with confidence Statistical confidence
8. One Function of Two Random Variables
Objectives 6.1 Estimating with confidence Statistical confidence
BOX JENKINS (ARIMA) METHODOLOGY
CHAPTER – 1.2 UNCERTAINTIES IN MEASUREMENTS.
Business Statistics For Contemporary Decision Making 9th Edition
Presentation transcript:

Dealing with Noisy Data Smoothing Techniques

Want to find overall evolution of some phenomena In general, most environmental data has a considerable amount of random noise it. The origin of this random noise is the phenomena itself has a lot of intrinsic variability - this is especially true with climate data the measurements are often difficult and imprecise

This example shows a time series where the intrinsic variability completely swamps out any potential trend. To identify possible trends it is necessary to smooth the data.  Any data that has significant fluctuations as a function of time may dominate the underlying wave form (trend with time). You therefore want to dampen out the fluctuations to reveal the underlying wave form.

Smoothing vs Averaging Averaging ends up with fewer actually data points Smoothing preserves the number of data points and lowers the amplitude of the noise by coupling the data points; smoothing length is a trade-off Boxcar smoothing or sliding average: this works best when there is some underlying waveform to the data (long period systematics vs. short period fluctuations) (Note that the Excel function of "moving average" is, in fact equivalent to boxcar smoothing). Box Averaging: Replace N data points by the average of those data points. So suppose you have 100 data points and you average them in groups of 3. You will then up with 33 data points.

Exponential Smoothing Here the smoothed statistic, st is the weighed average of the previous data value (xt-i) and the previous smoothed statistic, st-i; α is the smoothing factor (a value between 0 and 1).  This method is often applied to data which has lots of short term fluctuations in it. Financial or stock market data is often treated this way but climate time series also lend itself to this smoothing approach. The functional form of this smoothing is governed by one smoothing parameter, and it has a value between 0 and 1. Values close to one have less of a smoothing effect and give greater weight to recent changes in the data, while closer to zero have a greater smoothing effect and are less responsive to recent changes. There is no formally correct procedure for choosing the smoothing factor -its a scientific judgement call.

Guassian Kernel Smoothing: This is very well suited to time series and not used nearly enough This form of smoothing is a weighted form The 'kernel' for smoothing, defines the shape of the function that is used to take the average of the neighboring points. A Gaussian kernel is a kernel with the shape of a Gaussian (normal distribution) curve.

Example Kernel Smoothing The basic process of smoothing is very simple. We proceed through the data point by point. For each data point we generate a new value that is some function of the original value at that point and the surrounding data points as weighted by the kernel. In this case we have chosen our kernel to have a standard deviation of 2.5 units on the X-axis.

This is the discrete representation of our kernel This is the discrete representation of our kernel. Note that the total area under the Kernel must = 1 We center our Kernel at x =14. The table below summarizes the x,y, values of the discrete function above as well as the actual Y-data in the raw case Note: we used a Gaussian kernel with σ = 2.5. The example used here just uses the first 5 points (eg. ~ +/- 1 σ) which represents 86.5 % of the total area and so contains most of the weight in the smoothing (but you really do want to use all of the kernel to get the best results) 

Operational Procedure We want to produce a new value for x = 14 based on this weighted smoothing function. That means we convolve our function with the data. That sounds onerous, but its really quite simple and works like this. The new Y-data point at X=14 is then: 0.117*1.065 + 0.198*0.389 + 0.235*0.349 + 0.198*-0.656 + 0.117*-.195 = .141 (compared to the actual data value of .349). We then apply the same procedure to point 15 etc and end up with a convolved time series.

Convolved Time Series This shows clustering of events and overall form of the data much better This particular method is very good for image or pattern recognition and can be important for analyzing climate data.