Physics 114: Lecture 12 Mean of Means

Slides:

Advertisements

Similar presentations

Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.

Advertisements

Physics 114: Lecture 7 Uncertainties in Measurement Dale E. Gary NJIT Physics Department.

Lecture 2 Describing Data II ©. Summarizing and Describing Data Frequency distribution and the shape of the distribution Frequency distribution and the.

Physics 114: Lecture 15 Probability Tests & Linear Fitting Dale E. Gary NJIT Physics Department.

Investment Analysis and Portfolio management Lecture: 24 Course Code: MBF702.

 The situation in a statistical problem is that there is a population of interest, and a quantity or aspect of that population that is of interest. This.

Physics 114: Exam 2 Review Lectures 11-16

Physics 114: Lecture 14 Mean of Means Dale E. Gary NJIT Physics Department.

SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. PPSS The situation in a statistical problem is that there is a population of interest, and a quantity or.

Intro to Inference & The Central Limit Theorem. Learning Objectives By the end of this lecture, you should be able to: – Describe what is meant by the.

Introduction to Inference Sampling Distributions.

From the population to the sample The sampling distribution FETP India.

CHAPTER – 1 UNCERTAINTIES IN MEASUREMENTS. 1.3 PARENT AND SAMPLE DISTRIBUTIONS  If we make a measurement x i in of a quantity x, we expect our observation.

Lecture 8: Measurement Errors 1. Objectives List some sources of measurement errors. Classify measurement errors into systematic and random errors. Study.

CHAPTER 4 ESTIMATES OF MEAN AND ERRORS. 4.1 METHOD OF LEAST SQUARES I n Chapter 2 we defined the mean  of the parent distribution and noted that the.

Methods of Presenting and Interpreting Information Class 9.

Physics 114: Lecture 16 Least Squares Fit to Arbitrary Functions

Physics 114: Lecture 11-a Error Analysis, Part III

INTRODUCTION TO STATISTICS

SUR-2250 Error Theory.

Dependent-Samples t-Test

Physics 114: Lecture 13 Probability Tests & Linear Fitting

Linear Algebra Review.

Market-Risk Measurement

Statistical Data Analysis - Lecture /04/03

Physics 114: Lecture 14 Linear Fitting

Physics 114: Lecture 5 Uncertainties in Measurement

Physics 114: Exam 2 Review Weeks 7-9

3. The X and Y samples are independent of one another.

Distribution of the Sample Means

Simulation-Based Approach for Comparing Two Means

Introduction to Summary Statistics

Introduction to Summary Statistics

Physics 114: Lecture 6 Measuring Noise in Real Data

Physics 114: Lecture 10 Error Analysis/ Propagation of Errors

Physics 114: Lecture 12 Central Limit Theorem or Mean of Means

Physics 114: Exam 2 Review Material from Weeks 7-11

Comparing Two Proportions

Introduction to Summary Statistics

Introduction to Summary Statistics

CHAPTER 29: Multiple Regression*

Introduction to Summary Statistics

Significant Figures The significant figures of a (measured or calculated) quantity are the meaningful digits in it. There are conventions which you should.

Introduction to Summary Statistics

Descriptive and inferential statistics. Confidence interval

Inferential Statistics

Discrete Event Simulation - 4

Physics 114: Lecture 11 Error Analysis, Part II

Comparing Two Proportions

Geology Geomath Chapter 7 - Statistics tom.h.wilson

Arithmetic Mean This represents the most probable value of the measured variable. The more readings you take, the more accurate result you will get.

Introduction to Summary Statistics

Sampling distributions

Physics 114: Lecture 14-a Linear Fitting Using Matlab

John Federici NJIT Physics Department

Chapter 10: Estimating with Confidence

Sampling Distributions

Introduction to Summary Statistics

CHAPTER- 3.1 ERROR ANALYSIS.

Inferential Statistics

Introduction to Summary Statistics

Regression & Correlation (1)

CHAPTER – 1.2 UNCERTAINTIES IN MEASUREMENTS.

Variance and Hypothesis Tests

Introduction to Summary Statistics

Propagation of Error Berlin Chen

MGS 3100 Business Analysis Regression Feb 18, 2016

Chapter 5: Sampling Distributions

Intro to Inference & The Central Limit Theorem

CHAPTER – 1.2 UNCERTAINTIES IN MEASUREMENTS.

Presentation transcript:

Physics 114: Lecture 12 Mean of Means Averaging of averages? John Federici NJIT Physics Department

Physics Cartoons

The Goal of Measurement When we make measurements of a quantity, we are mainly after two things: (1) the value of the quantity (the mean), and (2) a sense of how well we know this value (for which we need the spread of the distribution, or standard deviation). Remember that the goal of measurement is to obtain these two bits of information. The mean is of no use without the standard deviation as well. We have seen that repeated measurement of a quantity can be used to improve the estimate of the mean. Let’s take a closer look at what is going on. HOW do we COMBINE data from REPEATED measurements? Say we create a random set of 100 measurements of a quantity whose parent distribution has a mean of 5 and standard deviation of 1: x = randn(1,100)+5; Create a histogram of that set of measurements: [y z] = hist(x,0:0.5:10); Here, y is the histogram (frequency of points in each bin), and z is the bin centers. Now plot it: plot(z,y,’.’). If you prefer bars, use stairs(z,y).

Histogram of “data” >> ylabel('# of occurances') >> x = randn(1,100)+5; >> [y z] = hist(x,0:0.5:10); >> plot(z,y,'.') >> xlabel('Data Value') >> stairs(z,y)

Comparing Measurements If you make repeated sets of 100 measurements, you will obtain different samples from the parent distribution, whose averages are approximations to the mean of the parent distribution. Let’s make 100 sets of 100 measurements: y = zeros(100,21); % here 21 represents the number of ‘bins’ in histo for i = 1:100; x = randn(1,100)+5; [y(i,:) z] = hist(x,0:0.5:10); end Plotting some of the histograms, for i = 1:16; subplot(4,4,i); stairs(z-0.25,y(i,:)); axis([2,8,0,25]); end Basic structure of “for” loops. For loops can be nested. To programmatically exit the loop, use a break statement. To skip the rest of the instructions in the loop and begin the next iteration, use a continue statement. for index = values statements end

Comparing Measurements you can see that the samples means vary from one set of 100 measurements to another.

Comparing Measurements Now the mean can be determined from the original values (xi): mean(x) or from the histograms themselves: mean(y(100,:).*z)/mean(y(100,:)) Make sure you understand why these two should be the same (nearly), and why they might be slightly different. Approximately, sum over all xi

Comparing Measurements Now the mean can be determined from the original values (xi): mean(x) or from the histograms themselves: mean(y(100,:).*z)/mean(y(100,:)) Make sure you understand why these two should be the same (nearly), and why they might be slightly different. Since we have saved the histograms, let’s print out the means of these 16 sample distributions: for i=1:16; a = mean(y(i,:).*z)/mean(y(i,:)); fprintf('%f\n',a); end Here is one realization of those means. 5.015 4.915 5.000 4.940 4.995 4.980 4.975 4.960 4.990 4.920 4.965 5.040 5.010 4.870 5.135 4.890 We might surmise that the mean of these means might be a better estimate of the mean of the parent distribution, and we would be right!

Distribution of Means Let’s now calculate the 100 means And plot them a = zeros(1,100); for i=1:100; a(1,i) = mean(y(i,:).*z)/mean(y(i,:)); end And plot them subplot(1,1,1) plot(a,'.') This is the distribution of means. The mean of this distribution is 4.998, clearly very close to the mean of the parent distribution (5)

Mean of Means COMBINE data into ONE BIG DATA SET We can think about this distribution in two different, but equivalent ways. If we simply sum all of the histograms, we obtain a much better estimate of the parent population: COMBINE data into ONE BIG DATA SET mom = sum(y); stairs(z-0.25,mom) mean(mom.*z)/mean(mom) gives: 4.998

Mean of Means Alternatively, we can think of these different estimates of the mean of the original population as being drawn from a NEW parent population, one representing the distribution of means. This NEW parent population has a different (smaller) standard deviation than the original parent population. std(a) is 0.0976

We have seen this before! This should remind you of HW #4 If one were to BLOCK the THz Beam in the transmission measurement, what waveform would you measure?

HomeWork #4 Is the noise in the THz transmission measurement statistical? 1 average

Does Averaging help? 1 Avg 10 Avg 1 Avg 10 Avg “Averaging” of 10 traces together into 1 trace is the first “MEAN”. Then when you calculate the histogram of that AVERAGED trace (essentially, the MEAN OF THE MEAN), the distribution function narrows.

Calculation of Error in the Mean Recall in Previous Lectures we introduced the general formula for propagation of errors of a combination of two measurements u and v as: Generalizing further for our case of N measurements of the mean m’, and ignoring correlations between measurements (i.e. setting the cross-terms to zero), we have We can make the assumption that all of the si are equal (this is just saying that the samples are all of equal size and drawn from the same parent population). Also so

Calculation of Error in the Mean This is no surprise, and it says what we already knew, that the error in the mean gets smaller according to the square-root of the number of means averaged. Result from HW#4 Slope=-1/2

Calculation of Error in the Mean This is no surprise, and it says what we already knew, that the error in the mean gets smaller according to the square-root of the number of means averaged. Again, this is the case when all of the errors in the means used are equal. What would we do if, say, some of the means were determined by averaging different numbers of observations instead of 100 each time? In this case, we can do what is called weighting the data. If we know the different values of si, then the weighted average of the data points is Even if we take the SAME number of observations each ‘experiment’, we may STILL want to weight the data…. If measurements were taken on DIFFERENT days, the noise might be different because the experimental conditions might be different.

Error in the Weighted Mean In this case, if we want to combine a number of such weighted means to get the error in the weighted mean, we still have to calculate the propagation of errors: but now the si are not all the same, and also we use the weighted mean to get the gradient of the mean Inserting that into the above equation, we have

Relative Uncertainties In some cases, we do not necessarily know the uncertainties of each measurement, but we do know the relative values of si. That is, we may know that some of our estimates of the mean used 100 measurements, and some used only 25. In that case, we can guess that the latter measurements have errors twice as large (since the standard deviations are proportional to the square-root of the number of measurements). So, say Then In other words, because of the nature of the ratio, the proportionality constant cancels and we need only the relative weights. To get the overall variance in this case, we must appeal to an average variance of the data: So the standard deviation is found using this, as

Calculation of Error in the Mean This is no surprise, and it says what we already knew, that the error in the mean gets smaller according to the square-root of the number of means averaged. Slope=-1/2

REMINDER of Limitations of Averaging Some of the major concepts of HW#4: Averaging will definitely help, but the trade-off is TIME to acquire the data Even if we had enough time (and money to PAY the people making the measurements), some of the experimental conditions MAY CHANGE during experiment – Temperature changes cause materials to expand or contract, changing, for example, optical alignment of mirrors and lenses. Systematic errors CAN NOT be removed even with averaging.

Previous Example of Systematic Error February 12, 2010

This Systematic Error is Intermittent 10 Avg 100 Avg Best solution is to FIX the instrumentation to remove Systematic Errors.

‘Non-statistical Fluctuations’ Essentially the point is that MOST experiments do not follow Gaussian distribution beyond 3 or 4 standard deviations. OUTLIER data points which can not be easily explained are an issue. In addition, even if you can ‘average away’ Gaussian distribution of noise, other sources of noise can then become dominate that are not Gaussian-like in behavior. Slope=-1/2 ‘Non-Statistical Fluctuations’

‘Severe’ Outlier Data Points What is a 'Black Swan‘? A black swan is an event or occurrence that deviates beyond what is normally expected of a situation and is extremely difficult to predict; the term was popularized by Nassim Nicholas Taleb, a finance professor, writer and former Wall Street trader. Black swan events are typically random and are unexpected. (ie. RARE events) http://www.investopedia.com For insurance, how do you determine the risk and price of insurance for a HIGHLY unlikely event (tidal wave on coast of NJ) but if it DID happen, the effects would be catastrophic ….

Elimination of Data points The textbook describes Chauvenet’s criteria which essentially says that if an outlier data point occurs with much higher rate than would be expected by statistics, you can discard the data point as a ‘valid’ data point for calculating mean and standard deviation. HOWEVER Bevington textbook

In Class exercise Later in the course, we will be processing multiple THz data files to create images. The first step in this progress is to use FOR loops to generate the correct file names. x 0y 0.txt x 1y 0.txt x 2y 0.txt … Use nested FOR loops to create the FILE NAMES starting with ‘x 0y 0.txt’ and ending with ‘x 5y 7.txt’ . Printout the names of the files as you create them. HINT: Format integers and the letters as ‘string’ variables.