Environmental Data Analysis with MatLab Lecture 23: Hypothesis Testing continued; F-Tests.

Slides:



Advertisements
Similar presentations
Inferential Statistics and t - tests
Advertisements

Environmental Data Analysis with MatLab Lecture 10: Complex Fourier Series.
Chapter 9: Simple Regression Continued
Environmental Data Analysis with MatLab
Environmental Data Analysis with MatLab Lecture 21: Interpolation.
Environmental Data Analysis with MatLab Lecture 15: Factor Analysis.
Environmental Data Analysis with MatLab Lecture 8: Solving Generalized Least Squares Problems.
Hypothesis Testing Steps in Hypothesis Testing:
Environmental Data Analysis with MatLab Lecture 9: Fourier Series.
Environmental Data Analysis with MatLab
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Linear regression models
Environmental Data Analysis with MatLab Lecture 13: Filter Theory.
Environmental Data Analysis with MatLab Lecture 16: Orthogonal Functions.
Introduction to Statistics
Environmental Data Analysis with MatLab Lecture 11: Lessons Learned from the Fourier Transform.
Environmental Data Analysis with MatLab
Environmental Data Analysis with MatLab Lecture 12: Power Spectral Density.
Environmental Data Analysis with MatLab Lecture 17: Covariance and Autocorrelation.
The Multiple Regression Model Prepared by Vera Tabakova, East Carolina University.
Probability & Statistical Inference Lecture 7 MSc in Computing (Data Analytics)
Lecture 9 Inexact Theories. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture 03Probability and.
Chapter Seventeen HYPOTHESIS TESTING
Environmental Data Analysis with MatLab Lecture 5: Linear Models.
Environmental Data Analysis with MatLab Lecture 3: Probability and Measurement Error.
Environmental Data Analysis with MatLab Lecture 24: Confidence Limits of Spectra; Bootstraps.
BCOR 1020 Business Statistics Lecture 20 – April 3, 2008.
Linear and generalised linear models
Environmental Data Analysis with MatLab Lecture 7: Prior Information.
5-3 Inference on the Means of Two Populations, Variances Unknown
Chapter 9: Introduction to the t statistic
1 In the previous sequence, we were performing what are described as two-sided t tests. These are appropriate when we have no information about the alternative.
Introduction to Regression Analysis, Chapter 13,
Lecture 5 Correlation and Regression
Environmental Data Analysis with MatLab Lecture 20: Coherence; Tapering and Spectral Analysis.
Inference for regression - Simple linear regression
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
1 GE5 Lecture 6 rules of engagement no computer or no power → no lesson no SPSS → no lesson no homework done → no lesson.
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Hypothesis Testing "Parametric" tests -- we will have to assume Normal distributions (usually) in ways detailed below These standard tests are useful to.
1 Tests with two+ groups We have examined tests of means for a single group, and for a difference if we have a matched sample (as in husbands and wives)
Fundamentals of Data Analysis Lecture 4 Testing of statistical hypotheses.
Today’s lesson Confidence intervals for the expected value of a random variable. Determining the sample size needed to have a specified probability of.
F OUNDATIONS OF S TATISTICAL I NFERENCE. D EFINITIONS Statistical inference is the process of reaching conclusions about characteristics of an entire.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
Oceanography 569 Oceanographic Data Analysis Laboratory Kathie Kelly Applied Physics Laboratory 515 Ben Hall IR Bldg class web site: faculty.washington.edu/kellyapl/classes/ocean569_.
+ Chapter 12: Inference for Regression Inference for Linear Regression.
Random Regressors and Moment Based Estimation Prepared by Vera Tabakova, East Carolina University.
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Y X 0 X and Y are not perfectly correlated. However, there is on average a positive relationship between Y and X X1X1 X2X2.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
Lecture 16 Section 8.1 Objectives: Testing Statistical Hypotheses − Stating hypotheses statements − Type I and II errors − Conducting a hypothesis test.
EMIS 7300 SYSTEMS ANALYSIS METHODS FALL 2005 Dr. John Lipp Copyright © Dr. John Lipp.
Lecture 10: Correlation and Regression Model.
Hypothesis Testing "Parametric" tests – based on assumed distributions (with parameters). You assume Normal distributions (usually) in ways detailed below.
Review of Statistics.  Estimation of the Population Mean  Hypothesis Testing  Confidence Intervals  Comparing Means from Different Populations  Scatterplots.
T Test for Two Independent Samples. t test for two independent samples Basic Assumptions Independent samples are not paired with other observations Null.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
T tests comparing two means t tests comparing two means.
Hypothesis Testing and Statistical Significance
Environmental Data Analysis with MatLab 2 nd Edition Lecture 14: Applications of Filters.
Fundamentals of Data Analysis Lecture 4 Testing of statistical hypotheses pt.1.
Environmental Data Analysis with MatLab 2 nd Edition Lecture 22: Linear Approximations and Non Linear Least Squares.
One-Sample Tests of Hypothesis
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
Hypothesis Testing: Hypotheses
Lecture 26: Environmental Data Analysis with MatLab 2nd Edition
Environmental Data Analysis with MatLab
Presentation transcript:

Environmental Data Analysis with MatLab Lecture 23: Hypothesis Testing continued; F-Tests

Lecture 01Using MatLab Lecture 02Looking At Data Lecture 03Probability and Measurement Error Lecture 04Multivariate Distributions Lecture 05Linear Models Lecture 06The Principle of Least Squares Lecture 07Prior Information Lecture 08Solving Generalized Least Squares Problems Lecture 09Fourier Series Lecture 10Complex Fourier Series Lecture 11Lessons Learned from the Fourier Transform Lecture 12Power Spectral Density Lecture 13Filter Theory Lecture 14Applications of Filters Lecture 15Factor Analysis Lecture 16Orthogonal functions Lecture 17Covariance and Autocorrelation Lecture 18Cross-correlation Lecture 19Smoothing, Correlation and Spectra Lecture 20Coherence; Tapering and Spectral Analysis Lecture 21Interpolation Lecture 22 Hypothesis testing Lecture 23 Hypothesis Testing continued; F-Tests Lecture 24 Confidence Limits of Spectra, Bootstraps SYLLABUS

purpose of the lecture continue Hypothesis Testing and apply it to testing the significance of alternative models

Review of Last Lecture

Steps in Hypothesis Testing

Step 1. State a Null Hypothesis some variation of the result is due to random variation

Step 2. Focus on a statistic that is unlikely to be large when the Null Hypothesis is true

Step 3. Determine the value of statistic for your problem

Step 4. Calculate that the probability that a the observed value or greater would occur if the Null Hypothesis were true

Step 5. Reject the Null Hypothesis only if such large values occur less than 5% of the time

An example test of a particle size measuring device

manufacturer's specs machine is perfectly calibrated particle diameters scatter about true value measurement error is σ d 2 = 1 nm 2

your test of the machine purchase batch of 25 test particles each exactly 100 nm in diameter measure and tabulate their diameters repeat with another batch a few weeks later

Results of Test 1

Results of Test 2

Question 1 Is the Calibration Correct? Null Hypothesis The observed deviation of the average particle size from its true value of 100 nm is due to random variation (as contrasted to a bias in the calibration).

in our case the key question is Are these unusually large values for Z ? = and = and So values of |Z| greater than Z est are very common The Null Hypotheses cannot be rejected there is no reason to think the machine is biased

Question 2 Is the variance in spec? Null Hypothesis The observed deviation of the variance from its true value of 1 nm 2 is due to random variation (as contrasted to the machine being noisier than the specs).

the key question is Are these unusually large values for χ 2 ? = ? Results of the two tests

In MatLab = and So values of χ 2 greater than χ est 2 are very common The Null Hypotheses cannot be rejected there is no reason to think the machine is noisy

End of Review now continuing this scenario …

Question 1, revisited Is the Calibration Correct? Null Hypothesis The observed deviation of the average particle size from its true value of 100 nm is due to random variation (as contrasted to a bias in the calibration).

suppose the manufacturer had not specified a variance then you would have to estimate it from the data = and 0.894

but then you couldn’t form Z since you need the true variance

last lecture, we examined a quantity t, defined as the ratio of a Normally-distributed variable and something that has the form as of an estimated variance

so we will test t instead of Z

in our case Are these unusually large values for t ? = and = and So values of |t| greater than t est are very common The Null Hypotheses cannot be rejected there is no reason to think the machine is biased

Question 3 Has the calibration changed between the two tests? Null Hypothesis The difference between the means is due to random variation (as contrasted to a change in the calibration). = and

since the data are Normal their means (a linear function) is Normal and the difference between them (a linear function) is Normal

if c = a – b then σ c 2 = σ a 2 + σ b 2

so use a Z test in our case Z est = 0.368

= Values of |Z| greater than Z est are very common so the Null Hypotheses cannot be rejected there is no reason to think the bias of the machine has changed using MatLab

Question 4 Has the variance changed between the two tests? Null Hypothesis The difference between the variances is due to random variation (as contrasted to a change in the machine’s precision). = and 0.974

last lecture, we examined the distribution of a quantity F, the ratio of variances

so use an F test in our case F est = 1.110

F p(F) 1/F est F est whether the top or bottom χ 2 in is the bigger is irrelevant, since our Null Hypothesis only concerns their being different. Hence we need evaluate:

= Values of F greater than F est or less than 1/F est are very common using MatLab so the Null Hypotheses cannot be rejected there is no reason to think the noisiness of the machine has changed

Another use of the F-test

we often develop two alternative models to describe a phenomenon and want to know which is better?

However any difference in total error between two models may just be due to random variation

Null Hypothesis the difference in total error between two models is due to random variation

linear fit cubic fit time t, hours d(i) Example Linear Fit vs. Cubic Fit?

A) linear fit B) cubic fit time t, hours d(i) Example Linear Fit vs Cubic Fit? cubic fit has 14% smaller error, E

The cubic fits 14% better, but … The cubic has 4 coefficients, the line only 2, so the error of the cubic will tend to be smaller anyway and furthermore the difference could just be due to random variation

Use an F-test degrees of freedom on linear fit: ν L = 50 data – 2 coefficients = 48 degrees of freedom on cubic fit: ν C = 50 data – 4 coefficients = 46 F = (E L / ν L ) / (E C / ν C ) = 1.14

in our case = Values of F greater than F est or less than 1/F est are very common so the Null Hypotheses cannot be rejected

in our case = Values of F greater than F est or less than 1/F est are very common so the Null Hypotheses cannot be rejected there is no reason to think one model is ‘really’ better than the other