USC3002 Picturing the World Through Mathematics Wayne Lawton Department of Mathematics S , Theme for Semester I, 2008/09 : The Logic of Evolution, Mathematical Models of Adaptation from Darwin to Dawkins
1.Populations and Samples 2.Sample Population Statistics 3.Statistical Hypothesis 4. Test Statistics for Gaussian Hypotheses Sample Mean for Parameter Estimation z-Test and t-Test Statistics Rejection/Critical Region for z-Test Statistic Hypothesis Test for Mean Height 5. General Hypotheses Tests Type I and Type II Errrors Null and Alternative Hypotheses 6. Assign Tutorial Problems PLAN FOR LECTURE
Population - a specified collection of quantities: e.g. heights of males in a country, glucose levels of a collection of blood samples, batch yields of an industrial compound for a chemical plant over a specified time with and without the use of a catalyst Sample Population – a population from which samples are taken to be used for statistical inference Sample - the subset of the sample population consisting of the samples that are taken. POPULATIONS AND SAMPLES
Sample SAMPLE POPULATION PARAMETERS Sample Size Sample Parameters Sample Mean Sample Variance Sample Standard Deviation
Theorem 1 The variance of a population is related to its mean and average squared values by SAMPLE POPULATION PARAMETERS Proof Since Question How can the proof be completed ? Why ?
are assertions about a population that describe some statistical properties of the population. STATISTICAL HYPOTHESES For Gaussian distributions there are four possibilities: Typically, statistical hypotheses assert that a population consists of independent samples of a random variable that has a certain type of distribution and some of the parameters that describe this distribution may be specified. Neither the mean nor the variance is specified. Only the variance is specified. Only the mean is specified. Both the mean and the variance are specified.
The sample mean for TEST STATISTICS for Hypothesis with Gaussian Distributions unknown, is Gaussian with mean 0 and variance 1/n. Proof (Outline) We let denote the mean of a random variable Y. Then clearly known Independence and Theorem 1 gives where
The sample mean for PARAMETER ESTIMATION for Hypothesis with Gaussian Distributions unknown, can be used to estimate the mean since the estimate error known is unbiased and converges in the statistical sense that as
The One Sample z-Test for MORE TEST STATISTICS for Hypothesis with Gaussian Distributions known is a Gaussian random variable with mean 0,variance 1. The One Sample t-Test for known, unknown is a t-distributed random variable with n-1 degrees of freedom.
z-TEST STATISTIC ALPHAS
CRITICAL REGION FOR alpha=0.05
HEIGHT HISTOGRAMS
HYPOTHESIS TEST FOR MEAN HEIGHT You suspect that the height of males in a country has increased due to diet or a Martian conspiracy, you aim to support your Alternative Hypothesis by testing the Null Hypothesis You compute a sample mean using 20 samples then compute If the Null Hypothesis is true the probability that is Question Should the Null Hypothesis be rejected ?
GENERAL HYPOTHESES TESTS and more complicated test statistics, such as the One Sample t-Test statistic, whose distribution is determined even though the distributions of the Gaussian random samples, used to compute it, is not. Type I Error: prob rejecting null hypothesis if its true, also called the significance level Type II Error: prob failing to reject null hypothesis if its false, also called the power of a test, requires an Alternative Hypothesis that determines the distribution of the test statistic. involve
Homework 5. Due Monday where and are the same as for the null hypothesis and 20 samples are used and the significance 1. Compute the power of a hypothesis test whose null hypothesis is that in vufoil #13, the alternative hypothesis asserts that heights are normally distributed Suggestion: if the alternative hypothesis is true, what What is the probability that is the distribution of test statistic with 2. Use a t-statistic table to describe how to test the null hypothesis that heights are normal with mean and unknown variance based on 20 samples.
EXTRA TOPIC: CONFIDENCE INTERVALS Given a sample meanfor largewe can assume, by the central limit theorem that it is Gaussian with We say that where p(x) is the probability density of a Gaussian meanmean of the original population and variance Furthermore, variance of the original population. sample variance and if the with confidence with meanand standard deviation Theorem If is a random variable unif. on [-L,L] then Bayes Theorem population is {0,1}-valued
EXTRA TOPIC: TWO SAMPLE TESTS A null hypothesis may assert a that two populations have the same means, a special case for {0,1}-valued populations asserts equalily of population proportions. Under these assumptions and if the variances of both populations are known, hypothesis testing uses the Two-Sample z-Test Statistic whereis the sample mean, variance, and sample size for one population, tilde’s for the other. For unkown variances and other cases consult:
EXTRA TOPIC: CHI-SQUARED TESTS are used to determine goodness-or-fit for various distributions. They employ test statistics of the form where observations & null hyp. expected value Answer: The expected values are 250, 750, 750, 2250 Example [1,p.216] A geneticist claims that four species of fruit flies should appear in the ratio 1:3:3:9. Suppose that the sample of 4000 flies contained 226, 764, 733, and 2277 flies of each species, respectively. For alpha =.1, is there sufficient evidence to reject the geneticist’s claim ? are independent and chi-squared distrib. with d-1 degrees of freedom. hence NO since 3 deg. freed. & alpha =.1
EXTRA TOPIC: POISSON APPROXIMATION The Binomial Distribution It has mean is the probability that k-events happen in n-trials if Ifthen The right side is the Poisson Distribution and variance and
REFERENCES 1.Martin Sternstein, Statistics, Barrows College Review Series, New York, Survey textbook covers probability distributions, hypotheses tests, populations,samples, chi-squared analysis, regression. 3. J.Neyman and E.S. Pearson, Joint Statistical Papers, Cambridge University Press, Source materials. 2. E. L. Lehmann, Testing Statistical Hypotheses, New York, Detailed development of the Neyman-Pearson theory of hypotheses testing. 4. Jan von Plato, Creating Modern Probability, Cambridge University Press, Charts the history and development of modern probability theory.