Distribution functions

Distribution functions
There is some core terminology that we need to know in order to use distribution functions Probability function (pf) / probability density function (pdf). Terms are used interchangeably although traditional convention is to use pf for discrete distributions and pdf for continuous. In the discrete case the pdf gives the probability that in a random experiment you will observe outcome x. The pdf for a continuous does not have the same interpretation Statistical Data Analysis - Lecture08 21/03/03

Cumulative distribution functions
The cumulative distribution function (cdf) gives us Pr(Xx) “the probability that our random variable X is less than or equal to x” For a discrete random variable this is the sum of the probabilities for all outcomes less than or equal to x For a continuous random variable we replace the sum by an integral Statistical Data Analysis - Lecture08 21/03/03

Inverse cumulative distribution functions
Given a probability p, the inverse cdf gives us the value x such that Pr(X x) = p These three functions are very important – they help us do many different things in statistics Some calculations can be done by hand or with tables In general we need to use the computer to answer our questions. To use tables or the computer, many distributions require extra parameters Statistical Data Analysis - Lecture08 21/03/03

So distribution functions we should know about
Discrete Binomial Hypergeometric Poisson Uniform Continuous Chi-square Exponential F Normal Student t Uniform Statistical Data Analysis - Lecture08 21/03/03

Statistical Data Analysis - Lecture08 21/03/03
Working with cts. pdfs To use the Normal distribution Don’t need anything extra as long as the statistic has been standardised To use Student’s t, Chi-Square Need degrees of freedom To use F Need numerator and denominator degrees of freedom Statistical Data Analysis - Lecture08 21/03/03

Some calculations in R Let X~N(5, 3). Find Pr(X > 8) Pr(X < -8) Pr(2 < X < 7) If Pr(X > x)= what is x? What value does the Normal pdf take at the point x = 5? Let T ~ t with 4 df Pr(T >0) Pr(|T| > 2) If Pr( T > t) = what is t? Let X2 ~ Chi-square with 5 df If X2 = 1.944, what is the P-value? Statistical Data Analysis - Lecture08 21/03/03

Some key facts about the distribution of test statistics
If we know the population mean, , and std. deviation, , then any statistic of the form will have a N(0,1) distribution If we don’t know  and substitute the sample standard deviation, then the resulting test statistic is distributed Student t Statistical Data Analysis - Lecture08 21/03/03

Some key facts about the distribution of test statistics
If we have a std. normal random variable (rv) and we square it, the resulting rv is chi-square with 1 df. If we sum n squared normal rv’s the result is a chi-squared rv with n degrees of freedom If we have two chi-square rv’s, X1, and X2, with df equal to n1, and n2 respectively, the ratio, F=X1/ X2 has an F distribution with n1 numerator degrees of freedom and n2 denominator degrees of freedom Statistical Data Analysis - Lecture08 21/03/03

Degrees of freedom Some of what is in the previous slides looks contradictory to what we’ve seen already or perhaps know For this course, let the df be dictated by the standard formulae (which I will tell you) for each problem If you want to know why the formulae are what they are take ! Statistical Data Analysis - Lecture08 21/03/03

Some common hypothesis tests
One sample t Tests H0:  = 0, where 0 is some hypothesised value Alternatives: One-sided: H1:  > 0 or H1:  < 0 Two-sided: H1:   0 3. Test statistic 4. P-value given by (with df=n-1) 5. P-value < 0.05 => significance at 5% level Statistical Data Analysis - Lecture08 21/03/03

Paired t-test Really just a one-sample t test in disguise Have two measurements for each of n subjects and examine differences. e.g. before diet and after diet weight. Null hypothesis: H0: diff = 0 (hypothesis of no difference) Alternative hypothesis is usually two tailed H1: diff  0 Test statistic Statistical Data Analysis - Lecture08 21/03/03

Two sample t-test Two independent samples of size nx, ny Null hypothesis: H0: x- y = 0 (usually 0 = 0, hypothesis of no difference) Alternative hypothesis: H1: x- y  0 (usually two tailed but can be one tailed as well) Test statistic (this assumes unequal variances) 5. P-value on nx+ny-2 df Statistical Data Analysis - Lecture08 21/03/03

A common theme? We’ve seen three different hypothesis tests, and all three have the same form for their test statistic, namely “The estimate minus the hypothesised value divided by the standard error of the estimate has a Student t distribution” Statistical Data Analysis - Lecture08 21/03/03

Chi-square/Goodness of Fit (GOF)
Usually for one or two (or n) way tables Null hypothesis Test statistic P-value (on k-1 df for oneway and (nr-1)(nc-1) for twoway) Statistical Data Analysis - Lecture08 21/03/03

F tests Most useful to us in ANOVA Hypotheses vary, but a general one might be If Note this will look slightly different than what we see in ANOVA. There are reasons. Statistical Data Analysis - Lecture08 21/03/03

Two sample t-test Two independent samples of size nx, ny Null hypothesis: H0: x-  y =  0 (usually  0 = 0, hypothesis of no difference) Alternative hypothesis: H1:  x-  y   0 (usually two tailed but can be one tailed as well) Test statistic (this assumes unequal variances) Statistical Data Analysis - Lecture08 21/03/03

Welch’s modification to the two-sample t-test continued
When we assume the variances are not equal, the degrees of freedom value is not nx + ny – 2. In fact they are given by You do not want to calculate this by hand! Statistical Data Analysis - Lecture08 21/03/03

Pooled two sample t-test
When we assume the variances are equal, then the pooled test is performed instead. This uses a pooled estimate of the standard error, namely Statistical Data Analysis - Lecture08 21/03/03

Differences between pooled test and Welch’s test
Most introductory text books recommend the routine use of Welch’s modification. Why? The assumption of equal variance is often hard to justify. Are there some costs associated with this? Yes, but we need to learn a couple of terms Statistical Data Analysis - Lecture08 21/03/03

Distribution functions

Similar presentations

Presentation on theme: "Distribution functions"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Distribution functions

Similar presentations

Presentation on theme: "Distribution functions"— Presentation transcript:

Similar presentations

About project

Feedback