Download presentation
1
Parameter, Statistic and Random Samples
A parameter is a number that describes the population. It is a fixed number, but in practice we do not know its value. A statistic is a function of the sample data, i.e., it is a quantity whose value can be calculated from the sample data. It is a random variable with a distribution function. Statistics are used to make inference about unknown population parameters. The random variables X1, X2,…, Xn are said to form a (simple) random sample of size n if the Xi’s are independent random variables and each Xi has the sample probability distribution. We say that the Xi’s are iid. STA286 week 8
2
Example – Sample Mean and Variance
Suppose X1, X2,…, Xn is a random sample of size n from a population with mean μ and variance σ2. The sample mean is defined as The sample variance is defined as The sample standard deviation, S, is the square root of the sample variance. STA286 week 8
3
Quantiles A quantile of a sample, xp, is the value for which a specific fraction, p, of the data values is less than or equal to it, and (1-p) is greater than it. The most known quantile is the median which is the 50th quantile. Quantiles are often described as percentiles and represents an estimate of a characteristic of the theoretical distribution. If a data set contains n observations, then the pth percentile is the value in the ordered data set. We can describe the spread or variability of a distribution by giving several percentiles. STA286 week 8
4
Quartiles The 25th percentile is called the first quartile (Q1).
The 75th percentile is called the third quartile (Q3). Note, the median is the second quartile Q2 . The distance between the first and third quartiles is called the Interquartile range (IQR) i.e. IQR =Q3 – Q1 . The IQR is another measure of spread that is less sensitive to the influence of extreme values. STA286 week 8
5
The five-number summary
The five-number summary of a set of observations consists of the smallest observation, the first quartile, the median, the third quartile and the largest observation. These five numbers give a reasonably complete description of both the center and the spread of the distribution. MINITAB commands: Stat > Basic Statistics > Display Descriptive Statistics STA286 week 8
6
Example The highway mileages of 20 cars, arranged in increasing order are: Give the five number summary. Answer We have, min = 13, Q1 = 18, median = 23, Q3 = 27 , max = 32. The MINITAB output using the above commands is as follows: Variable N Minimum Q1 Median Q3 Maximum mileage STA286 week 8
7
Box-plot A box-plot is a graph of the five-number summary. Example:
Make a box-plot for the data in the above example. MINITAB commands: Graph > Boxplot STA286 week 8
8
Quantile Plots A quantile plot is a plot of the data values on the vertical axis against an empirical assessment of the fraction of observations exceeded by the data value…. A very useful quantile plot is the Normal-Quantile-Quantile plot. It is often used by analysts to determine whether a data set came from a normal distribution. A Normal Quantile Quantile plot is a plot of the empirical (data) quantiles against the corresponding quantiles of the normal distribution… STA286 week 8
9
Interpreting Normal Quantile Plots
If the data comes form any normal distribution, the NQQ plot produces a straight line on the plot. If the points on a normal quantile plot lie close to a straight line, the plot indicates that the data are normal. Systematic deviations from a straight line indicate a nonnormal distribution. Outliers appear as points that are far away from the overall pattern of the plot. STA286 week 8
10
Histogram, the nscores plot and the normal quantile plot for data generated from a normal distribution (N(500, 20)). STA286 week 8
11
Histogram, the nscores plots and the normal quantile plot for data generated from a right skewed distribution STA286 week 8
12
STA286 week 8
13
Histogram, the nscores plots and the normal quantile plot for data generated from a left skewed distribution STA286 week 8
14
STA286 week 8
15
Histogram, the nscores plots and the normal quantile plot for data generated from a uniform distribution (0,5) STA286 week 8
16
STA286 week 8
17
Sampling Distribution of a Statistic
The sampling distribution of a statistic is the distribution of values taken by the statistic in all possible samples of the same size from the same population. The distribution function of a statistic is NOT the same as the distribution of the original population that generated the original sample. The form of the theoretical sampling distribution of a statistic will depend upon the distribution of the observable random variables in the sample. STA286 week 8
18
Sampling from Normal population
Often we assume the random sample X1, X2,…Xn is from a normal population with unknown mean μ and variance σ2. Suppose we are interested in estimating μ and testing whether it is equal to a certain value. For this we need to know the probability distribution of the estimator of μ. STA286 week 8
19
Sampling Distribution of Sample Mean
Suppose X1, X2,…Xn are i.i.d normal random variables with unknown mean μ and variance σ2 then Proof: STA286 week 8
20
The Central Limit Theorem
Let X1, X2,…be a sequence of i.i.d random variables with mean E(Xi) = μ < ∞ and Var(Xi) = σ2 < ∞. Let Then, converges in distribution to Z ~ N(0,1). Also, converges in distribution to Z ~ N(0,1). Example… STA286 week 8
21
Example Suppose that the weights of airline passengers are known to have a distribution with a mean of 75kg and a std. dev. of 10kg. A certain plane has a passenger weight capacity of 7700kg. What is the probability that a flight of 100 passengers will exceed the capacity? week 8
22
Question State whether the following statements are true or false.
(i) As the sample size increases, the mean of the sampling distribution of the sample mean decreases. (ii) As the sample size increases, the standard deviation of the sampling distribution of the sample mean decreases. (iii) The mean of a random sample of size 4 from a negatively skewed distribution is approximately normally distributed. (iv) The distribution of the proportion of successes in a sufficiently large sample is approximately normal with mean p and standard deviation where p is the population proportion and n is the sample size. (v) If is the mean of a simple random sample of size 9 from N(500, 18) distribution, then has a normal distribution with mean 500 and variance 36. week 8
23
Question State whether the following statements are true or false.
A large sample from a skewed population will have an approximately normal shaped histogram. The mean of a population will be normally distributed if the population is quite large. The average blood cholesterol level recorded in a SRS of students from a large population will be approximately normally distributed. The proportion of people with incomes over $ , in a SRS of 10 people, selected from all Canadian income tax filers will be approximately normal. week 8
24
Exercise A parking lot is patrolled twice a day (morning and afternoon). In the morning, the chance that any particular spot has an illegally parked car is If the spot contained a car that was ticketed in the morning, the probability the spot is also ticketed in the afternoon is 0.1. If the spot was not ticketed in the morning, there is a chance the spot is ticketed in the afternoon. a) Suppose tickets cost $10. What is the expected value of the tickets for a single spot in the parking lot. b) Suppose the lot contains 400 spots. What is the distribution of the value of the tickets for a day? c) What is the probability that more than $200 worth of tickets are written in a day? week 8
25
Law of Large Numbers - Example
Toss a coin n times. Suppose Xi’s are Bernoulli random variables with p = ½ and E(Xi) = ½. The proportion of heads is Intuitively approaches ½ as n ∞ . STA286 week 8
26
Law of Large Numbers Interested in sequence of random variables X1, X2, X3,… such that the random variables are independent and identically distributed (i.i.d). Let Suppose E(Xi) = μ , V(Xi) = σ2, then and Intuitively, as n ∞, so STA286 week 8
27
Formally, the Weak Law of Large Numbers (WLLN) states the following:
Suppose X1, X2, X3,…are i.i.d with E(Xi) = μ < ∞ , V(Xi) = σ2 < ∞, then for any positive number a as n ∞ . This is called Convergence in Probability. STA286 week 8
28
Recall - The Chi Square distribution
If Z ~ N(0,1) then, X = Z2 has a Chi-Square distribution with parameter 1, i.e., Can proof this using change of variable theorem for univariate random variables. The moment generating function of X is If , all independent then Proof… STA286 week 8
29
Claim Suppose X1, X2,…Xn are i.i.d normal random variables with mean μ
and variance σ2. Then, are independent standard normal variables, where i = 1, 2, …, n and Proof: … STA286 week 8
30
Sampling Distribution of S2
Suppose X1, X2,…Xn are i.i.d normal random variables with mean μ and variance σ2. Then, Further, it can be shown that and s2 are independent. STA286 week 8
31
t distribution Suppose Z ~ N(0,1) independent of X ~ χ2(n). Then,
Proof: using one dimensional change of variables theorem. The density function of the t-distribution is given by… STA286 week 8
32
Claim Suppose X1, X2,…Xn are i.i.d normal random variables with mean μ and variance σ2. Then, Proof: STA286 week 8
33
F distribution Suppose X ~ χ2(n) independent of Y ~ χ2(m). Then,
The density function of the F distribution is given by… STA286 week 8
34
Properties of the F distribution
The F-distribution is a right skewed distribution. i.e. Can use Table A.6 in appendix to find percentile of the F- distribution. Example… STA286 week 8
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.