Download presentation
Presentation is loading. Please wait.
Published byGeorgiana Bond Modified over 9 years ago
1
1 1 Slide Statistical Inference n We have used probability to model the uncertainty observed in real life situations. n We can also the tools of probability in making inferences about these situations n In addition we can also assess the reliability of these inferences. Population and Sample Population and Sample Part of our uncertainty is often caused because we cannot access all of the information in which we are interested. Part of our uncertainty is often caused because we cannot access all of the information in which we are interested. n The population is the set of all elements of interest in a particular study. n A sample is a selection of some of the members of the population.
2
2 2 Slide Statistical Inference n We will try to model the variation in some quantity measured on the members of a population by using an appropriate probability distribution. n Remember, we are often interested in summaries of such variation – we use measures such as means and standard deviations. When these are applied to populations we call them parameters and when they are applied to samples we call them statistics. n A statistic is a numerical characteristic of a sample. n A parameter is a numerical characteristic of a population.
3
3 3 Slide Example 1. n Suppose that we want to know the proportion p, of students banking at Bath University Barclays, that is in favour of extended opening hours. n We don’t have resources to contact everyone in the population, so we select a small sample and ask each member of it if they are in favour or not. n If this sample is representative, it seems reasonable to use the proportion of the sample in favour as an indication of the value of p. Questions n What makes a sample representative ? n Is this a good procedure – what about reliability of guess? N.B. finite population.
4
4 4 Slide Example 2. n A piece of new software is claimed to handle electronic transactions more efficiently than existing software. n A broker tries out the new software for a week and is anxious to see if the the average time to complete the deals improves on the value using the current software which is 2.75 hours. n Population: all the future deals that the software might handle. n Sample: deals processed during trial week. n Parameter: average time to process all future deals. n Statistics: average time to process deals during trial week. n Probability model is …?
5
5 5 Slide Choosing and using a Probability Model. n Typically we use a probability distribution to model the variation in the characteristic of interest both within the population and the sample which is drawn from it. n Example: Electronic transaction times. After examining the sample data via histograms etc., we might decide that a Normal distribution adequately describes the pattern of variation in deal completion times. After examining the sample data via histograms etc., we might decide that a Normal distribution adequately describes the pattern of variation in deal completion times. Specifically, if X is a typical completion time, X~N(μ,σ²) – assumed or adopted model. Specifically, if X is a typical completion time, X~N(μ,σ²) – assumed or adopted model.
6
6 6 Slide Statistical Inference. n But this “model” is incomplete! n We do not know the value of μ ( or the value of σ). n Can we infer something about these population parameters from say, the equivalent sample statistics? Typical tasks of Statistical Inference. n POINT ESTIMATION – guess a single numerical value for a parameter. n INTERVAL ESTIMATION – guess a set of ‘likely’ values for a parameter. n HYPOTHESIS TESTING – use data to decide whether or not some assertion about the unknown value of a parameter is true.
7
7 7 Slide Point Estimation n Consider the electronic transaction times example. n MODEL: Typical transaction processing time, X. n X~N(μ,σ²) n Over the trial week we will obtain a sample of say, n observations, n Let us assume that these observations are independent. n At the end of the week we will have the actual values, n Interest is in the average transaction time in the population of future deals – this corresponds to the parameter μ in our model n We use the data to guess the value of the parameter.
8
8 8 Slide Point Estimation n Definition: A statistic used to estimate a parameter value is called a (point) estimator. n Think of the estimator as a ‘recipe’ that tells you what to do with your observations in order to obtain the actual numerical guess which is known as the estimate. n In ideal circumstances, i.e. when the assumptions about the model are correct, the best estimator for the mean of a normal distribution, μ, is the sample mean n If actual data values are, the corresponding estimate is
9
9 9 Slide Sampling Variation and Sampling Distributions n An estimator is composed of random variables and so it too, is a random variable. n When we carry out the process of estimation as described, we do not really notice this random behaviour since we obtain just a single value of the estimate. n If we repeated the whole estimation process, drawing a different sample, we would most likely obtain a different value of the estimate. Different possible samples lead to different possible values of the estimate. This is sampling variation. n The nature of the variation in possible values of the estimates is described by the probability distribution of the corresponding estimator. n This probability distribution is commonly known as the sampling distribution of the estimator.
10
10 Slide Sampling Distributions: Example n If ~N(μ,σ²), independently, then N. B. distribution of is n Still normal, n Has the same mean as individual X’s, n Has a variance reduced by a factor of n. n So is likely to be closer to μ than the individual X’s.
11
11 Slide Indicating Precision of Estimation n Having obtained the data values, we can quote the value of the estimate. n But we should also indicate the likely extent of the remaining uncertainty, i.e. the reliability or variability of the estimate. n One way to do this is to quote the standard deviation of the estimate which is known as its standard error. n Example: ~N(μ,σ²) then using the sample mean to estimate μ, SE( ) =σ/ n n PROBLEM: In most cases we won’t know value of σ!
12
12 Slide An Estimator for σ² n The usual estimator is the sample variance, s² n The corresponding estimator for σ is the sample standard deviation, s= s² n We can now quote the estimated standard error of the sample mean, ESE( )= s/ n.
13
13 Slide Numerical Example Electronic transaction times. n Suppose n=100, =2.62 and s²= 0.81, n ESE( )= s/ n= (0.81/100)= 0.9/10 = 0.09. n We would report that the estimate of the mean transaction time is 2.62 hours with an estimated standard error of 0.09 hours.
14
14 Slide Sampling Distribution of n Process of Statistical Inference Population Population with mean = ? Population Population with mean = ? A sample of n elements is selected from the population. The sample data provide a value for the sample mean. The sample data provide a value for the sample mean. The value of is used to make inferences about the value of . The value of is used to make inferences about the value of .
15
15 Slide Another example of point estimation n The ideas of estimator and estimate, sampling distribution and standard error are quite general. n We describe another - very common – situation which illustrates this.
16
16 Slide Opinion Polls n Suppose that we are interested in estimating the proportion of the voting population which is in favour of adopting the Euro. n We take a sample of n independent individuals from the voting population and ask each member of this sample if they are pro- or anti-euro. n Suppose r are in favour. n The parameter of interest is the proportion of the population that is pro-euro. Call this Π.
17
17 Slide Opinion Polls n How do we model this situation? n Let = 1 if i’th person is pro-euro and = 0 if i’th person is anti-euro. = 0 if i’th person is anti-euro. n P( =1) = П and P( =0) = 1- П. n So R= ~Binomial(n, П) n Estimator for П is proportion of the sample that is pro- euro, i.e. S=R/n. n Applying this to the data we obtain the estimate of П, r/n.
18
18 Slide Sampling Distribution of S=R/n n P(S=(r/n)) = for r=0,1,…,n. n E(S)=E(R)/n = n Π/n = Π n Var(S) = Var(R/n)= var(R)/ n² = n Π (1- Π)/n² = Π (1- Π)/n = Π (1- Π)/n n and SE(S)= (Π (1- Π)/n) n ESE(S) = ( (1/n) (r/n) (1-(r/n))
19
19 Slide Opinion Polls n Numerical example. n Suppose n = 400 and r = 240. n Then s=240/400 = 0.6 n ESE(S)= (0.6x0.4/400) = 0.0245. n So the estimate of the proportion in favour of adopting the Euro is 0.6 with an estimated standard error of 0.0245.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.