Introduction to Inference
Did you ever cook a big pot of soup? Growing up, my mother used to make a rather large pot of soup or chili (eight kids). How did she know the soup was going to be good? She used to take a sample to infer about what the whole pot would taste like. In statistics we might like to know about a population. Just like my mom only took a sample to infer about the population, we may only have a sample of data and from the sample we will make an inference about the population. Recall from a previous section that if the distribution of the variable in the population is, say, normal then we can make probability statements about certain events happening (like what is the probability that a brand of tire will go 42,000 miles or more before it fails.) But we also need to know about the population mean and population standard deviation. With only a sample, we will use techniques to infer what the population values might be.
Thought experiment: When you and I think about a coin we usually say the coin has a 50% chance of coming up heads when we flip it, right? But, if you flip a coin 10 times will it come up heads 5 times? Maybe, but probably not. When we say the coin has a 50% chance of coming up heads what we really mean is if we flip it a really large number if times (like say a million times) that it will then have a relative frequency of about Summary here: on the coin if we flip it a lot it will come up heads 50% of the time. But, we hardly ever do this. For a while we will think about some ideas that we hardly ever do in practice, but the ideas underlie all the actual inference we conduct.
Another though experiment: In the family I was born into there are a total of ten people – mom and dad and eight kids. This family could be considered a population (we usually have bigger populations we are interested in, but we use this as an example). Here is a data set with age and dominant hand of each person in the family. Personagedominant hand Mom76right(this is a made up Dad77rightfamily – the ages Greg53right have changed over Kevin52right time) Bill 50right Bob49right Steve47right Chuck46right Patty45left George43right
The population mean age in the family is 53.8 and the population proportion of right handed folks is 0.9. We could also find the population standard deviation of the age, but I didn’t here. Let’s focus on samples of size three. Say on a random sample we pick Greg, Bill, and Patty. The sample mean age here would be and the sample proportion of right handed people would be You will notice that each of these sample point estimates of the population parameters are wrong in terms of actually matching the population parameters. But, let’s take a different sample of three, say Mom, Kevin and Steve. We would have a sample mean age of and sample proportion of right handers of 1.0. Again we have no match in terms of actually meeting the population parameters.
Before I talked about flipping a coin a lot. By analogy (and the analogy is not perfect), if we take a lot of samples of size 3 we would get many different sample means and sample proportions. If we looked at the distribution of sample means (and sample proportions separately) we would begin to see a pattern in the sample means. The pattern is a distribution called the sampling distribution of sample means and the sampling distribution comes from a repeated sampling context. In practice we only take 1 sample, but there are theoretical distributions of the statistics that a great deal is known about (and we will look at). This is similar to knowing a coin will come up heads 50% of the time.
Samples have statistics we might use to learn about population parameters. In a repeated sampling context, the statistics have patterns that we call sampling distributions. Next we turn to thinking about the population mean of a variable when we take a sample and calculate the sample mean.