Today’s Agenda Review Homework #1 [not posted]

Today’s Agenda Review Homework #1 [not posted]
Probability  Application to Normal Curve Inferential Statistics Sampling

There are 2 outcomes that are red
Probability Basics What is the probability of picking a red marble out of a bowl with 2 red and 8 green? THERE ARE 10 POSSIBLE OUTCOMES There are 2 outcomes that are red p(red) = 2 divided by 10 p(red) = .20

Frequencies and Probability
The probability of picking a color relates to the frequency of each color in the bowl 8 green marbles, 2 red marbles, 10 total p(Green) = p(Red) = .2

Frequencies & Probability
What is the probability of randomly selecting an individual who is extremely liberal from this sample? p(extremely liberal) = = .024 (or 2.4%) 1,319

PROBABILITY & THE NORMAL DISTRIBUTION
We can use the normal curve to estimate the probability of randomly selecting a case between 2 scores Probability distribution: Theoretical distribution of all events in a population of events, with the relative frequency of each event

PROBABILITY & THE NORMAL DISTRIBUTION
The probability of a particular outcome is the proportion of times that outcome would occur in a long run of repeated observations. 68% of cases fall within +/- 1 standard deviation of the mean in the normal curve The odds (probability) over the long run of obtaining an outcome within a standard deviation of the mean is 68% EXAMPLE: Say you’re hanging out in the Kirby cafeteria killing time. You’re sitting by one of the entrances and you decide to play “guess the height” of all the guys coming through the door. So say the rule is, when you hear someone walking toward the entrance, you guess his height and then ask him how tall he is when he appears. And if your guess is within 4 inches of his height, you win the game. Let’s say that the mean height for males is 70 inches, w/a standard deviation of 4 inches (5 foot 10, with about 68% of all adult males falling between 5-6” and 6-2”) Consider this probabilistically (Look at the LONG RUN) We know from the normal distribution that for every 100 males that come through the door, 68 should fall in this range of So, if you guess the mean (70 inches) every time, we know on any given observation our chance of being right will be point 68 or 68%.

Probability & the Normal Distribution
Suppose the mean score on a test is 80, with a standard deviation of 7. If we randomly sample one score from the population, what is the probability that it will be as high or higher than 89? Z for 89 = 89-80/7 = 9/7 or 1.29 Area in tail for z of 1.29 = P(X > 89) = or 9.85% ALL WE ARE DOING IS THINKING ABOUT “AREA UNDER CURVE” A BIT DIFFERENTLY (SAME MATH) The formula for proportions & probabilities is essentially the same. Just remember that one is for describing, and another is for ESTIMATING or PREDICTING. Again, we know that the area under the normal curve adds up to 1. QUESTIONS? IF THIS DOESN’T TOTALLY MAKE SENSE YET, THAT’S OKAY, B/C WE’LL BE TALKING ABOUT IT MORE IN THE NEXT FEW CLASSES. Probability is important for inferential statistics Generally we test whether the likelihood of a particular outcome – a score, a mean, a difference between means – could result from randomly sampling from an underlying probability distribution

Probability & the Normal Distribution
Bottom line: Normal distribution can also be thought of as probability distribution Probabilities always range from 0 – 1 0 = never happens 1 = always happens In between = happens some percent of the time This is where our interest lies The formula for proportions & probabilities is essentially the same. BOARD: Just remember that one is for describing, and another is for ESTIMATING or PREDICTING. Again, we know that the area under the normal curve adds up to 1. QUESTIONS? IF THIS DOESN’T TOTALLY MAKE SENSE YET, THAT’S OKAY, B/C WE’LL BE TALKING ABOUT IT MORE IN THE NEXT FEW CLASSES. NOW WE’RE GOING TO WATCH A VIDEO. IT STARS ROB REINER (MEATHEAD FROM ALL IN THE FAMILY). ACTUALLY, IT’S JUST A GUY WHO LOOKS LIKE HIM….

Inferential Statistics (intro)
Inferential statistics are used to generalize from a sample to a population We seek knowledge about a whole class of similar individuals, objects or events (called a POPULATION) We observe some of these (called a SAMPLE) We extend (generalize) our findings to the entire class

WHY SAMPLE? Why sample? It’s often not possible to collect info. on all individuals you wish to study Even if possible, it might not be feasible (e.g., because of time, $, size of group)

WHY USE PROBABILITY SAMPLING?
Representative sample One that, in the aggregate, closely approximates the population from which it is drawn

PROBABILITY SAMPLING Samples selected in accord with probability theory, typically involving some random selection mechanism If everyone in the population has an equal chance of being selected, it is likely that those who are selected will be representative of the whole group EPSEM – Equal Probability of SElection Method

PARAMETER & STATISTIC Population Parameter Statistic
the total membership of a defined class of people, objects, or events Parameter the summary description of a given variable in a population Statistic the summary description of a variable in a sample (used to estimate a population parameter)

INFERENTIAL STATISTICS
Samples are only estimates of the population Sample statistics will be slightly off from the true values of its population’s parameters Sampling error: The difference between a sample statistic and a population parameter

EXAMPLE OF HOW SAMPLE STATISTICS VARY FROM A POPULATION PARAMETER
x=0 x=3 x=1 x=5 x=8 x=5 x=3 x=8 x=7 x=4 x=6 X=4.0 X=5.5 μ = 4.5 (N=50) x=1 x=7 x=3 x=4 x=5 x=6 BOARD: EXAMPLE OF DIFFERENT AMOUNTS OF SAMPLING ERROR ASSOCIATED WITH EACH OF THESE STATISTICS. CHILDREN’S AGE IN YEARS X=4.3 x=2 x=8 x=4 x=5 x=9 x=4 x=5 x=9 x=3 x=0 x=6 x=5 X=5.3 X=4.7

By Contrast: Nonprobability Sampling
Nonprobability sampling may be more appropriate and practical than probability sampling: When it is not feasible to include many cases in the sample (e.g., because of cost) In the early stages of investigating a problem (i.e., when conducting an exploratory study) It is the only viable means of case selection: If the population itself contains few cases If an adequate sampling frame doesn’t exist Normal sampling distribution: CENTRAL LIMIT THEOREM: Regardless of the shape of a raw score distribution (sample or population) of an interval-ratio variable, the sampling distribution will be approximately normal, as long as sample size is ≥ 100 REVIEW: SOME OF THIS STUFF I’VE NEVER TALKED ABOUT, BUT PERHAPS YOU TALKED ABOUT IT IN 2155. Let’s start off with the subject of studies in which nonprobability sampling is used. Sometimes NONprobability sampling may be more appropriate and practical than probability sampling: •when it is not feasible to include many cases in the sample (e.g., because of cost). Take the example of cities as the unit of analysis, or police departments (POPN) • in the early stages of investigating a problem (i.e., when conducting an exploratory study). YOU’RE NOT SO WORRIED ABOUT GENERALIZATION AT THIS POINT... Sometimes, nonprobability sampling the only viable means of case selection: •if the population itself contains few cases. Maybe there’s an identifiable population, but there’s such a small number in the population that you can’t do a probability sample. Say you wanted to do a study of domestic terrorism, with individual terrorists as your unit of analysis. •if an adequate sampling frame cannot be obtained or constructed. SAMPLING FRAME – the set of all cases from which the sample is actually selected There aren’t registries for a lot of populations that social scientists are interested in studying: the homeless.

Nonprobability Sampling: 2 Types
CONVENIENCE SAMPLING When the researcher simply selects a requisite number of cases that are conveniently available SNOWBALL SAMPLING Researcher asks interviewed subjects to suggest additional people for interviewing

Probability vs. Nonprobability Sampling: Research Situations
For the following research situations, decide whether a probability or nonprobability sample would be more appropriate: You plan to conduct research delving into the motivations of serial killers. You want to estimate the level of support among adult Duluthians for an increase in city taxes to fund more snow plows. You want to learn the prevalence of alcoholism among the homeless in Duluth. 1. NONPROBABILITY 2. PROBABILITY 3. NONPROBABILITY

(Back to Probability Sampling…) The “Catch-22” of Inferential Stats:
When we collect a sample, we know nothing about the population’s distribution of scores We can calculate the mean (X) & standard deviation (s) of our sample, but  and  are unknown The shape of the population distribution (normal?) is also unknown Exceptions: IQ, height

PROBABILITY SAMPLING 2 Advantages of probability sampling:
Probability samples are typically more representative than other types of samples Allow us to apply probability theory This permits us to estimate the accuracy or representativeness of the sample

SAMPLING DISTRIBUTION
From repeated random sampling, a mathematical description of all possible sampling event outcomes (and the probability of each one) Permits us to make the link between sample and population… & answer the question: “What is the probability that sample statistic is due to chance?” Based on probability theory

EXAMPLE OF HOW SAMPLE STATISTICS VARY FROM A POPULATION PARAMETER
x=0 x=3 x=1 x=5 x=8 x=5 x=3 x=8 x=7 x=4 x=6 X=4.0 X=5.5 μ = 4.5 (N=50) x=1 x=7 x=3 x=4 x=5 x=6 CHILDREN’S AGE IN YEARS X=4.3 x=2 x=8 x=4 x=5 x=9 x=4 x=5 x=9 x=3 x=0 x=6 x=5 X=5.3 X=4.7

What would happen… (Probability Theory)
If we kept repeating the samples from the previous slide millions of times? What would be our most common sample mean? The population mean What would the distribution shape be? Normal This is the idea of a sampling distribution Sampling distribution of means

Relationship between Sample, Sampling Distribution & Population
(Distribution of sample outcomes) SAMPLE Empirical (exists in reality) but unknown Nonempirical (theoretical or hypothetical) Laws of probability allow us to describe its characteristics (shape, central tendency, dispersion) Use of probability sampling techniques allows us to apply the SAMPLING distribution (theoretical) to move between the SAMPLE distribution (empirical & known) and the POPULATION distribution (empirical but unknown). You’re only doing one sample, but the sampling distribution is the sample mean of every possible sample you could draw. BOARD: SAMPLING DISTRIBUTION: THE MEAN VALUE OF THE MEAN OF REPEATED RANDOM SAMPLES…. Empirical & known (e.g., distribution shape, mean, standard deviation)

THE TERMINOLOGY OF INFERENTIAL STATS
Population the universe of students at the local college Sample 200 students (a subset of the student body) Parameter 25% of students (p=.25) reported being Catholic; unknown, but inferred from sample statistic Statistic Empirical & known: proportion of sample that is Catholic is 50/200 = p=.25 Random Sampling (a.k.a. “Probability”) Ensures EPSEM & allows for use of sampling distribution to estimate pop. parameter (infer from sample to pop.) Representative EPSEM gives best chance that the sample statistic will accurately estimate the pop. parameter PASS OUT HANDOUT. Here are the answers.

Today’s Agenda Review Homework #1 [not posted]

Similar presentations

Presentation on theme: "Today’s Agenda Review Homework #1 [not posted]"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Today’s Agenda Review Homework #1 [not posted]

Similar presentations

Presentation on theme: "Today’s Agenda Review Homework #1 [not posted]"— Presentation transcript:

Similar presentations

About project

Feedback