Introduction to Statistics

Slides:



Advertisements
Similar presentations
Copyright © 2010 Pearson Education, Inc. Slide
Advertisements

Chapter 18 Sampling distribution models
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 18 Sampling Distribution Models.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 18 Sampling Distribution Models.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 18 Sampling Distribution Models.
Chapter 8: Estimating with Confidence
Copyright © 2010 Pearson Education, Inc. Slide
Copyright © 2010, 2007, 2004 Pearson Education, Inc.
Chapter 18 Sampling Distribution Models
Copyright © 2010 Pearson Education, Inc. Chapter 18 Sampling Distribution Models.
Warm-up 8.1 Estimating a proportion w/ confidence
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 19 Confidence Intervals for Proportions.
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 18, Slide 1 Chapter 18 Confidence Intervals for Proportions.
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Are you here? Slide Yes, and I’m ready to learn 2. Yes, and I need.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Chapter 19: Confidence Intervals for Proportions
Confidence Intervals for
Copyright © 2010 Pearson Education, Inc. Chapter 19 Confidence Intervals for Proportions.
Chapter 19: Confidence Intervals for Proportions
Copyright © 2012 Pearson Education. All rights reserved Copyright © 2012 Pearson Education. All rights reserved. Chapter 10 Sampling Distributions.
10.3 Estimating a Population Proportion
Chapter 9 Sampling Distributions and the Normal Model © 2010 Pearson Education 1.
Estimation: Sampling Distribution
Sample Surveys.  The first idea is to draw a sample. ◦ We’d like to know about an entire population of individuals, but examining all of them is usually.
Review from before Christmas Break. Sampling Distributions Properties of a sampling distribution of means:
8-3 Estimation Estimating p in a binomial distribution.
Statistical Sampling & Analysis of Sample Data
Chapter 18: Sampling Distribution Models
Chapter 12 Notes Surveys, Sampling, & Bias Examine a Part of the Whole: We’d like to know about an entire population of individuals, but examining all.
Objectives Chapter 12: Sample Surveys How can we make a generalization about a population without interviewing the entire population? How can we make a.
Chapter 18: Sampling Distribution Models AP Statistics Unit 5.
From the Data at Hand to the World at Large
Copyright © 2009 Pearson Education, Inc. Chapter 18 Sampling Distribution Models.
1 Chapter 18 Sampling Distribution Models. 2 Suppose we had a barrel of jelly beans … this barrel has 75% red jelly beans and 25% blue jelly beans.
Slide 12-1 Copyright © 2004 Pearson Education, Inc.
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 18 Sampling Distribution Models.
February 2012 Sampling Distribution Models. Drawing Normal Models For cars on I-10 between Kerrville and Junction, it is estimated that 80% are speeding.
Sampling Distribution Models Chapter 18. Toss a penny 20 times and record the number of heads. Calculate the proportion of heads & mark it on the dot.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Chapter 22: Comparing Two Proportions. Yet Another Standard Deviation (YASD) Standard deviation of the sampling distribution The variance of the sum or.
Copyright © 2010 Pearson Education, Inc. Chapter 19 Confidence Intervals for Proportions.
Part III – Gathering Data
Copyright © 2009 Pearson Education, Inc. Chapter 19 Confidence Intervals for Proportions.
Chapter 19 Confidence intervals for proportions
Sampling Distributions Chapter 18. Sampling Distributions A parameter is a measure of the population. This value is typically unknown. (µ, σ, and now.
1 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Example: In a recent poll, 70% of 1501 randomly selected adults said they believed.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 19 Confidence Intervals for Proportions.
Chapter 18 Sampling distribution models math2200.
Chapter 3 Surveys and Sampling © 2010 Pearson Education 1.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 19 Confidence Intervals for Proportions.
1 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Example: In a recent poll, 70% of 1501 randomly selected adults said they believed.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 18 Sampling Distribution Models.
Copyright © 2010 Pearson Education, Inc. Slide
Review from before Christmas Break. Sampling Distributions Properties of a sampling distribution of means:
Statistics 19 Confidence Intervals for Proportions.
CHAPTER 8 (4 TH EDITION) ESTIMATING WITH CONFIDENCE CORRESPONDS TO 10.1, 11.1 AND 12.1 IN YOUR BOOK.
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 8: Estimating with Confidence Section 8.1 Confidence Intervals: The.
Sampling Distributions Chapter 18. Sampling Distributions A parameter is a number that describes the population. In statistical practice, the value of.
Chapter 10 Confidence Intervals for Proportions © 2010 Pearson Education 1.
Solution: D. Solution: D Confidence Intervals for Proportions Chapter 18 Confidence Intervals for Proportions Copyright © 2010 Pearson Education, Inc.
Sample Surveys.
Confidence Intervals for Proportions
Chapter 19: Confidence intervals for proportions
Sampling Distribution Models
Confidence Intervals for Proportions
Confidence Intervals for Proportions
CHAPTER 12 Sample Surveys.
Sampling Distribution Models
Confidence Intervals for Proportions
Confidence Intervals for Proportions
Presentation transcript:

Introduction to Statistics From the Data at Hand to the World at Large Part I Siana Halim

TOPICS Population and Sample Sampling distribution models Confidence interval for proportions References: De Veaux, Velleman , Bock, Stats, Data and Models, Pearson Addison Wesley International Edition, 2005 John A Rice, Mathematical Statistics and Data Analysis, Duxbury Press, 1995

Sampling and Population We’d like to know about an entire population of individuals, but examining all of them is usually impractical, if not impossible. So we settle for examining a smaller group of individuals – a sample- selected from the population We should select individuals for the sample at random. Randomizing protects us from the influences of all the features of our population, even ones that we may not have thought about. The fraction of the population that you’ve sampled doesn’t matter. It’s the sample size itself that’s important.

Sampling and Population Does a census make sense ? It can be difficult to complete a census Populations rarely stand still Taking a census can be more complex than sampling.

Population and Parameters Models use mathematics to represent reality. Parameters are the key numbers in those models. A parameter used in a model for a population is called a population parameter. Any summary found from the data is a statistic. Name Statistic Parameter Mean  (mu) Standard deviation s  (sigma) Correlation r  (rho) Regression coefficient b  (beta) Proportion p

Simple Random Samples We need to be sure that the statistics we compute from the sample reflect the corresponding parameter accurately (representative). How would we select a representative sample ? A Simple Random Sample (SRS) Every possible sample of the size we plan to draw has an equal chance to be selected. Each combination of people has an equal chance of being selected as well. The sampling frame is a list of individuals from which the sample is drawn. Samples drawn at random generally differ one from another. Each draw of random numbers selects different people for our sample. These differences lead to different values for the variables we measure. We call these sample-to-sample difference sampling variability.

Stratified Sampling All statistical sampling designs have in common the idea that chance, rather than human choice, is used to select to sample. Designs that are used to sample from large populations – especially populations residing across large areas – are often more complicated than simple random samples. Sometimes the population is first sliced into homogeneous groups, called strata, before the sample is selected. Then simple random sampling is used within each stratum before the results are combined. This common sampling design is called stratified random sampling.

Cluster and Multistage Sampling Splitting the population into similar parts or clusters can make sampling more practical. Then we could simply select one or a few clusters at random and perform a census within each of them. Sampling schemes that combine several methods are called multistage samples. Sometimes we draw a sample by selecting individuals systematically. This is called a systematic sampling.

Sampling Distribution Models Why do sample proportions vary at all ? How can surveys conducted at essentially the same time by the same organization asking the same questions get different result ? This answer is the heart of statistics. It’s because each survey is based on different sample size. The proportion vary from sample to sample because the samples are composed of different people

Modeling the Distribution of Sample Proportion Most models are useful only when specific assumptions are true. In the case of the model for the distribution of sample proportions, there are two assumptions: The sampled values must be independent of each other. The sample size, n, must be large enough. The corresponding conditions to check before using the Normal to model the distribution of sample proportions are: 10% condition : If sampling has not been made with replacement, then the sample size, n, must be no larger than 10% of the population Success/failure condition : The sample size has to be big enough so that both np and nq are greater than 10

The Sampling Distribution Model of a Proportion Provided that the sampled values are independent and the sample size is large enough, the sampling distribution of p is modeled by a Normal model with mean and standard deviation Proporsi sample y is number of success n is the sample size

The Central Limit Theorem (CLT) As the sample size, n, increases, the mean of n independent values has a sampling distribution that tends toward a Normal model with mean equal to the population mean, µ, and standard deviation The CLT requires remarkably few assumptions, so there are few conditions to check: Random sampling condition. Independence assumption

Sampling Distribution Model for Mean If assumptions of independence and random sampling are met, and the sample size is large enough, the sampling distribution of the sample mean is modeled by a Normal model with a mean equal to the population mean, µ, and a standard deviation equal to parameter  in the population is estimated by Sample mean Sample standard deviation

Working with Sample Distribution Models Example 1. About 13% of the population is left-handed. A 200-seat school auditorium has been built with 15 “leftie seats,” seats that have the built-in desk on the left rather than the right arm of the chair. In a class of 90 students, what’s the probability that there will not be enough seats for the left-handed students? Step-by-step State what we want to know. Check the conditions. State the parameters and the sampling distribution model. Make a picture. Sketch the model and shade the area we’re interested in. Find the z-score or the cutoff proportion. Find the resulting probability from a table of Normal probabilities. Discuss the probability in the context of the question.

Working with Sample Distribution Models Example 2. Suppose that mean adult weight is 175 pounds with a standard deviation of 25 pounds. An elevator in our building has a weight limit of 10 persons or 2000 pounds. What’s the probability that the 10 people who get on the elevator overload its weight limit?

Standard Error When we estimate the standard deviation of a sampling distribution using statistics found from the data, the estimate is called a standard error.

Confidence Interval for Proportion We 95% confidence to state that the True Proportion of the population is in our interval. Proportion

Confidence Interval (Example) Sea fans, one spectacular kind of coral, in the Caribbean Sea have been under attack by the disease aspergillosis. In June of 2000, the sea fan disease team from Dr. Drew Harvell’s lab randomly sampled some sea fans at the Las Redes Reef in Akumal, Mexico, at a depth of 40 feet. They found that 54 of the 104 sea fans they sampled were infected with the disease. What might this say about the prevalence of this disease among sea fans in general?

Confidence Interval (Example) What can we say about the population proportion, p? Is the infected proportion of all sea fans 51.9%? We do know, though, that the sampling distribution model of is centered at p, and we know that the standard deviation of the sampling distribution is But we don’t know p, instead we’ll use and find the standard error,

Now we know the sampling model for should look like this: Because it’s Normal, it says that about 68% of all samples of 104 see fans will have ‘s within 1SE, 0.049, of p. And about 95% of all these samples will be within p2SEs. BUT Where is our sample proportion in this picture? We do know that for 95% if random samples, will be no more than 2 SEs away from p. So let’s look at this from ‘s point of view. If I’m , there’s a 95% chance that p is no more than 2 SEs away from me. If I reach out 2 SEs, or 2 x 0.049, away from me on both sides, I’m 95% sure that p will be within my grasp. Now, I’ve got him! Probably.

So what can we really say about p? Far better an approximate answer to the right question, … than an exact answer to the wrong question.” - John W. Tukey So what can we really say about p? “51.9% of all sea fans on the Las Redes Reef are infected.” → NO WAY! “It is probably true that 51.9% of all sea fans on the Las Redes Reef are infected” → NO “We don’t know exactly what proportion of sea fans on the Las Redes Reef are infected but we know that it’s within the interval 51.9% ± 2x4.9%. That is, it’s between 42.1% and 61.7%” → GETTING CLOSER! “We don’t know exactly what proportion of sea fans on the Las Redes Reef are infected, but the interval from 42.1% and 61.7% probably contains the true proportion.” → TRUE but a bit wishy-washy. “We are 95% confident that between 42.1% and 61.7% of Las Redes Reef sea fans are infected.” YES! Statement like these are called confidence intervals. They’re the best we can do. The interval is called a one-proportion z-interval.

Margin of Error Confidence Interval (CI) has the form The extent of the interval on either side of is called the margin of error (ME). In general, CI look like this: estimate ± ME The more confident we want to be, the larger the margin of error must be.

Critical Value 0.95 1.96 -1.96 The z* = 1.96 and z* = 1.645 is called as the critical value. The CI for the sample proportion and the sample mean can be formulated as follow 0.9 1.645 -1.645

Assumptions and Conditions Independence Assumption → check three conditions: Plausible independence condition. This condition depends on your knowledge of the situation. Randomization condition. Were the data sampled at random or generated from a properly randomized experiment? 10% condition. Sample Size Assumption → check success/failure condition. We must expect at least 10 “successes” and at least 10 “failures.”

One-proportion z-interval When the conditions are met, we are ready to find the confidence interval for the population proportion, p. The confidence interval is where the standard error of the proportion is estimated by

Example In May 2002, the Gallup Poll asked 537 randomly sampled adults the question “Generally speaking, do you believe the death penalty is applied fairly or unfairly in this country today?” Of these, 53% answered “Fairly” and 7% said they didn’t know, What can we conclude from this survey?

Student t distribution Note: t → Z if n increase Standard Normal (t with df = ) t (df = 13) t-Distribution has similar shape as the normal distribution but it has longer tails t (df = 5) t

T- Distribution .05 2 t /2 = .05 2.920 Upper Tail Area df .25 .10 1 Let: n = 3 df = n - 1 = 2  = .10 /2 =.05 df .25 .10 .05 1 1.000 3.078 6.314 2 0.817 1.886 2.920 /2 = .05 3 0.765 1.638 2.353 t 2.920 This the value of t, not the value of the probability.. Using t distribution then the CI for mean can be formulated as