Copyright (c) Bani K. Mallick1 STAT 651 Lecture #15.

Slides:



Advertisements
Similar presentations
Chapter 6 Sampling and Sampling Distributions
Advertisements

Chapter 8: Binomial and Geometric Distributions
CHAPTER 13: Binomial Distributions
Copyright © 2009 Cengage Learning 9.1 Chapter 9 Sampling Distributions.
Chapter 18 Sampling Distribution Models
McGraw-Hill Ryerson Copyright © 2011 McGraw-Hill Ryerson Limited. Adapted by Peter Au, George Brown College.
Copyright (c) Bani Mallick1 Stat 651 Lecture 5. Copyright (c) Bani Mallick2 Topics in Lecture #5 Confidence intervals for a population mean  when the.
Copyright (c) Bani K. Mallick1 STAT 651 Lecture #17.
1 BA 555 Practical Business Analysis Housekeeping Review of Statistics Exploring Data Sampling Distribution of a Statistic Confidence Interval Estimation.
1 Midterm Review Econ 240A. 2 The Big Picture The Classical Statistical Trail Descriptive Statistics Inferential Statistics Probability Discrete Random.
Copyright (c) Bani Mallick1 Lecture 2 Stat 651. Copyright (c) Bani Mallick2 Topics in Lecture #2 Population and sample parameters More on populations.
Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9.
Copyright (c) Bani Mallick1 STAT 651 Lecture 7. Copyright (c) Bani Mallick2 Topics in Lecture #7 Sample size for fixed power Never, ever, accept a null.
Chapter 7 Sampling and Sampling Distributions
Point and Confidence Interval Estimation of a Population Proportion, p
Copyright (c) Bani K. Mallick1 STAT 651 Lecture #16.
Copyright (c) Bani Mallick1 Lecture 4 Stat 651. Copyright (c) Bani Mallick2 Topics in Lecture #4 Probability The bell-shaped (normal) curve Normal probability.
C82MCP Diploma Statistics School of Psychology University of Nottingham 1 Overview Parameters and Statistics Probabilities The Binomial Probability Test.
Copyright (c) Bani K. Mallick1 STAT 651 Lecture #13.
© 2013 Pearson Education, Inc. Active Learning Lecture Slides For use with Classroom Response Systems Introductory Statistics: Exploring the World through.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Lecture Slides Elementary Statistics Eleventh Edition and the Triola Statistics Series by.
Binomial & Geometric Random Variables
A primer in Biostatistics
Copyright (c) Bani K. Mallick1 STAT 651 Lecture # 12.
Copyright (c) Bani K. mallick1 STAT 651 Lecture #14.
Binomial Probability Distribution.
Chapter 7 Confidence Intervals and Sample Sizes
CHAPTER 6 Random Variables
Chapter 13: Inference in Regression
Chapter 5 Sampling Distributions
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Review and Preview This chapter combines the methods of descriptive statistics presented in.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
© 2003 Prentice-Hall, Inc.Chap 6-1 Business Statistics: A First Course (3 rd Edition) Chapter 6 Sampling Distributions and Confidence Interval Estimation.
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
Sampling Distribution ● Tells what values a sample statistic (such as sample proportion) takes and how often it takes those values in repeated sampling.
Chapter 4 Probability Distributions
Theory of Probability Statistics for Business and Economics.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 11 Section 1 – Slide 1 of 34 Chapter 11 Section 1 Random Variables.
Stat 13, Tue 5/8/ Collect HW Central limit theorem. 3. CLT for 0-1 events. 4. Examples. 5.  versus  /√n. 6. Assumptions. Read ch. 5 and 6.
Estimating a Population Proportion
Confidence Interval Estimation for a Population Proportion Lecture 31 Section 9.4 Wed, Nov 17, 2004.
Statistical Hypotheses & Hypothesis Testing. Statistical Hypotheses There are two types of statistical hypotheses. Null Hypothesis The null hypothesis,
Determination of Sample Size: A Review of Statistical Theory
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Section 7-1 Review and Preview.
Copyright © 2010 Pearson Education, Inc. Slide Beware: Lots of hidden slides!
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 22 Comparing Two Proportions.
Agresti/Franklin Statistics, 1 of 87  Section 7.2 How Can We Construct a Confidence Interval to Estimate a Population Proportion?
Statistics and Quantitative Analysis U4320 Segment 5: Sampling and inference Prof. Sharyn O’Halloran.
Section Estimating a Proportion with Confidence Objectives: 1.To find a confidence interval graphically 2.Understand a confidence interval as consisting.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 6 Random Variables 6.3 Binomial and Geometric.
Copyright © 2009 Cengage Learning 9.1 Chapter 9 Sampling Distributions ( 표본분포 )‏
Probability Theory Modelling random phenomena. Permutations the number of ways that you can order n objects is: n! = n(n-1)(n-2)(n-3)…(3)(2)(1) Definition:
Copyright (c) Bani Mallick1 STAT 651 Lecture 8. Copyright (c) Bani Mallick2 Topics in Lecture #8 Sign test for paired comparisons Wilcoxon signed rank.
1 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Example: In a recent poll, 70% of 1501 randomly selected adults said they believed.
SECTION 7.2 Estimating a Population Proportion. Where Have We Been?  In Chapters 2 and 3 we used “descriptive statistics”.  We summarized data using.
Introduction Sample surveys involve chance error. Here we will study how to find the likely size of the chance error in a percentage, for simple random.
6-1 Copyright © 2014, 2011, and 2008 Pearson Education, Inc.
Confidence Interval Estimation for a Population Proportion Lecture 33 Section 9.4 Mon, Nov 7, 2005.
Test of Goodness of Fit Lecture 41 Section 14.1 – 14.3 Wed, Nov 14, 2007.
Binomial and Geometric Random Variables
Inference for Proportions
Chapter 5 Sampling Distributions
Sampling Distribution Models
Estimating a Population Proportion
The Binomial and Geometric Distributions
Random Variables Binomial Distributions
Lecture 41 Section 14.1 – 14.3 Wed, Nov 14, 2007
Keller: Stats for Mgmt & Econ, 7th Ed Sampling Distributions
Lecture Slides Elementary Statistics Twelfth Edition
Presentation transcript:

Copyright (c) Bani K. Mallick1 STAT 651 Lecture #15

Copyright (c) Bani K. Mallick2 Topics in Lecture #15 Some basic probability The binomial distribution Inference about a single population proportions

Copyright (c) Bani K. Mallick3 Book Sections Covered in Lecture #15 Chapters Chapter 10.2

Copyright (c) Bani K. Mallick4 Lecture 14 Review: Nonparametric Methods Replace each observation by its rank in the pooled data Do the usual ANOVA F-test Kruskal-Wallis

Copyright (c) Bani K. Mallick5 Lecture 14 Review: Nonparametric Methods Once you have decided that the populations are different in their means, there is no version of a LSD You simply have to do each comparison in turn This is a bit of a pain in SPSS, because you physically must do each 2-population comparison, defining the groups as you go

Copyright (c) Bani K. Mallick6 Categorical Data Not all experiments are based on numerical outcomes We will deal with categorical outcomes, i.e., outcomes that for each individual is a category The simplest categorical variable is binary: Success or failure Male of female

Copyright (c) Bani K. Mallick7 Categorical Data For example, consider flipping a fair coin, and let X = 0 means “tails” X = 1 means “heads”

Copyright (c) Bani K. Mallick8 Categorical Data The fraction of the population who are “successes” will be denoted by the Greek symbol  Note that because it is a Greek symbol, it represents something to do with a population For coin flipping, if you flipped all the fair coins in the world (the population), the fraction of the times they turn up heads equals 

Copyright (c) Bani K. Mallick9 Categorical Data The fraction of the population who are “successes” will be denoted by the Greek symbol  The fraction of the sample of size n who are “successes” is going to be denoted by We want to relate to Let X = number of successes in the sample. The fraction = (# successes)/n = X / n

Copyright (c) Bani K. Mallick10 Categorical Data Suppose you flip a coin 10 times, and get 6 heads. The proportion of heads = 0.60 The percentage of heads = 60%

Copyright (c) Bani K. Mallick11 Categorical Data The number of success X in n experiments each with probability of success  is called a binomial random variable There is a formula for this: Pr(X = k) = 0! = 1, 1! = 1, 2! = 2 x 1 = 2, 3! = 3 x 2 x 1 = 6, 4! = 4 x 3 x 2 x 1 = 24, etc.

Copyright (c) Bani K. Mallick12 Categorical Data 0! = 1, 1! = 1, 2! = 2 x 1 = 2, 3! = 3 x 2 x 1 = 6, 4! = 4 x 3 x 2 x 1 = 24, etc. The idea is to relate the sample fraction to the population fraction using this formula Key Point: if we knew , then we could entirely characterize the fraction of experiments that have k successes

Copyright (c) Bani K. Mallick13 Categorical Data The probability that the coin lands on heads will be denoted by the Greek symbol  Suppose you flip a coin 2 times, and count the number of heads. So here, X = number of heads that arise when you flip a coin 2 times X takes on the values 0, 1 and 2 takes on the values 0/2, ½, 2/2

Copyright (c) Bani K. Mallick14 Categorical Data: What the binomial formula does The experiment results in 4 equally likely outcomes: each occurs ¼ of the time Tails on toss #1 Heads on toss #1 Tails of toss #2 ¼¼ Heads on Toss #2 ¼¼

Copyright (c) Bani K. Mallick15 Categorical Data Heads = “success”: Tails on toss #1 Heads on toss #1 Tails on toss #2 ¼¼ Heads on Toss #2 ¼¼ The binomial formula can be used to give these results without thinking

Copyright (c) Bani K. Mallick16 Categorical Data 0! = 1, 1! = 1, 2! = 2 x 1 = 2, 3! = 3 x 2 x 1 = 6, 4! = 4 x 3 x 2 x 1 = 24, etc. n=2, k=1, k! = 1, n! = 2, (n-k)! = 1 The binomial formula gives the answer ½, which we know to be correct

Copyright (c) Bani K. Mallick17 Categorical Data Roll a fair dice First Dice Every combination is equally likely, so what are the probabilities?

Copyright (c) Bani K. Mallick18 Categorical Data Roll a fair dice /6 First Dice Every combination is equally likely, so what are the probabilities?

Copyright (c) Bani K. Mallick19 Categorical Data Roll a fair dice /6 First Dice Every combination is equally likely, so what are the probabilities? What is the chance of rolling a 1 or a 2?

Copyright (c) Bani K. Mallick20 Categorical Data Roll a fair dice /6 First Dice Every combination is equally likely, so what are the probabilities? What is the chance of rolling a 1 or 2? 2/6 = 1/3

Copyright (c) Bani K. Mallick21 Categorical Data Now roll two fair dice Second Dice First Dice Every combination is equally likely, so what are the probabilities?

Copyright (c) Bani K. Mallick22 Categorical Data Roll two fair dice / Second Dice First Dice Every combination is equally likely, so what are the probabilities?

Copyright (c) Bani K. Mallick23 Categorical Data Roll two fair dice / Second Dice First Dice Define a success as rolling a 1 or a 2. What is the chance of two successes?

Copyright (c) Bani K. Mallick24 Categorical Data Roll two fair dice / Second Dice First Dice Define a success as rolling a 1 or a 2. What is the chance of two successes? 4/36 = 1/9

Copyright (c) Bani K. Mallick25 Categorical Data Roll two fair dice / Second Dice First Dice Define a success as rolling a 1 or a 2. What is the chance of two failures? 16/36 = 4/9

Copyright (c) Bani K. Mallick26 Categorical Data So, a success occurs when you roll a 1 or a 2 Pr(success on a single die) = 2/6 = 1/3 =  Pr(2 successes) = 1/3 x 1/3 = 1/9 Use the binomial formula: pr(X=k) when k=2 k!=2, n!=2, (n-k)!=1,

Copyright (c) Bani K. Mallick27 Categorical Data In other words, the binomial formula works in these simple cases, where we can draw nice tables Now think of rolling 4 dice, and ask the chance the 3 of the 4 times you get a 1 or a 2 Too big a table: need a formula

Copyright (c) Bani K. Mallick28 Categorical Data Does it matter what you call as “success” and hat you call a “failure”? No, as long as you keep track For example, in a class experiment many years ago, men were asked whether they preferred to wear boxers or briefs This is binary, because there are only 2 outcomes “success” = ?????

Copyright (c) Bani K. Mallick29 Categorical Data Binary experiments have sampling variability, just like sample means, etc. Experiment: “success” = being under 5’10” in height First 6 men with SSN < 5 First 6 men with SSN > 5 Note how the number of “successes” was not the same! (I might have to do this a few times)

Copyright (c) Bani K. Mallick30 Categorical Data The sample fraction is a random variable This means that if I do the experiment over and over, I will get different values. These different values have a standard deviation.

Copyright (c) Bani K. Mallick31 Categorical Data The sample fraction has a standard error Its standard error is Note how if you have a bigger sample, the standard error decreases The standard error is biggest when  = 0.50.

Copyright (c) Bani K. Mallick32 Categorical Data The sample fraction has a standard error Its standard error is The estimated standard error based on the sample is

Copyright (c) Bani K. Mallick33 Categorical Data It is possible to make confidence intervals for the population fraction if the number of successes > 5, and the number of failures > 5 If this is not satisfied, consult a statistician Under these conditions, the Central Limit Theorem says that the sample fraction is approximately normally distributed (in repeated experiments)

Copyright (c) Bani K. Mallick34 Categorical Data (1  100% CI for the population fraction is by looking up 1  in Table 1

Copyright (c) Bani K. Mallick35 Categorical Data Often, you will only know the sample proportion/percentage and the sample size Computing the confidence interval for the population proportion: two ways By hand By SPSS (this is a pain if you do not have the data entered already) Because you may need to do this by hand, I will make you do this.

Copyright (c) Bani K. Mallick36 Categorical Data (1  100% CI for the population fraction 95% CI, = 1.96 n = 25, = 0.30

Copyright (c) Bani K. Mallick37 Categorical Data (1  100% CI for the population fraction Interpretation?

Copyright (c) Bani K. Mallick38 Categorical Data (1  100% CI for the population fraction Interpretation? The proportion of successes in the population is from 0.12 to 0.48 (12% to 48%) with 95% confidence

Copyright (c) Bani K. Mallick39 Categorical Data You can use SPSS as long as the number of successes and the number of failures both exceed 5 To get the confidence intervals, you first have to define a numeric version of your variable that classifies whether an observation is a success or failure. You then compute the 1-sample confidence interval from “descriptives” “Explore”: Demo

Copyright (c) Bani K. Mallick40 Categorical Data If you set up your data in SPSS, the “mean” will be the proportion/fraction/percentage of 1’s Data = n = 10 Mean = 4/10 =.40 =.40

Copyright (c) Bani K. Mallick41 Boxers versus briefs for males In this output, boxers = 1 and briefs = 0

Copyright (c) Bani K. Mallick42 Boxers versus briefs for males: what % prefer boxers? In the sample, 46.81%. In the population??? Descriptives E Mean Lower Bound Upper Bound 95% Confidence Interval for Mean 5% Trimmed Mean Median Variance Std. Deviation Minimum Maximum Range Interquartile Range Skewness Kurtosis Boxers or Briefs Perference StatisticStd. Error In this output, boxers = 1 and briefs = 0. The proportion of 1’s is the mean

Copyright (c) Bani K. Mallick43 Boxers versus briefs for males: what % prefer boxers? Between 39.61% and 54.01% Descriptives E Mean Lower Bound Upper Bound 95% Confidence Interval for Mean 5% Trimmed Mean Median Variance Std. Deviation Minimum Maximum Range Interquartile Range Skewness Kurtosis Gender MaleNumeric Boxers: 0 = Briefs, 1 = Boxers StatisticStd. Error

Copyright (c) Bani K. Mallick44 Boxers versus briefs In the sample, 46.81% of the men preferred boxers to briefs: 53.19% preferred briefs. Between 39.61% and 54.01% men prefer boxers to briefs (95% CI) Is there enough evidence to conclude that men generally prefer briefs?

Copyright (c) Bani K. Mallick45 Boxers versus briefs In the sample, 46.81% of the men preferred boxers to briefs: 53.19% preferred briefs. Between 39.61% and 54.01% men prefer boxers to briefs (95% CI) Is there enough evidence to conclude that men generally prefer briefs? No: since 50% is in the CI! This means that it is possible (95%CI) that 50% prefer boxers, 50% prefer briefs,  = 0.50.

Copyright (c) Bani K. Mallick46 Sample Size Calculations The standard error of the sample fraction is If you want an (1  100% CI interval to be you should set

Copyright (c) Bani K. Mallick47 Sample Size Calculations This means that

Copyright (c) Bani K. Mallick48 Sample Size Calculations The small problem is that you do not know . You have two choices: Make a guess for  Set  = 0.50 and calculate (most conservative, since it results in largest sample size) Most polling operations make the latter choice, since it is most conservative

Copyright (c) Bani K. Mallick49 Sample Size Calculations: Examples Set E = 0.04, 95% CI, you guess that  = 0.30 You have no good guess: