Presentation is loading. Please wait.

Presentation is loading. Please wait.

6.1 - One Sample 6.1 - One Sample  Mean μ, Variance σ 2, Proportion π 6.2 - Two Samples 6.2 - Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.

Similar presentations


Presentation on theme: "6.1 - One Sample 6.1 - One Sample  Mean μ, Variance σ 2, Proportion π 6.2 - Two Samples 6.2 - Two Samples  Means, Variances, Proportions μ 1 vs. μ 2."— Presentation transcript:

1 6.1 - One Sample 6.1 - One Sample  Mean μ, Variance σ 2, Proportion π 6.2 - Two Samples 6.2 - Two Samples  Means, Variances, Proportions μ 1 vs. μ 2 σ 1 2 vs. σ 2 2 π 1 vs. π 2 μ 1 vs. μ 2 σ 1 2 vs. σ 2 2 π 1 vs. π 2 6.3 - Multiple Samples 6.3 - Multiple Samples  Means, Variances, Proportions μ 1, …, μ k σ 1 2, …, σ k 2 π 1, …, π k μ 1, …, μ k σ 1 2, …, σ k 2 π 1, …, π k CHAPTER 6 Statistical Inference & Hypothesis Testing CHAPTER 6 Statistical Inference & Hypothesis Testing

2 6.1 - One Sample 6.1 - One Sample  Mean μ, Variance σ 2, Proportion π 6.2 - Two Samples 6.2 - Two Samples  Means, Variances, Proportions μ 1 vs. μ 2 σ 1 2 vs. σ 2 2 π 1 vs. π 2 μ 1 vs. μ 2 σ 1 2 vs. σ 2 2 π 1 vs. π 2 6.3 - Multiple Samples 6.3 - Multiple Samples  Means, Variances, Proportions μ 1, …, μ k σ 1 2, …, σ k 2 π 1, …, π k μ 1, …, μ k σ 1 2, …, σ k 2 π 1, …, π k CHAPTER 6 Statistical Inference & Hypothesis Testing CHAPTER 6 Statistical Inference & Hypothesis Testing

3 “Do you like olives?” I = 1I = 0 POPULATION Two random binary variables I and J TWO POPULATIONS Random binary variable I “Do you like Brussel sprouts?” Alternative Hypothesis H A :  1 ≠  2 “There is a difference in liking Brussel sprouts bet two pops.”  = P(Yes to Brussel sprouts) Null Hypothesis H 0 :  1 =  2 “No difference in liking Brussel sprouts between two pops.” Binary Response: P(Success) =  “Test of Homogeneity” “Test of Homogeneity”

4 TWO POPULATIONS Random binary variable I “Do you like Brussel sprouts?” Alternative Hypothesis H A :  1 ≠  2 “There is a difference in liking Brussel sprouts bet two pops.”  = P(Yes to Brussel sprouts) Null Hypothesis H 0 :  1 =  2 “No difference in liking Brussel sprouts between two pops.” Binary Response: P(Success) =  “Test of Homogeneity” “Do you like anchovies?” J = 0 J = 1 POPULATION Two random binary variables I and J

5 TWO POPULATIONS Random binary variable I “Do you like Brussel sprouts?” Alternative Hypothesis H A :  1 ≠  2 “There is a difference in liking Brussel sprouts bet two pops.”  = P(Yes to Brussel sprouts) Null Hypothesis H 0 :  1 =  2 “No difference in liking Brussel sprouts between two pops.” Binary Response: P(Success) =  “Test of Homogeneity” “Do you like anchovies?” POPULATION Two random binary variables I and J “Do you like olives?”  1 = P(Yes to olives)  2 = P(Yes to anchovies) Alternative Hypothesis H A :  1 ≠  2 “An association exists between liking olives and anchovies.” Null Hypothesis H 0 :  1 =  2 “No association exists between liking olives and anchovies.” “Test of Independence” “Test of Independence” I = 1I = 0 J = 0 J = 1

6 TWO POPULATIONS Random binary variable I “Do you like Brussel sprouts?”  = P(Yes to Brussel sprouts) Binary Response: P(Success) =  “Test of Homogeneity” “Do you like anchovies?” POPULATION Two random binary variables I and J “Do you like olives?” “Test of Independence” Sample, size n 1 Sample, size n 2 Sample, size n 1 Sample, size n 2 (Assume “large” sample sizes.) I = 1I = 0 J = 0 J = 1  1 = P(Yes to olives)  2 = P(Yes to anchovies)

7 If n   15 and n (1 –  )  15, then via the Normal Approximation to the Binomial… If n   15 and n (1 –  )  15, then via the Normal Approximation to the Binomial… If n   15 and n (1 –  )  15, then via the Normal Approximation to the Binomial… If n   15 and n (1 –  )  15, then via the Normal Approximation to the Binomial… Sample 1, size n 1 Sample 2, size n 2 X 1 = # Successes X 2 = # Successes Sampling Distribution of Solution: Use Problem: s.e. depends on  !! Recall…

8 If n 1  1  15 and n 1 (1 –  1 )  15, then via Normal Approximation to the Binomial If n 1  1  15 and n 1 (1 –  1 )  15, then via Normal Approximation to the Binomial Sample 1, size n 1 Sample 2, size n 2 X 1 = # Successes X 2 = # Successes Sampling Distribution of If n 2  2  15 and n 2 (1 –  2 )  15, then via Normal Approximation to the Binomial If n 2  2  15 and n 2 (1 –  2 )  15, then via Normal Approximation to the Binomial Mean(X – Y) = Mean(X) – Mean(Y) Recall from section 4.1 (Discrete Models): and if X and Y are independent… Var(X – Y) = Var(X) + Var(Y)

9 0 Sampling Distribution of Sample 1, size n 1 Sample 2, size n 2 X 1 = # Successes X 2 = # Successes Similar problem as “one proportion” inference s.e. !  For confidence interval, replace  1 and  2 respectively, by standard error  For critical region and p-value, replace  1 and  2 respectively, by….. ???? Null Hypothesis H 0 :  1 =  2 …so replace their common value by a “pooled” estimate. standard error estimate = 0 under H 0 “Null Distribution”

10 Study Question: “Is there an association between liking Bruce Willis movies and gender, or not?” Example: Two Proportions (of “Success”)

11 Test of Homogeneity or Independence? Example: Two Proportions (of “Success”) Design: Randomly select two large samples of males and females, and record their binary responses (Yes = 1, No = 0) to the question “Do you like Bruce Willis movies?” samples Let the discrete random variable X = “# Successes” (i.e., “Yes” responses) in each gender of the samples, and use these data to test… Data: Sample 1) n 1 = 60 males, X 1 = 42 Sample 2) n 2 = 40 females, X 2 = 16 Analysis via Z-test: Point estimates Null Hypothesis H 0 : P(“Yes” among Males) = P(“Yes” among Females), i.e., H 0 : π 1 = π 2 where π = P(Success) in each gender population. “No association exists.” π 1 – π 2 = 0, NOTE: This is > 0. Therefore, REJECT H 0 Conclusion: A significant association exists at the.05 level between “liking Bruce Willis movies” and gender, with males showing a 30% preference over females, on average. Test of Homogeneity (between two populations) Study Question: “Is there an association between liking Bruce Willis movies and gender, or not?”

12 TWO POPULATIONS Random binary variable I “Do you like Bruce Willis movies?” Alternative Hypothesis H A :  1 ≠  2 “There is a difference in liking Bruce Willis bet two pops.”  = P(Yes to Bruce Willis movies) Null Hypothesis H 0 :  1 =  2 “No difference in liking Bruce Willis between two pops.” Binary Response: P(Success) =  “Test of Homogeneity” “Do you like anchovies?” POPULATION Two random binary variables I and J “Do you like olives?” Alternative Hypothesis H A :  1 ≠  2 “An association exists between liking olives and anchovies.” Null Hypothesis H 0 :  1 =  2 “No association exists between liking olives and anchovies.” “Test of Independence” I = 1I = 0 J = 0 J = 1 MalesFemales  1 = P(Yes to olives)  2 = P(Yes to anchovies)

13 Conclusion: A significant association exists at the.05 level; “liking Bruce Willis movies” and gender are dependent, with males showing a 30% preference over females, on average. Study Question: “Is there an association between liking Bruce Willis movies and gender, or not?” Example: Two Proportions (of “Success”) Design: Randomly select two large samples of males and females, and record their binary responses (Yes = 1, No = 0) to the question “Do you like Bruce Willis movies?” samples Let the discrete random variable X = “# Successes” (i.e., “Yes” responses) in each gender of the samples, and use these data to test… Data: Sample 1) n 1 = 60 males, X 1 = 42 Sample 2) n 2 = 40 females, X 2 = 16 Analysis via Z-test: Point estimates Null Hypothesis H 0 : P(“Yes” among Males) = P(“Yes” among Females), i.e., H 0 : π 1 = π 2 where π = P(Success) in each gender population. “No association exists.” π 1 – π 2 = 0, NOTE: This is > 0. Therefore, REJECT H 0 Test of Homogeneity or Independence

14 “Do you like olives?” TWO POPULATIONS Random binary variable I “Do you like Bruce Willis movies?” Alternative Hypothesis H A :  1 ≠  2 “There is a difference in liking Bruce Willis bet two pops.”  = P(Yes to Bruce Willis movies) Null Hypothesis H 0 :  1 =  2 “No difference in liking Bruce Willis between two pops.” Binary Response: P(Success) =  “Test of Homogeneity” “Do you like anchovies?” POPULATION Two random binary variables I and J  1 = P(Yes to Bruce)  2 = P(Yes to Male) Alternative Hypothesis H A :  1 ≠  2 “An association exists between liking Bruce and Male.” Null Hypothesis H 0 :  1 =  2 “No association exists between liking Bruce and Male.” “Test of Independence” I = 1I = 0 J = 0 J = 1 MalesFemales “Gender: Male?”“Do you like Bruce Willis?”

15 “Do you like olives?” TWO POPULATIONS Random binary variable I “Do you like Bruce Willis movies?” Alternative Hypothesis H A :  1 ≠  2 “There is a difference in liking Bruce Willis bet two pops.”  = P(Yes to Bruce Willis movies) Null Hypothesis H 0 :  1 =  2 “No difference in liking Bruce Willis between two pops.” Binary Response: P(Success) =  “Test of Homogeneity” “Do you like anchovies?” POPULATION Two random binary variables I and J  1 = P(Yes to Bruce)  2 = P(Yes to Male) Alternative Hypothesis H A :  1 ≠  2 “Liking Bruce” and “Gender” are statistically dependent. Null Hypothesis H 0 :  1 =  2 “Liking Bruce” and “Gender” are statistically independent. “Test of Independence” I = 1I = 0 J = 0 J = 1 MalesFemales “Gender: Male?”“Do you like Bruce Willis?”

16 Study Question: “Is there an association between liking Bruce Willis movies and gender, or not?” Example: Two Proportions (of “Success”) Data: Sample 1) n 1 = 60 males, X 1 = 42 Sample 2) n 2 = 40 females, X 2 = 16 H 0 : π 1 = π 2 where π = P(Success) in each gender population. “No association exists.” π 1 – π 2 = 0, Null Hypothesis H 0 : P(“Yes” among Males) = P(“Yes” among Females), i.e., Design: Randomly select two large samples of males and females, and record their binary responses (Yes = 1, No = 0) to the question “Do you like Bruce Willis movies?” samples Let the discrete random variable X = “# Successes” (i.e., “Yes” responses) in each gender of the samples, and use these data to test… ~ ALTERNATE METHOD ~ I = 1I = 0 J = 0 J = 1

17 Study Question: “Is there an association between liking Bruce Willis movies and gender, or not?” Example: Two Proportions (of “Success”) Data: Sample 1) n 1 = 60 males, X 1 = 42 Sample 2) n 2 = 40 females, X 2 = 16 H 0 : π 1 = π 2 where π = P(Success) in each gender population. “No association exists.” π 1 – π 2 = 0, MalesFemales Yes4216 No 6040 MalesFemales YesE 11 = ?E 12 = ?58 NoE 21 = ?E 22 = ?42 6040100 Observed Expected (under H 0 ) MalesFemales Yes421658 No182442 6040100 Null Hypothesis H 0 : P(“Yes” among Males) = P(“Yes” among Females), i.e., Design: Randomly select two large samples of males and females, and record their binary responses (Yes = 1, No = 0) to the question “Do you like Bruce Willis movies?” samples Let the discrete random variable X = “# Successes” (i.e., “Yes” responses) in each gender of the samples, and use these data to test…

18 Recall Probability Tables from Chapter 3…. Under the null hypothesis, the binary variable I is statistically independent of the binary variable J, i.e., P(I ∩ J) = P(I) P(J). J = 1J = 2 I = 1 π 11 π 12 π 11 + π 12 I = 2 π 21 π 22 π 21 + π 22 π 11 + π 21 π 12 + π 22 1

19 Recall Probability Tables from Chapter 3…. Contingency Table Under the null hypothesis, the binary variable I is statistically independent of the binary variable J, e.g., P(“I = 1” ∩ “J = 1”) = P(“I = 1”) P(“J = 1”). J = 1J = 2 I = 1 π 11 π 12 π 11 + π 12 I = 2 π 21 π 22 π 21 + π 22 π 11 + π 21 π 12 + π 22 1 J = 1J = 2 I = 1 E 11 E 12 R1R1 I = 2 E 21 E 22 R2R2 C1C1 C2C2 n J = 1J = 2 I = 1 E 11 /nE 12 /nR1/nR1/n I = 2 E 21 /nE 22 /nR2/nR2/n C1/nC1/nC2/nC2/n1 Probability Table Therefore…, etc. 

20 H 0 : π 1 = π 2 where π = P(Success) in each gender population. “No association exists.” Null Hypothesis H 0 : P(“Yes” among Males) = P(“Yes” among Females), i.e., Check: Is the null hypothesis true? Study Question: “Is there an association between liking Bruce Willis movies and gender, or not?” Example: Two Proportions (of “Success”) Data: Sample 1) n 1 = 60 males, X 1 = 42 Sample 2) n 2 = 40 females, X 2 = 16 MalesFemales Yes4216 No 6040 MalesFemales YesE 11 = ?E 12 = ?58 NoE 21 = ?E 22 = ?42 6040100 Observed Expected (under H 0 ) MalesFemales Yes421658 No182442 6040100 34.8 23.2 25.2 16.8 “Chi-squared” Test Statistic where “degrees of freedom” df = (# rows – 1)(# cols – 1), = 1 for a 2  2 table. π 1 – π 2 = 0, Design: Randomly select two large samples of males and females, and record their binary responses (Yes = 1, No = 0) to the question “Do you like Bruce Willis movies?” samples Let the discrete random variable X = “# Successes” (i.e., “Yes” responses) in each gender of the samples, and use these data to test…

21 Study Question: “Is there an association between liking Bruce Willis movies and gender, or not?” Example: Two Proportions (of “Success”) MalesFemales Yes34.823.258 No25.216.842 6040100 Observed Expected (under H 0 ) MalesFemales Yes421658 No182442 6040100 “Chi-squared” Test Statistic = 8.867 on 1 df p = ????? Design: Randomly select two large samples of males and females, and record their binary responses (Yes = 1, No = 0) to the question “Do you like Bruce Willis movies?”

22 Because 8.867 is much greater than the α =.05 critical value of 3.841, it follows that p <<.05. More precisely, 7.879 < 8.867 < 9.141; hence.0025 < p <.005. The actual p-value =.0029, the same as that found using the Z-test! Yes = c(42, 16) No = c(18, 24) Bruce = rbind(Yes, No) chisq.test(Bruce, correct = F) Pearson's Chi-squared test data: Bruce X-squared = 8.867, df = 1, p-value = 0.002904 α =.05

23 Study Question: “Is there an association between liking Bruce Willis movies and gender, or not?” Example: Two Proportions (of “Success”) MalesFemales Yes34.823.258 No25.216.842 6040100 Observed Expected (under H 0 ) MalesFemales Yes421658 No182442 6040100 “Chi-squared” Test Statistic = 8.867 on 1 df p =.0029 The α =.05 critical value is 3.841. Recall… Design: Randomly select two large samples of males and females, and record their binary responses (Yes = 1, No = 0) to the question “Do you like Bruce Willis movies?”

24 H 0 : π 1 = π 2 where π = P(Success) in each gender population. “No association exists.” Study Question: “Is there an association between liking Bruce Willis movies and gender, or not?” Example: Two Proportions (of “Success”) Data: Sample 1) n 1 = 60 males, X 1 = 42 Sample 2) n 2 = 40 females, X 2 = 16 Analysis via Z-test: Point estimates populationpopulation Null Hypothesis H 0 : P(“Yes” in Male population) = P(“Yes” in Female population), i.e., π 1 – π 2 = 0, NOTE: This is > 0. Therefore, REJECT H 0 Conclusion: A significant association exists at the.05 level; “liking Bruce Willis movies” and gender are dependent, with males showing a 30% preference over females, on average. Design: Randomly select two large samples of males and females, and record their binary responses (Yes = 1, No = 0) to the question “Do you like Bruce Willis movies?” samples Let the discrete random variable X = “# Successes” (i.e., “Yes” responses) in each gender of the samples, and use these data to test…

25 Study Question: “Is there an association between liking Bruce Willis movies and gender, or not?” Example: Two Proportions (of “Success”) MalesFemales Yes34.823.258 No25.216.842 6040100 Observed Expected (under H 0 ) MalesFemales Yes421658 No182442 6040100 “Chi-squared” Test Statistic p =.0029 The α =.05 critical value is 3.841. NOTE: (Z-score) 2 = (2.9775) 2 Connection between Z-test and Chi-squared test ! = 8.867 on 1 df NOTE: (Z-score) 2 = (2.9775) 2 Connection between Z-test and Chi-squared test ! = 8.867 on 1 df Design: Randomly select two large samples of males and females, and record their binary responses (Yes = 1, No = 0) to the question “Do you like Bruce Willis movies?”

26

27

28

29

30

31

32

33

34

35

36

37

38 Categorical data – contingency table with any number of rows and columns See notes for other details, comments, including “Goodness-of-Fit” Test. 2  2 Chi-squared Test is only valid if: Null Hypothesis H 0 :  1 –  2 = 0. One-sided or nonzero null value  Z-test! Expected Values  5, in order to avoid “spurious significance” due to a possibly inflated Chi-squared value. Paired version of 2  2 Chi-squared Test = McNemar Test Formal Null Hypothesis difficult to write mathematically in terms of  1,  2,… “Test of Independence” “Test of Homogeneity” Informal H 0 : “No association exists between rows and columns.” 80% of Expected Values  5


Download ppt "6.1 - One Sample 6.1 - One Sample  Mean μ, Variance σ 2, Proportion π 6.2 - Two Samples 6.2 - Two Samples  Means, Variances, Proportions μ 1 vs. μ 2."

Similar presentations


Ads by Google