1 Always be mindful of the kindness and not the faults of others.
Categorical Data Sections 10.1 to 10.5 Estimation for proportions Tests for proportions Chi-square tests
3 Example Researchers in the development of new treatments for cancer patients often evaluate the effectiveness of new therapies by reporting the proportion of patients who survive for a specified period of time after completion of the treatment. A new treatment of 870 patients with lung cancer resulted in 330 survived at least 5 years.
4 Example Estimate , the proportion of all patients with lung cancer who would survive at least 5 years after being administered this treatment How much would you estimate the proportion as?
5 Distribution of Sample Proportion Y: the number of successes in the n trials (independent and identical trials) What’s the distribution of Y? Sample proportion,
6 Distribution of Sample Proportion When n ≥ 5 and n(1- ≥ 5, the distribution of Y can be approximated by a normal distribution. (approximate) (1- ) Confidence Interval for : Optional: (exact) C.I. for for small sample
7 Sample Size Where E is the largest tolerable error at (1- confidence level.
8 Test for a Large Sample When n ≥ 5 and n(1- ≥ 5, the test statistic is:
9 Inference about 2 Proportions Notation: Population 1Population 2 Proportion Sample sizen1n2 # of successesy1y2 Sample proportion
10 Estimation for Point estimate:
11 Estimation for (1- ) Confidence Interval for two large samples:
12 Example 10.6 A company markets a new product in the Grand Rapids and Wichita. In Grand Rapids, the company’s advertising is based entirely on TV commercials. In Wichita, based on a balanced mix of TV, radio, newspaper, and magazine. 2 months after the ad campaign begins, the company conducts surveys to determine consumer awareness of the product.
13 Example 10.6: Data Set Grand RapidsWichita # of interviewed # of aware Q: Calculate a 95% C.I. for the regional difference in the proportion of all consumers who are aware of the product.
14 Example 10.6 (conti.) Conduct a test at =0.05 to verify if there are >10% more Wichita consumers than Grand Rapids consumers aware of the product.
15 Test for Large Samples) When n1 ≥ 5 and n1(1- ≥ 5; n2 ≥ 5 and n2(1- ≥ 5, the test statistic of Ho: p1-p2=d is Optional: Fisher Exact Test (p.511)
Minitab Z procedure for one proportion: Stat >> Basic Statistics >>1 proportion Z procedure for two proportions: Stat >> Basic Statistics >>2 proportion Sample size calculation: Stat >>Power & Sample size>>1 proportion or 2 proportion Stat >>Power & Sample size>>sample size for estimation 16
17 Chi-Square Goodness of Fit Test More than two possible outcomes per trial the multinomial experiment 1. The experiment consists of n identical trials. 2. Each trial results in one of k outcomes with probabilities ... k. Y=(Y 1,…,Y k ); Y i = the # of outcome i.
18 Chi-square Goodness of Fit Test Goal:We are interested in testing a hypothesized distribution of Y (i.e. a set of i ’s values). Hypotheses: Ho: i = io for all ivs. Ha: Ho is false
19 Chi-square Goodness of Fit Test Test Statistic: ni = the observed Yi Ei = the expected Yi = n io
20 Chi-square Goodness of Fit Test Rejection Region: Reject Ho if where df=k-1. Note: This test can be trusted only when 80% of more cells of the Ei’s are at least 5.
21 Example CategoryHypothesized %Observed counts Marked decrease50120 Moderate decrease2560 Slight decrease10 Stationary of slight increase 1510
Minitab: Stat >> Tables >> Chi-Square Goodness-of-Fit Test(One Variable) 22 Example 10.11
23 Contingency Table(Example 10.12) n ij Age Category Severity of skin disease 1234 Total n i* Total n *j = n
24 Contingency Table 2 categorical variables: row and column indexed by i and j, respectively If they are independent, then
25 Test for Independence of 2 Var’s Hypotheses: Ho: the row and column variables are independent Ha: they are dependent Test Statistic:
26 Test for Independence of 2 Var’s Rejection Region: Reject Ho if where df=(r-1)(c-1). Note: This test can be trusted only when 80% of more cells of the are at least 5.
Minitab: Stat >> Tables >> Cross Tabulation and Chi-square Tabulated Statistics: C1, Worksheet columns Rows: C1 Columns: Worksheet columns All A B C All Cell Contents: Count % of Total Expected count Pearson Chi-Square = , DF = 8, P-Value = Likelihood Ratio Chi-Square = , DF = 8, P-Value = Example 10.12