HIMS 650 Homework set 5 Putting it all together Chapters 8 and 11
Problem 8.1 1 Building a Table Start with simple probability. Simple Probability, also called marginal probability uses the blue colored cells. Candy Color Blue Yellow Total Green 20 Pink 80 60 40 100
Problem 8.1 1 Building a Table Simple probability is expressed as P(A) = Meaning, the probability that “A” will occur Example: P(candy with the color, pink, in them) = 80/100 or 0.80 or 80% You find this probability by dividing the number of candies with pink in them by the total number of candies.
Problem 8.1 1 Building a Table Joint probability involves 2 events and uses the yellow cells in this picture. You find joint probability by multiplying two independent probabilities together. Therefore, the probability that you will find a Blue and Green colored candy in the mix is 20 times 60 divided by 100 (always divide by the total when looking for probabilities) or 18. Candy Color Blue Yellow Total Green 18 20 Pink 80 60 40 100
Problem 8.1 1 Building a Table Follow the same procedure for the rest of the cells: The probability of a yellow and green candy is (40*20)/100 = 8 The probability of a blue and pink candy is (60*80)/100 = 48 The probability of a yellow and pink candy is (40*80/100 = 32 Voila! Its that easy! Candy Color Blue Yellow Total Green 12 8 20 Pink 48 32 80 60 40 100
Problem 8.1 2 Chi Square Candy Color Blue Yellow Total Green 10 20 Now, when you calculated the joint probabilities of the candy mix in problem 8.1 1, you calculated the expected values of the candy mix, based on the simple probabilities in the margins. You will probably never get the expected values that you calculated, which is true in this case. When you count the number of different colored candies in the mix, you will find the results in this table. Candy Color Blue Yellow Total Green 10 20 Pink 50 30 80 60 40 100
Problem 8.1 2 Chi Square I am sure that the question that everyone is asking themselves at this point is: can the two colors be considered statistically independent of each other? To answer that question, we need to do the Chi Square test on both of the tables. Here is the Chi Square formula, (which is not that difficult to compute using Excel.)
Problem 8.1 2 Chi Square Candy Color Blue Yellow Green 12-10=2 8-10=-2 The variable, E, is the expected results and they are shown in the first table that we created. The variable, O, is the observed results and they are shown in the second table. Using only the yellow areas, create another table where you subtract the expected results from the observed results. Candy Color Blue Yellow Green 12-10=2 8-10=-2 Pink 48-50=-2 32-30=2
Problem 8.1 2 Chi Square Candy Color Blue Yellow Green 2*2=4 -2*-2=4 Now square the results: Candy Color Blue Yellow Green 2*2=4 -2*-2=4 Pink
Problem 8.1 2 Chi Square Now divide by the expected outcome in each case (answers are rounded to two decimal places in this example so when you use Excel, you might get a slightly different result due to rounding): Finally add all of the answers together, and Voila! You have your chi square result, which is 1.04 Candy Color Blue Yellow Green 4/12= 0.33 4/8= 0.50 Pink 4/48= 0.08 4/32= 0.13
Problem 8.1 3 Chi Square I am sure that you next question is how can I interpret the number 1.04, which is the Chi square result from our candy mix? Well, I am glad that you asked that question, so here is the answer. Use =CHIDIST() =CHIDIST() takes 2 arguments: the chi square value that we calculated, 1.04, and the degrees of freedom. The only question is how to find the degrees of freedom. The formula for finding degrees of freedom is the number of rows of joint probability minus 1 times the number of columns of joint probability numbers minus 1 or: (r-1)*(c-1) For a 2 by 2 table, like the one we just constructed the degrees of freedom is (2-1)*(2-1) = 1
Problem 8.1 3 Chi Square So, for our problem =CHIDIST(1.04, 1) = 0.45 (rounded to two decimals.) There is about a .31 or 31% chance of getting the 1.04 value. Like most of the other distributions, only reject the null hypothesis is the Chi Square value is less than .05. Do not reject the null hypothesis so assume that the colors are statistically independent.
Problem 8.1 4 Chi Square Candy Color Blue Yellow Total Green 10 20 Calculate conditional and marginal probabilities for he cross-tabulations and give it an interpretation. Since rows and columns are independent, no interpretation is necessary. Candy Color Blue Yellow Total Green 10 20 Pink 50 30 80 60 40 100 Candy Color Blue Yellow Total Green 10/60 = 17% 10/40 = 25% 20/100 = 20% Pink 50/60 = 83% 30/40 = 75% 80/100 = 80% 83%+17% = 100% 25%+75% = 100% 20%+80% = 100%
Problems 8.2 3 and 7 Chi Square (Both 8.2 3 and 8.2 7 follow the same steps. The steps for 8.2 3 are given here.) You know how to do the pivot table so you simply need to download Chpt 8-1.xls and create a pivot table for sex and length of stay by “long” or “short.” Use the pivot table that you created to create an “observed” table like problem 8.1 2. Next, use the simple or marginal probabilities and create an “expected” table like you did in problem 8.1 1. Ignore the middle of the observed table, only use the numbers on the margins for this task. Next, use these two tables to find the chi square and use =CHIDIST() to determine the hypothesis of independence. (like problem 8.2 3.) Finally, create conditional probabilities and provide an interpretation of the results like you did in problem 8.1 4. You have already done this problem, you simply need to follow the steps that you have learned.
Problem 11.1 1 You learned how to use the chart feature on Excel to generate bar charts, pie charts, etc. Now you will use the chart feature to generate an xy graph. Here is a good website to learn how to use the xy graph. http://www.excel-easy.com/examples/scatter-chart.html From the website: 1. Select your range. 2. On the insert tab, in the Charts group, choose Scatter, and select Scatter with straight lines. Note: If you want to add the axis titles, make sure you select the titles on the first row of the data along with the data. You might want to consider adding a trend line to help you visualize how well the data “fits” to the line.
Problem 11.1 1 and 11.1 2 Adding a trend line. http://www.excel-easy.com/examples/trendline.html From the website. “Right click the data series, and then click Add Trendline.” Problem 11.1 2: Just answer the 3 questions. “Which, if any, of the preceding charts appear to be the best to fit a linear model and why? Which appear to be the worst?”
Problem 11.1 5 The equation of a line is y = b(1) * x + b(0) where b(1) is the slope and b(0) is the intercept. Use the slope, intercept and x values to find what y equals and then plot the answers on a graph.
Problem 11.1 7 The slope-intercept line formula y = b(1)x + b(0) When you get the answers generated in parts a – c, use the Scatter plot feature of Excel to graph your answers.
Problem 11.3 1 part a The answer to this problem is found on page 378, labeled Figure 11.10. That’s it! You are done. Congratulations.
A soothing picture to relax you