Probabilities from Contingency Tables
The table below contains data obtained in a study of the relationship between olive oil consumption and cancer of the colon and rectum. Olive Oil Consumption LowMediumHighTotals Cancer Status Colon Cancer Rectal Cancer No Cancer Totals
We would first like to calculate the probability that a randomly chosen subject has colon cancer: Out of 6107 subjects, 1225 have colon cancer, so P(Colon Cancer) = 1225/6107 = Olive Oil Consumption LowMediumHighTotals Cancer Status Colon Cancer Rectal Cancer No Cancer Totals
Now calculate the probability that a randomly selected subject consumed a medium amount of olive oil: Out of 6107 subjects, 2015 consumed a medium amount, so P(Medium) = 2015/6107 =.330 Olive Oil Consumption LowMediumHighTotals Cancer Status Colon Cancer Rectal Cancer No Cancer Totals
Next, we will calculate the probability that a randomly selected subject has colon cancer and consumed a medium amount of olive oil: Out of 6107 subjects, 397 had colon cancer and consumed a medium amount, so P(Colon Cancer and Medium) = 397/6107 =.065 Olive Oil Consumption LowMediumHighTotals Cancer Status Colon Cancer Rectal Cancer No Cancer Totals
To calculate the probability that a randomly selected subject has colon cancer or consumed a medium amount of olive oil, we use the General Addition Rule: P(Colon Cancer or Medium) = P(Colon Cancer) + P(Medium) – P(Colon Cancer and Medium) = 1225/ /6107 – 397/6107 = Olive Oil Consumption LowMediumHighTotals Cancer Status Colon Cancer Rectal Cancer No Cancer Totals
The events ‘Colon Cancer’ and ‘Rectal Cancer’ are disjoint (mutually exclusive), so, to find the probability that a randomly selected subject has either Colon Cancer or Rectal Cancer, we use the Special Addition Rule: P(Colon Cancer or Rectal Cancer) = P(Colon) + P(Rectal) = 1225/ /6107 = Olive Oil Consumption LowMediumHighTotals Cancer Status Colon Cancer Rectal Cancer No Cancer Totals
Now we will calculate some conditional probabilities. For example, we might want to know the probability that a randomly selected subject has Colon Cancer, given that the subject consumed a Medium amount of olive oil. We are then interested only in the ‘Medium’ column, which corresponds to the given condition. We see that, out of 2015 subjects who consumed a Medium amount of olive oil, 397 have Colon Cancer: P(Colon Cancer | Medium) = 397/2015 = Olive Oil Consumption Low Medium HighTotals Cancer Status Colon Cancer Rectal Cancer No Cancer Totals
The probability that a randomly selected subject consumed a Medium amount of olive oil given that he has Colon Cancer is different. This time, the given condition is that the subject has Colon Cancer, so we are interested only in the Colon Cancer row. Out of 1225 subjects with Colon Cancer, 397 consumed a Medium amount of olive oil: P(Medium | Colon Cancer) = 397/1225 = Olive Oil Consumption LowMediumHighTotals Cancer Status Colon Cancer Rectal Cancer No Cancer Totals
We might want to know whether the occurrence of Colon Cancer is related to whether or not the subject consumed a Medium amount of olive oil. In other words, are the events ‘Colon Cancer’ and ‘Medium’ independent? To answer this, remember that events A and B are independent if P(A | B) = P(A) and dependent if P(A | B) is different from P(A), so we will compare P(Colon | Medium) with P(Colon). (We could also compare P(Medium | Colon) with P(Medium)). Recall that P(Colon | Medium) =.197, but P(Colon) =.201, so the probability of Colon Cancer is slightly smaller for those who consumed a Medium amount of olive oil than for the entire sample. We can conclude that these two events are Dependent.