1 PROBABILITY AND THEORETICAL DISTRIBUTIONS
2 Since medicine is an inexact science, physicians seldom can predict an outcome with absolute certainty. To formulate a diagnosis, a physician must rely on available diagnostic information about a patient (physical examinations or laboratory tests). If the test result is not absolutely accurate, decisions (diagnoses) relying on this result will be uncertain. Probability is a means for quantifying uncertainty.
3 PROBABILITY AND THEORETICAL DISTRIBUTIONS trial Assume that an experiment can be repeated many times, with each replication called a trial, and assume that one or more outcomes can result from each trial. In probability, an experiment is defined as any planned process of data collection. The probability of outcome “A” is written as P(A).
4 PROBABILITY AND THEORETICAL DISTRIBUTIONS Example: A blood bank gives the distribution of blood group types of 150 subjects in a local area. The probability that a randomly selected person has blood type A is: P(A)=64/150=0.427 RULE 1: The sum of probabilities of all possible outcomes is 1.
5 PROBABILITY AND THEORETICAL DISTRIBUTIONS RULE 2: Two events are independent if the product of individual probabilities is equal to the probability that the two events happen together. In this case the outcome of one event, has no effect on the outcome of the other. If P(A) * P(B) = P(A and B) A and B are independent.
6 PROBABILITY AND THEORETICAL DISTRIBUTIONS Example 1: Assume that 100 subjects were cross classified according to their body weight status and presence or absence of coronary heart disease. Body Weight Status CHD + - TOTAL Obese Normal TOTAL Are body weight status and CHD independent? P(Obese and CHD+)=0.1 P(Obese)=0.4 P(CHD+)=0.25 P(Obese)*P(CHD+)=0.4*0.25=0.1 Obesity does not affect the disease status of the subjcet.
7 PROBABILITY AND THEORETICAL DISTRIBUTIONS Example 2: Assume that 100 subjects were cross classified according to their body weight status and presence or absence of choronary heart disease. Body Weight Status CHD + - TOTAL Obese20 40 Normal55560 TOTAL Are body weight status and CHD independent? P(Obese and CHD+)=0.2 P(Obese)=0.4 P(CHD+)=0.25 P(Obese)*P(CHD+)=0.4*0.25=0.1≠0.2 Obesity affect the disease status of the subjcet.
8 PROBABILITY AND THEORETICAL DISTRIBUTIONS RULE 3: When two events are not independent, the occurence of one event depends on whether the other event has occured. conditional probability The probability of one event given that other event has occured is called the conditional probability. P(A|B)= The probability of Event A, given Event B, P(A|B), is the conditional probability.
9 PROBABILITY AND THEORETICAL DISTRIBUTIONS Body Weight Status CHD + - TOTAL Obese Normal TOTAL Example 3: Using the data in Ex.1, we can calculate P(CHD+ | Obese)= P(CHD+ and Obese) P(Obese) =0.1/0.4 =0.25 Which is also P(CHD+). Knowing that a subject is obese, does not change the likelihood of CHD. In this example the two events were found to be independent.
10 PROBABILITY AND THEORETICAL DISTRIBUTIONS Body Weight Status CHD + - TOTAL Obese20 40 Normal55560 TOTAL Example 4: Using the data in Ex.2, we can calculate P(CHD+ | Obese)= P(CHD+ and Obese) P(Obese) =0.2/0.4 =0.5 P(CHD+)=025. Knowing that a subject is obese, doubles the likelihood of CHD. In this example the two events were found to be dependent. One event is affected by the other.
11 PROBABILITY AND THEORETICAL DISTRIBUTIONS Test Result Disease + - TOTAL TOTAL In diagnostic medicine, it is important to know whether a test result depends upon the presence or absence of a disease or disorder. Given a subject with the disease, the probability that he will have a positive test result is SENSITIVITY OF A TEST
12 PROBABILITY AND THEORETICAL DISTRIBUTIONS Test Result Disease + - TOTAL TOTAL Given a subject without the disease, the probability that he will have a negative test result is SPECIFICITY OF A TEST
13 PROBABILITY AND THEORETICAL DISTRIBUTIONS Test Result Disease + - TOTAL TOTAL Given a subject with the disease, the probability that he will have a negative test result is FALSE NEGATIVE RATE OF A TEST
14 PROBABILITY AND THEORETICAL DISTRIBUTIONS Test Result Disease + - TOTAL TOTAL Given a subject without the disease, the probability that he will have a positive test result is FALSE POZITIVE RATE OF A TEST
15 PROBABILITY AND THEORETICAL DISTRIBUTIONS Variables can take on values by some chance mechanisms. Since probability is a measure of chance, probability distributions help us to study the probabilities associated with outcomes of the variable under study. Several theoretical probability distributions are important in biostatistics: I) Binomial II) Poisson III)Normal Discrete probability distributions: Variable takes only integer values. Continuous probability distribution: Variable has values measured on a continuous scale.
16 PROBABILITY AND THEORETICAL DISTRIBUTIONS THE BINOMIAL DISTRIBUTION: Variable has only binary outcomes (male – female; diseased – not diseased; positive – negative) denoted A and B. The probability of A is denoted by p. P(A) = p and P(B)= 1-p When an experiment is repeated n times, p remains constant (outcome is independent from one trial to another) Such a variable is said to follow a BINOMIAL DISTRIBUTION.
17 PROBABILITY AND THEORETICAL DISTRIBUTIONS The question is: What is the probability that outcome A occurs x times? or What proportion of n outcomes will be A? The probability of x outcomes in a group of size n, if each outcome has probability p and is independent from all outcomes is given by Binomial Probability Function:
18 PROBABILITY AND THEORETICAL DISTRIBUTIONS Example 1. For families with 5 children each, what is the probability that i) There will be one male child? Among families with 5 children each, 0.16 have one male child.
19 PROBABILITY AND THEORETICAL DISTRIBUTIONS ii) There will be at least one male children?
20 PROBABILITY AND THEORETICAL DISTRIBUTIONS Using the probabilities associated with possible outcomes, we can draw a probability distribution for the event under study:
21 PROBABILITY AND THEORETICAL DISTRIBUTIONS Example: Among men with localized prostate tumor and a PSA<10, the 5-year survival is known to be 0.8. We can use Binimial Distribution to calculate the probability that any particular number (A), out of n, will survive 5 years. For example for a new series of 6 such men: Non will survive 5 years : P(A=0)=0, Only 1 will survive 5 years : P(A=1)=0, will survive 5 years : P(A=2)=0,015 3 will survive 5 years : P(A=3)=0,082 4 will sıurvive 5 years : P(A=4)=0,246 5 will survive 5 years : P(A=5)=0,393 All will survive 5 years : P(A=6)=0,262
22 PROBABILITY AND THEORETICAL DISTRIBUTIONS
23 PROBABILITY AND THEORETICAL DISTRIBUTIONS THE POISSON DISTRIBUTION: Like the Binomial, Poisson distribution is a discrete distribution applicable when the outcome is the number of times an event occurs. Instead of the probability of an outcome, if average number of occurence of the event is given, associated probabilities can be calculated by using the Poisson Distribution Function which is defined as:
24 PROBABILITY AND THEORETICAL DISTRIBUTIONS Example. If the average number of hospitalizations for a group of patients is calculated as 3.22, the probability that a patient in the group has zero hospitalizations is
25 PROBABILITY AND THEORETICAL DISTRIBUTIONS The probability that a patient has exactly one hospitalization is The probability that a patient will be hospitalized more than 3 times, since the upper limit is unknown, is calculated as P(A>3)=1-P(A 3)
26 NORMAL DISTRIBUTION Normal (Gaussian) distribution is the most famous probability distribution of continuous variables. The function of normal distribution curve is as follows:
27 It is a smooth, bell-shaped curve Half of the area is on the left of the mean and half the area is on the right. Sum of the probabilities for any given set of events is equal to 1. µ It is symmetric around the mean of the distribution, symbolized by . Mean, median and mode are equal to the each other.
% 95.44% 99.74%
29 Normal distribution is not unique. There are many different types of normal distribution Graph 1. Three different normal distributions which have different means, same standard deviations Graph 2. Three different normal distributions which have same means, different standard deviations
30 STANDARD NORMAL DISTRIBUTION
31 Birthweight (x i )Zi=Zi= :: =3300 ; =600 =0 ; =1.0
32 If it is known that the birthweights of infants are normally distributed with a mean of 3300gr and a standard deviation of 600gr, what is the probability that a randomly selected infant will weigh less than 3000gr? More than 3000gr? Ans: =0.69
Area between 0 and z
34 If the mean and the standard deviation of the BMI of adult women are 24 and 6 units respectively, what proportion of women will have BMI>30 (what proportion of women will be clssified as obese)? 16% of the adult women will be classified as obese.
35 If it is given that, among adult women 10% is classisfied as “thin”, what is the borderline for being thin? An adult woman will be classified as thin if she has a BMI<16,32