Probability Distributions for Discrete Variables Farrokh Alemi Ph.D. Professor of Health Administration and Policy College of Health and Human Services, George Mason University 4400 University Drive, Fairfax, Virginia 22030 703 993 1929 falemi@gmu.edu
Lecture Outline What is probability? Discrete Probability Distributions Assessment of rare probabilities Conditional independence Causal modeling Case based learning Validation of risk models Examples This is the second lecture on a series of lectures intended to prepare you to do probabilistic risk analysis.
Lecture Outline What is probability? Discrete Probability Distributions Bernoulli Geometric Binomial Poisson Assessment of rare probabilities Conditional independence Causal modeling Case based learning Validation of risk models Examples This lecture introduces you to various probability mass functions including, Bernoulli, Binominal, Geometric and Poisson distributions. We show how these are related to each other and how they can be used to describe real processes in health care settings.
Definitions Function Density function Distribution function A function assigns numbers to events. A probability density function gives the probability of a group or events or a single event. In contrast, a cumulative probability function gives the probability of all events less than or equal to a specific level. Both functions require us to sort events from a low value to a high value.
Probability density function Cumulative distribution function Definitions Events Probability density function Cumulative distribution function 0 medication errors 0.90 1 medication error 0.06 0.96 2 medication errors 0.04 1 Otherwise In this example, the number of medication errors are sorted and listed to the left. The first column shows the probability density function. At each value, it provides the probability of the event listed. For example, 90% of patients have no medication errors, 6% of patients have 1 medication error and 4% have 2 medication errors. The cumulative distribution function is also given in the right column. It starts at 90% and increases or stays the same thereafter. The cumulative distribution function gives probability of occurrence of all of the events with values less than or equal to the event. For example, 96% is the probability of having patients with 1 or 0 medication errors. The steps in a cumulative distribution function are equality to the probability of the event at the step.
Expected Value Probability density function can be used to calculate expected value for an uncertain event. Summed over all possible events Probability density function can be used to calculate expected value for an uncertain event. The expected value is calculated by multiplying the probability of the event by its value and summing across all events. Expected Value for variable X Value of event “i” Probability of event “i”
Calculation of Expected Value from Density Function Events Probability density function Value times probability 0 medication errors 0.90 0*(0.90)=0 1 medication error 0.06 2 medication errors 0.04 Otherwise In this example, the number of medication errors are sorted and listed to the left. The first column shows the probability density function. At each value, it provides the probability of the event listed. For example, 90% of patients have no medication errors, 6% of patients have 1 medication error and 4% have 2 medication errors. To calculate the expected value, first we multiply the value of each event by its probability. In the first row, 0 medication errors is multiplied by 90% chance to obtain 0.
Calculation of Expected Value from Density Function Events Probability density function Value times probability 0 medication errors 0.90 0*(0.90)=0 1 medication error 0.06 2 medication errors 0.04 0.08 Otherwise In the second row, 6% is multiplied by 1 to obtain 0.06 and 4% is multiplied by 2 to obtain 0.08.
Calculation of Expected Value from Density Function Events Probability density function Value times probability 0 medication errors 0.90 1 medication error 0.06 2 medication errors 0.04 0.08 Otherwise Total 0.12 The expected medication error is the sum of the products of the event value and its probability. In this example it is 0.12. Expected medication errors
Exercise Chart the density and distribution functions of the following data for patients with specific number of medication errors & calculate expected number of medication errors
Probability Density & Cumulative Distribution Functions This chart shows the probability density and the cumulative distribution function for the data on medication errors. A cumulative density function shows the probability of events less than or equal to a particular value. It never decreases. The size of the step in the cumulative distribution is the same as the probability of the event. For example the rise from 0 to 1 medication error is equal to the probability of 1 medication error. The shape of distribution functions tells a great deal about the function. For discrete variables, the type of data we focus on in this course, there are some very unique distribution functions, each with a signature shape.
Exercise If the chances of medication errors among our patients is 1 in 250, how many medication errors will occur over 7500 patients? Show the density and cumulative probability functions.
Typical Probability Density Functions Bernoulli Binomial Geometric Poisson Knowing the probability density function is a very important step in deciding what to expect. Naturally, a great deal of thought has gone into recognizing different probability distributions. The most common density probability functions for discrete variables are Bernoulli, Binomial, Geometric, and Poisson functions. We will describe each of these functions and explain the relationships among them. These functions are useful in describing a large number of events, for example the probability of wrong side surgery, the time to next gun shot in a hospital, the probability of medication errors over a large number of visits, the arrival rate of security incidences and so on. If our focus remains on events that either happen or do not occur, then these four distributions are sufficient to describe many aspects of these events.
Bernoulli Probability Density Function Mutually exclusive Exhaustive Occurs with probability of p Bernoulli density function is the most common density function in discrete events. It assumes that two outcomes are possible. Either the event occurs or it does not. In other words the events of interest are mutually exclusive. It also assumes that the possible outcomes are exhaustive, meaning at least one of the outcomes must occur. In a Bernoulli density function, the event occurs with a constant probability of p. Typically, it is assumed that the probability is calculated for a specific number of trials or a specific period of time. For example, we can talk of probability of patient elopement in our facility to be 0.05 in a day.
Exercise If a nursing home takes care of 350 patients, how many patients will elope in a day if the daily probability of elopement is 0.05?
Independent Repeated Bernoulli Trials Independence means that the probability of occurrence does not change based on what has happened in the previous day Patient elopes No event Day 1 Day 2 Day 3 Lets now think through a situation where a Bernoulli event is repeatedly tried. Lets assume that in these trials the probability of occurrence of the event is not affected by its past occurrence, in other words that each trial is independent of all others. As we will see shortly when we are dealing with independent repeated trials of Bernoulli events, a number of common density functions may describe various events. For example, the time to the first occurrence of the event has a geometric distribution. Lets start with looking at an example of repeated independent trials. In this situation we have 3 repeated trials for tracking patients elopement over time. We are assuming that the probability of elopement does not change if one patient has eloped in the prior days. In day 1, the patient may elope or not. In day two, the same event may repeat. The process continues until day three. As you can see the patient may elope in different days and on each day this probability of elopement on that day is constant and equal to values on prior days. Independence cannot be assumed for all repeated trials. For example, probability of contagious infection changes if there was an infection in the prior day. Therefore, independent trials cannot be assumed in this situation. But in many situations it can and when we can there is a lot we can tell about the probability function under this assumption.
Geometric Probability Density Function Number of trials till first occurrence of a repeating independent Bernoulli event If a Bernoulli event is repeated until the event occurs, then the number of trials to the occurrence of the event has a Geometric distribution. The Geometric density function is given by multiplying probability of 1 occurrence by probability of k-1 non-occurrence that should precede it. K-1 non-occurrence of the event occurrence of the event
Geometric Probability Density Function Expected number of trials prior to occurrence of the event An interesting property of Geometric probability density function is that the expected number of trials prior to occurrence of the event is given by dividing 1 by the probability of the occurrence of the event in every trial. As we will see later, this fact is used to estimate probability of rare events that might occur only once in a decade. If an event has occurred once in the last decade, then the daily probability of the event is 3 in 10000, a very small probability indeed.
Exercise No medication errors have occurred in the past 90 days. What is the daily probability of medication error in our facility? The time between patient falls was calculated to be 3 days, 60 days and 15 days. What is the daily probability of patient falls?
Binomial Probability Distribution Independent repeated Bernoulli trials Number of k occurrences of the event in n trials Assume that we repeat the Bernoulli trials and that each time we do so the probability of occurrence of the event stays the same and is independent of what has happened in the past. The Binomial density function gives the probability of having k occurrences of the event in n Bernoulli trials.
Repeated Independent Bernoulli Trials Probability of exactly two elopement in 3 days On day 1 and 2 not 3 p p (1-p) On day 1 not 2 and 3 p (1-p) p On day 2 3 and not 1 P p (1-p) For example, having 2 patients elope in 3 days is possible if patients elope in days 1 and 2 but not 3 and if they elope in days 1 and 3 and not 2. Finally the third possibility is the patients elope in days 2 and 3 and not 1.
Binomial Probability Distribution n! is n factorial and is calculated as 1*2*3*…*n n this formula, n! is referred to as n factorial. It is equal to 1*2,*3, ...., *(n-1)*n. The proportion of n!/(k!(n-k)!) counts the number of different ways k occurrence of the event might be arranged in n trials. Possible ways of getting k occurrences in n trials
Binomial Probability Distribution k occurrences of the even The term p to the power of k measures the probability of k occurrences of the event Possible ways of getting k occurrences in n trials
Binomial Probability Distribution k occurrences of the even and the term (1-p) to the power of n-k measures the probability of n-k non-occurrence of the event. Possible ways of getting k occurrences in n trials n-k non-occurrence of the event
Binomial Density Function for 6 Trials, p=1/2 Lets look at some examples so you can have an intuitive understanding of the Binomial distribution. Here you see a Binomial distribution with 6 trials. There are seven possibilities. Either the event never occurs or it occurs once, twice, three, four, five, or six times. The probability density function shows the likely occurrences. The most likely situation is for the event to occur three times in 6 days. This is also the expected value and can be obtained by multiplying the number of trials by the probability of occurrence of the Bernoulli event. Note that the distribution is symmetric, though as we will see shortly when p value is less than 0.5 then the distribution gets skewed to the left. The expected value of a Binomial distribution is np. The variance is np(1-p)
Binomial Density Function for 6 Trials, p=0.05 Lets now go back to the patient elopement example, where the daily probability of elopement was 5%. Over a 6 day period, we are most likely to see no patients elope. There is a 23% chance of seeing one elopement and there is a 3% chance of seeing two elopements. Note how the distribution is skewed to the left. This is always the case when p is less than 50%. The more it is smaller than 50% the more it would be skewed to the left.
Exercise If the daily probability of elopement is 0.05, how many patients will elope in a year? Now we can answer this exercise more fully. If the daily probability of elopement is 0.05, the expected number of elopement over a year is 0.05*365 = 18.25 per person.
Exercise If the daily probability of death due to injury from a ventilation machine is 0.002, what is the probability of having 1 or more deaths in 30 days? What is the probability of 1 or more deaths in 4 months? Number of trials = 30 Daily probability = 0.002 Number of deaths = Probability of 0 deaths = 0.942 Probability of 1 or more deaths= 0.058
Exercise If the daily probability of death due to injury from a ventilation machine is 0.002, what is the probability of having 1 or more deaths in 30 days? What is the probability of 1 or more deaths in 4 months? Number of trials = 30 Daily probability = 0.002 Number of deaths = Probability of 0 deaths = 0.942 Probability of 1 or more deaths= 0.058
Exercise Which is more likely, 2 patients failing to comply with medication orders in 15 days or 4 patients failing to comply with medication orders in 30 days.
Poisson Density Function Approximates Binomial distribution Large number of trials Small probabilities of occurrence As the number of trials increases and the probability of occurrence drops, Poisson distribution approximates Binomial distribution. In risk analysis, this occurs often. Typically we are looking at a large number of visits or days and the sentinel event has a very low probability of occurrence. In these circumstances, the number of occurrence of the event can be estimated by Poisson distribution.
Poisson Density Function In this formula, Λ is the expected number of trials and is equal to n times p, where n is the number of trials and p is the probability of the occurrence of the event in one trial. k is the number of occurrences of the sentinel event and e is 2.71828, the base of natural logarithms Λ is the expected number of trials = n p k is the number of occurrences of the sentinel event e = 2.71828, the base of natural logarithms
Exercise What is the probability of observing one or more security violations. when the daily probability of violations is 5% and we are monitoring the organization for 4 months What is the probability of observing exactly 3 violations in this period?
Take Home Lesson Repeated independent Bernoulli trials is the foundation of many distributions
Exercise What is the daily probability of relapse into poor eating habits when the patient has not followed her diet on January 1st, May 30th and June 7th? What is the daily probability of security violations when there has not been a security violation for 6 months?
Exercise How many visits will it take to have at least one medication error if the estimated probability of medication error in a visit is 0.03? If viruses infect computers at a rate of 1 every 10 days, what is the probability of having 2 computers infected in 10 days?
Exercise Assess the probability of a sentinel event by interviewing a peer student. Assess the time to sentinel event by interviewing the same person. Are the two responses consistent?