Data Collection Principles

Data Collection Principles
By Farrokh Alemi, Ph.D. This lecture is organized by Dr. Alemi. The lecture is based on the OpenIntro Statistics book.

Sampling Examine a subset of cases
Sampling refers to the process of examining a subset of cases to infer the rate in the entire population

Randomization Examine a subset of cases
Sampling randomly helps reduce bias. If someone was permitted to pick and choose exactly which was included in the sample, it is entirely possible that the sample could be skewed to that person's interests. This introduces bias into a sample. Randomization avoids bias.

Non-response Bias Angry patients do not take satisfaction surveys
The act of taking a simple random sample helps minimize bias, however, bias can crop up in other ways. Even when people are picked at random, e.g. for surveys, caution must be exercised if the non-response is high. For instance, if only 30% of the people randomly sampled for a survey actually respond, then it is unclear whether the results are representative of the entire population. This non-response bias can skew results. For example, angry patients do not take satisfaction surveys, causing a response bias.

Convenience Bias Another common bias is a convenience sample, where individuals who are easily accessible are more likely to be included in the sample. These individuals do not represent the entire picture. For example, people who hang up on automated telephone systems are not included in count of satisfied customers.

Confounding Variable Real Cause Presumed Cause Effect
Confounding variables affect both the explanatory variable and the response. If confounding variables are not measured, one might erroneously conclude that the explanatory variable is the cause of the response.

Confounding Variable Fire Firemen on scene Burning house
For example, one may erroneously conclude that firemen cause fires. Obviously the firemen are associated with burning houses. They are always there when a house is burning. It is a mistake to think they are the cause of the fire. Both they and the burning house are co-occur because of a third confounding variable: the fire.

Blaming Heroes Severity of Illness Medication Treatment
Patient Outcomes Consider what will occur if confounding variables are not taken into account. Treatment that is working could be judged to hurt patients. Typically, clinicians complain that in analysis of their patients severity of illness was not adequately accounted for. They claim that patients under their care would have died anyway if it were not for the few they saved. Inadequate adjustment for severity of patients illness leads to the mistake of attributing their deaths to the heroic efforts to save them. This is akin to blaming the fire on the firemen.

Think it Through Supposed 10% of patients report their attitudes on the web and 40% of patients who write a comment on web are negative, what percent of our patients are dissatisfied?

Think it Through Satisfied Report on web Not satisfied All patients
Do not report 0.06% 60% 40% 10% 0.04% Under assumption that reporting on the web is independent from satisfaction levels, we expect a total 4% of patients to report on the web and complain and 46% to not report on the web and be dissatisfied. This will set the rate of dissatisfied patients at 50% which is higher than the 40% that complain on the web. 90% 0.54% 60% 40% 0.46%

Think it Through 40% 10% In statistical inference, we want to make sure that the sample represents the population. Otherwise, prototypical patients not in sample are ignored leading to erroneous conclusions about the population of patients. 90% 60%

Oops For most managers, randomization is not possible. Therefore they have to come up with an alternative method to equivalent groups of patients.

Observational Studies
Since randomization is not always possible, an alternative is to make sense of data as they occur. This type of studies are called observational studies. Generally, data in observational studies are collected only by monitoring what occurs, while experiments require random assignment. In observational studies, patients and providers do what is best for them and data on their outcomes is used to find out what worked. Each patient encounter is recorded in the electronic health record and the outcomes of these encounters provide a telltale to detect if the interventions are working.

Retrospective Studies
Cases Exposed Controls not Exposed Outcomes Future Past Present Time Investigator Observational studies come in two forms: prospective and retrospective studies. A prospective study identifies individuals and collects information as events unfold. For instance, medical researchers may identify and follow a group of similar individuals over many years to assess the possible influences of behavior on cancer. Retrospective studies collect data after events have taken place. For example, managers may review past events in medical records to see which treatments are working. The availability of medical records have made it easier to conduct retrospective studies.

Implied Equivalence Investigator Cases Exposed Outcomes
Controls not Exposed Outcomes Present Time Matched Otherwise Almost all statistical analysis are based on the notion of implied randomness or equivalence of treatment and control groups. If observational data are not collected in a random framework from a population, these statistical methods are not reliable unless effort is made to match cases to controls. For each case in the treatment group, one randomly selects a control patient that is similar in all relevant variables except exposure to treatment. Past Future

Vioxx Study The Vioxx study is a good example of how retrospective matched case control studies work out. This medication was withdrawn from the market when it was shown that cases treated with this medication were twice more likely to have a cardiac event than matched controls not treated with this medication. At the time of the study, Vioxx was had sales exceeding a billion dollars.

Take Home Lesson In observational studies, controls are matched to cases in all relevant variables except for exposure to treatment The take home lesson is that in observational studies, controls are matched to cases in all relevant variables except for exposure to treatment

Do One: Identify cases. Select two controls that match on age & have lowest random number. Calculate the proportion of success for case and matched controls Patient ID Treated Age Outcome Random 1 Yes Young Positive 0.24 2 No 0.85 3 Old Negative 0.64 4 0.70 5 0.87 6 0.72 7 0.86 8 0.16 9 0.17

Data Collection Principles

Similar presentations

Presentation on theme: "Data Collection Principles"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Data Collection Principles

Similar presentations

Presentation on theme: "Data Collection Principles"— Presentation transcript:

Similar presentations

About project

Feedback