Data Collection Principles

Slides:



Advertisements
Similar presentations
Chapter 13: Inference for Distributions of Categorical Data
Advertisements

Chance, bias and confounding
Chapter 13 Notes Observational Studies and Experimental Design
Gathering Useful Data. 2 Principle Idea: The knowledge of how the data were generated is one of the key ingredients for translating data intelligently.
Designing Experiments
1 Chapter 11 Understanding Randomness. 2 Why Random? What is it about chance outcomes being random that makes random selection seem fair? Two things:
1-3 Data Collection and Sampling techniques  Data Collection  Surveys:  Most common method  Telephone Survey  Mailed questionnaire  Personal interview.
Experimental Design Ragu, Nickola, Marina, & Shannon.
Copyright © 2009 Pearson Education, Inc. Chapter 13 Experiments and Observational Studies.
Market research THE TIMES 100.
Experimental Research
AP Statistics Exam Review Topic #4
Elementary Statistics
Sampling Design Corresponds to Chapter 2 in Triola
Statistics 200 Lecture #10 Thursday, September 22, 2016
CHAPTER 4 Designing Studies
Descriptive study design
Collecting Data with Surveys and Scientific Studies
Collecting Data Sensibly
CASE-CONTROL STUDIES Ass.Prof. Dr Faris Al-Lami MB,ChB MSc PhD FFPH
Take-home quiz due! Get out materials for notes!
Experimental Design, Data collection, and sampling Techniques
CHAPTER 4 Designing Studies
Probability and Statistics
Experiments and Observational Studies
Relationship between Two Numerical Variables
Observational Studies and Experiments
Sampling And Sampling Methods.
Inferential Statistics and Probability a Holistic Approach
Observational Studies and Experiments
Principles of Experiment
Chapter 13- Experiments and Observational Studies
Designing experiments
Sampling Design Corresponds to Chapter 2 in Triola
Experiments and Observational Studies
CHAPTER 4 Designing Studies
Experimental Design Basics
AP Stats Check In Where we’ve been… Chapter 7…Chapter 8…
Basic Research Methods
Ten things about Experimental Design
Stratification Matters: Analysis of 3 Variables
Designing Experiments
Comparing two Rates Farrokh Alemi Ph.D.
Use your Chapter 1 notes to complete the following warm-up.
CHAPTER 4 Designing Studies
Expectation And Variance of Random Variables
Observational Studies
Wednesday, September 21, 2016 Farrokh Alemi, PhD.
Chapter 5: Producing Data
Chapter 2 What is Research?
Chapter 13: Inference for Distributions of Categorical Data
CHAPTER 4 Designing Studies
1.) Come up with 10 examples of how statistics are used in the real life. Be specific and unique. 2.) Video.
Improving Overlap Farrokh Alemi, Ph.D.
CHAPTER 4 Designing Studies
Experiments & Observational Studies
Collecting Data Sensibly
CHAPTER 4 Designing Studies
CHAPTER 4 Designing Studies
Chapter 7 Sampling and Sampling Distributions
11.2 Inference for Relationships
Sample Sizes for IE Power Calculations.
CHAPTER 4 Designing Studies
CHAPTER 4 Designing Studies
Elementary Statistics: Looking at the Big Picture
Designing Samples Section 5.1.
CHAPTER 4 Designing Studies
Categorical Data By Farrokh Alemi, Ph.D.
Probability and Statistics
MATH 2311 Sections 6.2.
Presentation transcript:

Data Collection Principles By Farrokh Alemi, Ph.D. This lecture is organized by Dr. Alemi. The lecture is based on the OpenIntro Statistics book.

Sampling Examine a subset of cases Sampling refers to the process of examining a subset of cases to infer the rate in the entire population

Randomization Examine a subset of cases Sampling randomly helps reduce bias. If someone was permitted to pick and choose exactly which was included in the sample, it is entirely possible that the sample could be skewed to that person's interests. This introduces bias into a sample. Randomization avoids bias.

Non-response Bias Angry patients do not take satisfaction surveys The act of taking a simple random sample helps minimize bias, however, bias can crop up in other ways. Even when people are picked at random, e.g. for surveys, caution must be exercised if the non-response is high. For instance, if only 30% of the people randomly sampled for a survey actually respond, then it is unclear whether the results are representative of the entire population. This non-response bias can skew results. For example, angry patients do not take satisfaction surveys, causing a response bias.

Convenience Bias Another common bias is a convenience sample, where individuals who are easily accessible are more likely to be included in the sample. These individuals do not represent the entire picture. For example, people who hang up on automated telephone systems are not included in count of satisfied customers.

Confounding Variable Real Cause Presumed Cause Effect Confounding variables affect both the explanatory variable and the response. If confounding variables are not measured, one might erroneously conclude that the explanatory variable is the cause of the response.

Confounding Variable Fire Firemen on scene Burning house For example, one may erroneously conclude that firemen cause fires. Obviously the firemen are associated with burning houses. They are always there when a house is burning. It is a mistake to think they are the cause of the fire. Both they and the burning house are co-occur because of a third confounding variable: the fire.

Blaming Heroes Severity of Illness Medication Treatment Patient Outcomes Consider what will occur if confounding variables are not taken into account. Treatment that is working could be judged to hurt patients. Typically, clinicians complain that in analysis of their patients severity of illness was not adequately accounted for. They claim that patients under their care would have died anyway if it were not for the few they saved. Inadequate adjustment for severity of patients illness leads to the mistake of attributing their deaths to the heroic efforts to save them. This is akin to blaming the fire on the firemen.

Think it Through Supposed 10% of patients report their attitudes on the web and 40% of patients who write a comment on web are negative, what percent of our patients are dissatisfied?

Think it Through Satisfied Report on web Not satisfied All patients Do not report 0.06% 60% 40% 10% 0.04% Under assumption that reporting on the web is independent from satisfaction levels, we expect a total 4% of patients to report on the web and complain and 46% to not report on the web and be dissatisfied. This will set the rate of dissatisfied patients at 50% which is higher than the 40% that complain on the web. 90% 0.54% 60% 40% 0.46%

Think it Through 40% 10% In statistical inference, we want to make sure that the sample represents the population. Otherwise, prototypical patients not in sample are ignored leading to erroneous conclusions about the population of patients. 90% 60%

Oops For most managers, randomization is not possible. Therefore they have to come up with an alternative method to equivalent groups of patients.

Observational Studies Since randomization is not always possible, an alternative is to make sense of data as they occur. This type of studies are called observational studies. Generally, data in observational studies are collected only by monitoring what occurs, while experiments require random assignment. In observational studies, patients and providers do what is best for them and data on their outcomes is used to find out what worked. Each patient encounter is recorded in the electronic health record and the outcomes of these encounters provide a telltale to detect if the interventions are working.

Retrospective Studies Cases Exposed Controls not Exposed Outcomes Future Past Present Time Investigator Observational studies come in two forms: prospective and retrospective studies. A prospective study identifies individuals and collects information as events unfold. For instance, medical researchers may identify and follow a group of similar individuals over many years to assess the possible influences of behavior on cancer. Retrospective studies collect data after events have taken place. For example, managers may review past events in medical records to see which treatments are working. The availability of medical records have made it easier to conduct retrospective studies.

Retrospective Studies Cases Exposed Controls not Exposed Outcomes Future Past Present Time Investigator Observational studies come in two forms: prospective and retrospective studies. A prospective study identifies individuals and collects information as events unfold. For instance, medical researchers may identify and follow a group of similar individuals over many years to assess the possible influences of behavior on cancer. Retrospective studies collect data after events have taken place. For example, managers may review past events in medical records to see which treatments are working. The availability of medical records have made it easier to conduct retrospective studies.

Implied Equivalence Investigator Cases Exposed Outcomes Controls not Exposed Outcomes Present Time Matched Otherwise Almost all statistical analysis are based on the notion of implied randomness or equivalence of treatment and control groups. If observational data are not collected in a random framework from a population, these statistical methods are not reliable unless effort is made to match cases to controls. For each case in the treatment group, one randomly selects a control patient that is similar in all relevant variables except exposure to treatment. Past Future

Vioxx Study The Vioxx study is a good example of how retrospective matched case control studies work out. This medication was withdrawn from the market when it was shown that cases treated with this medication were twice more likely to have a cardiac event than matched controls not treated with this medication. At the time of the study, Vioxx was had sales exceeding a billion dollars.

Take Home Lesson In observational studies, controls are matched to cases in all relevant variables except for exposure to treatment The take home lesson is that in observational studies, controls are matched to cases in all relevant variables except for exposure to treatment

Do One: Identify cases. Select two controls that match on age & have lowest random number. Calculate the proportion of success for case and matched controls Patient ID Treated Age Outcome Random 1 Yes Young Positive 0.24 2 No 0.85 3 Old Negative 0.64 4 0.70 5 0.87 6 0.72 7 0.86 8 0.16 9 0.17