Bias in Studies of the Human Genome Thomas A. Pearson, MD, PhD University of Rochester School of Medicine Visiting Scientist, NHGRI.

Slides:



Advertisements
Similar presentations
What is an association study? Define linkage disequilibrium
Advertisements

Deriving Biological Inferences From Epidemiologic Studies.
Single Nucleotide Polymorphism And Association Studies Stat 115 Dec 12, 2006.
Genetic research designs in the real world Vishwajit L Nimgaonkar MD, PhD University of Pittsburgh
Genetic Analysis in Human Disease
Multiple Comparisons Measures of LD Jess Paulus, ScD January 29, 2013.
Perspectives from Human Studies and Low Density Chip Jeffrey R. O’Connell University of Maryland School of Medicine October 28, 2008.
STUDY DESIGN CASE SERIES AND CROSS-SECTIONAL
Chance, bias and confounding
ODAC May 3, Subgroup Analyses in Clinical Trials Stephen L George, PhD Department of Biostatistics and Bioinformatics Duke University Medical Center.
Estimation and Reporting of Heterogeneity of Treatment Effects in Observational Comparative Effectiveness Research Prepared for: Agency for Healthcare.
Elements of a clinical trial research protocol
Potential Roles and Limitations of Biomarkers in Alzheimer’s Disease Richard Mayeux, MD, MSc Columbia University.
Case-Control Study Chunhua Song Warm up.
Bias in Epidemiology Wenjie Yang
Integrating domain knowledge with statistical and data mining methods for high-density genomic SNP disease association analysis Dinu et al, J. Biomedical.
SOME ADDITIONAL POINTS ON MEASUREMENT ERROR IN EPIDEMIOLOGY Sholom May 28, 2011 Supplement to Prof. Carroll’s talk II.
Bias and errors in epidemiologic studies Manish Chaudhary BPH( IOM) MPH(BPKIHS)
Cohort Studies Hanna E. Bloomfield, MD, MPH Professor of Medicine Associate Chief of Staff, Research Minneapolis VA Medical Center.
Doug Brutlag 2011 Genomics & Medicine Doug Brutlag Professor Emeritus of Biochemistry &
STrengthening the Reporting of OBservational Studies in Epidemiology
Genetic Analysis in Human Disease. Learning Objectives Describe the differences between a linkage analysis and an association analysis Identify potentially.
Cohort Study.
Unit 6: Standardization and Methods to Control Confounding.
Multiple Choice Questions for discussion
Genome-Wide Association (GWA) Studies National Human Genome Research Institute National Institutes of Health U.S. Department of Health and Human Services.
Epidemiologic Study Designs Nancy D. Barker, MS. Epidemiologic Study Design The plan of an empirical investigation to assess an E – D relationship. Exposure.
Molecular and Genetic Epidemiology Kathryn Penney, ScD January 5, 2012.
Epidemiology The Basics Only… Adapted with permission from a class presentation developed by Dr. Charles Lynch – University of Iowa, Iowa City.
Evidence-Based Medicine 3 More Knowledge and Skills for Critical Reading Karen E. Schetzina, MD, MPH.
Biostatistics Case Studies Peter D. Christenson Biostatistician Session 5: Analysis Issues in Large Observational Studies.
Web of Causation; Exposure and Disease Outcomes Thomas Songer, PhD Basic Epidemiology South Asian Cardiovascular Research Methodology Workshop.
Measures of Association
Population Genetics: Chapter 3 Epidemiology 217 January 16, 2011.
The Complexities of Data Analysis in Human Genetics Marylyn DeRiggi Ritchie, Ph.D. Center for Human Genetics Research Vanderbilt University Nashville,
Prospective Evaluation of B-type Natriuretic Peptide Concentrations and the Risk of Type 2 Diabetes in Women B.M. Everett, N. Cook, D.I. Chasman, M.C.
From Genome-Wide Association Studies to Medicine Florian Schmitzberger - CS 374 – 4/28/2009 Stanford University Biomedical Informatics
Lecture 5 Objective 14. Describe the elements of design of experimental studies: clinical trials and community intervention trials. Discuss the advantages.
Discussion for a statement for biobank and cohort studies in human genome epidemiology John P.A. Ioannidis, MD International Biobank and Cohort Studies.
Case-control study Chihaya Koriyama August 17 (Lecture 1)
Design and Analysis of Clinical Study 2. Bias and Confounders Dr. Tuan V. Nguyen Garvan Institute of Medical Research Sydney, Australia.
Experimental Design and Data Structure Supplement to Lecture 8 Fall
LEADING RESEARCH… MEASURES THAT COUNT Challenges of Studying Cardiovascular Outcomes in ADHD Elizabeth B. Andrews, MPH, PhD, VP, Pharmacoepidemiology and.
Jianfeng Xu, M.D., Dr.PH Professor of Public Health and Cancer Biology Director, Program for Genetic and Molecular Epidemiology of Cancer Associate Director,
System error Biases in epidemiological studies FETP India.
Case-Crossover Studies.
Risk Prediction of Complex Disease David Evans. Genetic Testing and Personalized Medicine Is this possible also in complex diseases? Predictive testing.
Case-Control Study Duanping Liao, MD, Ph.D
Organization of statistical research. The role of Biostatisticians Biostatisticians play essential roles in designing studies, analyzing data and.
The International Consortium. The International HapMap Project.
Understanding lack of validity: Bias
Case Control Study Dr Pravin Pisudde Moderator: Abhishek Raut.
Matching. Objectives Discuss methods of matching Discuss advantages and disadvantages of matching Discuss applications of matching Confounding residual.
Design of Clinical Research Studies ASAP Session by: Robert McCarter, ScD Dir. Biostatistics and Informatics, CNMC
BIOSTATISTICS Lecture 2. The role of Biostatisticians Biostatisticians play essential roles in designing studies, analyzing data and creating methods.
Lecture 7 From GWAS to EWAS & Interpretation of epigenetic data
Case-Control Studies Afshin Ostovar Bushehr University of Medical Sciences Bushehr, /14/20161.
Chapter 9 Lecture Research Techniques: For the Health Sciences Fifth Edition © 2014 Pearson Education, Inc. Conducting Analytical Epidemiologic Studies.
Purpose of Epi Studies Discover factors associated with diseases, physical conditions and behaviors Identify the causal factors Show the efficacy of intervening.
Table 1. Methodological Evaluation of Observational Research (MORE) – observational studies of incidence or prevalence of chronic diseases Tatyana Shamliyan.
Power and Meta-Analysis Dr Geraldine M. Clarke Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for.
(www).
Measures of disease frequency Simon Thornley. Measures of Effect and Disease Frequency Aims – To define and describe the uses of common epidemiological.
Epidemiological Methods
Marker heritability Biases, confounding factors, current methods, and best practices Luke Evans, Matthew Keller.
Epidemiology 101 Epidemiology is the study of the distribution and determinants of health-related states in populations Study design is a key component.
ERRORS, CONFOUNDING, and INTERACTION
Evaluating Effect Measure Modification
Discovery From Data Repositories H Craig Mak  Nature Biotechnology 29, 46–47 (2011) 2013 /06 /10.
Confounders.
Presentation transcript:

Bias in Studies of the Human Genome Thomas A. Pearson, MD, PhD University of Rochester School of Medicine Visiting Scientist, NHGRI

Lecture 6: Bias in Studies of the Human Genome 1. Consider the causes of heterogeneity of results in gene association studies. 2. Review the types and sources of bias relevant to human genomic research. 3. Provide examples from genome-wide association studies to illustrate biases or potential for bias. 4. Identify strategies in study design, data collection, statistical analysis, and interpretation which could prevent or minimize bias in human genome research.

Larson, G. The Complete Far Side

PLoS Med Aug;2(8):e124.

WSJ. 2004Sep14.

Disease/TraitGenePolymorph.Freq. DVTF5Arg506Gln0.015 Graves’ DiseaseCTLA4Thr17Ala0.62 Type 1 DMINS5’ VNTR0.67 HIV/AIDSCCR532 bp Ins/Del Alzheimer’sAPOEEpsilon 2/3/ Creutzfeldt-Jakob Disease PRNPMet129Val0.37 Hirschhorn J et al, Genet Med 2002; 4: Only 6/600 Gene-Disease Associations Significant in >75% of Studies (Hirschhorn J et al, Genet Med 2002; 4:45-61)

Possible Explanations of Heterogeneity of Results in Genetic Association Studies Biologic mechanisms –Genetic heterogeneity –Gene-gene interactions –Gene-environment interactions Spurious mechanisms –Inadequacies of genomic markers –Type 1 error –Limited sample sizes and power –Cohort, age, period (secular) effects –Bias

Definition of Bias in Human Research Sackett (1975): “Any process at any stage of inference which tends to produce results or conclusions that differ systematically from the truth.” Gordis (2004): “Any systematic error in the design, conduct, or analysis of a study that results in a mistaken estimate of an exposure’s effect on risk or disease.”

Effects of Bias on GWAS Results False negatives False positives Inaccurate effect sizes –Underestimates –Overestimates

Larson, G. The Complete Far Side

Types of Bias in Genome Association Studies Selection of cases and controls Information on genotype or phenotype Analysis and presentation of results Interpretation of results

20 Types of Biases Potentially Encountered in GWAS Common to all human observational studies (N=12) Unique or common in GWAS (N=8) –Supercase or supercontrol biases –Latent case bias –Population stratification –Hardy-Weinberg disequilibrium –Genotyping quality bias –Transmission disequilibrium bias –Winner’s Curse

Systematic Review of GWAS: NHGRI Catalog of GWAS in Print* 109 studies from 3/05 to 3/08. Genotyping platforms of density>100,000 SNPs Each study reviewed for: –Study design –Description of case and comparison groups –Collection of genotype and other risk factors –Presentation of study results –Interpretation of study results *

Characteristics of 109 GWAS Phenotypes –Discrete outcomes or traits: 91 in 83 studies –Quantitative traits: 40 in 26 studies Design of discovery study N % –Case-control –Trio –Nested case-control –Cross-sectional/Cohort

Four Key Requirements for a Bias-Free Case-Control Study Selection Bias –Cases are representative of all those who develop the disease being studied. –Controls are representative of all those at risk of developing the disease and eligible to become cases and be included in the study. –Ancestral geographical origins and predominant environmental exposures of cases do not differ dramatically from controls. Information Bias - Collection of risk factor and exposure information is the same for cases and controls.

Selection Biases in GWAS: Criteria for Classification Misclassification bias: Absence of description or use of adequate means to define cases and/or controls. Nonresponse bias: Absence of description of rates of recruitment and participation in cases and/or controls. Prevalence-incidence bias: Use of prevalent cases of disease which have sizable short term case-fatality or remission rates.

Larson, G. The Complete Far Side

Characteristics of 109 GWAS: Selection of Study Subjects Methods of selection/recruitment frequently (30%) described in supplement or other publication. Few baseline descriptors or cases/controls –Tables comparing cases vs. controls: 36.0% Statistical comparison of cases/controls: 3.5% –Participation rates (cases or controls): 9.0% Comparison of participants/nonparticipants: 2.0% Most cases (67%) prevalent cases derived from clinical sources, rather than population-based or incident cases.

GWAS of Type II Diabetes in Mexican-Americans* Case-control study design –281 cases with diabetes defined by current Dx/RX or fasting blood glucose or 2 hour GTT –280 persons from a random population sample whose T2DM status is unknown 112,541 SNPs assayed in each person 4 genes identified ?Misclassification: Substantial prevalence (7- 14%) of T2DM likely in controls. *Hayes MG et al. Diabetes, 9/10/07.

Selection Biases in GWAS: Criteria for Classification Supercase bias: Use of additional criteria in case selection that increases the chance of a genetic etiology. Supercontrol bias: Use of additional criteria in control selection that decreases the chance of a genetic etiology. Latent case bias: Inclusion as controls of persons who could never develop the disease even if a gene carrier.

A Case-Control GWAS of Prostate Cancer* Discovery Study –1854 cases with symptomatic prostate cancer and diagnosis <60years or positive family history. –1894 controls with age>50 years and PSA<0.5 ng/ml. –Genotyping of 541,129 SNPs –11 new SNPs associated (P<E-6) Replication Study –3268 cases/3354 controls –Genotyping of 11 SNPs –7 SNPs independently associated (P<E-7) *Eeles RA et al. NatGen 2/10/08

Prostate Cancer: 7 Novel SNPs in Discovery and Replication Studies Discovery Replication SNP OR 95%CI OR 95%CI rs rs rs rs rs rs rs Eeles RA et al: Nat Gen 2/10/08

Latent Cases in a GWAS of Prostate Cancer* Cases Controls Discovery Study Male Female Iceland Replications Netherlands Spain Sweden US-Baltimore US-Chicago US-Nashville US-Rochester *Gudmundsson J et al. Nat Gen 2008; 40:281-3

Selection Biases in GWAS: Criteria for Classification Membership bias: Membership in a group may imply a degree of health which differs systematically from that of the general population. Population Stratification: Genetic differences between cases and controls unrelated to disease but due to sampling from populations of different ancestries. Phenotypic variation bias: The use of different definitions of cases or controls between discovery study and subsequent replications.

Wellcome Trust Case-Control (WTCC) Consortium* Genotyping: 500,000 SNPs (Affymetrix) Cases: 2000 persons from each of 7 diseases: (bipolar disorder,coronary artery disease, Crohn disease, rheumatoid arthritis, T1DM, T2DM, hypertension) Controls: 3000 persons without disease 1500 in 1958 British Birth Cohort 1500 UK blood donors *WTCC, Nature 2007; 447:

Population Stratification* Each population has unique genetic and social history; ancestral patterns of migration, mating, expansions/bottlenecks, stochastic variation all yield differences in allele frequencies between populations. Population stratification: cases and controls have different allele frequencies due to diversity in populations of origin and unrelated to outcome, requiring: 1) differences in disease prevalence 2) differences in allele frequencies *Cardon LR, Palmer LJ, Lancet 2003

Downloaded from: StudentConsult (on 11 May :40 PM) © 2005 Elsevier

Population Stratification and Allelic Association Index of Indian heritage Gm3;5,13,14 + Gm3;5,13, %19.9% 428.3%28.8% 835.9%39.3% Full heritage Am. Indian population Gm3;5,13,14 prevalence: 1% NIDDM prevalence: 40% Caucasian population Gm3;5,13,14 prevalence: 66% NIDDM prevalence: 15% Gm3;5,13,14 haplotype NIDDM +NIDDM %29.0% -92.2%71.0% OR = 0.27 [0.18,0.40]] Cardon LR and Palmer LJ, Lancet 2003; 361: , after Knowler et al 1988.

Unlinked Genetic Markers in Population Stratification Population stratification (or any non-random mating) allows marker-allele frequencies to vary among population segments. Disease more prevalent in one subpopulation will be associated with any alleles in high frequency in that subpopulation. If population stratification exists, can often be detected by analysis of unlinked marker loci. [Pritchard JD, Rosenberg NA; AJHG 1999; 65: ].

Adjusting for Population Stratification in a GWAS of T2DM* Case-control study of 661 cases of T2DM and 614 controls from France. Genotyping assayed 392,935 SNPs SNP 200kb from lactase gene on 2q21: –Strong association with T2DM –Strong north-south prevalence gradient in France Used 20,323 SNPs not related to T2DM as measure of population stratification. After adjustment for stratification, most of the association was removed. *Sladek R et al. Nature 2007; 445:

Phenotypic Variation Bias: Are the case homogeneous? GWAS of Atrial Fibrillation* –Sample 1: hospital diagnosis of AF “confirmed by 12- lead ECG”. –Sample 2: patients with ischemic stroke or TIA, diagnosis of AF “based on 12-lead ECG.” –Sample 3: patients hospitalized with acute stroke “diagnosed with AF.” –Sample 4: patients with lone AF of AF plus hypertension referred to arrythmia service, “AF documented by ECG.” Gudbjartsson et al, Nature 2007; 448:

Information Bias: Systematic differences in data collection between cases and controls Genotyping quality bias: Lack of genotyping protocol for exclusion of SNPs for quality control criteria or publication of call rate. –Testing for Hardy-Weinberg disequilibrium –Transmission disequilibrium testing: differential rate of genotyping error leading to distortion of allele frequency in cases/controls

Is DNA Collected and Handled Identically in Cases and Controls? T1DM gene association study: cases from GRID Study, controls from 1958 British Birth Cohort Study examining 6322 SNPs. Samples from lymphoblastoid cell lines extracted using same protocol in two different laboratories. Case and control DNAs randomly ordered with teams masked to case/control status. Some extreme associations could not be replicated by second genotyping method. Clayton DG et, Nat Genet 2005; 37:

Biases in the Analysis and Presentation of Data Environmental exposure information bias: Lack of collection or presentation of known environmental causes of the disease or comparisons between cases and controls. Confounding control bias: Lack of statistical adjustment or stratified analysis in presence of potential confounding.

Characteristics of 109 GWAS: Confounding Few comparisons of environmental exposures known to predispose to disease between cases and controls. –Table comparing cases and controls: 36% –Statistical comparison of cases/controls: 3.5% –Statistical adjustment for differences: 16% –Stratified analysis by confounder group: 16%

Distribution of Three Known Risk Factors for Neovascular AMD in a GWA [DeWan A et al, Science 2006] Covariate Cases (n = 96) Controls (n = 130) Male sex (%)6833 Age (yrs)7574 Smokers (%)6326 DeWan A et al, Science 2006; 314:

Confounding Confounder: “A factor that distorts the apparent magnitude of the effect of a study factor on risk. Such a factor is a determinant of the outcome of interest and is unequally distributed among the exposed and the unexposed” (Last, 1983). –Associated with exposure –Independent cause or predictor of disease –Not an intermediate step in causal pathway C E D E C D Aschengrau and Seage, Essentials of Epidemiology in Public Health, 2003.

FTO Variants, Type 2 Diabetes, and Obesity* Diabetes Association Cohort OR [ 95% CI ] P WTCCC phase [ ] 2xE-8 WTCCC phase [ ] 5xE-7 DGI 1.03 [ ] 0.25 Frayling, 2007 and Zeggini, 2007

FTO Variants, Type 2 Diabetes, and Obesity* BMI Association TTAT AA WTCC Cases WTCC Controls *Frayling 2007 and Zeggini 2007

FTO Variants, Type 2 Diabetes, and Obesity* Diabetes Association Cohort OR[+/-95%] P WTCCC phase [ ] 2xE-8 WTCCC phase [ ] 5xE-7 DGI 1.03 [ ] 0.25 Diabetes Association Adjusted for BMI WTCCC phase [ ] 0.44 Frayling TM,et al. Science 2007; 316: Zeggini E, et al. Science 2007; 316:

Dealing with Confounders In design –Randomize –Restrict: confine study subjects to those within specified category of confounder –Match: select cases and controls so confounders equally distributed In analysis –Standardize: for age, gender, time –Stratify: separate sample into subsamples according to specified criteria –Multivariate analysis: adjust for many confounders Aschengrau and Seage, Essentials of Epidemiology in Public Health, 2003

Biases in the Analysis and Presentation of Data (Cont.) Alpha error control bias: Lack of correction of level of alpha error accepted as significant. Data dredging bias: Lack of replication studies testing hypotheses identified in a discovery study. The winner’s curse: The overestimation of the effect size in discovery GWAS at the extremes of their range with inability to replicate the odds ratios due to lack of adequate power to identify the true odds ratio of smaller magnitude.

Prostate Cancer: 7 Novel SNPs in Discovery and Replication Studies Discovery Replication SNP OR 95%CI OR 95%CI rs rs rs rs rs rs rs Eeles RA et al: Nat Gen 2/10/08

Larson, G. The Complete Far Side

Interpretation Biases in Genomic Research* Confirmation bias: evaluating evidence that supports one’s preconceptions differently from evidence that challenges these convictions. Rescue bias: discounting data by finding selective faults in the experiments Mechanism bias: being less skeptical when underlying science furnishes credibility for the data. *Kaptchuk TJ. BMJ 2003; 326:

Information to be Included in Initial Report Study information: –Source of cases and controls –Methods used for defining disease or trait –Participation rates and flow chart of selection –Standard “Table 1,” including rates of missing data –Success rate of DNA acquisition, comparability Genotyping and quality control procedures Results –Analysis methods in sufficient detail to understand and reproduce what was done –Simple single-locus and multi-marker (haplotype) association analyses –Significance of any known 'positive controls' Chanock, Manolio et al, Nature 2007; 447:

Controlling Bias in Genomic Research: Design Define population to be studied Maximize representativeness Use standard, reproducible methods for assignment of case/control status Use incident cases Select controls eligible to become cases Estimate and maximize participation rates Apply standard genotyping QC methods Replicate positive findings on different genotyping platform

Controlling Bias in Genomic Research: Analysis Describe sources and methods of ascertaining cases and controls Compare participants and non-participants Compare cases and controls Stratify and adjust for important confounders (including population stratification) Stratify and test for important interactions Report results of genotyping QC Report results of prior known associations

Larson, G. The Complete Far Side