Quality control for GWAS

Slides:



Advertisements
Similar presentations
Lecture 2 Strachan and Read Chapter 13
Advertisements

What is an association study? Define linkage disequilibrium
Lab 3 : Exact tests and Measuring Genetic Variation.
SHI Meng. Abstract The genetic basis of gene expression variation has long been studied with the aim to understand the landscape of regulatory variants,
METHODS FOR HAPLOTYPE RECONSTRUCTION
Study Designs in GWAS Jess Paulus, ScD January 30, 2013.
Objectives Cover some of the essential concepts for GWAS that have not yet been covered Hardy-Weinberg equilibrium Meta-analysis SNP Imputation Review.
Office hours Wednesday 3-4pm 304A Stanley Hall. Fig Association mapping (qualitative)
Rare and common variants: twenty arguments G.Gibson Homework 3 Mylène Champs Marine Flechet Mathieu Stifkens 1 Bioinformatics - GBIO K.Van Steen.
Factors to Consider in Selecting a Genotyping Platform Elizabeth Pugh June 22, 2007.
A single-nucleotide polymorphism tagging set for human drug metabolism and transport Kourosh R Ahmadi, Mike E Weale, Zhengyu Y Xue, Nicole Soranzo, David.
A Genome-wide association study of Copy number variation in schizophrenia Andrés Ingason CNS Division, deCODE Genetics. Research Institute of Biological.
HW2: exome sequencing and complex disease Jacquemin Jonathan de Bournonville Sébastien.
Kathleen Giacomini, Mark J. Ratain, Michiaki Kubo, Naoyuki Kamatani, and Yusuke Nakamura NIH Pharmacogenomics Research Network III & RIKEN Center for Genomic.
Data Quality Control Suzanne M. Leal Baylor College of Medicine Copyrighted © S.M. Leal 2015.
Familial coronary artery disease Paul Brennan Clinical Director Northern Genetics Service Newcastle Hospitals NHS Foundation Trust North East and North.
A brief guide to sequencing Dr Gavin Band Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for Health.
Power and Meta-Analysis Dr Geraldine M. Clarke Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for.
071126_EAS56_0057_FC – lanes 1-8 read 2 b a _EAS56_0057_FC – lanes 1-8 read 1 Table S1. Summary tables for a read 1 and b read 2 of a.
Interpreting exomes and genomes: a beginner’s guide
SNPs and complex traits: where is the hidden heritability?
Peng Yin1, Andrea L Jorgensen1, Andrew P Morris1, Richard Turner2, Richard Fitzgerald2, Rod Stables3, Anita Hanson2, Munir Pirmohamed2 1. Department of.
SNP Detection Congtam Pham 2/24/04 Dr. Marth’s Class.
Common variation, GWAS & PLINK
Gil McVean Department of Statistics
Genome-wide association study identifies new type 2 diabetes risk loci in Jordan subpopulations  Jin Li, Rana Dajani, Zhi Wei, Yousef Khader, Michael March,
Complex disease and long-range regulation: Interpreting the GWAS using a Dual Colour Transgenesis Strategy in Zebrafish.
Population genetics Dr Gavin Band
Global Variation in Copy Number in the Human Genome
Genome Wide Association Studies using SNP
KEY CONCEPT Hardy-Weinberg equilibrium provides a framework for understanding how populations evolve.
KEY CONCEPT Hardy-Weinberg equilibrium provides a framework for understanding how populations evolve.
KEY CONCEPT Hardy-Weinberg equilibrium provides a framework for understanding how populations evolve.
Gene Hunting: Design and statistics
Preparing data for GWAS analysis
High level GWAS analysis
Epidemiology 101 Epidemiology is the study of the distribution and determinants of health-related states in populations Study design is a key component.
Power to detect QTL Association
Beyond GWAS Erik Fransen.
Linking Genetic Variation to Important Phenotypes
KEY CONCEPT Hardy-Weinberg equilibrium provides a framework for understanding how populations evolve.
Type 2 Diabetes With type 2 diabetes, your body either resists the effects of insulin — a hormone that regulates the movement of sugar into your cells.
A genome-wide association study to identify genetic determinants of atopy in subjects from the United Kingdom  Yize I. Wan, BMedSci, David P. Strachan,
Lecture: Natural Selection and Genetic Drift and Genetic Equilibrium
KEY CONCEPT Hardy-Weinberg equilibrium provides a framework for understanding how populations evolve.
Caroline Durrant, Krina T. Zondervan, Lon R
10 Years of GWAS Discovery: Biology, Function, and Translation
Variant Association Tools for Quality Control and Analysis of Large-Scale Sequence and Genotyping Array Data  Gao T. Wang, Bo Peng, Suzanne M. Leal  The.
Guidelines for Large-Scale Sequence-Based Complex Trait Association Studies: Lessons Learned from the NHLBI Exome Sequencing Project  Paul L. Auer, Alex.
Medical genomics BI420 Department of Biology, Boston College
KEY CONCEPT Hardy-Weinberg equilibrium provides a framework for understanding how populations evolve.
Osteoarthritis year 2012 in review: genetics and genomics
Simultaneous Genotype Calling and Haplotype Phasing Improves Genotype Accuracy and Reduces False-Positive Associations for Genome-wide Association Studies 
Mixed Up Multiplication Challenge
Perspectives from Human Studies and Low Density Chip
10 Years of GWAS Discovery: Biology, Function, and Translation
KEY CONCEPT Hardy-Weinberg equilibrium provides a framework for understanding how populations evolve.
KEY CONCEPT Hardy-Weinberg equilibrium provides a framework for understanding how populations evolve.
Hardy - Weinberg Questions.
BF528 - Genomic Variation and SNP Analysis
Medical genomics BI420 Department of Biology, Boston College
KEY CONCEPT Hardy-Weinberg equilibrium provides a framework for understanding how populations evolve.
Modern Evolutionary Biology I. Population Genetics
Discovery From Data Repositories H Craig Mak  Nature Biotechnology 29, 46–47 (2011) 2013 /06 /10.
KEY CONCEPT Hardy-Weinberg equilibrium provides a framework for understanding how populations evolve.
Osteoarthritis year 2012 in review: genetics and genomics
L-GATOR: Genetic Association Testing for a Longitudinally Measured Quantitative Trait in Samples with Related Individuals  Xiaowei Wu, Mary Sara McPeek 
Analysis of protein-coding genetic variation in 60,706 humans
Evaluating the Effects of Imputation on the Power, Coverage, and Cost Efficiency of Genome-wide SNP Platforms  Carl A. Anderson, Fredrik H. Pettersson,
Presentation transcript:

Quality control for GWAS Jeff Barrett

Challenges to GWAS? Data quality control No common, single SNP main effects (all epistasis or rare variants or …) Sample size too small to detect effects Computational burden Multiple testing correction will drown signal Unmatched controls / population structure SNP chips don’t cover enough of the genome

Challenges to GWAS? Data quality control No common, single SNP main effects (all epistasis or rare variants or …) Sample size too small to detect effects Computational burden Multiple testing correction will drown signal Unmatched controls / population structure SNP chips don’t cover enough of the genome

Challenges to GWAS? Data quality control No common, single SNP main effects (all epistasis or rare variants or …) Sample size too small to detect effects Computational burden Multiple testing correction will drown signal Unmatched controls / population structure SNP chips don’t cover enough of the genome

What we want to work with

Getting from intensities to genotypes

Getting from intensities to genotypes

SNP QC SNP QC for GWAS aims to systematically identify these problems: Hardy-Weinberg equilibrium (expected frequency of three possible genotypes) Fraction of missing genotypes Frequency differences in separate controls (if available) …but the scale is huge: biggest meta-analyses involve > 1 trillion genotypes!

Calling wrinkles: > 3 clusters

Plate effects Transition to SSF site

Calling wrinkles: monomorphics

Calling wrinkles: rare SNPs

Missing data a good predictor of bad calling

Sample QC Collecting, processing and genotyping thousands of samples (often from many different clinicians, hospitals, countries. . . ) is difficult. Duplicates Unexpected relatives Samples with different ancestry Low quality DNA samples Sample mix-ups The good news is that simple analyses at scale are very informative.

Heterozygosity locally and globally A key advantage of GWAS is the sheer volume of data, which allows simple analyses. A heterozygous sample at one SNP isn’t particularly interesting, but what about across the entire genome?

Bad samples: call rate & heterozygosity

Data cleaning on X: gender

Bad samples: plate effects

Clean data matters!

Hit SNP 1

Hit SNP 2

The missed warning signs

The missed warning signs

The need for QC never dies

Useful references Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Wellcome Trust Case Control Consortium. Nature. 2007 Jun;447(7145):661-78. Data quality control in genetic case-control association studies. Anderson CA, Pettersson FH, Clarke GM, Cardon LR, Morris AP, Zondervan KT. Nat. Protoc. 2010 Sep;5(9):1564-73.