Why I chose: First reading results seemed counterintuitive Introduction full of references I didn’t know Useful? Or Gee Whizz so what?...Needed to read.

Slides:

Advertisements

Similar presentations

Multivariate Meta-analysis: Notes on Correlations Robert Platt Department of Epidemiology & Biostatistics McGill University Jack Ishak United BioSource.

Advertisements

Linkage and Genetic Mapping

Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION

Sequential Kernel Association Tests for the Combined Effect of Rare and Common Variants Journal club (Nov/13) SH Lee.

Imputation for GWAS 6 December 2012.

Analysis of imputed rare variants

Introduction to Haplotype Estimation Stat/Biostat 550.

Chapter 5 Multiple Linear Regression

What is an association study? Define linkage disequilibrium

Gene-by-Environment and Meta-Analysis Eleazar Eskin University of California, Los Angeles.

Chapter 11: The t Test for Two Related Samples

GWAS: Installing and Testing

1 A Systematic Review of Cross- vs. Within-Company Cost Estimation Studies Barbara Kitchenham Emilia Mendes Guilherme Travassos.

Publications Reviewed Searched Medline Hand screening of abstracts & papers Original study on human cancer patients Published in English before December.

Association Tests for Rare Variants Using Sequence Data

Why this paper Causal genetic variants at loci contributing to complex phenotypes unknown Rat/mice model organisms in physiology and diseases Relevant.

Sample size estimation

EVAL 6970: Meta-Analysis Vote Counting, The Sign Test, Power, Publication Bias, and Outliers Dr. Chris L. S. Coryn Spring 2011.

G ENOTYPE AND SNP C ALLING FROM N EXT - GENERATION S EQUENCING D ATA Authors: Rasmus Nielsen, et al. Published in Nature Reviews, Genetics, Presented.

Meta-analysis for GWAS BST775 Fall DEMO Replication Criteria for a successful GWAS P

METHODS FOR HAPLOTYPE RECONSTRUCTION

Multiple Comparisons Measures of LD Jess Paulus, ScD January 29, 2013.

Objectives Cover some of the essential concepts for GWAS that have not yet been covered Hardy-Weinberg equilibrium Meta-analysis SNP Imputation Review.

Understanding GWAS Chip Design – Linkage Disequilibrium and HapMap Peter Castaldi January 29, 2013.

Variant discovery Different approaches: With or without a reference? With a reference – Limiting factors are CPU time and memory required – Crossbow –

MSc GBE Course: Genes: from sequence to function Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue.

Efficient Estimation of Emission Probabilities in profile HMM By Virpi Ahola et al Reviewed By Alok Datar.

1 Bayesian inference of genome structure and application to base composition variation Nick Smith and Paul Fearnhead, University of Lancaster.

Genotyping of James Watson’s genome from Low-coverage Sequencing Data Sanjiv Dinakar and Yözen Hernández.

(1) Risk prediction by kernels and (2) Ranking SNPs Usman Roshan.

Give me your DNA and I tell you where you come from - and maybe more! Lausanne, Genopode 21 April 2010 Sven Bergmann University of Lausanne & Swiss Institute.

Host Genomics in WIHS  The WIHS GWAS data set  Concept Sheet  Data use agreement  Data transfer  Analytic support.

A Comparative Study of Search Result Diversification Methods Wei Zheng and Hui Fang University of Delaware, Newark DE 19716, USA

Q2010, Helsinki Development and implementation of quality and performance indicators for frame creation and imputation Kornélia Mag László Kajdi Q2010,

An Efficient Method of Generating Whole Genome Sequence for Thousands of Bulls Chuanyu Sun 1 and Paul M. VanRaden 2 1 National Association of Animal Breeders,

CS177 Lecture 10 SNPs and Human Genetic Variation

Genome-Wide Association Study (GWAS)

Jianfeng Xu, M.D., Dr.PH Professor of Public Health and Cancer Biology Director, Program for Genetic and Molecular Epidemiology of Cancer Associate Director,

Methods in genome wide association studies. Norú Moreno

Sampling Design in Regional Fine Mapping of a Quantitative Trait Shelley B. Bull, Lunenfeld-Tanenbaum Research Institute, & Dalla Lana School of Public.

The Impact of Missing Data on the Detection of Nonuniform Differential Item Functioning W. Holmes Finch.

Linear Reduction Method for Tag SNPs Selection Jingwu He Alex Zelikovsky.

California Pacific Medical Center

Association analysis Genetics for Computer Scientists Biomedicum & Department of Computer Science, Helsinki Päivi Onkamo.

The International Consortium. The International HapMap Project.

Imputation-based local ancestry inference in admixed populations

Paul VanRaden and Chuanyu Sun Animal Genomics and Improvement Lab USDA-ARS, Beltsville, MD, USA National Association of Animal Breeders Columbia, MO, USA.

Practical With Merlin Gonçalo Abecasis. MERLIN Website Reference FAQ Source.

Fast test for multiple locus mapping By Yi Wen Nisha Rajagopal.

Linkage Disequilibrium and Recent Studies of Haplotypes and SNPs

Lectures 7 – Oct 19, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall.

The Haplotype Blocks Problems Wu Ling-Yun

Genome-Wides Association Studies (GWAS) Veryan Codd.

Exposure Prediction and Measurement Error in Air Pollution and Health Studies Lianne Sheppard Adam A. Szpiro, Sun-Young Kim University of Washington CMAS.

Increasing Power in Association Studies by using Linkage Disequilibrium Structure and Molecular Function as Prior Information Eleazar Eskin UCLA.

Power and Meta-Analysis Dr Geraldine M. Clarke Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for.

Date of download: 7/2/2016 Copyright © 2016 American Medical Association. All rights reserved. From: How to Interpret a Genome-wide Association Study JAMA.

Genome Wide Association Studies using SNP

Marker heritability Biases, confounding factors, current methods, and best practices Luke Evans, Matthew Keller.

The effect of using sequence data instead of a lower density SNP chip on a GWAS EAAP 2017; Tallinn, Estonia Sanne van den Berg, Roel Veerkamp, Fred van.

Beyond GWAS Erik Fransen.

Colocalization of GWAS and eQTL Signals Detects Target Genes

Arpita Ghosh, Fei Zou, Fred A. Wright

A Flexible Bayesian Framework for Modeling Haplotype Association with Disease, Allowing for Dominance Effects of the Underlying Causative Variants Andrew.

Random-Effects Model Aimed at Discovering Associations in Meta-Analysis of Genome- wide Association Studies Buhm Han, Eleazar Eskin The American Journal.

Volume 7, Issue 3, Pages e12 (September 2018)

GWAS-eQTL signal colocalisation methods

Selecting a Maximally Informative Set of Single-Nucleotide Polymorphisms for Association Analyses Using Linkage Disequilibrium Christopher S. Carlson,

Colocalization of GWAS and eQTL Signals Detects Target Genes

Leveraging Multi-ethnic Evidence for Mapping Complex Traits in Minority Populations: An Empirical Bayes Approach Marc A. Coram, Sophie I. Candille, Qing.

Presentation transcript:

Why I chose: First reading results seemed counterintuitive Introduction full of references I didn’t know Useful? Or Gee Whizz so what?...Needed to read in detail Seemed relevant to our MND study GWAS + imputation + sequencing Nicely laid our for journal club presentation

Localisation success rate = probability that the causal SNP is top ranked within an associated region Consider 2 SNPs One causal from sequencing or imputation – imperfect genotyping accuracy One tag from GWAS perfect genotyping accuracy MAF both SNPs = 0.12 Causal SNP OR =1.25 Selection at tag SNP based on p-value < 0.05 in 1000 cases & 1000 controls Correlation between actual and estimated genotype at the causal SNP Correlation between actual genotype at causal and genotyped SNPs Call rate at causal SNP Generates Fig 1-3 Association test statistic at causal or genotyped SNP depends on joint effects of selection based on p-value, tagging and genotyping accuracy

Figure 1. Tagging effect decreases localization success rates with or without the selection effect. A& B Tight linkage disequilibrium between SNPs can obscure the causal SNP C&D Selection at the tag SNP inflates the association evidence at the tag, increasing the probability that it outranks the causal SNP Localisation success rate = probability that the causal SNP is top ranked within an associated region Causal MAF 0.12 Correlation causal & non-causal seq SNP 0.9 OR=1.25 Perfect genotyping accuracy Tag MAF 0.12

Fig S8: Tagging effect decreases localization success rates with or without the selection effect, 3 SNPs:1 tag, 1 causal, 1 noncausal sequencing SNP. Fig S9: Tagging effect decreases localization success rates with or without the selection effect 5 SNPs: 1 tag, 1 causal, 3 non- causalsequencing SNPs. Causal MAF 0.12 Correlation causal & non-causal seq SNP 0.9 OR=1.25 Perfect genotyping accuracy Tag MAF 0.12 Causal MAF 0.02 Correlation causal & non-causal seq SNP 0.9 OR=1.5 Perfect genotyping accuracy Tag MAF 0.02 Causal MAF 0.02 Correlation causal & non-causal seq SNP 0.9 OR=1.5 Perfect genotyping accuracy Tag MAF 0.02

Figure 2. Low genotyping accuracy at causal SNP further reduces localization success rates with or without the selection effect. Sequencing or imputation error decreases the localization success rate, with or without tag selection Causal MAF 0.12 OR=1.25 Tag MAF 0.12 Perfect genotyping accuracy for tag SNP

S4. Low genotyping accuracy at causal SNP further reduces localization success rates with or without the selection effect RARE causal SNP Causal MAF 0.02 OR=1.5 Tag MAF 0.02 Perfect genotyping accuracy for tag SNP

S5. Low genotyping accuracy at causal SNP further reduces localization success rates with or without the selection effect common causal SNP Causal MAF 0.25 OR=1.25 Tag MAF 0.25 Perfect genotyping accuracy for tag SNP

Figure 3. Counter-intuitively, sample size can reduce localization success rate Well-tagged causal SNPs sequenced with low accuracy are unlikely to be correctly identified even as sample size increases. When the causal SNP is less accurately genotyped than one of its highly correlated proxies (i.e.  C <  G and r CG is large), the proxy SNP may capture the association better than the causal SNP. As a result, this proxy SNP will out-rank the causal SNP more than 50% of the time. Causal MAF 0.12 Correlation causal & non-causal seq SNP 0.9 OR=1.25 Perfect genotyping accuracy Tag MAF 0.12

MAF = 0.02 MAF = 0.12 MAF=0.25 Results so far demonstrate the need to correct for the joint effects of selection, tagging and genotyping accuracy on the localization success rate. How to correct?

Test statistic at sequenced SNP Call rates i.e missingness Joint vs individual G=tag S=seq Correlation between genotyped and sequenced in sample when no errors Estimate of selection bias of genetic effect at tag SNP – form of winner’s curse Correlation between true genotype and sequenced genotype in the sample Revised test statistic at sequenced SNP When low difference between test statistic and revised test statistic increases Missingness rate Is zero if independent samples are used for sequencing and identification of tag SNP

G= genotyped C=causal rCG = correlation between genotyped and causal SNPs Selection effect most pronounced when low power at the tag SNP

The higher the correlation between the genotyped and sequenced SNP, the higher the test statistic at the sequenced SNP and the lower its variance Unconditional expected association at the sequenced SNP Distortion due to the tag SNP selection propogated through correlation SNPs in high LD with the tag are more likely to be top-ranked = “tagging effect”

Boot strap resampling at the genome-wide level Incorporates information across the whole genome to account for effects of LD and rank on bias Counts of missingness Estimate from sample Mean posterior genotype eg MACH ratio of variance estimate or full genotype posterior probabilities eg BEAGLE r 2

Scenario 1: GWAS used for discovery, and sequencing/ imputation used for fine- mapping around GWAS ‘‘hits’’ using the same GWAS sample. GWAS-focused design based on the WTCCC Type 1 Diabetes A significant region is identified by a significant GWAS tag SNP (p < 5x10 -7 ) and followed by fine-mapping with post-GWAS data (sequenced or imputed SNPs) in the region surrounding the tag SNP. The SNP with the largest test statistic in the region is selected as the best candidate causal SNP. Scenario 2: All GWAS and sequenced/imputed SNPs used for discovery and fine-mapping in the same dataset. Scenario 3: Discovery and fine-mapping using different datasets. Scenario 4: Discovery and fine-mapping using different datasets + Multiple causal SNPs. Scenario 5 Discovery and fine-mapping using different datasets + missing data (imperfect call rate)

Table 2. Parameters and parameter values of the main simulation studies.

Table 3. Localization success rates for simulation Scenarios 1, 2, 3, 4. Scenario 1: GWAS used for discovery, and sequencing/ imputation used for fine- mapping around GWAS ‘‘hits’’ using the same GWAS sample. Across table Down table Adverse effect of tagging (down table) and genotyping accuracy (across table) are highest when causal SNP is well tagged (larger r) and less accurately sequenced (low rho) e.g. high density GWAS followed by low density sequencing Well-tagged causal SNPs suffer lower localisation success rates because perfectly genotyped tag captures the association better than the imperfectly sequenced/imputed causal SNP No good if tag is causal After re-rankig localisation success rate “similar” to when tag is not causal. “Minor tradeoff” as GWAS SNP unlikely to be causal

Table 3. Localization success rates for simulation Scenarios 1, 2, 3, 4. Scenario 2: All GWAS and sequenced/imputed SNPs used for discovery and fine-mapping in the same dataset. Scenario 2: All GWAS and sequenced/imputed SNPs used for discovery and fine-mapping in the same dataset ie significance is not required at the GWAS SNP. Impact of sample size, correlation between tag and causal SNP fixed Genotyping accuracy alone impacts Big impact of re-ranking when low seq cover and large sample size

Table 3. Localization success rates for simulation Scenarios 1, 2, 3, 4. Scenario 3: Discovery and fine-mapping using different datasets. Very simialar rates to scenario 2

Table 3. Localization success rates for simulation Scenarios 1, 2, 3, 4. Scenario 4: Discovery and fine-mapping using different datasets (as 3)+ Multiple causal SNPs Improves re- ranking for both causal SNPs

Table 4. Localization success rates for simulation Scenarios 5a. Scenario 5 Discovery and fine-mapping using different datasets + missing data (imperfect call rate) (across table changed) Missing data affect localisation success rates in a similar manner to imperfect genotyping accuracy

Summary from simulation GWAS-based region selection or moderate genotype error substantially reduces the probability of correctly identifying the causal SNP Proposed re-ranking can recover lost power increasing localisation success rates by 1.5 to 3 times When genotypig accuracy is high power lost due to tagging is small so re-ranking has no effect

Figure 4. Naïve test statistics and re-ranking statistics for regions surrounding rs in the 8q24.21 region for association with prostate cancer risk. Michaela et al Prostate cancer Consortium different genotyping platforms Imputed to 1000 Genomes Fixed-effect meta-analysis Cohorts excluded from assocation analysis if imputation r2 < 0.8 Report 5 statistically independent regions within 8q24.21 locus plus 11q13.3 and 17q24.3 Selected all SNPs in LD r2 > 0.2 with index SNP Didn’t exclude studies based on imputation r2 Only correct for imputation accuracy ie deltaG =0 New top SNPs for 8q24.21 and 17q24.3 8q24.21: 2 SNPs move from lower ransks to top 10%

Figure 5. Naïve test statistics and re-ranking statistics for regions surrounding rs in the 17q24.3 region for association with prostate cancer risk. 8 SNPs move from lower ranks to top 10% SNPs naively ranked in top 10% stay highly ranked When most SNPs are well genotyped re-ranking only makes subtle changes One poorly imputed SNP (yellow) moves form rank 245 to 16. Association driven by one study (rank 10), when removed SNP rank is 306 changing to 106

DISCUSSION Tagging and genotyping accuracy are non-trivial sources of bias that could obscure association evidence at the causal SNP Proposed re-ranking is simple to implement and can substantially increase the probability of identifying the causal SNP For low coverage sequencing we recommend the re-ranking method For imputation and high coverage sequencing we recommend that unfiltered SNPs in associated regions be used with the re-ranking method Large changes in rank should be carefully examined for heterogeneity between studies Re-ranking is most beneficial when genotyping accuracy is low High density genotyping followed by low density sequencing can generate misleading results- Don’t do it Imputation and sequencing software output accurate estimates of rho needed for the re-ranking

DISCUSSION Re-ranking important when study specific factors exacerbate GWAS-based selection and genotyping error High genetic diversity so sequence read are difficult to align Low LD among SNPs or lack of population-specific reference panel so poor imputation Low MAF SNPs tend to suffer from both low power and high genotyping error When genotyping accuracy is very poor, re-ranking may not be able to generate useful results- first consider accuracy thresholds recommended by genotype calling or imputation algorithm Re-ranking only improves localization success when applied to SNPs under the alternative, ie SNPs that re themselves causal or in LD with a causal SNP

Existing methods that incorporate genotyping uncertainty into tests for association do not completely recover lost power This paper considered frequentist and Bayesian methods of incorporating uncertainty We anticipate that re-ranking to correct for the adverse effects of selection, tagging and differential genotyping accuracy rates will continue to be important because cost- effective designs are for low-coverage large sample sizes