BSc Course: "Experimental design“ Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue de Bugnon 27.

Slides:



Advertisements
Similar presentations
Analysis of imputed rare variants
Advertisements

What is an association study? Define linkage disequilibrium
Qualitative and Quantitative traits
SHI Meng. Abstract The genetic basis of gene expression variation has long been studied with the aim to understand the landscape of regulatory variants,
Meta-analysis for GWAS BST775 Fall DEMO Replication Criteria for a successful GWAS P
Genetic Analysis in Human Disease
Multiple Comparisons Measures of LD Jess Paulus, ScD January 29, 2013.
Perspectives from Human Studies and Low Density Chip Jeffrey R. O’Connell University of Maryland School of Medicine October 28, 2008.
Objectives Cover some of the essential concepts for GWAS that have not yet been covered Hardy-Weinberg equilibrium Meta-analysis SNP Imputation Review.
Association Mapping David Evans. Outline Definitions / Terminology What is (genetic) association? How do we test for association? When to use association.
Modeling genetic and phenotypic data with the use of statistics Discovery of phenotypes influenced by the season of birth Can environment modify genetic.
Ingredients for a successful genome-wide association studies: A statistical view Scott Weiss and Christoph Lange Channing Laboratory Pulmonary and Critical.
Lab 13: Association Genetics. Goals Use a Mixed Model to determine genetic associations. Understand the effect of population structure and kinship on.
Dr. Almut Nebel Dept. of Human Genetics University of the Witwatersrand Johannesburg South Africa Significance of SNPs for human disease.
Biomedical Master Introduction to genome-wide association studies Metabolic diseases (B. Thorens) Biomedical Master: Metabolic diseases Lausanne, November.
A coalescent computational platform for tagging marker selection for clinical studies Gabor T. Marth Department of Biology, Boston College
Quantitative Genetics
Biomedical Master Introduction to genome-wide association studies Metabolic diseases (B. Thorens) Biomedical Master: Metabolic diseases Lausanne, October.
MSc GBE Course: Genes: from sequence to function Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue.
Human Migrations Saeed Hassanpour Spring Introduction Population Genetics Co-evolution of genes with language and cultural. Human evolution: genetics,
Computational Complexity The complexity of the MG model for a single SNP is determined by the complexity of the matrix operations in formulas used to iteratively.
Using biological networks to search for interacting loci in genome-wide association studies Mathieu Emily et. al. European journal of human genetics, e-pub.
Genome-Wide Association Studies
Genome-Wide Association Studies Xiaole Shirley Liu Stat 115/215.
Genomewide Association Studies.  1. History –Linkage vs. Association –Power/Sample Size  2. Human Genetic Variation: SNPs  3. Direct vs. Indirect Association.
Give me your DNA and I tell you where you come from - and maybe more! Lausanne, Genopode 21 April 2010 Sven Bergmann University of Lausanne & Swiss Institute.
Quantitative Genetics
Correlation and Regression Analysis
Genetic Analysis in Human Disease. Learning Objectives Describe the differences between a linkage analysis and an association analysis Identify potentially.
Rare and common variants: twenty arguments G.Gibson Homework 3 Mylène Champs Marine Flechet Mathieu Stifkens 1 Bioinformatics - GBIO K.Van Steen.
Modes of selection on quantitative traits. Directional selection The population responds to selection when the mean value changes in one direction Here,
IUMSP Institut universitaire de médecine sociale et préventive, Lausanne Exploring the association of the CYP1A1- CYP1A2 locus with blood pressure in CoLaus.
The Complexities of Data Analysis in Human Genetics Marylyn DeRiggi Ritchie, Ph.D. Center for Human Genetics Research Vanderbilt University Nashville,
A single-nucleotide polymorphism tagging set for human drug metabolism and transport Kourosh R Ahmadi, Mike E Weale, Zhengyu Y Xue, Nicole Soranzo, David.
The medical relevance of genome variability Gabor T. Marth, D.Sc. Department of Biology, Boston College Medical Genomics Course – Debrecen,
©Edited by Mingrui Zhang, CS Department, Winona State University, 2008 Identifying Lung Cancer Risks.
CS177 Lecture 10 SNPs and Human Genetic Variation
What host factors are at play? Paul de Bakker Division of Genetics, Brigham and Women’s Hospital Broad Institute of MIT and Harvard
From Genome-Wide Association Studies to Medicine Florian Schmitzberger - CS 374 – 4/28/2009 Stanford University Biomedical Informatics
Genome-Wide Association Study (GWAS)
Quantitative Genetics. Continuous phenotypic variation within populations- not discrete characters Phenotypic variation due to both genetic and environmental.
Quantitative Genetics
Jeff O’ConnellInterbull annual meeting, Orlando, FL, July 2015 (1) J. R. O’Connell 1 and P. M. VanRaden 2 1 University of Maryland School of Medicine,
Jianfeng Xu, M.D., Dr.PH Professor of Public Health and Cancer Biology Director, Program for Genetic and Molecular Epidemiology of Cancer Associate Director,
Finnish Genome Center Monday, 16 November Genotyping & Haplotyping.
Methods in genome wide association studies. Norú Moreno
Lab 13: Association Genetics December 5, Goals Use Mixed Models and General Linear Models to determine genetic associations. Understand the effect.
Statistical Issues in Genetic Association Studies
Lecture 21: Quantitative Traits I Date: 11/05/02  Review: covariance, regression, etc  Introduction to quantitative genetics.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 11: Models Marshall University Genomics Core Facility.
In The Name of GOD Genetic Polymorphism M.Dianatpour MLD,PHD.
An atlas of genetic influences on human blood metabolites Nature Genetics 2014 Jun;46(6)
Simultaneous identification of causal genes and dys-regulated pathways in complex diseases Yoo-Ah Kim, Stefan Wuchty and Teresa M Przytycka Paper to be.
Genome-Wides Association Studies (GWAS) Veryan Codd.
Increasing Power in Association Studies by using Linkage Disequilibrium Structure and Molecular Function as Prior Information Eleazar Eskin UCLA.
1 Finding disease genes: A challenge for Medicine, Mathematics and Computer Science Andrew Collins, Professor of Genetic Epidemiology and Bioinformatics.
Power and Meta-Analysis Dr Geraldine M. Clarke Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for.
Date of download: 7/2/2016 Copyright © 2016 American Medical Association. All rights reserved. From: How to Interpret a Genome-wide Association Study JAMA.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 10: Comparing Models.
University of Colorado at Boulder
SNPs and complex traits: where is the hidden heritability?
David Daniel Rico Sarvenaz Zóltan.
Genome Wide Association Studies using SNP
Epidemiology 101 Epidemiology is the study of the distribution and determinants of health-related states in populations Study design is a key component.
Beyond GWAS Erik Fransen.
Medical genomics BI420 Department of Biology, Boston College
Medical genomics BI420 Department of Biology, Boston College
Evan G. Williams, Johan Auwerx  Cell 
GWAS-eQTL signal colocalisation methods
Presentation transcript:

BSc Course: "Experimental design“ Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue de Bugnon 27 - DGM 328 CH-1005 Lausanne Switzerland work: cell:

Overview Population stratification Associations: Basics Whole genome associations Genotype imputation Uncertain genotypes New Methods

Overview Population stratification Associations: Basics Whole genome associations Genotype imputation Uncertain genotypes New Methods

6’189 individuals Phenotypes 159 measurement 144 questions Genotypes SNPs CoLaus = Cohort Lausanne Collaboration with: Vincent Mooser (GSK), Peter Vollenweider & Gerard Waeber (CHUV)

ATTGCAATCCGTGG...ATCGAGCCA…TACGATTGCACGCCG… ATTGCAAGCCGTGG...ATCTAGCCA…TACGATTGCAAGCCG… ATTGCAATCCGTGG...ATCGAGCCA…TACGATTGCACGCCG…ATTGCAAGCCGTGG...ATCTAGCCA…TACGATTGCAAGCCG… Genetic variation in SNPs (Single Nucleotide Polymorphisms)

Analysis of Genotypes only Principle Component Analysis reveals SNP-vectors explaining largest variation in the data

Example: 2PCs for 3d-data Raw data points: {a, …, z}

Example: 2PCs for 3d-data Normalized data points: zero mean (& unit std)!

Example: 2PCs for 3d-data Identification of axes with the most variance Most variance is along PCA1 The direction of most variance perpendicular to PCA1 defines PCA2

Ethnic groups cluster according to geographic distances PC1 PC2

PCA of POPRES cohort

Overview Population stratification Associations: Basics Whole genome associations Genotype imputation Uncertain genotypes New Methods

Phenotypic variation:

What is association? chromosomeSNPstrait variant Genetic variation yields phenotypic variation Population with ‘ ’ allele Distributions of “trait”

Quantifying Significance

T-test t-value (significance) can be translated into p-value (probability)

Association using regression genotypeCoded genotype phenotype

Regression analysis X Y “response” “feature(s)” “intercept” “coefficients” “residuals”

Regression formalism (monotonic) transformation phenotype (response variable) of individual i effect size (regression coefficient) coded genotype (feature) of individual i p(β=0) error (residual) Goal: Find effect size that explains best all (potentially transformed) phenotypes as a linear function of the genotypes and estimate the probability (p-value) for the data being consistent with the null hypothesis (i.e. no effect)

Overview Population stratification Associations: Basics Whole genome associations Genotype imputation Uncertain genotypes New Methods

Whole Genome Association

Current microarrays probe ~1M SNPs! Standard approach: Evaluate significance for association of each SNP independently: significance

Whole Genome Association significance Manhattan plot observed significance Expected significance Quantile-quantile plot Chromosome & position GWA screens include large number of statistical tests! Huge burden of correcting for multiple testing! Can detect only highly significant associations ( p < α / #(tests) ~ )

GWAS: >20 publications in 2006/2007 Massive!

Genome-wide meta-analysis for serum calcium identifies significantly associated SNPs near the calcium-sensing receptor (CASR) gene Karen Kapur, Toby Johnson, Noam D. Beckmann, Joban Sehmi, Toshiko Tanaka, Zolt á n Kutalik, Unnur Styrkarsdottir, Weihua Zhang, Diana Marek, Daniel F. Gudbjartsson, Yuri Milaneschi, Hilma Holm, Angelo DiIorio, Dawn Waterworth, Andrew Singleton, Unnur Steina Bjornsdottir, Gunnar Sigurdsson, Dena Hernandez, Ranil DeSilva, Paul Elliott, Gudmundur Eyjolfsson, Jack M Guralnik, James Scott, Unnur Thorsteinsdotti, Stefania Bandinelli, John Chambers, Kari Stefansson, G é rard Waeber, Luigi Ferrucci, Jaspal S Kooner, Vincent Mooser, Peter Vollenweider, Jacques S. Beckmann, Murielle Bochud, Sven Bergmann

Current insights from GWAS: Well-powered (meta-)studies with (ten-)thousands of samples have identified a few (dozen) candidate loci with highly significant associations Many of these associations have been replicated in independent studies

Current insights from GWAS: Each locus explains but a tiny (<1%) fraction of the phenotypic variance All significant loci together explain only a small (<10%) of the variance David Goldstein: “~93,000 SNPs would be required to explain 80% of the population variation in height.” Common Genetic Variation and Human Traits, NEJM 360;17

The “Missing variance” (Non-)Problem Why should a simplistic (additive) model using incomplete or approximate features possibly explain anything close to the genetic variance of a complex trait? … and it doesn ’ t have to as long as Genome-wide Association Studies are meant to as an undirected approach to elucidate new candidate loci that impact the trait!

1.Other variants like Copy Number Variations or epigenetics may play an important role 2.Interactions between genetic variants (GxG) or with the environment (GxE) 3.Many causal variants may be rare and/or poorly tagged by the measured SNPs 4.Many causal variants may have very small effect sizes 5.Overestimation of heritabilities from twin-studies? So what do we miss?

Overview Population stratification Associations: Basics Whole genome associations Genotype imputation Uncertain genotypes New Methods

Intensity of Allele G Intensity of Allele A Genotypes are called with varying uncertainty

Some Genotypes are missing at all …

… but are imputed with different uncertainties

… using Linkage Disequilibrium! Markers close together on chromosomes are often transmitted together, yielding a non-zero correlation between the alleles. Marker 123n LD D

Conclusion Genotypic markers are always measured or inferred with some degree of uncertainty Association methods should take into account this uncertainty

Two easy ways dealing with uncertain genotypes 1.Genotype Calling: Choose the most likely genotype and continue as if it is true (p 11 =10%, p 12 =20% p 22 =70% => G=2) 2.Mean genotype: Use the weighted average genotype (p 11 =10%, p 12 =20% p 22 =70% => G=1.6)

Overview Associations: Basics Whole genome associations Population stratification Genotype imputation Uncertain genotypes New Methods

1.Improve measurements: - measure more variants (e.g. by UHS) - measure other variants (e.g. CNVs) - measure “molecular phenotypes” 2.Improve models: - proper integration of uncertainties - include interactions - multi-layer models How could our models become more predictive?

Towards a layered Systems Model We need intermediate (molecular) phenotypes to better understand organismal phenotypes

Organisms Data types Conditions Developmental Physiological Environmental Experimental Clinical –Genotypic data –Gene expression data –Metabolomics data –Interaction data –Pathways The challenge of many datasets: How to integrate all the information?

Network Approaches for Integrative Association Analysis Using knowledge on physical gene-interactions or pathways to prioritize the search for functional interactions

Using expression data from Hapmap panel we computed all pairwise interactions for several expression values: The strongest interactions do not occur for SNPs with high (marginal) significance! Bioinformatics Advance Access published online on April 7, 2010 Bioinformatics, doi: /bioinformatics/btq147

Transcription Modules reduce Complexity SB, J Ihmels & N Barkai Physical Review E (2003) /ExpressionView

Association of (average) module expression is often stronger than for any of its constituent genes

Overview Associations: Basics Whole genome associations Population stratification Genotype imputation Uncertain genotypes New Methods

Analysis of genome-wide SNP data reveals that population structure mirrors geography Genome-wide association studies elucidate candidate loci for a multitude of traits, but have little predictive power so far Future improvement will require –better genotyping (CGH, UHS, …) –New analysis approaches (interactions, networks, data integration) Take-home Messages: