University of Colorado at Boulder

Slides:



Advertisements
Similar presentations
15 The Genetic Basis of Complex Inheritance
Advertisements

Fitting Bivariate Models October 21, 2014 Elizabeth Prom-Wormley & Hermine Maes
Perspectives from Human Studies and Low Density Chip Jeffrey R. O’Connell University of Maryland School of Medicine October 28, 2008.
Quantitative Genetics Up until now, we have dealt with characters (actually genotypes) controlled by a single locus, with only two alleles: Discrete Variation.
Quantitative Genetics Theoretical justification Estimation of heritability –Family studies –Response to selection –Inbred strain comparisons Quantitative.
Lab 13: Association Genetics. Goals Use a Mixed Model to determine genetic associations. Understand the effect of population structure and kinship on.
Biometrical Genetics QIMR workshop 2013 Slides from Lindon Eaves.
Estimating “Heritability” using Genetic Data David Evans University of Queensland.
Quantitative Genetics
MSc GBE Course: Genes: from sequence to function Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue.
Chapter 5 Human Heredity by Michael Cummings ©2006 Brooks/Cole-Thomson Learning Chapter 5 Complex Patterns of Inheritance.
Give me your DNA and I tell you where you come from - and maybe more! Lausanne, Genopode 21 April 2010 Sven Bergmann University of Lausanne & Swiss Institute.
Karri Silventoinen University of Helsinki Osaka University.
Linkage in selected samples Manuel Ferreira QIMR Boulder Advanced Course 2005.
Lab 13: Association Genetics December 5, Goals Use Mixed Models and General Linear Models to determine genetic associations. Understand the effect.
Achievement & Ascription in Educational Attainment Genetic & Environmental Influences on Adolescent Schooling François Nielsen.
Lecture 21: Quantitative Traits I Date: 11/05/02  Review: covariance, regression, etc  Introduction to quantitative genetics.
Practical With Merlin Gonçalo Abecasis. MERLIN Website Reference FAQ Source.
Mx modeling of methylation data: twin correlations [means, SD, correlation] ACE / ADE latent factor model regression [sex and age] genetic association.
Lecture Outline What is Development? Themes/Issues in Developmental Psychology Developmental Systems Theories.
Model building & assumptions Matt Keller, Sarah Medland, Hermine Maes TC21 March 2008.
Power and Meta-Analysis Dr Geraldine M. Clarke Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for.
I. Statistical Methods for Genome-Enabled Prediction of Complex Traits OUTLINE THE CHALLENGES OF PREDICTING COMPLEX TRAITS ORDINARY LEAST SQUARES (OLS)
Extended Pedigrees HGEN619 class 2007.
SNPs and complex traits: where is the hidden heritability?
NORMAL DISTRIBUTIONS OF PHENOTYPES
Polygenic methods in analysis of complex trait genetics
GCTA Practical 2.
PBG 650 Advanced Plant Breeding
Bio 508: Evolution Robert Page Slides Courtesy of Dr. Voss
Univariate Twin Analysis
Stratification Lon Cardon University of Oxford
NORMAL DISTRIBUTIONS OF PHENOTYPES
Quantitative Variation
More on Specification and Data Issues
Genome Wide Association Studies using SNP
Marker heritability Biases, confounding factors, current methods, and best practices Luke Evans, Matthew Keller.
More on Specification and Data Issues
The Genetic Basis of Complex Inheritance
Genetics of qualitative and quantitative phenotypes
Epidemiology 101 Epidemiology is the study of the distribution and determinants of health-related states in populations Study design is a key component.
Linkage in Selected Samples
Beyond GWAS Erik Fransen.
Correlation for a pair of relatives
Genome-wide Association
I271b Quantitative Methods
Chapter 7 Multifactorial Traits
Huwenbo Shi, Nicholas Mancuso, Sarah Spendlove, Bogdan Pasaniuc 
Pi = Gi + Ei Pi = pi - p Gi = gi - g Ei = ei - e _ _ _ Phenotype
Association Analysis Spotted history
Improved Heritability Estimation from Genome-wide SNPs
GREML: Heritability Estimation Using Genomic Data
A Selection Operator for Summary Association Statistics Reveals Allelic Heterogeneity of Complex Traits  Zheng Ning, Youngjo Lee, Peter K. Joshi, James.
Genome-wide Complex Trait Analysis and extensions
Simulating and Modeling Extended Twin Family Data
Lecture 9: QTL Mapping II: Outbred Populations
Simulating and Modeling Genetically Informative Data
Chapter 7 Beyond alleles: Quantitative Genetics
Perspectives from Human Studies and Low Density Chip
Diego Calderon, Anand Bhaskar, David A
Correlation and Covariance
An Expanded View of Complex Traits: From Polygenic to Omnigenic
More on Specification and Data Issues
Power Calculation for QTL Association
BOULDER WORKSHOP STATISTICS REVIEWED: LIKELIHOOD MODELS
Interpretation of Association Signals and Identification of Causal Variants from Genome- wide Association Studies  Kai Wang, Samuel P. Dickson, Catherine.
MGS 3100 Business Analysis Regression Feb 18, 2016
Jung-Ying Tzeng, Daowen Zhang  The American Journal of Human Genetics 
科学抑或哲学?—— 从“遗传率消失之谜”说起
Presentation transcript:

University of Colorado at Boulder Assumptions in using SNPs to estimate trait heritability among ‘unrelated’ individuals Matthew Keller Teresa de Candia University of Colorado at Boulder

Outline Overview of estimating genetic variance tagged by SNPs how it works: HE-regression example how to interpret SNP h2 assumptions using GREML approach assortative mating & GREML/HE biases

Regression estimates of VA product of centered scores (here, z-scores) (the slope of the regression is an estimate of standardized VA (i.e., h2)

Regression estimates of VA average correlation between 2 genotypes across ALL MEASURED SNPS: (the slope of the regression is an estimate of standardized VA (i.e., h2)

Regression estimates of VA (the slope of the regression is an estimate of standardized VA (i.e., h2)

Regression estimates of VA COV(MZ) (the slope of the regression is an estimate of standardized VA (i.e., h2)

Regression estimates of VA COV(DZ) (the slope of the regression is an estimate of standardized VA (i.e., h2)

Regression estimates of VA 2*[COV(MZ)-COV(DZ)] = h2 = slope (the slope of the regression is an estimate of standardized VA (i.e., h2)

Regression estimates of VA (the slope of the regression is an estimate of standardized VA (i.e., h2)

Regression estimates of VA (the slope of the regression is an estimate of standardized VA (i.e., h2)

Regression estimates of VA_snp (the slope of the regression is an estimate of standardized VA_SNP (i.e., h2_SNP)

But how to interpret this “SNP” h2 (h2snp) average correlation between 2 genotypes across ALL (k=1…m) MEASURED SNPS: To estimate total narrow h2, we’d need the average correlation between 2 genotypes across ALL (k=1…q) CAUSAL VARIANTS:

But how to interpret this “SNP” h2 (h2snp) To the degree isn’t predicted well by , WHEN is not well predicted by ? This is actually the usual case, and it is not a bad thing! It happens to the degree that (typically unmeasured) causal variants aren’t well tagged by (in high LD with) measured SNPs E.g., rare causal variants.

Interpreting h2 estimated from SNPs (h2snp) If close relatives included (e.g., sibs), h2snp ≅ h2 estimated from a family-based method. Thus, interpret h2snp as you would h2 from AE model (including all biases!). If use only distant ‘relatives’ (the usual approach; e.g., pihat< .05): h2snp = proportion of VP due to VA captured by common SNPs. Upper bound of % VP GWAS can detect Gives idea of the aggregate importance of causal variants tagged by SNPs (mostly common ones b/c rare ones poorly tagged by common SNPs) By not using relatives who also share environmental effects: (a) VA estimate 'uncontaminated' by VC & VNA; (b) does not rely on assumptions required in family studies (e.g., r(MZ) > r(DZ) for only genetic reasons)

Big picture: using SNPs to estimate h2 It is an independent approach to estimating heritability Its assumptions are different from family-based models. It takes increasingly tortuous reasoning to suggest trait X isn’t heritable because methodological flaws in estimating h2 It provides a downwardly biased estimate of h2 – and this is good! h2snp helps elucidate the genetic architecture of complex traits The “still missing” h2 (h2family – h2snp) provides insight into the importance of rare variants, non-additive genetics, or over-estimation of h2twin. But it is not a panacea! Many issues still need to be worked out. As with twin designs, we can be misled under certain scenarios (e.g., assortative mating, maternal effects, etc.). Is an active area of research.

Unimportant assumptions in estimating h2snp See excellent paper by Speed et al, 2012, who investigated the following and found that these violations had little effect on estimates: Genetic and error effects are normally distributed effect sizes of causal variants is proportionate to 1/2pq (low MAF variants have bigger effects) All SNPs have an association with the phenotype

Important assumptions in estimating h2snp There is no correlation between environmental similarity and SNP_pihats. A common way this can occur is in stratified samples: E.g., estimating h2snp of diabetes in a sample that incudes Native Americans and Caucasians would get a serious inflation in h2snp. This be dealt with by controlling for principal components of GRM Another way: cases and controls genotyped separately The avg. LD between causal variants (CVs) & SNPs is the same as the average LD between SNPs and other SNPs. If E[r(CV,SNP)] > E[r(SNP,SNP)], you overestimate h2snp (and vice-versa) The average LD between CVs is same as between SNPs. If E[r(CV,CV)] > E[r(SNP,SNP)], you overestimate h2snp (and vice-versa).

Important assumptions in estimating h2snp There is no correlation between environmental similarity and SNP_pihats. A common way this can occur is in stratified samples: E.g., estimating h2snp of diabetes in a sample that incudes Native Americans and Caucasians would get a serious inflation in h2snp. This be dealt with by controlling for principal components of GRM Another way: cases and controls genotyped separately The avg. LD between causal variants (CVs) & SNPs is the same as the average LD between SNPs and other SNPs. If E[r(CV,SNP)] > E[r(SNP,SNP)], you overestimate h2snp (and vice-versa) The average LD between CVs is same as between SNPs. If E[r(CV,CV)] > E[r(SNP,SNP)], you overestimate h2snp (and vice-versa).

Assortative mating & estimating h2snp As discussed Thursday, AM biases estimates of VA down and VC up in twin studies. What kind of effect does it have on SNP-heritability estimates? Two years ago at this workshop, there was an interesting debate (and bet – no-one won) about effects AM would have on GCTA estimates. AM leads to very long-range LD between CVs themselves, but not between other parts of the genome (SNPs that don’t tag CVs) After much work by de Candia, Eaves, Carey, Evans, and myself: AM leads to an upward bias in estimates of equilibrium h2snp This occurs because AM creates covariances between CVs and these are correctly reflected in phenotypic covariances between individuals but poorly reflected in pihat matrix. Thus, variance of pihats is underestimated. Underestimating variance in a predictor leads to overestimates of the coefficients associated with that predictor.

Background and Previous Results Increases genetic variance in population by creating correlations among increasing alleles within individuals. Previously we had simulated very simplistic "genomes" of multivariate normally distributed causal variants. We analytically predicted and observed that, under AM, HE regression produces overestimates of equilibrium genetic variance We also observed, but did not analytically prove, that REML estimates are unstable across increasing sample size (decreasing) & increasing number of markers (increasing)

Current Results - Simulations Simulated populations over 20 generations of mating under varying levels of AM for a single trait using GeneEvolve (Rasool Tahmasbi). No. populations: 3 CVs: 1000 Heritability: 0.5 Relative pruning: >.05 Spousal phenotypic correlation: .4

Current Results - Simulations

Current Results – Height, Raw Phenotypic Variance: 85