Download presentation
Presentation is loading. Please wait.
1
Slides of this talk: google “Alkes HSPH”
Impact of negative selection on common variant disease architectures Alkes L. Price Harvard School of Public Health October 19, 2018 Slides of this talk: google “Alkes HSPH”
2
What is negative selection?
Negative selection is the negative pressure on allele frequencies of mutations that reduce fitness. Allele frequency Allele frequency Time Time Kryukov et al Am J Hum Genet, Kiezun et al PLoS Genet
3
Negative selection: causal effect on trait is larger for rare and low-frequency variants
Let p be minor allele frequency (MAF). Let β be per-allele causal effect on disease/trait. Let h2 = β22p(1−p) be variance explained by a SNP. Selection: α model Var(β) ~ [p(1−p)]α E(h2) ~ [p(1−p)]1+α rare variants explain more h2 if α < 0 (negative selection) than if α = 0 (no selection) Speed et al Am J Hum Genet; also see Schoech et al. biorxiv 09/13/17 (analytical derivations support validity of α model for p above a threshold)
4
Negative selection: causal effect on trait is larger for rare and low-frequency variants
Let p be minor allele frequency (MAF). Let β be per-allele causal effect on disease/trait. Let h2 = β22p(1−p) be variance explained by a SNP. Selection: α model Var(β) ~ [p(1−p)]α E(h2) ~ [p(1−p)]1+α rare variants explain more h2 if α < 0 (negative selection) α = −0.38 across 25 UK Biobank traits (rare variants: larger causal effects but smaller per-SNP h2 vs. common variants) Schoech et al. biorxiv 09/13/17; also see Zeng et al Nat Genet
5
Rare variants explain limited trait heritability,
despite larger causal effects + many rare variants MAF = 5% α = 0 (no selection) Schoech et al. biorxiv 09/13/17; also see Zeng et al Nat Genet
6
Beyond the α model: does negative selection impact common variant disease architectures?
MAF = 5% α = 0 (no selection) Schoech et al. biorxiv 09/13/17; also see Zeng et al Nat Genet
7
Outline 1. LD-dependent architectures 2. Functional architectures
3. Polygenicity
8
Outline 1. LD-dependent architectures 2. Functional architectures
3. Polygenicity
9
What does “LD-dependent architecture” mean?
10
What does “LD-dependent architecture” mean?
• SNPs with higher LD have higher average χ2 association statistics due to increased tagging of causal variants. Pritchard & Przeworski 2001 Am J Hum Genet
11
What does “LD-dependent architecture” mean?
• SNPs with higher LD have higher average χ2 association statistics due to increased tagging of causal variants. “LD-dependent architecture”: dependence of causal effect sizes on the level of LD of a SNP. Speed et al Am J Hum Genet
12
What does “LD-dependent architecture” mean?
• Common SNPs have higher LD and higher causal variance than rare SNPs => SNPs with higher LD have higher causal variance. Schoech et al. biorxiv 09/13/17; also see Zeng et al Nat Genet
13
What does “LD-dependent architecture” mean?
• Common SNPs have higher LD and higher causal variance than rare SNPs => SNPs with higher LD have higher causal variance. “LD-dependent architecture”: dependence of causal effect sizes on the level of LD of a SNP, after conditioning on MAF.
14
Inferring LD-dependent architectures from summary statistics using S-LDSC
Extend S-LDSC (Finucane et al Nat Genet) to continuous annot. q: E(χ2) = 1 + NΣq τqLDscoreq LDscoreq(SNP m) = = normalized conditional effect of annot. q (proportionate change in trait h2 per 1 s.d. increase in annot. q) am,q = value of annot. q at SNP m τq = conditional effect of annot. q h2 = genome-wide trait heritability m m Gazal et al Nat Genet
15
Inferring LD-dependent architectures using continuous LLD annotation
Level of LD ( LLD ): MAF-adjusted LD score (MAF-stratified quantile normalization) LDscoreLLD (SNP m) for continuous LLD annotation = • Include “baseline model” annotations (Finucane et al Nat Genet) • Also include binary annotations for 10 common SNP MAF bins • Simulations confirm robust results (not shown) Gazal et al Nat Genet
16
SNPs with lower MAF-adjusted level of LD (LLD) have larger causal effect sizes
Same sign of effect across all 56 traits (average N=101K)
17
Many annotations correlated to LD could contribute to LD-dependent architectures
LD-related annotations Predicted allele age (ARGweaver; Rasmussen et al PLoS Genet) LLD in Africans (LLD-AFR) Recombination rate (±10kb window; Hussin et al Nat Genet) GC-content (±1Mb window; Loh et al. 2015b Nat Genet) Replication timing (Koren et al Am J Hum Genet) Background selection (1 − B statistic; McVicker et al PLoS Genet) Nucleotide diversity (SNPs per kb; ±10kb window) CpG content (±50kb window) Functional annotations (Finucane et al Nat Genet) Coding, regulatory, conserved, etc.
18
Many annotations correlated to LD could contribute to LD-dependent architectures
| LD-related annotations Functional annotations from “baseline model” (Finucane et al Nat Genet)
19
Many annotations correlated to LD could contribute to LD-dependent architectures
| LD-related annotations Functional annotations from “baseline model” (Finucane et al Nat Genet)
20
Many annotations correlated to LD could contribute to LD-dependent architectures
| LD-related annotations Functional annotations from “baseline model” (Finucane et al Nat Genet)
21
Many LD-related annotations impact causal effect sizes
+ MAF Annotation + baseline model + MAF Meta-analysis of 31 independent traits
22
Many LD-related annotations impact causal effect sizes
+ MAF Annotation + baseline model + MAF Recombination rate has discordant sign of effect (Hill & Robertson 1966 Genet Res) Heritability is enriched in SNPs with low LLD in low recombination rate regions r = −0.63 Meta-analysis of 31 independent traits
23
Many LD-related annotations impact causal effect sizes after conditioning on baseline model
+ MAF Annotation + baseline model + MAF Meta-analysis of 31 independent traits
24
Many LD-related annotations impact causal effect sizes after conditioning on baseline model
+ MAF Annotation + baseline model + MAF LLD effect is 0.37x smaller when including annotations from baseline model Some, but not all, of LD-dependent architecture due to DHS, enhancers, etc. 0.37x Meta-analysis of 31 independent traits
25
Many LD-related annotations impact causal effect sizes after conditioning on baseline model
+ MAF Annotation + baseline model + MAF LLD effect is 0.51x smaller after adding baseline model Predicted allele age has largest effect. Meta-analysis of 31 independent traits
26
Many LD-related annotations impact causal effect sizes in joint fit with baseline model baseline-LD model Annotation + MAF Annotation + baseline model + MAF Joint-fit annotations + baseline model + MAF Meta-analysis of 31 independent traits
27
Many LD-related annotations impact causal effect sizes in joint fit with baseline model baseline-LD model Annotation + MAF Annotation + baseline model + MAF Joint-fit annotations + baseline model + MAF LLD effect is 0.51x smaller af 6 significant annotations in joint fit Meta-analysis of 31 independent traits
28
Many LD-related annotations impact causal effect sizes in joint fit with baseline model baseline-LD model Annotation + MAF Annotation + baseline model + MAF Joint-fit annotations + baseline model + MAF LLD effect is 0.51x smaller af predicted allele age has largest effect Meta-analysis of 31 independent traits
29
Forward simulations show that negative selection explains LD-dependent architectures
Annotation + MAF Annotation + baseline model + MAF Joint-fit annotations + baseline model + MAF Forward Simulations: impact on s • Forward simulations using SLiM (Messer 2013 Genetics) under African-European demographic model (Gravel et al PNAS) • Jointly regress selection coeff s on 4 LD-related annotations and minor allele frequency X X 31 traits Simulations
30
Proportion of heritability
Quintiles illustrate large effects of LD-related annotations from baseline-LD model 40% 30% Proportion of heritability 20% 10% 0% Youngest 20% explain 3.8x more heritability than oldest 20% vs. 1.8x for MAF
31
Proportion of heritability
Quintiles illustrate large effects of TMRCA annotation inferred using ASMC • ASMCavg annotation: Average TMRCA inferred by Ascertained Sequentially Markovian Coalescent (ASMC) in GoNL WGS data • Jointly statistically significant with other LD-related annotations (τ* = ‒0.25±0.01) Proportion of heritability Low-TMRCA 20% explain 3.8x more heritability than high-TMRCA 20% Palamara et al Nat Genet
32
LD-dependent architectures can lead to bias in
estimates of heritability and functional enrichment Modeling LD-dependent architectures is critically important. Speed et al Am J Hum Genet, Gusev et al PLoS Genet, Yang et al Nat Genet, Speed et al Nat Genet, Gazal et al Nat Genet
33
How well does the baseline-LD model fit the data?
34
How well does the baseline-LD model fit the data?
Idea (Speed et al Nat Genet): use out-of-sample likelihoods for formal model comparisons Speed et al Nat Genet: LDAK model > infinitesimal model (“GCTA model”) in analysis of 1000G SNPs
35
2.8M 1000G SNPs: LDAK model > GCTA model
y-axis = change in log likelihood vs. LDAK model (16 UK Biobank traits) “GCTA” = infinitesimal model 2.8M SNPs from 1000G 4.6M SNPs from HRC ASHG 2018 poster 3432/W Price + Gazal et al. biorxiv 10/16/18
36
4.6M HRC SNPs: GCTA model (>) LDAK model
y-axis = change in log likelihood vs. LDAK model (16 UK Biobank traits) “GCTA” = infinitesimal model 2.8M SNPs from 1000G 4.6M SNPs from HRC ASHG 2018 poster 3432/W Price + Gazal et al. biorxiv 10/16/18
37
baseline-LD > LDAK and GCTA in both SNP sets
y-axis = change in log likelihood vs. LDAK model (16 UK Biobank traits) “GCTA” = infinitesimal model (also see Yang et al Nat Genet; LDMS model) “Gazal-LD” = LD + MAF annotations only from baseline-LD model “baseline-LD+LDAK” = model with baseline-LD + LDAK annotations 2.8M SNPs from 1000G 4.6M SNPs from HRC ASHG 2018 poster 3432/W Price + Gazal et al. biorxiv 10/16/18
38
Outline 1. LD-dependent architectures 2. Functional architectures
3. Polygenicity image from Shlyueva et al Nat Rev Genet
39
Common variant functional architectures: coding + regulatory (tissue-specific)
Coding variants explain ~10% Regulatory variant enrichments of common variant h are often tissue/cell-type-specific Finucane et al Nat Genet; also see Finucane et al Nat Genet
40
Low-frequency variant functional architectures: ??? + ???
Coding variants explain ??? Regulatory variants explain ??? of low-frequency variant h of low-frequency variant h2 ??? ??? ??? ??? Coding variants likely important for low-frequency variant architectures: UK10K 2015 Nature, Astle et al Cell, Marouli et al Nature
41
Inferring low-frequency variant functional architectures by extending S-LDSC
Multi-linear Regression: χ2 statistic = 1 + Σq(Nτq)LDscoreq • Separate annotations for common and low-frequency SNPs • Also include binary annotations for 5 low-frequency MAF bins • UK Biobank target samples + UK10K LD reference samples • Simulations confirm robust results (not shown) m m, ASHG 2018 poster 2699/F Gazal + Gazal et al Nat Genet
42
Inferring low-frequency variant functional architectures by extending S-LDSC
Common variant enrichment (CVE) of an annotation = prop. of hc2 / prop. of common SNPs Low-frequency variant enrichment (LFVE) of an annotation = prop. of hlf2 / prop. of low-frequency SNPs • Separate annotations for common and low-frequency SNPs • Also include binary annotations for 10 low-frequency MAF bins • UK Biobank target samples + UK10K LD reference samples • Simulations confirm robust results (not shown) ASHG 2018 poster 2699/F Gazal + Gazal et al Nat Genet
43
LFVE is correlated to CVE LFVE > CVE when CVE is large
33 main annotations: r(LFVE,CVE) = 0.79 Meta-analysis across 40 UK Biobank traits (average N = 363K) assoc. method: BOLT-LMM (Loh et al Nat Genet) Low-frequency variant enrichment (LFVE) Common variant enrichment (CVE)
44
LFVE is correlated to CVE LFVE > CVE when CVE is large
33 main annotations: r(LFVE,CVE) = 0.79 Non-synonymous variants: 17.3% of hlf2 vs. 2.1% of hc2 (Even larger LFVE for n.s. variants • predicted as damaging: PolyPhen-2 • in genes under strong selection: shet) Low-frequency variant enrichment (LFVE) Common variant enrichment (CVE)
45
LFVE ≈ CVE for most regulatory annotations but LFVE > CVE for brain annotations
637 cell-type-specific (CTS) annotation-trait pairs with significant CVE (Finucane et al Nat Genet) Low-frequency variant enrichment (LFVE) 55 brain annotation-trait pairs with LFVE/CVE>2x Common variant enrichment (CVE)
46
LFVE ≈ CVE for most regulatory annotations but LFVE > CVE for brain annotations
H3K4me3 in brain DPFC-Neuroticism: 56.9% of hlf2 vs. 11.7% of hc2 (P = ) Low-frequency variant enrichment (LFVE) Common variant enrichment (CVE)
47
LFVE/CVE ratio depends primarily on strength of selection
sdn = avg selection coefficient of deleterious de novo variants π = prob. that de novo variant is causal for trait Forward simulations (SLiM2 + τEyre-Walker) LFVE/CVE ratio Non-synonymous variants: LFVE/CVE=5x, sdn=‒0.003 55 brain annotation-trait pairs: LFVE/CVE>2x, sdn<‒0.0006 (potentially useful for WGS) Proportion of causal variants (π)
48
Outline 1. LD-dependent architectures 2. Functional architectures
3. Polygenicity image from Evangelou et al Nat Genet
49
Complex traits are extremely polygenic
Systolic blood pressure: GWAS of 1 million people identifies 901 genome-wide significant loci explaining 5.7% of trait variance (vs. total SNP-heritability = 21%) Evangelou et al Nat Genet; also see Purcell et al Nature, Yang et al Nat Genet, Stahl et al Nat Genet, PGC-SCZ 2014 Nature, Loh et al. 2015b Nat Genet, Zhang et al Nat Genet
50
Omnigenic model: polygenicity arises from extraordinary biological complexity
Boyle et al Cell also see Wray et al Cell, Liu et al. biorxiv 09/24/18
51
Flattening hypothesis: polygenicity arises because negative selection flattens common variant effect sizes (always small) ASHG 2018 poster 3528/W O’Connor + O’Connor et al. biorxiv 09/18/18
52
New definition of polygenicity: effective number of associated SNPs (Ma)
No LD between causal SNPs: (M = #SNPs, β = normalized effect size)
53
New definition of polygenicity: effective number of associated SNPs (Ma)
No LD between causal SNPs: (M = #SNPs, β = normalized effect size) If SNP effects follow normal distribution: Ma = number of SNPs If SNP effects follow point-normal distribution: Ma = number of causal SNPs
54
New definition of polygenicity: effective number of associated SNPs (Ma)
No LD between causal SNPs: (M = #SNPs, β = normalized effect size) If SNP effects follow normal distribution: Ma = number of SNPs If SNP effects follow point-normal distribution: Ma = number of causal SNPs Estimates of the number of causal SNPs under a point-normal model are sample size dependent (Zhang et al Nat Genet, O’Connor et al. biorxiv 09/18/18)
55
New method to estimate Ma: Stratified LD 4th moments regression (S-LD4M)
S-LDSC (Finucane et al Nat Genet): • Regress χ2 statistics on stratified LD scores (∑ r2) • Include baseline-LD model annotations (Gazal et al Nat Genet) S-LD4M (O’Connor et al. biorxiv 09/18/18): • Regress squared χ2 statistics on stratified LD 4th moments (∑ r4)
56
New method to estimate Ma: Stratified LD 4th moments regression (S-LD4M)
S-LD4M (O’Connor et al. biorxiv 09/18/18): • Regress squared χ2 statistics on stratified LD 4th moments (∑ r4) • Include baseline-LD model annotations (Gazal et al Nat Genet) Applicable to genome-wide SNP or categories of SNPs (e.g. low-frequency SNPs, coding SNPs, etc.)
57
New method to estimate Ma: Stratified LD 4th moments regression (S-LD4M)
S-LD4M (O’Connor et al. biorxiv 09/18/18): • Regress squared χ2 statistics on stratified LD 4th moments (∑ r4) • Include baseline-LD model annotations (Gazal et al Nat Genet) Applicable to genome-wide SNP or categories of SNPs (e.g. low-frequency SNPs, coding SNPs, etc.) Robust results in simulations
58
Approaches to understanding polygenicity
Love is Understanding. -- Madonna Data is Understanding. -- Alkes
59
Brain-related traits are particularly polygenic (Number of children is even more polygenic)
Results sub-selected from 33 diseases and complex traits (average N = 361K)
60
Common variants are more polygenic than low-frequency variants
Polygenicity (Ma) of common vs. low-frequency SNPs Common variants: ~4x more polygenic than low-frequency variants (evolutionary modeling: ≥30x more polygenic than de novo variants)
61
Functional categories are more polygenic
(in proportion to heritability enrichment) Main functional categories from baseline-LD model Results aggregated across common + low-frequency variants heritability enrichment reflects differences in the number of associations rather than their magnitude (which is constrained by selection)
62
Flattening hypothesis: polygenicity arises because negative selection flattens common variant effect sizes (always small) ASHG 2018 poster 3528/W O’Connor + O’Connor et al. biorxiv 09/18/18
63
Flattening hypothesis: implications for GWAS
• GWAS effect sizes are largely determined by negative selection, not just the biological importance of the implicated gene.
64
Flattening hypothesis: implications for GWAS
• GWAS effect sizes are largely determined by negative selection, not just the biological importance of the implicated gene. • Weak perturbations to strongly constrained genes will yield more insights than strong perturbations to weakly constrained genes.
65
Flattening hypothesis: implications for GWAS
• GWAS effect sizes are largely determined by negative selection, not just the biological importance of the implicated gene. • Weak perturbations to strongly constrained genes will yield more insights than strong perturbations to weakly constrained genes. 37 fine-mapped IBD loci (Huang et al Nature): 0/8 candidate genes with fine-mapped coding variants vs. 12/29 candidate genes near fine-mapped non-coding variants were loss-of-function intolerant (pLI ≥ 0.9; Lek et al Nature) (P = for difference)
66
Flattening hypothesis: implications for GWAS
• GWAS effect sizes are largely determined by negative selection, not just the biological importance of the implicated gene. • Weak perturbations to strongly constrained genes will yield more insights than strong perturbations to weakly constrained genes. • Rare variant association studies (in very large sample sizes) will usefully complement GWAS, as rare variant architectures are less impacted by flattening due to negative selection.
67
Outline 1. LD-dependent architectures 2. Functional architectures
3. Polygenicity
68
Conclusions • Low-LD variants have larger causal effect sizes (at a given MAF), consistent with negative selection (Gazal et al Nat Genet); the baseline-LD model attains higher likelihoods than other models in formal model comparisons (Gazal et al. biorxiv 10/16/18). Modeling LD-dependent architectures is critically important. • Non-synonymous + conserved + some brain-related annotations have LFVE >> CVE, consistent with strong negative selection (Gazal et al Nat Genet). • Common variants are more polygenic than low-frequency variants + common variants are far more polygenic than de novo variants, due to negative selection (O’Connor et al. biorxiv 09/18/18).
69
Acknowledgements Harvard T.H. Chan School of Public Health:
• Steven Gazal • Luke O’Connor Broad Institute: • Hilary Finucane BWH/Harvard Medical School: • Shamil Sunyaev • Po-Ru Loh • All authors of Gazal et al Nat Genet, Gazal et al. biorxiv 10/16/18, Gazal et al Nat Genet, O’Connor et al. biorxiv 09/18/18 Additional thanks to UK Biobank and 23andMe
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.