Slides of this talk: google “Alkes HSPH”

Slides:



Advertisements
Similar presentations
Sequential Kernel Association Tests for the Combined Effect of Rare and Common Variants Journal club (Nov/13) SH Lee.
Advertisements

Association Tests for Rare Variants Using Sequence Data
Single Nucleotide Polymorphism And Association Studies Stat 115 Dec 12, 2006.
Meta-analysis for GWAS BST775 Fall DEMO Replication Criteria for a successful GWAS P
Perspectives from Human Studies and Low Density Chip Jeffrey R. O’Connell University of Maryland School of Medicine October 28, 2008.
Signatures of Selection
Estimating “Heritability” using Genetic Data David Evans University of Queensland.
Human non-synonymous SNP: molecular function, evolution and disease Shamil Sunyaev Genetics Division, Brigham & Women’s Hospital Harvard Medical School.
MSc GBE Course: Genes: from sequence to function Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue.
Polymorphism Structure of the Human Genome Gabor T. Marth Department of Biology Boston College Chestnut Hill, MA
Design Considerations in Large- Scale Genetic Association Studies Michael Boehnke, Andrew Skol, Laura Scott, Cristen Willer, Gonçalo Abecasis, Anne Jackson,
Characterizing the role of miRNAs within gene regulatory networks using integrative genomics techniques Min Wenwen
Slide 1 Estimating Performance Below the National Level Applying Simulation Methods to TIMSS Fourth Annual IES Research Conference Dan Sherman, Ph.D. American.
A single-nucleotide polymorphism tagging set for human drug metabolism and transport Kourosh R Ahmadi, Mike E Weale, Zhengyu Y Xue, Nicole Soranzo, David.
1 Association Analysis of Rare Genetic Variants Qunyuan Zhang Division of Statistical Genomics Course M Computational Statistical Genetics.
What host factors are at play? Paul de Bakker Division of Genetics, Brigham and Women’s Hospital Broad Institute of MIT and Harvard
Jeff O’ConnellInterbull annual meeting, Orlando, FL, July 2015 (1) J. R. O’Connell 1 and P. M. VanRaden 2 1 University of Maryland School of Medicine,
Sampling Design in Regional Fine Mapping of a Quantitative Trait Shelley B. Bull, Lunenfeld-Tanenbaum Research Institute, & Dalla Lana School of Public.
Sequence Kernel Association Tests (SKAT) for the Combined Effect of Rare and Common Variants 統計論文 奈良原.
An atlas of genetic influences on human blood metabolites Nature Genetics 2014 Jun;46(6)
Global Variation in Copy Number in the Human Genome Speaker: Yao-Ting Huang Nature, Genome Research, Genome Research, 2006.
Increasing Power in Association Studies by using Linkage Disequilibrium Structure and Molecular Function as Prior Information Eleazar Eskin UCLA.
Association Mapping in Families Gonçalo Abecasis University of Oxford.
Inferences on human demographic history using computational Population Genetic models Gabor T. Marth Department of Biology Boston College Chestnut Hill,
Understanding GWAS SNPs Xiaole Shirley Liu Stat 115/215.
University of Colorado at Boulder
Common variation, GWAS & PLINK
Partitioning of genomic variance using prior biological information
Complex disease and long-range regulation: Interpreting the GWAS using a Dual Colour Transgenesis Strategy in Zebrafish.
LD Hub: a centralized database and web interface to perform LD score regression that maximizes the potential of summary level GWAS data for SNP heritability.
Signatures of Selection
upstream vs. ORF binding and gene expression?
Genome Wide Association Studies using SNP
Gene-set analysis Danielle Posthuma & Christiaan de Leeuw
Marker heritability Biases, confounding factors, current methods, and best practices Luke Evans, Matthew Keller.
Gene Hunting: Design and statistics
POLYMORPHISMS & ASSOCIATION TESTS
Case Study #2 Session 1, Day 3, Liu
Power to detect QTL Association
Genome-wide Associations
The ‘V’ in the Tajima D equation is:
Genome-wide Association Studies
The effect of using sequence data instead of a lower density SNP chip on a GWAS EAAP 2017; Tallinn, Estonia Sanne van den Berg, Roel Veerkamp, Fred van.
Beyond GWAS Erik Fransen.
1. Interpreting rich epigenomic datasets
Disentangling the Effects of Colocalizing Genomic Annotations to Functionally Prioritize Non-coding Variants within Complex-Trait Loci  Gosia Trynka,
Alkes Price Harvard School of Public Health April 23 & April 25, 2019
Genetic Drift, followed by selection can cause linkage disequilibrium
Huwenbo Shi, Nicholas Mancuso, Sarah Spendlove, Bogdan Pasaniuc 
Genetics and genomics of psychiatric disease
Genomic Signatures of Selective Pressures and Introgression from Archaic Hominins at Human Innate Immunity Genes  Matthieu Deschamps, Guillaume Laval,
Medical genomics BI420 Department of Biology, Boston College
Genetics of Human Cardiovascular Disease
Xin Li, Alexis Battle, Konrad J. Karczewski, Zach Zappala, David A
Ivan P. Gorlov, Olga Y. Gorlova, Shamil R. Sunyaev, Margaret R
Perspectives from Human Studies and Low Density Chip
Pier Francesco Palamara, Laurent C. Francioli, Peter R
Diego Calderon, Anand Bhaskar, David A
Medical genomics BI420 Department of Biology, Boston College
Detection of human adaptation during the past 2000 years
An Expanded View of Complex Traits: From Polygenic to Omnigenic
Varying Intolerance of Gene Pathways to Mutational Classes Explain Genetic Convergence across Neuropsychiatric Disorders  Shahar Shohat, Eyal Ben-David,
Chris Gignoux Colorado Center for Personalized Medicine
Huwenbo Shi, Gleb Kichaev, Bogdan Pasaniuc 
GWAS-eQTL signal colocalisation methods
IMPACT: Genomic Annotation of Cell-State-Specific Regulatory Elements Inferred from the Epigenome of Bound Transcription Factors  Tiffany Amariuta, Yang.
Functional Architectures of Local and Distal Regulation of Gene Expression in Multiple Human Tissues  Xuanyao Liu, Hilary K. Finucane, Alexander Gusev,
Hong Zhang, Judong Shen & Devan V. Mehrotra
Amanda L. Tapia Department of Biostatistics
Presentation transcript:

Slides of this talk: google “Alkes HSPH” Impact of negative selection on common variant disease architectures Alkes L. Price Harvard School of Public Health October 19, 2018 Slides of this talk: google “Alkes HSPH”

What is negative selection? Negative selection is the negative pressure on allele frequencies of mutations that reduce fitness.  Allele frequency Allele frequency Time Time Kryukov et al. 2007 Am J Hum Genet, Kiezun et al. 2013 PLoS Genet

Negative selection: causal effect on trait is larger for rare and low-frequency variants Let p be minor allele frequency (MAF). Let β be per-allele causal effect on disease/trait. Let h2 = β22p(1−p) be variance explained by a SNP. Selection: α model Var(β) ~ [p(1−p)]α E(h2) ~ [p(1−p)]1+α rare variants explain more h2 if α < 0 (negative selection) than if α = 0 (no selection) Speed et al. 2012 Am J Hum Genet; also see Schoech et al. biorxiv 09/13/17 (analytical derivations support validity of α model for p above a threshold)

Negative selection: causal effect on trait is larger for rare and low-frequency variants Let p be minor allele frequency (MAF). Let β be per-allele causal effect on disease/trait. Let h2 = β22p(1−p) be variance explained by a SNP. Selection: α model Var(β) ~ [p(1−p)]α E(h2) ~ [p(1−p)]1+α rare variants explain more h2 if α < 0 (negative selection) α = −0.38 across 25 UK Biobank traits (rare variants: larger causal effects but smaller per-SNP h2 vs. common variants) Schoech et al. biorxiv 09/13/17; also see Zeng et al. 2018 Nat Genet

Rare variants explain limited trait heritability, despite larger causal effects + many rare variants MAF = 5% α = 0 (no selection) Schoech et al. biorxiv 09/13/17; also see Zeng et al. 2018 Nat Genet

Beyond the α model: does negative selection impact common variant disease architectures? MAF = 5% α = 0 (no selection) Schoech et al. biorxiv 09/13/17; also see Zeng et al. 2018 Nat Genet

Outline 1. LD-dependent architectures 2. Functional architectures 3. Polygenicity

Outline 1. LD-dependent architectures 2. Functional architectures 3. Polygenicity

What does “LD-dependent architecture” mean?

What does “LD-dependent architecture” mean? • SNPs with higher LD have higher average χ2 association statistics due to increased tagging of causal variants. Pritchard & Przeworski 2001 Am J Hum Genet

What does “LD-dependent architecture” mean? • SNPs with higher LD have higher average χ2 association statistics due to increased tagging of causal variants. “LD-dependent architecture”: dependence of causal effect sizes on the level of LD of a SNP. Speed et al. 2012 Am J Hum Genet

What does “LD-dependent architecture” mean? • Common SNPs have higher LD and higher causal variance than rare SNPs => SNPs with higher LD have higher causal variance. Schoech et al. biorxiv 09/13/17; also see Zeng et al. 2018 Nat Genet

What does “LD-dependent architecture” mean? • Common SNPs have higher LD and higher causal variance than rare SNPs => SNPs with higher LD have higher causal variance. “LD-dependent architecture”: dependence of causal effect sizes on the level of LD of a SNP, after conditioning on MAF.

Inferring LD-dependent architectures from summary statistics using S-LDSC Extend S-LDSC (Finucane et al. 2015 Nat Genet) to continuous annot. q: E(χ2) = 1 + NΣq τqLDscoreq LDscoreq(SNP m) = = normalized conditional effect of annot. q (proportionate change in trait h2 per 1 s.d. increase in annot. q) am,q = value of annot. q at SNP m τq = conditional effect of annot. q h2 = genome-wide trait heritability m m Gazal et al. 2017 Nat Genet

Inferring LD-dependent architectures using continuous LLD annotation Level of LD ( LLD ): MAF-adjusted LD score (MAF-stratified quantile normalization) LDscoreLLD (SNP m) for continuous LLD annotation = • Include “baseline model” annotations (Finucane et al. 2015 Nat Genet) • Also include binary annotations for 10 common SNP MAF bins • Simulations confirm robust results (not shown) Gazal et al. 2017 Nat Genet

SNPs with lower MAF-adjusted level of LD (LLD) have larger causal effect sizes Same sign of effect across all 56 traits (average N=101K)

Many annotations correlated to LD could contribute to LD-dependent architectures LD-related annotations Predicted allele age (ARGweaver; Rasmussen et al. 2014 PLoS Genet) LLD in Africans (LLD-AFR) Recombination rate (±10kb window; Hussin et al. 2015 Nat Genet) GC-content (±1Mb window; Loh et al. 2015b Nat Genet) Replication timing (Koren et al. 2012 Am J Hum Genet) Background selection (1 − B statistic; McVicker et al. 2009 PLoS Genet) Nucleotide diversity (SNPs per kb; ±10kb window) CpG content (±50kb window) Functional annotations (Finucane et al. 2015 Nat Genet) Coding, regulatory, conserved, etc.

Many annotations correlated to LD could contribute to LD-dependent architectures | LD-related annotations Functional annotations from “baseline model” (Finucane et al. 2015 Nat Genet)

Many annotations correlated to LD could contribute to LD-dependent architectures | LD-related annotations Functional annotations from “baseline model” (Finucane et al. 2015 Nat Genet)

Many annotations correlated to LD could contribute to LD-dependent architectures | LD-related annotations Functional annotations from “baseline model” (Finucane et al. 2015 Nat Genet)

Many LD-related annotations impact causal effect sizes + MAF Annotation + baseline model + MAF Meta-analysis of 31 independent traits

Many LD-related annotations impact causal effect sizes + MAF Annotation + baseline model + MAF Recombination rate has discordant sign of effect (Hill & Robertson 1966 Genet Res) Heritability is enriched in SNPs with low LLD in low recombination rate regions r = −0.63 Meta-analysis of 31 independent traits

Many LD-related annotations impact causal effect sizes after conditioning on baseline model + MAF Annotation + baseline model + MAF Meta-analysis of 31 independent traits

Many LD-related annotations impact causal effect sizes after conditioning on baseline model + MAF Annotation + baseline model + MAF LLD effect is 0.37x smaller when including annotations from baseline model Some, but not all, of LD-dependent architecture due to DHS, enhancers, etc. 0.37x Meta-analysis of 31 independent traits

Many LD-related annotations impact causal effect sizes after conditioning on baseline model + MAF Annotation + baseline model + MAF LLD effect is 0.51x smaller after adding baseline model Predicted allele age has largest effect. Meta-analysis of 31 independent traits

Many LD-related annotations impact causal effect sizes in joint fit with baseline model  baseline-LD model Annotation + MAF Annotation + baseline model + MAF Joint-fit annotations + baseline model + MAF Meta-analysis of 31 independent traits

Many LD-related annotations impact causal effect sizes in joint fit with baseline model  baseline-LD model Annotation + MAF Annotation + baseline model + MAF Joint-fit annotations + baseline model + MAF LLD effect is 0.51x smaller af 6 significant annotations in joint fit Meta-analysis of 31 independent traits

Many LD-related annotations impact causal effect sizes in joint fit with baseline model  baseline-LD model Annotation + MAF Annotation + baseline model + MAF Joint-fit annotations + baseline model + MAF LLD effect is 0.51x smaller af predicted allele age has largest effect Meta-analysis of 31 independent traits

Forward simulations show that negative selection explains LD-dependent architectures Annotation + MAF Annotation + baseline model + MAF Joint-fit annotations + baseline model + MAF Forward Simulations: impact on s • Forward simulations using SLiM (Messer 2013 Genetics) under African-European demographic model (Gravel et al. 2011 PNAS) • Jointly regress selection coeff s on 4 LD-related annotations and minor allele frequency X X 31 traits Simulations

Proportion of heritability Quintiles illustrate large effects of LD-related annotations from baseline-LD model 40% 30% Proportion of heritability 20% 10% 0% Youngest 20% explain 3.8x more heritability than oldest 20% vs. 1.8x for MAF

Proportion of heritability Quintiles illustrate large effects of TMRCA annotation inferred using ASMC • ASMCavg annotation: Average TMRCA inferred by Ascertained Sequentially Markovian Coalescent (ASMC) in GoNL WGS data • Jointly statistically significant with other LD-related annotations (τ* = ‒0.25±0.01) Proportion of heritability Low-TMRCA 20% explain 3.8x more heritability than high-TMRCA 20% Palamara et al. 2018 Nat Genet

LD-dependent architectures can lead to bias in estimates of heritability and functional enrichment Modeling LD-dependent architectures is critically important. Speed et al. 2012 Am J Hum Genet, Gusev et al. 2013 PLoS Genet, Yang et al. 2015 Nat Genet, Speed et al. 2017 Nat Genet, Gazal et al. 2017 Nat Genet

How well does the baseline-LD model fit the data?

How well does the baseline-LD model fit the data? Idea (Speed et al. 2017 Nat Genet): use out-of-sample likelihoods for formal model comparisons Speed et al. 2017 Nat Genet: LDAK model > infinitesimal model (“GCTA model”) in analysis of 1000G SNPs

2.8M 1000G SNPs: LDAK model > GCTA model y-axis = change in log likelihood vs. LDAK model (16 UK Biobank traits) “GCTA” = infinitesimal model 2.8M SNPs from 1000G 4.6M SNPs from HRC ASHG 2018 poster 3432/W Price + Gazal et al. biorxiv 10/16/18

4.6M HRC SNPs: GCTA model (>) LDAK model y-axis = change in log likelihood vs. LDAK model (16 UK Biobank traits) “GCTA” = infinitesimal model 2.8M SNPs from 1000G 4.6M SNPs from HRC ASHG 2018 poster 3432/W Price + Gazal et al. biorxiv 10/16/18

baseline-LD > LDAK and GCTA in both SNP sets y-axis = change in log likelihood vs. LDAK model (16 UK Biobank traits) “GCTA” = infinitesimal model (also see Yang et al. 2015 Nat Genet; LDMS model) “Gazal-LD” = LD + MAF annotations only from baseline-LD model “baseline-LD+LDAK” = model with baseline-LD + LDAK annotations 2.8M SNPs from 1000G 4.6M SNPs from HRC ASHG 2018 poster 3432/W Price + Gazal et al. biorxiv 10/16/18

Outline 1. LD-dependent architectures 2. Functional architectures 3. Polygenicity image from Shlyueva et al. 2014 Nat Rev Genet

Common variant functional architectures: coding + regulatory (tissue-specific) Coding variants explain ~10% Regulatory variant enrichments of common variant h2 are often tissue/cell-type-specific Finucane et al. 2015 Nat Genet; also see Finucane et al. 2018 Nat Genet

Low-frequency variant functional architectures: ??? + ??? Coding variants explain ??? Regulatory variants explain ??? of low-frequency variant h2 of low-frequency variant h2 ??? ??? ??? ??? Coding variants likely important for low-frequency variant architectures: UK10K 2015 Nature, Astle et al. 2016 Cell, Marouli et al. 2017 Nature

Inferring low-frequency variant functional architectures by extending S-LDSC Multi-linear Regression: χ2 statistic = 1 + Σq(Nτq)LDscoreq • Separate annotations for common and low-frequency SNPs • Also include binary annotations for 5 low-frequency MAF bins • UK Biobank target samples + UK10K LD reference samples • Simulations confirm robust results (not shown) m m, ASHG 2018 poster 2699/F Gazal + Gazal et al. 2018 Nat Genet

Inferring low-frequency variant functional architectures by extending S-LDSC Common variant enrichment (CVE) of an annotation = prop. of hc2 / prop. of common SNPs Low-frequency variant enrichment (LFVE) of an annotation = prop. of hlf2 / prop. of low-frequency SNPs • Separate annotations for common and low-frequency SNPs • Also include binary annotations for 10 low-frequency MAF bins • UK Biobank target samples + UK10K LD reference samples • Simulations confirm robust results (not shown) ASHG 2018 poster 2699/F Gazal + Gazal et al. 2018 Nat Genet

LFVE is correlated to CVE LFVE > CVE when CVE is large 33 main annotations: r(LFVE,CVE) = 0.79 Meta-analysis across 40 UK Biobank traits (average N = 363K) assoc. method: BOLT-LMM (Loh et al. 2018 Nat Genet) Low-frequency variant enrichment (LFVE) Common variant enrichment (CVE)

LFVE is correlated to CVE LFVE > CVE when CVE is large 33 main annotations: r(LFVE,CVE) = 0.79 Non-synonymous variants: 17.3% of hlf2 vs. 2.1% of hc2 (Even larger LFVE for n.s. variants • predicted as damaging: PolyPhen-2 • in genes under strong selection: shet) Low-frequency variant enrichment (LFVE) Common variant enrichment (CVE)

LFVE ≈ CVE for most regulatory annotations but LFVE > CVE for brain annotations 637 cell-type-specific (CTS) annotation-trait pairs with significant CVE (Finucane et al. 2018 Nat Genet) Low-frequency variant enrichment (LFVE) 55 brain annotation-trait pairs with LFVE/CVE>2x Common variant enrichment (CVE)

LFVE ≈ CVE for most regulatory annotations but LFVE > CVE for brain annotations H3K4me3 in brain DPFC-Neuroticism: 56.9% of hlf2 vs. 11.7% of hc2 (P = 0.0002) Low-frequency variant enrichment (LFVE) Common variant enrichment (CVE)

LFVE/CVE ratio depends primarily on strength of selection sdn = avg selection coefficient of deleterious de novo variants π = prob. that de novo variant is causal for trait Forward simulations (SLiM2 + τEyre-Walker) LFVE/CVE ratio Non-synonymous variants: LFVE/CVE=5x, sdn=‒0.003 55 brain annotation-trait pairs: LFVE/CVE>2x, sdn<‒0.0006 (potentially useful for WGS) Proportion of causal variants (π)

Outline 1. LD-dependent architectures 2. Functional architectures 3. Polygenicity image from Evangelou et al. 2018 Nat Genet

Complex traits are extremely polygenic Systolic blood pressure: GWAS of 1 million people identifies 901 genome-wide significant loci explaining 5.7% of trait variance (vs. total SNP-heritability = 21%) Evangelou et al. 2018 Nat Genet; also see Purcell et al. 2009 Nature, Yang et al. 2010 Nat Genet, Stahl et al. 2012 Nat Genet, PGC-SCZ 2014 Nature, Loh et al. 2015b Nat Genet, Zhang et al. 2018 Nat Genet

Omnigenic model: polygenicity arises from extraordinary biological complexity Boyle et al. 2017 Cell also see Wray et al. 2018 Cell, Liu et al. biorxiv 09/24/18

Flattening hypothesis: polygenicity arises because negative selection flattens common variant effect sizes (always small) ASHG 2018 poster 3528/W O’Connor + O’Connor et al. biorxiv 09/18/18

New definition of polygenicity: effective number of associated SNPs (Ma) No LD between causal SNPs: (M = #SNPs, β = normalized effect size)

New definition of polygenicity: effective number of associated SNPs (Ma) No LD between causal SNPs: (M = #SNPs, β = normalized effect size) If SNP effects follow normal distribution: Ma = number of SNPs If SNP effects follow point-normal distribution: Ma = number of causal SNPs

New definition of polygenicity: effective number of associated SNPs (Ma) No LD between causal SNPs: (M = #SNPs, β = normalized effect size) If SNP effects follow normal distribution: Ma = number of SNPs If SNP effects follow point-normal distribution: Ma = number of causal SNPs Estimates of the number of causal SNPs under a point-normal model are sample size dependent (Zhang et al. 2018 Nat Genet, O’Connor et al. biorxiv 09/18/18)

New method to estimate Ma: Stratified LD 4th moments regression (S-LD4M) S-LDSC (Finucane et al. 2015 Nat Genet): • Regress χ2 statistics on stratified LD scores (∑ r2) • Include baseline-LD model annotations (Gazal et al. 2017 Nat Genet) S-LD4M (O’Connor et al. biorxiv 09/18/18): • Regress squared χ2 statistics on stratified LD 4th moments (∑ r4)

New method to estimate Ma: Stratified LD 4th moments regression (S-LD4M) S-LD4M (O’Connor et al. biorxiv 09/18/18): • Regress squared χ2 statistics on stratified LD 4th moments (∑ r4) • Include baseline-LD model annotations (Gazal et al. 2017 Nat Genet) Applicable to genome-wide SNP or categories of SNPs (e.g. low-frequency SNPs, coding SNPs, etc.)

New method to estimate Ma: Stratified LD 4th moments regression (S-LD4M) S-LD4M (O’Connor et al. biorxiv 09/18/18): • Regress squared χ2 statistics on stratified LD 4th moments (∑ r4) • Include baseline-LD model annotations (Gazal et al. 2017 Nat Genet) Applicable to genome-wide SNP or categories of SNPs (e.g. low-frequency SNPs, coding SNPs, etc.) Robust results in simulations

Approaches to understanding polygenicity Love is Understanding. -- Madonna Data is Understanding. -- Alkes

Brain-related traits are particularly polygenic (Number of children is even more polygenic) Results sub-selected from 33 diseases and complex traits (average N = 361K)

Common variants are more polygenic than low-frequency variants Polygenicity (Ma) of common vs. low-frequency SNPs Common variants: ~4x more polygenic than low-frequency variants (evolutionary modeling: ≥30x more polygenic than de novo variants)

Functional categories are more polygenic (in proportion to heritability enrichment) Main functional categories from baseline-LD model Results aggregated across common + low-frequency variants heritability enrichment reflects differences in the number of associations rather than their magnitude (which is constrained by selection)

Flattening hypothesis: polygenicity arises because negative selection flattens common variant effect sizes (always small) ASHG 2018 poster 3528/W O’Connor + O’Connor et al. biorxiv 09/18/18

Flattening hypothesis: implications for GWAS • GWAS effect sizes are largely determined by negative selection, not just the biological importance of the implicated gene.

Flattening hypothesis: implications for GWAS • GWAS effect sizes are largely determined by negative selection, not just the biological importance of the implicated gene. • Weak perturbations to strongly constrained genes will yield more insights than strong perturbations to weakly constrained genes.

Flattening hypothesis: implications for GWAS • GWAS effect sizes are largely determined by negative selection, not just the biological importance of the implicated gene. • Weak perturbations to strongly constrained genes will yield more insights than strong perturbations to weakly constrained genes. 37 fine-mapped IBD loci (Huang et al. 2017 Nature): 0/8 candidate genes with fine-mapped coding variants vs. 12/29 candidate genes near fine-mapped non-coding variants were loss-of-function intolerant (pLI ≥ 0.9; Lek et al. 2016 Nature) (P = 0.006 for difference)

Flattening hypothesis: implications for GWAS • GWAS effect sizes are largely determined by negative selection, not just the biological importance of the implicated gene. • Weak perturbations to strongly constrained genes will yield more insights than strong perturbations to weakly constrained genes. • Rare variant association studies (in very large sample sizes) will usefully complement GWAS, as rare variant architectures are less impacted by flattening due to negative selection.

Outline 1. LD-dependent architectures 2. Functional architectures 3. Polygenicity

Conclusions • Low-LD variants have larger causal effect sizes (at a given MAF), consistent with negative selection (Gazal et al. 2017 Nat Genet); the baseline-LD model attains higher likelihoods than other models in formal model comparisons (Gazal et al. biorxiv 10/16/18). Modeling LD-dependent architectures is critically important. • Non-synonymous + conserved + some brain-related annotations have LFVE >> CVE, consistent with strong negative selection (Gazal et al. 2018 Nat Genet). • Common variants are more polygenic than low-frequency variants + common variants are far more polygenic than de novo variants, due to negative selection (O’Connor et al. biorxiv 09/18/18).

Acknowledgements Harvard T.H. Chan School of Public Health: • Steven Gazal • Luke O’Connor Broad Institute: • Hilary Finucane BWH/Harvard Medical School: • Shamil Sunyaev • Po-Ru Loh • All authors of Gazal et al. 2017 Nat Genet, Gazal et al. biorxiv 10/16/18, Gazal et al. 2018 Nat Genet, O’Connor et al. biorxiv 09/18/18 Additional thanks to UK Biobank and 23andMe