Genetics Journal Club Robert C. Bauer October 22nd, 2015
“Focus on Genomes of Icelanders” May 2015 Volume 47, No 5 4 Linked papers: “Large-scale whole-genome sequencing of the Icelandic population.” “Loss-of-function variants in ABCA7 confer risk of Alzheimer's disease” “The Y-chromosome point mutation rate in humans” “Identification of a large set of rare complete human knockouts”
Why Iceland? First settled in 874 by Norse Explorers (VIKINGS!) Extensive genealogical record keeping Íslendingabók – Lists 435 initial settlers Extensive record back to the 17th Century 1703 Census lists population at 50,358. 2008 Census – 320,000 people deCODE Genetics – Founded in 1996 Created genealogical database – the NEW Íslendingabók Contains 819,410 individuals dating back to 740 A.D. 471,284 Icelanders from 20th century 91.1% recorded father, 93.7% recorded mother Spouses usually from same geographical region Consanguineous unions are uncommon Genetic Founder Effect
Founder Populations for Studying Genetics Small population base increases genetic homogeneity Higher prevalence of rare variants that might otherwise be selected out due to smaller pool of spouses Increased frequencies of rare recessive disorders and functionally deleterious SNPs Higher MAFs allows for greater power with smaller N Example of a Founder Population close to UPenn: The Amish Work at UPenn in Amish Communities Dr. Dwight Stambolian and AMD Dr. Maja Bucan and Mental Illness
Questions Set Out To Answer: What is the population frequency of homozygous loss-of-function mutations in the germline genome? How frequently do these occur without deleterious phenotypic consequences?
Variants identified in WGS of 185 Individuals as part of 1000Gs Pilot Multiple populations does not allow for population-wide analysis Instead focus on statistics of individuals
Experimental Outline Whole Genome Sequencing of 2,636 Icelanders to mean depth of 20X Paired-end libraries using Illumina TruSeq (300bp fragments) Sequencing done on Illumina HiSeq 2000 and GAIIx machines Alignment of WGS to hg18 using Burrows-Wheeler Aligner Variants called using GATK v2.3.9 Genotyping and Imputation in 104,220 Subjects Variety of Illumina SNP Arrays (Illumina HumanHap300, HumanCNV370, HumanHap610, HumanHap1M, HumanHap660, Omni-1, Omni 2.5 or Omni Express bead chips) SNPs excluded if (i) yield less than 95%, (ii) minor allele frequency (MAF) less than 1% in the population, (iii) significant deviation from Hardy-Weinberg equilibrium (P < 0.001), (iv) produced an excessive inheritance error rate (over 0.001), (v) substantial difference in allele frequency between chip types All samples with a call rate below 97% were excluded from the analysis. RNA-Seq for Allelic Imbalance Studies cDNA prepared using Illumina TruSeq RNA Sample Prep Kit Paired-end sequencing on Illumina GAIIx machines Alignment and gene expression using TopHat and CuffLinks
Phasing and Imputation Final set of 676,913 autosomal SNPs used for long-range phasing Sex chromosomes excluded from study Using long-range phasing and variants from WGS, can then impute all variants in the 104,220 genotyped subjects
Definition of Loss Of Function (LoF) Variants “Impact” determined by Ensembl Variant Effect Predictor (VEP) MAF cutoff of 2% based on prevalence of CF alleles in population Table S1
LoF Variants Identified in WGS Most LoF variants were rare – 85% had MAF < 0.5% LoF mutations had highest fraction of rare variants Table 1
Sanger Sequencing Validation 134 Individuals homozygous for LoF (96% success) 155 Individuals with unique homozygous LoF (98% success) Tables S5 and S6
Identification of human complete knockouts after phasing and imputation (6.1% of all genes) (7.7% of participants in study) Of the 8,041 KOs, 6885 were homozygotes and 1249 were compound hets Table 2
What are the genes at the tail end? Most Genes Had ≥ 5 Complete KO Individuals (790 of 1,171 genes) What are the genes at the tail end? Figure S4
Complete KO by LoF Variants tracks with Meiotic distance between parents Of the 8,041 individuals with at least 1 gene completely knocked out by rare LoF variant, 90.1% were not the children of parents who were second cousins or closer. Figure S6
Homozygous LoF Genes By Tissue Genes expressed in one or more tissues but not all 27 tested Table 3
Homozygous LOF Genes By Tissue Genes expressed in only 1 of 27 tissues tested Cardiac genes jump to top when restricted to tissue-specific genes? Table S9
Phenotypic Effects of Complete KO Genes based on KO of Mouse Orthologs Data collected from the Mouse Genome Informatics Database (Jax) Olefaction/taste must sesceptible to complete KO in humans Table S10
Complete KO of olfactory genes based on gene ontology Almost 3% of all genes that were knocked out were olfactory This accounted for 3.1% (251) complete KO individuals Table S7
Transmission of LOF variants deviates from expected Mendelian ratios Only 23.6% transmission of LoF to homozygotes (25% expected) Observed 0 (19.1 expected) homozygotes for DHCR7 splice site mutation Figure 1
Nonsense Mediated Decay (NMD) Not effective for mutations in the final exon (and ~50bp upstream last EJ) Wilkinson and Shyu, Nat. Cell Bio., 2002
Abundance of LOF variants and FRV by protein position Enrichment for LoF variants later in protein (escape NMD) High FRV means increased negative selection – highest in middle of protein – most deleterious? Figure 2
Allele Specific Expression of LOF alleles based on exon position 262 Individuals with blood RNA sequencing data Middle of gene again most deleterious with respect to ASE Figure 3
Summary Icelandic population, given size, mating patterns, and founder effect, is an ideal population for the identification and study of rare LoF variants. Performed WGS and long-range phasing in 2636 individuals, Then used that data to impute variants in 101,584 more individuals. Identified 6,795 LoF variants in 4,924 genes, and 8,041 individuals with complete KO of 1 or more genes. Genes involved in olfaction were most likely to harbor LoF variants, while brain and placental genes were most resistant LoF variants behave as one might expect (enriched in children of closely related parents, not transmitted to offspring in expected ratios, selected against when present in regions of genes causing NMD or ASE) All highly suggestive of deleterious impact.
Questions Set Out To Answer: What is the population frequency of homozygous loss-of-function mutations in the germline genome? In Icelandic populations, 7.7% will be complete KO for 1 or more genes. 6.1% of genes will harbor these LoF variants. 2) How frequently do these occur without deleterious phenotypic consequences? 1,149 of an expected 1,285 (89.4%) double transmissions of LoF observed. So 10% of LoF variants in Icelandic population incompatible with reproduction.
Lingering Questions Likely underestimating the amount of LoF variants Role for missense mutations? Alternative splicing products? How applicable is this to more heterogenous populations? Cutoff of MAF <2% better filter? Losing relevant data? Where is the biology?
“Another opportunity provided by the new technology is to turn the classic paradigm upside down and rather than searching for sequence variants that are responsible for phenotypic characteristics to search for phenotypic characteristics that are caused by variants in the sequence.”
Using KO Humans to Study Gene Function A KO human will always be more relevant to human disease than a KO mouse Can use biometrics, tissue collection, iPS cells to study function of known genes Opportunity to study KO effect in genes rarely mutated
Published Genome-Wide Associations through 12/2013 ~12,000 SNP-trait associations NHGRI GWA Catalog