Download presentation
Presentation is loading. Please wait.
1
Next Generation Sequencing
Michelle Luciano
2
Outline What is next generation sequencing?
Rare variants and general cognitive ability Rare variants and years of education Sequencing the Wellderly
3
1. Next generation sequencing
NGS platforms perform sequencing of millions of small fragments of DNA in parallel Bioinformatics to assemble fragments by mapping the reads to the human reference genome Each of the 3 billion bases is sequenced multiple times, the greater the depth, the more accurate the data Can sequence entire genomes, specific areas (e.g., exomes) or individual genes Small base changes (substitutions), insertions and deletions of DNA, large genomic deletions of exons or whole genes and rearrangements such as inversions and translocations
5
www.ccace.ed.ac.uk Rare Genetic Variants
< 0.5% frequency in population Variation is younger Mutations arise every generation at a rate of ×10-8 per base. Given ~3×109 bases in the human genome, a person should have, on average, between de novo mutations Increased population-specificity sharing of rare variants is about 10-30% among populations in different continents and 70-80% within the same continent
6
www.ccace.ed.ac.uk Rare Genetic Variants - Exomes
Protein coding regions of genes 1-2% of human genome Stronger negative selection for rare alleles in coding, compared to intergenic regions Deep exome sequencing study (Tennessen et al., 2012) of 1351 Europeans and 1088 African Americans showed: Most variants were rare (86% had a minor allele frequency < .5%) 82% of variants were previously unknown 82% population specific ? Due to additive effects of explosive, recent accelerated population growth and weak purifying selection
7
Genetic variation in general cognitive ability (g)
G related to fitness traits, genetic variation we observe today likely affected by directional selection Genetic variation can be maintained when new, mostly deleterious mutations occur at a rate equal to speed with which removed by selection: mutation–selection balance Predicts no common variants of large effect, as supported by GWAS GCTA shows at least half of genetic variance due to common SNPs or rare SNPs in LD with these More recent mutations, including family-specific and private de novo genetic variants, could explain remaining genetic variance
8
2) Rare variants and g
9
Age- and sex-residualised g scores
Sample Selection Generation Scotland: Scottish Family Health Study High g: PCA of summed Logical Memory immediate and delayed, Digit Symbol, Verbal Fluency, and Mill Hill Vocabulary 1st unrotated principal component explained 42% variance; composite score formed, g Top 76 female g scores Top 74 male g scores 2.34 to 3.97 SD from mean Age- and sex-residualised g scores
10
www.ccace.ed.ac.uk Sample Selection Control Group 1 (GS):
Major depression cases (81) or relatives of cases (27) with age- and sex-residualised g scores <.34 SD from mean Control Group 2 (GS): 223 Obesity controls Control Group 3: Scottish cancer patients reporting no education (N=123) OR high school senior certificate equivalent with low/intermediate SES based on postcode (N=32)
11
www.ccace.ed.ac.uk Exome-sequencing
Exon capture: Illumina Hi-Seq machine (average read depth 38x and 86x in Generation Scotland & 39x in Cancer controls) Variant alignment to the 1000G (v37) reference genome (Li & Durbin, 2009) Genotype calls: GATK’s unified genotyper (DePristo et al., 2011) Putative false positive SNP calls filtered out using GATK's VariantRecalibrator algorithm After quality control filtering, variants were annotated using SnpEff (version 2.0.5; Cingolani et al., 2012)
12
www.ccace.ed.ac.uk Analysis
Case (high g) - Control (low to average g) Design Lack of power to detect single nucleotide variants (SNVs) Do multiple SNVs in protein-coding genes contribute to the trait of interest? Burden: collapse rare variants into a single burden variable C-alpha: rare variants are a mixture of phenotypically deleterious, protective and neutral variants Biological Pathways Analysis GOrilla: enrichment for genes in specific biological pathways Biological processes – molecular events with defined beginning and end Molecular functions – activities that occur at molecular level Cellular components – within an anatomical structure or gene product group
13
All variants Analysis of all SNVs with <1% frequency in the cases versus combined controls included 24,514 gene sets comprising 339,231 variants No significant associations were found after FWER correction Pathways analysis revealed no enrichment for gene ontology terms (16,205 genes associated with a GO term) after FWER correction Similar results for <5% frequency
14
Non-Synonymous, Splice and Frameshift Variants
Variants with frequency <1% (134,751 variants in 20,791 gene sets) or <5% were not significant
15
Synonymous Variants No gene associations were found for variants with frequency <1% (73,738 variants in 18,533 gene sets) or <5% (84,374 variants in 19,135 gene sets) Gene ontology not significant
16
Burden Range of total minor alleles <1% frequency per individual was 765 to 2,544: high g cases (M = , SD = ) higher than controls (M = , SD = 87.56) Range of total minor alleles <5% frequency per individual was 2,265 to 4,479. High g cases (M = 2,564.18, SD = ) higher than controls (M = 2,537.56, SD = ) Burden tests including only non-synonymous variants (total N ranging for < 5% frequency and for < 1% frequency) were not significant
17
Limitations Future Control samples not ideal
Population based – can’t identify de novo mutational influences Extreme-trait designs important for identifying variants that are rare and that have modest to high effect sizes Future Plomin, Hsu, and Bowen – 1600 from the Study of Mathematically Precocious Youth & 500 recruited online; 4000 controls from the UK10K Project ‘Project Einstein’ (Rothberg & Tegmark) – Sequencing of 400 mathematicians and theoretical physicists
18
3. Rare variants and years of education
19
Research aim Ultra-rare inherited and de novo disruptive variants in highly constrained (HC) genes are enriched in neurodevelopmental disorders (autism, schizophrenia) H1: influence general cognitive abilities measured indirectly by years of education (YOE) 14,133 individuals with whole exome or genome sequencing data
20
URVs Variants observed only once (singletons) across each study and not observed in 60,706 exomes sequenced in the Exome Aggregation Consortium. To maximize the expected deleteriousness of the included variants (due to purifying selection) disruptive, putative loss-of-function variants including premature stop codons, essential splice site mutations and frameshift indels; observed 1 or more in 25% damaging, missense variants classified as damaging by 7 different in silico prediction algorithms; 24% negative control, synonymous variants not predicted to change the encoded protein; 78%.
21
Analysis Generalized linear regression model controlling for year of birth, sex, first 10 ancestry principal components, and schizophrenia status to test for association of YOE with the number of disruptive or damaging URVs in HC genes Meta-analysed the results across studies Gene-expression data to restrict to genes enriched for brain expression Gene-based burden test implemented in SKAT (sequence kernel association test) and using an exome-wide significance threshold of p<1x10- 6
22
3.1 months less for each additional mutation
25
4. Sequencing the Wellderly
26
Wellderly sequencing study
Healthy ageing is a complex polygenic trait related but distinct from longevity Healthy ageing is associated with decreased genetic risk for select diseases Healthy ageing is potentially linked to protection against cognitive decline
27
The Wellderly >80 years with no chronic diseases nor on chronic medication
28
Methods Whole genome sequence of 600 Wellderly (56x) compared to 1,507 adults from the Inova Translational Medicine Institute (ITMI) aged 20 to 44 years (55x) >94% European ancestry, maximum relatedness of 12.5% 511 Wellderly vs 686 ITMI ~57 million raw variants to 24,205,551 after filtration
29
Results Longevity variants did not differ in frequency between Wellderly and controls (ITMI or 1000Genomes European sample) No difference in cancer, stroke or type 2 diabetes genetic risk Lower genetic risk for Alzheimer disease (P=9.84x10-4) and coronary heart disease (P=2.54x10-3) No common variants associated in GWAS, correcting for population stratification Top region contained SNPs associated with cognitive traits Rare monogenic disease variants, pathogenic cancer and hereditary dementia (<0.5% frequency) unrelated to Wellderly
30
Results Rare coding variants (MAF<1%) tested using SKAT-O method
No genome-wide significant associations (correcting for 10,447 individual gene tests) Top SNP was COL25A1 (P=1.56x10-5) 9 ultra-rare variants carried by 10 Wellderly individuals, 8 variants observed as singletons and 1 observed in two individuals No variants observed in ITMI sample Many of the mutations result in highly non-conservative amino acid substitutions COL25A1 is a brain- specific, secreted collagenous protein associated with amyloid plaques
31
Prospects of NGS 100s TB of data SNP and structural variants Meta-analyses to increase power Functional genomics: gene expression profiling, genome annotation, small ncRNA discovery and profiling, and detection of aberrant transcription
32
Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.