Genome Analysis: Future Directions Christian Marshall
Genome Diagnostics 10kb1bp1kb>10Mb1Mb Total size of genomic change CYTOGENETICS 100bp100kb Genome wide Targeted MOLECULAR GENETICS Future shift and merging of molecular genetics and cytogenetics-> genome diagnostics FISH Targeted rearrangements Copy number variants MICROARRAY KARYOTYPE Balanced or unbalanced SNV + Indels MLPA SANGER Exonic CNVs GENOTYPING Targeted variants NGS PANELS SNV + Indels + exonic CNV WES SNV + Indels + CNV? WGS SNV + Indels + CNV + inversions and translocations
Neurodevelopmental disorders and/or congenital anomalies. 3 fold increase in diagnostic yield (~ 10-12%) versus G-banding Genome wide screen but majority do not reach a diagnosis Year Number of Tests Reported Annual Microarray Test Volumes at SickKids
The standard of care for first-tier clinical investigation of the etiology of congenital malformations and neurodevelopmental disorders is chromosome microarray analysis (CMA) Sickkids runs >4000 microarrays per year with ~600 referred through Sickkids Division of Clinical and Metabolic Genetics where the diagnostic yield is ~10- 12% Negative microarray cases often sent for clinical WES (dx yield ~25%) Aim: To investigate the diagnostic yield of WGS for patients referred for microarray in a pediatric hospital Study Design: Perform WGS in parallel with CMA on 100 consecutive prospective patients referred for neurodevelopmental disorders and/or congenital malformations Whole Genome Sequencing in Diagnostics
CMA identified a clinically significant CNV in 8% of patients An average of two additional routine genetic tests (single gene/gene panel) were ordered by physicians, increasing the diagnostic yield to 13% WGS resulted in a diagnosis in 34% of patients – how did it do for CNV detection? 34% 13% 8% WGS CMA CMA + routine genetic testing p=1.42e-05 p= % of Cases Diagnosed Diagnostic Yield Diagnostic Yield of WGS compared to conventional testing Stavropoulos et al 2016
CMACNVSV #/sample~ Mean size366kb30kb14kb Median size116kb10kb496bp CNV detection from CG WGS come from both read depth average and read based (mate pair mapping and split reads) methods Large majority of CNVs found in WGS are beyond the resolution of CMA -> convergence at larger sizes WGS CNV Binned Size and CNV counts for WGS and CMA experiments Stavropoulos et al 2016
647 Total CNVs Freq >= 3% 294 CNVs >= 70 % Overlap Seg dups 149 CNVs 24,529 Total CNVs Freq >= 3% 1,622 CNVs >= 70 % Overlap Seg dups 1,261 CNVs 85% CMA WGS Chromosome Microarray WGS CNVs by Sequence Depth All pathogenic CNVs detected…but 15% of rare variants not detected Majority of CNVs unique to the WGS were below the resolution of CMA – none were diagnostic but carriers existed (eg. Heterozygous deletion in CLN3 gene) Comparison of clinically relevant CNV detection from array and WGS Stavropoulos et al 2016
In 26% of patients, WGS revealed clinically significant SNV or indel mutations presenting in: Dominant (63%; including variants in EP300, GDF5, PIK3R2, PACS1, CCM2, SPTAN1, CBL) Recessive (37%; including variants PANK2, LARP7, TSEN54 and NGLY1) We found that 4/100 (4%) of cases had variants in at least two loci causing distinct genetic disorders 63% 37% SNV/Indel + CNV SNV/Indel 2 Genes CNV Sequence Mutations Mode of InheritanceDistribution of Variant Types Among Patients Summary of WGS as a diagnostic Test Stavropoulos et al 2016
Take home messages from study WGS identified genetic variants meeting clinical diagnostic criteria in 32% of cases, representing a 4-fold increase in diagnostic rate over CMA (8%) alone and >2-fold increase in CMA plus targeted gene sequencing (13%) WGS identified all rare clinically significant CNVs that were detected by CMA and all pathogenic SNVs detected through targeted sequencing WGS detected clinically relevant structural variants missed by CMA and/or targeted sequencing We found four subjects with mutations in at least two genes associated with distinct genetic disorders, including two cases harbouring a pathogenic CNV and SNV What about the 56% of cases without a diagnosis – what did we learn?
with Venter team: Science 2003, Nature Genetics 2006, PLOS Biology 2007, Genome Biology 2010 #/category 3,213,401 bp (0.11%) variable due to SNVs 40,568,593 bp (807,999 events; 1.35%) CNV/indel Venter genome Identification and Interpretation of CNVs and structural variants from WGS
1.7kb deletion Exon length: 155bp Through WGS detected a rare 1.7kb exonic loss in a proband with Autism at SCN2A SCN2A is a voltage gated sodium channel function of gene linked to seizure disorders Yuen et al 2013 Rare 1.7 kb exonic deletion in SCN2A
SCN2A 1.7kb del SCN2A 1.7kb del c Both affected females carry the ‘de novo’ deletion in exon 18, which causes a frameshift of the rest of the protein Confirmed by two probes in exon18 using qPCR Germline mosaic or more complex rearrangement? Rare 1.7 kb exonic deletion in SCN2A Yuen et al 2013
Proband Father Sanger Sequencing showed variant is inherited from father (but balanced) Need longer read technologies and better strategies to find complex rearrangements in WGS Breakpoint of the deletion Rare 1.7 kb exonic deletion in SCN2A Yuen et al 2013
De novo Genome Assembly needed to get ‘true’ individual genome Need longer read technologies and better strategies to find complex rearrangements in WGS
Interpretation of the Whole Genome Sequence Clinical Features 20 year old male presenting with: Multiple parenchymal cavernomas Developmental delay Hydrocephalus Father and paternal uncle with Father with cavernous hemangioma Stavropoulos et al 2016
WGS detected a heterozygous LOF variant (p.Gly352Valfs*2) in CCM2 Paternally Inherited and explains the Cerebral cavernous malformation (CCM) Does it explain the rest of the phenotype? Whole Genome Sequencing Interpretation of the Whole Genome Sequence Stavropoulos et al 2016
Also detected de novo 1.16Mb deletion at 8q22.1 explaining the other features Whole Genome Sequencing Interpretation of the Whole Genome Sequence Stavropoulos et al 2016
Individualized Annotation Clinical Notes: Male 11 year old patient with lipodystrophy, developmental delay (speech delay). Also has developed umbilical hernia, anxiety disorder and renal microcysts and hyperkalemia. Human Phenotype Ontology: HP: : Lipodystrophy HP: : Umbilical hernia HP: : Renal cyst HP: : Global developmental delay HP: : Hyperkalemia Standardize clinical descriptions and use to prioritize variants
Phenotype can be noisy (both OMIM and clinician input) Whole genome sequencing and Comprehensive phenotype (Phenotips) Diagnostic rate of 34% with 26 patients with causative SNV Used HPO terms to prioritize genes, diagnostic variant often NOT in list Noisy phenotype information or Noisy OMIM phenotype
WGS will not have all the answers Two cases had a molecular diagnosis not detected by WGS including: UPD14 (heterodisomy) [need parents] Russell-Silver Syndrome (Hypomethylation of H19 locus) [need methylation status] Ancillary tests may be needed as an overlay for genomic sequencing Methylation tests Gene expression Tests Stavropoulos et al 2016
Combining RNA sequencing with genome sequencing control case Hernan Gonorazky, Jennifer Orr, Peter Ray, James Dowling c T>G deep intronic variant in DMD that creates a novel splice acceptor site, which then pairs with a cryptic splice donor site in the intron to create an aberrant 51 base exon Interpretation of non-coding variants: overlaying gene expression on genome
Xiong et al. Science (2015) DNMs in non-coding regions contribute to ASD Interpretation of non-coding variants
Intronic/intergenic variants exonic variants ** * : p<0.05, ** : p<0.005 * Percentage of DNMs De novo variants in non-coding regions contribute to ASD Yuen et al submitted
GenomicExonic De novo SNVs De novo indels De novo CNVs SomaticExonic De novo SNVs De novo indelsNA De novo CNVsNA Allelic ratio (%) 33% Yuen et al submitted Somatic changes can be detected by WGS and some may be clinically relevant Detection of Somatic changes in the genome
Detection of clinically relevant Somatic changes in the genome
Genomics Era: Sequencing of Genomes US: 1 million genomesUK: 100,000 genomes Next few years will be explosion in number of genomes sequenced Available SNV and CNV Data from diverse populations (i.e. like ExAC) is crucial for interpreting genomes
Summary and Future Directions 1.Structural Variation – new technologies, algorithms, strategies to more accurately identify all variation in a genome 2.De novo Genome Assembly – new (cost effective) methods to get the true complement of a single genome 3.~5% of individuals tested will have two different genetic syndromes and CNVs will contribute to this – Full Medical Interpretation of a genome requires detection and interpretation of all variation 4.Individualized annotation and individualized variant prioritization based on phenotypic input – currently noisy and does not work really well 5.Ancillary tests like methylation and RNA expression can help interpret non-coding variants as can new splicing algorithms 6.Somatic and Mosasic Variants are difficult to identify – need read depth and algorithms to enable accurate detection 7.The best way to interpret and annotate genomes is to sequence more genomes