An Introduction to molecular diagnostics Dr Catherine Cargo HMDS
Overview Basic molecular biology Mutations DNA sequencing Sanger sequencing Next generation/high throughput sequencing Impact on haematology
What is Deoxyribonucleic acid (DNA)? DNA is a nucleic acid that contains all of our genetic make-up inherited from each parent Located in the nucleus of every (nucleated) cell What are the DNA ‘building blocks’? Double-stranded structure sugar (deoxyribose) phosphate backbone and four nitrogenous bases forming a ‘genetic code’ Four nucleotide bases are the Purines - adenosine (A) and guanine (G) Pyrimidines - cytosine (C), thymine (T) The double strands are held together by hydrogen bonds
Human Genome 3 billion base pairs Arranged into 46 chromosome Information carried in pieces of DNA called genes Only 1.5% of human genome is protein coding (exons) Human genetic variation All humans are on average 99.5% similar to other humans 1000 genome project “a typical individual genome differs from the reference human genome at 4.1 million to 5.0 million sites … affecting 20 million bases of sequence” Mostly single nucleotide polymorphisms (SNPs) some DNA sequences that do not code protein may still be involved in the regulation of gene expression or play structural roles in chromosomes
DNA sequence Protein Synthesis Transcribed Translated Unidirectional Central dogma of genetics Genes are simply the codes for making proteins Unidirectional
Transcription Occurs in nucleus RNA polymerase separates the DNA strands and synthesises a complementary RNA copy from one of the DNA strands RNA complement strand includes the nucleotide uracil (U) in all instances where thymine would have occurred in DNA - uracil is identical to methylated thymine - this is because RNA is more short lived and any potential uracil related errors do not lead to lasting damage
Transcription DNA Transcription (RNA polymerase) Pre-mRNA Exon 1 Intron 1 Exon 2 Intron 2 Exon3 Transcription (RNA polymerase) Pre-mRNA Exon 1 Intron 1 Exon 2 Intron 2 Exon3 5’ capping, RNA splicing, 3’polyadenylation mRNA AAAA(150-250) mRNA transported to cytosol where it is translated into protein
Translation Process in which cellular ribosomes create proteins mRNA is decoded by a ribosome that reads the RNA sequence by base pairing the messanger RNA to transfer RNA which carried amino acids Based on 3 letter ‘words’ called codons Produces a specific amino acid chain or polypeptide Polypeptide later folds into a active protein and performs its function in the cell
DNA damage DNA is prone to damage from environmental insults and also during DNA replication Protective mechanisms in place to eliminate detrimental abnormalities DNA repair through various pathways Proof-reading by DNA polymerase. If these pathways fail cell’s final ‘line of defence’ is apoptosis (i.e. cell death). Survival advantage Genes that regulate cell growth and differentiation are altered e.g a gene that is part of the protective process (-a ‘tumour supressor’) Cells may be open to more abnormalities as a result of its ability to evade protection becoming increasing unstable – ‘genetic instability’.
What is a mutation? Permanent alteration of the nucleotide sequence of the genome Result from Inherited Errors during DNA replication Introduced during DNA repair Induced mutations Chemicals Physical radiation from UV rays/X-rays, extreme heat Damaging effects of a DNA mutation are observed in the protein Germline Somatic Germline – can be passed on to descendants through their reproductive cells, present in all cells Somatic – acquired, not inherited from a parent or passed on to offspring
What is a mutation?
How do DNA mutations cause malignancy? Specific tissue More than 1 abnormality is usually necessary for carcinogenesis Steensma et al, Blood, 2015
Chromatin Modification DNA Methylation TET2 DNMT3A IDH1/2 RNA Splicing SF3B1 SRSF2 U2AF1 ZRSR2 Chromatin Modification ASXL1 EZH2 Myeloid Malignancy Signalling KRAS NRAS FLT3 KIT CBL Receptors/ Kinases JAK2 Transcription RUNX1 BCOR Hypermethylation of genes leads to transcription silencing – gene inactivation Tumour Suppressors TP53, WT1 Cohesin STAG2 Adapted from Cazzola et al, 2013
Types of mutations THE CAT ATE THE RAT THE BAT ATE THE RAT TSH ECA TAT ETH ERA THE CTA TET HER AT A 3 main types of mutations substitutions insertions deletions Have varying effects on the codons ‘3 letter words’ read during transcription
Types of mutations No mutation Point Mutations Silent Nonsense Missense Conservative Non-conservative DNA level TTC TTT ATC TCC TGC mRNA level AAG AAA UAG AGG ACG Protein level Lys STOP Arg Thr
DNA sequencing An essential tool in the molecular biology toolkit is the ability to read the base sequence of DNA molecules Fred Sanger developed an elegant method to sequence DNA by using DNA polymerase enzyme (for which he was awarded the Nobel Prize in 1980) The Sanger method is also known as the chain termination method Takes advantage of the process of DNA synthesis
Sanger Method Chain-terminator method Key principle Use of dideoxynucleotide triphosphates (ddNTPs) as chain terminators DNA divided into 4 separate sequencing reactions containing – dATP, dGTP, dCTP, dTTP and DNA polymerase To each reaction is added 1 of 4 dideoxynucleotides ddATP, ddGTP, ddCTP, ddTTP Results in DNA fragments of varying length Heat denatured and separated by gel electophoresis Autoradiography
Sanger vs. NGS Sanger sequencing NGS sequencing One sample, one amplicon & one sequence Multiple samples (48), hundreds of amplicons/fragments (361), millions of sequences (40,000,000 paired end reads) & 4 x 10(9) bases.
NGS = reduced sequencing costs
NGS: The basics Library Preparation Cluster generation / bead capture Sequencing Data analysis (Bioinformatics)
Library Preparation The Library is a pooled tube of ALL the barcoded DNA fragments from ALL the patients.
Individual samples barcoded
NGS: The basics Library Preparation Cluster generation / bead capture Sequencing Data analysis (Bioinformatics)
The black box…..
NGS: The basics Library Preparation Cluster generation / bead capture Sequencing Data analysis (Bioinformatics)
Sample preparation assay Interpretation of results Low throughput High throughput Sample preparation assay Interpretation of results
Analysis pipeline Extract DNA Library prep Sequence Coverage Alignment Read- level QC Variant detection CNV detection
Analysis pipeline Extract DNA Library prep Sequence Coverage Alignment Read- level QC Variant detection CNV detection
Burrows-Wheeler alignment Each fragment/amplicon will generate its own SAM/BAM file
Coverage and depth IGV coverage plots (pile up)
Detecting variation Able to detect SNPs, INDELS and copy number variation (CNV)
Filtering
Potential applications of NGS in haematology Diagnostic tool Objective evidence of disease Subclassification Prognostic tool Disease monitoring Identify targets for therapy (in future!)
How will this impact on you? Potential to greatly impact on patient care Early diagnosis Information on prognosis Access to therapies Early diagnosis = more patients in clinic Patients will become increasingly aware of this Ask questions on the significance of mutations Personalised medicine Quality results require quality samples!
Conclusions NGS is set to become a key tool in the diagnostic armoury of the laboratory NGS is cost effective, rapid, reliable and uses minimal amounts of clinical material. Targeted sequence analysis of key myeloid and lymphoid driver mutations will form the basis of this testing Guiding treatment based on the identification of some of these driver mutations is already happening in the clinic
Fluidigm Access Array system -uses micro-fluidics
Fluidigm Principal
Library preparation 27 genes, 370 amplicons TET2, TET2, DNMT3A, IDH1, IDH2 DNMT3A, IDH1, IDH2 Chromatin Modification Chromatin Modification ASXL1, EZH2 ASXL1, EZH2 Splicing Splicing SF3B1, SRSF2, U2AF1, ZRSR2 SF3B1, SRSF2, U2AF1, ZRSR2 Transcription Factors Transcription Factors NPM1, RUNX1, BCOR, WTI, TP53 NPM1, RUNX1, BCOR, WTI, TP53 Library preparation 27 genes, 370 amplicons 1 run will perform nearly 18 thousand reactions this would take weeks to do by old technology… Signalling Signalling FLT3, NRAS, KRAS, CBL, cKIT, JAK2, MPL, FLT3, NRAS, KRAS, CBL, cKIT, JAK2, MPL, CSF3R, STAT3 CSF3R, STAT3 Cohesin complex Cohesin complex STAG2 STAG2 Other Other SETBP1, CALR SETBP1, CALR