Whole Genome Sequencing for Epidemiologists – A Brief Introduction Joel R Sevinsky, PhD
Objectives Microbial genomes Common isolate identification techniques using molecular biology Whole genome sequencing (WGS) Example of WGS for outbreak investigation Questions
Microbial Genomes Genome size varies from 4.56 to 5.70 Mb How big is 5 Mb???
Harry Potter Story Long story, some big books! 1,084,440 words in all seven books Average word length ~5 letters ~5,422,200 letters total in box set E.coli genomes range from 4.56 to 5.70 Mb A single E. coli genome @ 1 box set Single human genome @ 1,000 box sets!!!
PFGE (Pulsed Field Gel Electrophoresis) What do these bands really mean???
PFGE (Pulsed Field Gel Electrophoresis) 5 Mb 5Mb 1Mb 0.5Mb 1 site 2 sites 3 sites 4 sites Restriction enzyme site Genome size in Mb
Harry Potter Story Specific word = enzyme restriction site Word frequency determines banding pattern. Different words represent different enzymes. What does PFGE really tell you then? Table 1 Book Frequency Voldemort (n) Sorcerer’s Stone 31 Chamber of Secrets 20 Prisoner of Azkaban 37 Table 2 Book Frequency Broomstick (n) Spell (n) Wand (n) Wizard (n) Sorcerer’s Stone 27 14 62 41 Chamber of Secrets 12 6 107 44 Prisoner of Azkaban 20 114 39
Isolate Identification Techniques Protein Serotyping PFGE Pulsed Field Gel Electrophoresis Total gDNA fragments 16S rRNA Ribosomal RNA Sequencing 1 gene MLST Multi Locus Sequence Typing 7 genes wgMLST Whole Genome Multi Locus Sequence Typing Thousands of reference genes plus pan genome wgSNP or hqSNP Whole Genome Single Nucleotide Polymorphism Typing Total gDNA DNA Sequencing Information WGS
Whole Genome Sequencing 40 box sets
Whole Genome Sequencing ATGCGTGATCTAGTAGTCTAGGAGCTGACCGATTA
WGS for Outbreak Investigations Salmonella enterica serovar Enteritidis JEGX01.004 JEGX01.002
WGS for Outbreak Investigations JEGX01.002 JEGX01.004
WGS for Outbreak Investigations A = suspect isolate, same time/PFGE B = same patient over 5 weeks
WGS for Outbreak Investigations C = suspect isolate for outbreak 5 D = environmental isolate, egg farm swab
WGS Beyond Outbreak Investigations “…comparison of these 61 genomes sequences revealed that neither the 16S gene, nor the gene fragments usually used for MLST, provides biologically meaningful information on the relatedness of the sequenced isolates. The best way to analyze this is by taking into account all the genomic content, rather than looking at one or a few individual genes.”
WGS Beyond Outbreak Investigations Genome size varies from 4.56 to 5.70 Mb This size variation demonstrates a genomic difference of up to 1 Mb between isolates. 1 Mb = ~1,000 genes
WGS Beyond Outbreak Investigations
Reference Characterization by WGS “One Shot” Characterization of STEC ANI SerotypeFinder VirulenceFinder 7-gene MLST ResFinder Phylogenetic ID GENUS/SPECIES: Escherichia coli SEROTYPE: O104:H4 PATHOTYPE: Shiga toxin producing and Enteroaggregative E. coli (STEC & EAEC) VIRULENCE PROFILE: stx2a, aggR, aggA, sigA, sepA, pic, aatA, aaiC, aap SEQUENCE TYPE: ST678 ANTIMICROBIAL RESISTANCE GENES: blaTEM-1, blaCTX-M-15, strAB, sul2, tet(A)A, dfrA7 wgMLST CODE: 102:45.26.35.3
Summary of Potential WGS Applications Outbreak investigation Sporadic vs outbreak Not just cluster but phylogenetic relationships Microbial Source Tracking (MST) Microbial Surveillance Food Environment Animals, soil, food prep areas, hospitals, etc Antibiotic resistance monitoring Genotype predicts phenotype Mobile vs integrated Virulence gene monitoring What else???
Questions?