Andrew Millard Warwick Medical School University of Warwick Viral Metagenomics Andrew Millard Warwick Medical School University of Warwick Email: a.d.millard@warwick.ac.uk Twitter: @milja001
Overview Brief history Methods Analysis Examples
Brief History 1St Viral Metagenome – Brietbart 2003, PNAS 2002 Marine seawater sample Human faeces – Brietbart 2003, J Bacteriology Four Ocean biomes – Angly 2006, PLoS Biology Global Ocean Survey – Williamson 2008, PLOS ONE Human faeces from monozygotic twins – Reyes 2010, Nature Pacific Ocean Virome – Hurwitz 2013, PLOS ONE TARA Oceans – Brum 2015 , Science
Collection of Viral Fraction Concentration Nucleic Acid Extraction Library Preparation Analysis
Virus Collection Separation from larger particulate matter Water samples –easy Faecal matter – less fun Iron Chloride flocculation – (Poulos 2015) Filtration Tangential flow filtration Both methods work Depends on samples – TFF can filter from 0.5 l to 100s litres of water.
Concentration - Ceasium Chloride Expensive Specialised equipment Ultra centrifuge Low throughput Does not remove all free bacterial DNA Viral band
Concentration - Columns Amicon columns Higher throughput Cheaper Less specialised equipment
Viral nucleic acid types dsDNA Linear Circular Linear with single strand breaks ssDNA dsRNA Linear, positive , negative strand, segmented ssRNA One library preparation method will NOT target all viruses
Bacterial Contamination (?) Does it matter ? Removal of bacterial DNA by DnaseI treatment, prior to viral lysis. Check for contamination with PCR or qPCR – 16S primers, rpoB etc BUT – is detection contamination ? Transduction Rate vary for different bacteriophages 1x109 vlp in seawater Transduction rate of 1x10-6 1x103 particles may contain host DNA – is this contamination ?
Sequencing Library Preparation Options How much sequence is enough ? Will depend on nucleic acid type DNA –dsDNA –Nextera XT Options RASL, LASL , Nextera XT, TruSeq …… Amount of DNA is dependent on method used Amplification – phi29. Can cause 10,000 x differences in resultant population (Zhang 2006 ) How much sequence is enough ? Depends on the question
To assemble or not ? Dependent on question of interest Assembly MetaVelvet, CLC, Ray Meta , etc Annotation EBI, MG-RAST PROKKA (Seeman 2014) with custom bacteriophage database
Analysis – The Dark Matter Known sequence Unknown sequence Data will look something like this Viral metagenomic sequences from human faeces, a marine sediment sample and two seawater samples were compared to the GenBank non-redundant database at the date of publication and in December 2004. The percentage of each library that could be classified as Eukarya, Bacteria, Archaea, viruses or showed no similarities (E-value >0.001) is shown.
Viral Diversity Majority of sequences will have no similarity to current databases Estimates of viral protein clusters vary ~2 billion protein clusters (Rohwer 2003) 3.9 million (Espinoza, 2013) ~5,746 virus populations in the ocean Only 39 previously identified
Analysis Issues- Databases Databases are crucial Very small database of bacteriophage and viral genomes 4026 complete viral genomes (http://www.ebi.ac.uk/genomes/virus.html) ~96 Mb total sequence ~20 kb mean ( 7 kb median) viral genome size In contrast >40,000 Salmonella genomes !! 1 MiSeq run V3 chemistry ~ 15 Gb Would allow sequencing of all complete viral genomes to reasonable coverage ( in theory)
Analysis – Assembly Fully assembled genomes no misassembly Partially assembled genomes no misassembly Genomes with assembly errors Mycobacterium phages 88% (169/192) 3.6 % (7/192) 8.4% (16/192) Pseudomonas phages 85.5 % (164/192) 8.3% 6.2% (12/192) Mycobacterium & Pseudomonas phages 100 % (192/192) 0 % (0/192) Mycobacterium & Pseudomonas & Synechococcus phages 92.7% (267/288) 3.47% (10/288) 3.81 % (11/288) Mycobacterium & Pseudomonas & Synechococcus & Bacillus phages 87.32 % (335/384) 4.42% (17/384) 8.33% (32/384) In silico modelling suggests misassembly of phage genomes will occur in mixtures of closely related phages
Viral Metagenomics Lake Borgoria, Kenya Sophie Clough & Martha Clokie
Background Flamingos eat Arthrospira sp blooms High specialised diet Crashes in blooms results in less food Flamingos die ( Kaggwa et al, 2013) Flamingos provide tourism to local community. Cyanophage implicated in lysis of blooms Peduzzi et al. (2014)
Data Collected Viral numbers Identification of Arthrospira sp present Abundance data of Arthrospira Viral DNA samples CTD, pH, Samples are continually being collected on a monthly basis
Viral Metagenomics Iron chloride concentration Nextera XT library preparation Analysis Assembly CLCworkbench MetaVir2 analysis Annotation against custom viral database
Number of sequence clusters Diversity Number of sequences
Sample Summary Sample Details Contigs Genes Predicted Circular Contigs Unaffiliated Contigs Affiliated Contigs % Affiliated a22 C/0 67315 88990 42 52070 15245 29.2 a23 N/0 48677 71352 21 36112 12565 34.7 a25 S/0 109987 157761 67 88809 21178 23.8 b22 C/25 79880 12744 54 59185 20695 34.9 b23 N/25 52125 73752 17 40182 11943 29.7 b25 S/25 87002 132177 53 67179 19823 29.50 c22 C/50 12936 18973 4 9265 3671 39.6 c23 N/50 50327 71111 39724 10603 26.6 c25 S/50 43624 62006 10 34821 8803 25.8 d22 C/75 59922 85208 47640 12082 25.3 d25 N/75 8355 12065 6 5537 2818 50.8
Pelagibacter phage HTVC008M Mycobacterium phage PattyP Synechococcus phage ACG-2014c
Novel phage 1 No similarity Phage associated protein Identification of multiple novel phage assemblies
Depth Analysis Viral community changes with depth North Basin Vertical Stratification Comparisons Viral community changes with depth
Flamingo Summary Viral metagenomes are diverse Majority of genes are unknown Identification of multiple novel phage isolates
Are phage a reservoir of antibiotic resistance genes in slurry ? Grass in 70 Litres a day ~30 million tonnes of slurry per year is spread onto farmland
Assaying for phage carriage of antibiotic resistance genes in cattle slurry Isolate bacteriophage Sequence their genomes Add phage fraction to bacterial isolates Assay for antibiotic resistance Selects for temperate phage only Adding phage increased occurrence of resistance colonies ( 0.005 % of cells when using E. coli ) * Metagenomics of viral fraction Search for antibiotic genes
Phage Isolation on E.coli Isolated 20 phage Purify Sequence Annotate Stdev 80 14 plaques 305 pfu/ml
Diversity of phage genomes isolated on E. coli T4-like T4-like HK587-like T5-like Shivani Seurat Novel
No Antibiotic Resistance genes Bacteriophage T4 Gp37 – tail fibre proteins are variable Alt -NAD+:protein ADP-ribosyltransferase No Antibiotic Resistance genes
Slurry Metagenome 2,171,116 PE reads ~217,735 contigs 56,824 genes 12,236 are annotated as known phage genes (BLASTp 1e-5) Only 10% of reads map to any known viral genome BUT 1% of reads map to new novel phage genome Coverage
Kraken-Analysis 99.25 % reads unassigned Partitiviridae - plant and funfal genomes
Analysis of assembled contigs- MetaVir 70% contigs contigs have no “known” viral genes 30% contigs associated as viral contigs Dominated by Siphoviridae
Assembled phage genomes (?) Known phage protein .............20 No similarity Bacterial/phage protein
Antibiotic Resistance Genes 190 genes have similarity to known antibiotic resistance genes Database from http://ardb.cbcb.umd.edu/ non are localized on a contig with a “known” phage gene
Conclusions III Cattle slurry harbours a vast diversity of unknown bacteriophages Just like every other viral metagenome !! Isolation of 20 phage from single host did not isolate the “same” phage twice. Viral fraction can transfer antibiotic resistance to E. coli Viral metagenomics reveals the presence of antibiotic resistance genes
Acknowledgements Prof Martha Clokie Sophie Clough Becky Smith Marie O Hara