Presentation is loading. Please wait.

Presentation is loading. Please wait.

Metagenomic analysis of bacteriophage populations in the human gut

Similar presentations


Presentation on theme: "Metagenomic analysis of bacteriophage populations in the human gut"— Presentation transcript:

1 Metagenomic analysis of bacteriophage populations in the human gut
APC Bioinformatics Workshop 23/03/2018 Metagenomic analysis of bacteriophage populations in the human gut Andrey Shkoporov, Gut Phageomics Spoke 1

2 Why study gut phageome? Bacteria Phages Prevotella
How gut phages impact on gut bacteriome and host physiology? Human gut virome is composed of 1015 viruses, >98% are bacteriophages. 6-20% of faecal DNA has bacteriophage origin. Majority of bacterial strains contain prophages. Little is known if and how bacteriophages regulate composition and functional repertoire of gut bacteriome. Evidence of sterile-filtered faecal water being able to recapitulate the effect of FMT. Role of phage? Phage transplants as alternatives to FMT? Resistant to damaging factors, easy to store, no need for anaerobic conditions. Phages Bacteria Prevotella Storage at RT Repeated freeze-thaw cycles Can phages be used as biomarkers of bacteriome composition? Faecal bacteriome but not phageome composition is strongly influenced by colonic residence time. Faecal phage composition is resistant to storage, repeated freeze-thaw cycles. Faecal phages can serve as biomarkers of microbiota in the upper gut.

3 Gut phage metagenomics:
typical workflow Isolation of VLPs Purification of viral nucleic acids Library preparation Next generation sequencing (NGS) Processing of reads, assembly and filtration of NGS data* Alignment of reads to closed reference DB DB of viral contigs Analysis of specific genes Host prediction Viral taxonomy Alignment of reads to the DB of assembled contigs Count matrix Statistical tests to identify drivers of diversity Analysis of α- and β-diversity

4 Isolation of VLPs Original faecal sample homogenated in SM buffer Different morphologies and physical properties of viral particles impact on isolation efficiency Centrifugation at 5 krpm (x2) Filtration through 0.45μm pore filter (x2) Precipitation with PEG/NaCl, extraction with chloroform (may result in removal of enveloped viruses) Bacterial DNA removal is never complete Ribosomes may be co-purified with virions Treatment with DNase and RNase

5 Purification of nucleic acids and library prep
VLP fraction Viral genomes can be dsDNA, ssDNA, dsRNA, ssRNA, circular, linear or fragmented. Nucleic acid yields from faecal VLP are typically low Traditionally used Illumina kits (Truseq Nano, Nextera XT) require WGA (typically MDA) to be done. MDA introduces significant bias Alternative kits optimized for low input of dsDNA/ssDNA should be considered Extraction of nucleic acids (dsDNA, ssDNA, RNA, circular or linear) Reverse transcription, WGA Fragmentation and library prep

6 Processing of NGS reads (Illumina)
Illumina Miseq: 2x300 bp (Nextera XT library) Illumina Hiseq 2500: High Output 2x150 bp (TruSeq Nano or Swift Biosciences Accel-NGS kit) Typically >1M reads per sample Adaptor and quality trimming + filtering increases mapping rates and assembly efficiency Raw Illumina Miseq reads Reads after trimming and filtration FastQC: quality control checks on raw and processed .fastq files Kraken: fast classification of reads against human database Cutadapt: removal of adaptors Trimmomatic: removal of adaptors, quality trimming, cropping and filtering.

7 Assembly of NGS reads Closed reference database vs. assembled contigs
as template read mapping. Bacteriophages are most predominant biological entities on earth and outnumber bacterial 10:1. Despite that, bacteriophages are under-represented in sequence DBs. Mapping of reads to viral reference DBs allows to identify <2% of real diversity in a sample. Assembly-based approach uses same reads to build contigs, which are then used as template for read alignment. Bacteria: 134,397 genomes of 13,537 species Viruses: 7,490 genomes of 4,853 species Mapping reads to the database of contigs results in recruitment of much higher fraction of reads. “Viral dark matter” - uncharacterised component of the phageome. Differentiation between viral and non-viral DNA can be challenging

8 Challenges associated with assembly of phage metagenomic reads
Assembly of NGS reads atgcat agtcta ... Samples: Sample n spAdes megahit IDBA Pooled samples Pooled contigs Nonredundant contigs DB BLAST contigs all-vs-all Remove shorter contigs from pairs with >90% homology and >90% overlap Challenges associated with assembly of phage metagenomic reads Variable complexity of samples ( virotypes) Highly uneven relative abundance of viruses Assemblers tend to fragment genomes in case coverage is either to low or too high Strain-level variations hard to capture Best performing assemblers: spAdes, metaspAdes, Megahit. Velvet, mira, Vicuna, SOAPdenovo and Abyss performed badly. Other considerations Individual sample and pooled assemblies could both be used Different assembler programs Different fixed k-mer lengths Random sub-setting of reads to improve assembly of highly over- represented sequences

9 Improving virome assembly through hybrid sequencing
Technologies such as single molecule real-time (SMRT) sequencing, generate long, low-coverage reads with relatively high error rates. Hybrid sequencing is an approach which uses these long reads in conjunction with accurate short reads to generate high quality assemblies and addresses the individual failings of each sequencing technology. As a proof of concept, we obtained improved assembly of putative viral genomes from a human gut virome using a hybrid sequencing approach. Thomas Sutton Bioinformatics PhD student PacBio Sequel

10 Removing non-viral contigs and generating count matrix
Nonredundant contigs DB Viral RefSeq Virsorter pVOGs DB >2 hits per 10kb >3 hits in total Circular Selected viral contigs Contigs >3 kb with no hits to NCBI nt DB > 100 nt “Viral dark matter” As contamination with bacterial DNA is inevitable procedure to remove contigs of non- viral origin is employed. Selecting for contigs hitting viral RefSeq DB, positive by Virsorter classifier, having >2 hits to pVOGs DB per 10 kb, as well as all circular contigs. Viral dark matter is also included (contigs negative by other tests but not hitting anything in NCBI nt). Mapping reads back to selected contigs Bowtie2 in end-to-end mode Count matrix Bar chart of viral contigs Ordination plots, α- and β-diversity analyses

11 Viral taxonomy and gene annotation
Feargal Ryan Bioinformatics Post-Doc Demovir Performs homology searches of protein encoded by contigs using Usearch against viral portion of UniProtKB/TrEMBL database. Uses voting approach to determine taxonomy of a contig. Extremely accurate (99% in 10-fold cross- validation test with UniProtKB/TrEMBL database). Enrique Gonzales Tortuero Bioinformatics Post-Doc Viga Predicts open reading frames, tRNA, rRNA genes and repeats by wrapping Prodigal, ARAGORN, INFERNAL, PILERCR, TRF and IRF tools. Assign functions to protein-coding genes using BLAST/HMM searches against nr, UniProtKB and PFAM databases. Convenient output in .gbk format.

12 Acknowledgements Gut Phageomics Lab PIs: Prof. Colin Hill
Prof. Paul Ross Bioinformaticians Dr. Feargal Ryan Dr. Adam Clooney Dr. Enrique Gonzales Tortuero Thomas Sutton, PhD student “Wet lab” scientists Dr. Lorraine Draper Dr. Stephen Stockdale As well as all other members of Gut Phageomics Lab and the APC community...


Download ppt "Metagenomic analysis of bacteriophage populations in the human gut"

Similar presentations


Ads by Google