Metagenomic analysis of bacteriophage populations in the human gut

Slides:



Advertisements
Similar presentations
RNA-seq library prep introduction
Advertisements

Virus Classification And Description. Classification Parameters Several Parameters Are Used for Classification –Viral classification study is referred.
Bioinformatics Student host Chris Johnston Speaker Dr Kate McCain.
The Sorcerer II Global ocean sampling expedition Katrine Lekang Global Ocean Sampling project (GOS) Global Ocean Sampling project (GOS) CAMERA CAMERA METAREP.
High Throughput Sequencing
De-novo Assembly Day 4.
Mapping protein-DNA interactions by ChIP-seq Zsolt Szilagyi Institute of Biomedicine.
H = -Σp i log 2 p i. SCOPI Each one of the many microbial communities has its own structure and ecosystem, depending on the body environment it exists.
Introduction to next generation sequencing Rolf Sommer Kaas.
Library screening Heterologous and homologous gene probes Differential screening Expression library screening.
NCBI Review Concepts Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)
June 11, 2013 Intro to Bioinformatics – Assembling a Transcriptome Tom Doak Carrie Ganote National Center for Genome Analysis Support.
Essential knowledge 3.C.3:
RNA Sequencing I: De novo RNAseq
How will new sequencing technologies enable the HMP? Elaine Mardis, Ph.D. Associate Professor of Genetics Co-Director, Genome Sequencing Center Washington.
Current Challenges in Metagenomics: an Overview Chandan Pal 17 th December, GoBiG Meeting.
Analyzing Time Course Data: How can we pick the disappearing needle across multiple haystacks? IEEE-HPEC Bioinformatics Challenge Day Dr. C. Nicole Rosenzweig.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
Accurate estimation of microbial communities using 16S tags
__________________________________________________________________________________________________ Fall 2015GCBA 815 __________________________________________________________________________________________________.
What is BLAST? Basic BLAST search What is BLAST?
Metagenomic dataset preprocessing – data reduction
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Gene prediction in metagenomic fragments: A large scale machine learning approach Katharina J Hoff, Maike Tech, Thomas Lingner, Rolf Daniel, Burkhard Morgenstern.
From Reads to Results Exome-seq analysis at CCBR
16S rRNA Experimental Design
What is BLAST? Basic BLAST search What is BLAST?
An Introduction to the Viruses Non-Living Etiologies
Virginia Commonwealth University
Rob Edwards San Diego State University
Bacteriophage Gene Functions
Canadian Bioinformatics Workshops
DNA Sequencing Second generation techniques
Metagenomic Species Diversity.
Bacterial infection by lytic virus
Metagenomics: From Bench to Data Analysis 19-23rd September S rRNA-based surveys for Community Analysis: How Quantitative are they? Dr.
Cancer Genomics Core Lab
Andrew Millard Warwick Medical School University of Warwick
MGmapper A tool to map MetaGenomics data
Quality Control & Preprocessing of Metagenomic Data
Dr. Christoph W. Sensen und Dr. Jung Soh Trieste Course 2017
RNA-Seq analysis in R (Bioconductor)
Considerations for metagenomics data analysis and summary of workflows
Basics of BLAST Basic BLAST Search - What is BLAST?
Metagenomic assembly Cedric Notredame
Research in Computational Molecular Biology , Vol (2008)
The FASTQ format and quality control
Workshop on the analysis of microbial sequence data using ARB
Long way to solve short ncRNA data analysis problems – evaluation of small RNA-Seq datasets from non-model organisms in Galaxy Jochen Bick Jochen Bick.
Single-molecule sequencing and chromatin conformation capture enable de novo reference assembly of the domestic goat genome.
Teagasc/APC Sequencing Facility
Transcriptome Assembly
Viruses.
2nd (Next) Generation Sequencing
H = -Σpi log2 pi.
Microbiome: Metagenomics
Essential knowledge 3. C. 3: youtube. com/watch
Mukoye B., Mangeni B. C., Ndong’a M. F. O. and Were H. K.
Microbiome studies for microbial disease pathogenesis research
Single-Molecule Sequencing: Towards Clinical Applications
BF nd (Next) Generation Sequencing
Gene Regulation results in differential Gene Expression, leading to cell Specialization Viruses
Basic Local Alignment Search Tool
Sequence Analysis - RNA-Seq 2
AS Level Paper 1 and 2. A2 Level Paper 1 and 3 - Topics 1-4
Sequence Analysis - RNA-Seq 1
Welkin Pope SEA-PHAGES Bioinformatics Workshop, 2017
Toward Accurate and Quantitative Comparative Metagenomics
Bacterial Argonaute Samples the Transcriptome to Identify Foreign DNA
Presentation transcript:

Metagenomic analysis of bacteriophage populations in the human gut APC Bioinformatics Workshop 23/03/2018 Metagenomic analysis of bacteriophage populations in the human gut Andrey Shkoporov, Gut Phageomics Spoke andrey.shkoporov@ucc.ie 1

Why study gut phageome? Bacteria Phages Prevotella How gut phages impact on gut bacteriome and host physiology? Human gut virome is composed of 1015 viruses, >98% are bacteriophages. 6-20% of faecal DNA has bacteriophage origin. Majority of bacterial strains contain prophages. Little is known if and how bacteriophages regulate composition and functional repertoire of gut bacteriome. Evidence of sterile-filtered faecal water being able to recapitulate the effect of FMT. Role of phage? Phage transplants as alternatives to FMT? Resistant to damaging factors, easy to store, no need for anaerobic conditions. Phages Bacteria Prevotella Storage at RT Repeated freeze-thaw cycles Can phages be used as biomarkers of bacteriome composition? Faecal bacteriome but not phageome composition is strongly influenced by colonic residence time. Faecal phage composition is resistant to storage, repeated freeze-thaw cycles. Faecal phages can serve as biomarkers of microbiota in the upper gut.

Gut phage metagenomics: typical workflow Isolation of VLPs Purification of viral nucleic acids Library preparation Next generation sequencing (NGS) Processing of reads, assembly and filtration of NGS data* Alignment of reads to closed reference DB DB of viral contigs Analysis of specific genes Host prediction Viral taxonomy Alignment of reads to the DB of assembled contigs Count matrix Statistical tests to identify drivers of diversity Analysis of α- and β-diversity

Isolation of VLPs Original faecal sample homogenated in SM buffer Different morphologies and physical properties of viral particles impact on isolation efficiency Centrifugation at 5 krpm (x2) Filtration through 0.45μm pore filter (x2) Precipitation with PEG/NaCl, extraction with chloroform (may result in removal of enveloped viruses) Bacterial DNA removal is never complete Ribosomes may be co-purified with virions Treatment with DNase and RNase

Purification of nucleic acids and library prep VLP fraction Viral genomes can be dsDNA, ssDNA, dsRNA, ssRNA, circular, linear or fragmented. Nucleic acid yields from faecal VLP are typically low Traditionally used Illumina kits (Truseq Nano, Nextera XT) require WGA (typically MDA) to be done. MDA introduces significant bias Alternative kits optimized for low input of dsDNA/ssDNA should be considered Extraction of nucleic acids (dsDNA, ssDNA, RNA, circular or linear) Reverse transcription, WGA Fragmentation and library prep

Processing of NGS reads (Illumina) Illumina Miseq: 2x300 bp (Nextera XT library) Illumina Hiseq 2500: High Output 2x150 bp (TruSeq Nano or Swift Biosciences Accel-NGS kit) Typically >1M reads per sample Adaptor and quality trimming + filtering increases mapping rates and assembly efficiency Raw Illumina Miseq reads Reads after trimming and filtration FastQC: quality control checks on raw and processed .fastq files Kraken: fast classification of reads against human database Cutadapt: removal of adaptors Trimmomatic: removal of adaptors, quality trimming, cropping and filtering.

Assembly of NGS reads Closed reference database vs. assembled contigs as template read mapping. Bacteriophages are most predominant biological entities on earth and outnumber bacterial 10:1. Despite that, bacteriophages are under-represented in sequence DBs. Mapping of reads to viral reference DBs allows to identify <2% of real diversity in a sample. Assembly-based approach uses same reads to build contigs, which are then used as template for read alignment. Bacteria: 134,397 genomes of 13,537 species Viruses: 7,490 genomes of 4,853 species Mapping reads to the database of contigs results in recruitment of much higher fraction of reads. “Viral dark matter” - uncharacterised component of the phageome. Differentiation between viral and non-viral DNA can be challenging

Challenges associated with assembly of phage metagenomic reads Assembly of NGS reads atgcat agtcta ... Samples: Sample n 1 2 3 4 5 spAdes megahit IDBA Pooled samples Pooled contigs Nonredundant contigs DB BLAST contigs all-vs-all Remove shorter contigs from pairs with >90% homology and >90% overlap Challenges associated with assembly of phage metagenomic reads Variable complexity of samples (30-3000 virotypes) Highly uneven relative abundance of viruses Assemblers tend to fragment genomes in case coverage is either to low or too high Strain-level variations hard to capture Best performing assemblers: spAdes, metaspAdes, Megahit. Velvet, mira, Vicuna, SOAPdenovo and Abyss performed badly. Other considerations Individual sample and pooled assemblies could both be used Different assembler programs Different fixed k-mer lengths Random sub-setting of reads to improve assembly of highly over- represented sequences

Improving virome assembly through hybrid sequencing Technologies such as single molecule real-time (SMRT) sequencing, generate long, low-coverage reads with relatively high error rates. Hybrid sequencing is an approach which uses these long reads in conjunction with accurate short reads to generate high quality assemblies and addresses the individual failings of each sequencing technology. As a proof of concept, we obtained improved assembly of putative viral genomes from a human gut virome using a hybrid sequencing approach. Thomas Sutton Bioinformatics PhD student PacBio Sequel

Removing non-viral contigs and generating count matrix Nonredundant contigs DB Viral RefSeq Virsorter pVOGs DB >2 hits per 10kb >3 hits in total Circular Selected viral contigs Contigs >3 kb with no hits to NCBI nt DB > 100 nt “Viral dark matter” As contamination with bacterial DNA is inevitable procedure to remove contigs of non- viral origin is employed. Selecting for contigs hitting viral RefSeq DB, positive by Virsorter classifier, having >2 hits to pVOGs DB per 10 kb, as well as all circular contigs. Viral dark matter is also included (contigs negative by other tests but not hitting anything in NCBI nt). Mapping reads back to selected contigs Bowtie2 in end-to-end mode Count matrix Bar chart of viral contigs Ordination plots, α- and β-diversity analyses

Viral taxonomy and gene annotation Feargal Ryan Bioinformatics Post-Doc Demovir Performs homology searches of protein encoded by contigs using Usearch against viral portion of UniProtKB/TrEMBL database. Uses voting approach to determine taxonomy of a contig. Extremely accurate (99% in 10-fold cross- validation test with UniProtKB/TrEMBL database). Enrique Gonzales Tortuero Bioinformatics Post-Doc Viga Predicts open reading frames, tRNA, rRNA genes and repeats by wrapping Prodigal, ARAGORN, INFERNAL, PILERCR, TRF and IRF tools. Assign functions to protein-coding genes using BLAST/HMM searches against nr, UniProtKB and PFAM databases. Convenient output in .gbk format.

Acknowledgements Gut Phageomics Lab PIs: Prof. Colin Hill Prof. Paul Ross Bioinformaticians Dr. Feargal Ryan Dr. Adam Clooney Dr. Enrique Gonzales Tortuero Thomas Sutton, PhD student “Wet lab” scientists Dr. Lorraine Draper Dr. Stephen Stockdale As well as all other members of Gut Phageomics Lab and the APC community...