Nicholas Edgington, Ph.D. Southern Connecticut State University Dept. of Biology, New Haven CT, USA RNA-Seq analysis of Mycobacterium smegmatis and its.

Slides:



Advertisements
Similar presentations
IMGS 2012 Bioinformatics Workshop: RNA Seq using Galaxy
Advertisements

The CCA phage hunt 2013 Phage. Lab outline Isolate phage visualize phage Soil samples.
Statistical methods and tools for integrative analysis of perturbation signatures Mario Medvedovic Laboratory for Statistical Genomics and Systems Biology.
 Genomic sequence of model eukaryote Saccharomyces cerevisiae completed in (12.1 Mb)  Despite 16 years of intense research, function of nearly.
GCAT-SEEK Workshop, 2013 Dr. Tammy Tobin, Susquehanna University Adapted from Mary J. Allen, SACS-COC Summer Institute Developing and Using Rubrics to.
Transcriptomics Breakout. Topics Discussed Transcriptomics Applications and Challenges For Each Systems Biology Project –Host and Pathogen Bacteria Viruses.
Isolate phage Prepare phage for testing *Create high titer lysate via flooding method. *Isolate phage DNA and replicate using PCR. *Send off isolated phage.
Dale Beach, Longwood University Lisa Scheifele, Loyola University Maryland.
Bioinformatics at WSU Matt Settles Bioinformatics Core Washington State University Wednesday, April 23, 2008 WSU Linux User Group (LUG)‏
Transcriptomics Jim Noonan GENE 760.
MCB Lecture #21 Nov 20/14 Prokaryote RNAseq.
RNA-seq Analysis in Galaxy
Identifying recombination events in phage Giles through presence of repeat sequences MEGAN MAIR.
Gene Finding Genome Annotation. Gene finding is a cornerstone of genomic analysis Genome content and organization Differential expression analysis Epigenomics.
mRNA-Seq: methods and applications
Before we start: Align sequence reads to the reference genome
NGS Analysis Using Galaxy
BACKGROUND Have a gene involved in neurological disease, its function unclear Knockout is lethal, so… Designed a conditional knockout (cKO) mouse where.
Introduction to RNA-Seq and Transcriptome Analysis
E-BIOGENOUEST: A REGIONAL LIFE SCIENCES INITIATIVE FOR DATA INTEGRATION Datacite Annual Conference Nancy Olivier Collin – IRISA/INRIA
Li and Dewey BMC Bioinformatics 2011, 12:323
National Center for Genome Analysis Support: Carrie Ganote Ram Podicheti Le-Shin Wu Tom Doak Quality Control and Assessment.
Bioinformatics Core Facility Ernesto Lowy February 2012.
Expression Analysis of RNA-seq Data
CEITEC BRNO | CZECH REPUBLIC central european institute of technology CEITEC Genomics and proteomics at MU Jiří Fajkus.
Bioinformatics and OMICs Group Meeting REFERENCE GUIDED RNA SEQUENCING.
Transcriptome analysis With a reference – Challenging due to size and complexity of datasets – Many tools available, driven by biomedical research – GATK.
BIF Group Project Group (A)rabidopsis: David Nieuwenhuijse Matthew Price Qianqian Zhang Thijs Slijkhuis Species: C. Elegans Project: Advanced.
RNAseq analyses -- methods
Introduction to RNA-Seq & Transcriptome Analysis
High Throughput Sequence (HTS) data analysis 1.Storage and retrieving of HTS data. 2.Representation of HTS data. 3.Visualization of HTS data. 4.Discovering.
Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome.
RNA-Seq in Galaxy Igor Makunin QAAFI, Internal Workshop, April 17, 2015.
Introductory RNA-seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis.
Verna Vu & Timothy Abreo
The iPlant Collaborative
RNA surveillance and degradation: the Yin Yang of RNA RNA Pol II AAAAAAAAAAA AAA production destruction RNA Ribosome.
Cloud Implementation of GT-FAR (Genome and Transcriptome-Free Analysis of RNA-Seq) University of Southern California.
Introductory RNA-seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis.
Transcriptomics Sequencing. over view The transcriptome is the set of all RNA molecules, including mRNA, rRNA, tRNA, and other non coding RNA produced.
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on the GEP UCSC Genome Browser Wilson Leung08/2014.
Pathogenomics How this project began: Ann Rose - take advantage of DNA sequence information - genomics Julian Davies - use the information to understand.
Introduction to RNAseq
The National Center for Genomic Analysis Support: creating a national cyberinfrastructure environment for genomics researchers. William Barnett, Thomas.
The iPlant Collaborative
The iPlant Collaborative
No reference available
Post-genomic Virology The impact of bioinformatics, microarrays and proteomics on investigating host and pathogen interactions Steven Masson.
Manuel Holtgrewe Algorithmic Bioinformatics, Department of Mathematics and Computer Science PMSB Project: RNA-Seq Read Simulation.
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
A Genomic Comparison of Cluster A Mycobacteriophages: Redefining Homoimmunity Amanda Scott D. Lovas, M. Richters, S. Bhuiyan, B. Miller, J Harmson C.R.
Graduate Research with Bioinformatics Research Mentors Nancy Warter-Perez, ECE Robert Vellanoweth Chem and Biochem Fellow Sean Caonguyen 8/20/08.
High Throughput Sequence (HTS) data analysis 1.Storage and retrieving of HTS data. 2.Representation of HTS data. 3.Visualization of HTS data. 4.Discovering.
Bioinformatics for biologists (2) Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented.
Introductory RNA-seq Transcriptome Profiling of the hy5 mutation in Arabidopsis thaliana.
Practice:submit the ChIP_Streamline.pbs 1.Replace with your 2.Make sure the.fastq files are in your GMS6014 directory.
RNA-Seq with the Tuxedo Suite Monica Britton, Ph.D. Sr. Bioinformatics Analyst September 2015 Workshop.
Integrating Genomic Datasets to Identify Host-Pathogen Interactions
Introductory RNA-seq Transcriptome Profiling
Easier Workflows & Tool comparison with oqtans+
RNA Quantitation from RNAseq Data
WS9: RNA-Seq Analysis with Galaxy (non-model organism )
Introduction to Bioinformatics and Functional Genomics
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Genomes and Their Evolution
Angela L. Rasmussen, Michael G. Katze  Cell Host & Microbe 
Additional file 2: RNA-Seq data analysis pipeline
Presentation transcript:

Nicholas Edgington, Ph.D. Southern Connecticut State University Dept. of Biology, New Haven CT, USA RNA-Seq analysis of Mycobacterium smegmatis and its phage pathogen during infection

Phages are viruses that attack bacteria. They are everywhere and represent an amazing amount of biomass on the planet earth. In fact, it is estimated that there are viruses in the world’s oceans alone. Mycobacteriophages specifically attack mycobacteria, which includes the important human pathogens that cause leprosy (Mycobacterium leprae) and tuberculosis (Mycobacterium tuberculosis), as well as the harmless Mycobacterium smegmatis (M. smeg). According to the WHO: tuberculosis is second largest killer (after HIV) by infection of a single infectious agent, and one third of the entire Earth’s human population is infected with latent tuberculosis.

Background To date, over 500 mycobacteriophage genomes have been sequenced mostly through the HHMI Science Education Alliance’s Phage Hunters Advancing Genomics & Evolutionary Science program (HHMI SEA-PHAGES) in conjunction with Dr. Graham Hatfull’s University of Pittsburgh laboratory (phagesdb.org) (Pope et al., 2011). About 90% of highly conserved mycobacteriophage genes “phamilies” have no known function! Let’s use ‘Dual RNA-Seq’ to validate mycobacteriophage gene annotations, and determine the temporal pattern of gene expression during infection.

Module Research Goals Use ‘Dual RNA-Seq’ to validate mycobacteriophage gene annotations. Determine the temporal pattern of gene expression of the mycobacteriophage during infection. Determine the temporal pattern of gene expression of the host M. smegmatis. Identify the ‘repressor’ gene of temperate phage by analyzing a lysogen. Croucher, N.J., and Thomson, N.R. (2010).

Student Learning Goals Be able to explain host-pathogen interactions & mechanisms in a ‘simple’ bacteria-phage system. Understand the advantages of using NGS technologies (including RNA-Seq) to elucidate gene expression patterns. Be able to perform an analytic pipeline in a Galaxy environment in order to discover gene expression patterns in a ‘dual RNA-Seq’ experiment. Be able to perform and understand the statistical implications of RNA-Seq experiments.

Vision and Change Core Competencies Ability to apply the process of science: Perform the analysis of a dual RNA-Seq dataset. Ability to use quantitative reasoning: Perform quantitative analysis and apply mathematical reasoning to the analysis of a RNA- Seq dataset. Ability to use modeling and simulation: Be able to explain the complex systems that regulate host-phage interactions Be able to run simulations of RNA-Seq datasets, and observe the effects of modifying program parameters Ability to tap into the interdisciplinary nature of science: NGS technologies represent an interdisciplinary science that intersects with physics, computer science, engineering, statistical inference, and information science. Ability to communicate and collaborate with other disciplines: Collaborate to identify the gene expression patterns of a phage and its host, and present the data to their peers Ability to understand the relationship between science and society: Understand that bacteriophage profoundly affect global ecosystems, can be used to treat human diseases, and can produce useful biotechnological tools for the benefit of humans.

GCAT-SEEK sequencing requirements The organism is the Mycobacterium smegmatis mc /- phage infection The Mycobacterium smegmatis mc genome is a single circular chromosome of 6.99Mb with 6,742 genes, and The ABCat phage genome is Kb with 145 predicted genes. Samples would be pelleted and resuspended in the RNeasy Mini Kit (Qiagen). The suggested kit for rRNA depletion is RiboZero for Gram + bacteria (Epicentre). Need to get around 200 million reads/sample, with around 160 million reads coming from the host (M.smeg.), and depending on time point, between million reads from the bacteriophage genome (therefore the phage transcripts would represent ~0.2-2% of the reads). generate single-end libraries from the TruSeq Illumina kit, without multiplexing.

Computer/program requirements for data analysis Internet connection, Mac OS, Linux (Ubuntu is nice) Web browser (excluding IE) Computer programs: Galaxy (and pre-compiled RNA-Seq/NGS tools) public, local, or Amazon EC2 instance url: ‘usegalaxy.org’ R stats (if using BioConductor NGS packages) Python 2.7 (if using ‘bcbio-nextgen’ or ‘biopython’ modules)

Student Assessments Pre- and post-tests for understanding of statistical methods, calculations, considerations comprehension of RNA-Seq methodology (wet-lab techniques and in silico analysis ability to explain host-pathogen interactions & mechanisms in a ‘simple’ bacteria-phage system. Assessment of student confidence in using NGS computational tools and in navigating in a Linux environment.

Timeline WEEK 1: Learn to navigate and use “usegalaxy.org” (ie Galaxy) to create a workflow for the analysis of dual RNA-Seq data from Mycobacterium smegmatis mc2 155 and a mycobacteriophage. Import RNA-Seq datasets that will be received from GCAT-SEEK sequencing facility in late summer into Galaxy. Convert Genbank files for Mycobacterium smegmatis mc2 155, and the selected Myccobacteriophage to a GFF file format using the Rätsch lab’s Galaxy Instance or use the “bcbio-nextgen” Python module. Import “GTF” or “GFF3” formatted reference genomes for Mycobacterium smegmatis mc2 155 Import a fasta file of the genomic sequence of the mycobacteriophage ABCcat. WEEKS 2-4: Use the Galaxy RNA-Seq tools to maps reads to the two reference genomes. bowtie, cufflinks, rsem Identify genes that are differentially expressed Cuffmerge, Cuffdiff Use R to visualize alignments and differential gene expression plots

Discussion & Lecture Topics ELSI of NGS technologies Bacterial Host-Pathogen interactions Bacteriophage replication mechanisms Lytic versus Lysogenic lifestyles The connection between mycobacteriophage genome architecture and temporal gene expression patterns Determining gene phylogeny through sequence comparisons using bioinformatic tools

References Croucher, N.J., and Thomson, N.R. (2010). Studying bacterial transcriptomes using RNA-seq. Curr Opin Microbiol 13, 619–624. Dedrick, R.M., Marinelli, L.J., Newton, G.L., Pogliano, K., Pogliano, J., and Hatfull, G.F. (2013). Functional requirements for bacteriophage growth: gene essentiality and expression in mycobacteriophage Giles. Mol Microbiol. Giardine, B., Riemer, C., Hardison, R.C., Burhans, R., Elnitski, L., Shah, P., Zhang, Y., Blankenberg, D., Albert, I., Taylor, J., et al. (2005). Galaxy: a platform for interactive large-scale genome analysis. Genome Res 15, 1451– Goecks, J., Nekrutenko, A., Taylor, J., Galaxy Team (2010). Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol 11, R86. Haas, B.J., Chin, M., Nusbaum, C., Birren, B.W., and Livny, J. (2012). How deep is deep enough for RNA-Seq profiling of bacterial transcriptomes? BMC Genomics 13, 734. Haas, B.J., and Zody, M.C. (2010). Advancing RNA-Seq analysis. Nat. Biotechnol. 28, 421–423. Hatfull, G.F. (2012). The secret lives of mycobacteriophages. Adv. Virus Res. 82, 179–288. Henry, M., and Debarbieux, L. (2012). Tools from viruses: bacteriophage successes and beyond. Virology 434, 151–161. Jacobs-Sera, D., Marinelli, L.J., Bowman, C., Broussard, G.W., Guerrero Bustamante, C., Boyle, M.M., Petrova, Z.O., Dedrick, R.M., Pope, W.H., Science Education Alliance Phage Hunters Advancing Genomics And Evolutionary Science Sea-Phages Program, et al. (2012). On the nature of mycobacteriophage diversity and host preference. Virology. Pope, W.H., Jacobs-Sera, D., Russell, D.A., Peebles, C.L., Al-Atrache, Z., Alcoser, T.A., Alexander, L.M., Alfano, M.B., Alford, S.T., Amy, N.E., et al. (2011). Expanding the diversity of mycobacteriophages: insights into genome architecture and evolution. PLoS ONE 6, e Soneson, C., and Delorenzi, M. (2013). A comparison of methods for differential expression analysis of RNA-seq data. BMC Bioinformatics 14, 91 Westermann, A.J., Gorski, S.A., and Vogel, J. (2012). Dual RNA-seq of pathogen and host. Nat. Rev. Microbiol. 10, 618–630.