Introductory RNA-seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis.

Slides:



Advertisements
Similar presentations
IMGS 2012 Bioinformatics Workshop: RNA Seq using Galaxy
Advertisements

Advanced ChIP-seq Identification of consensus binding sites for the LEAFY transcription factor Explain that you can use your own data Explain that data.
DEG Mi-kyoung Seo.
Introduction To Next Generation Sequencing (NGS) Data Analysis
RNA-seq Analysis in Galaxy
RNA-Seq data analysis Qi Liu Department of Biomedical Informatics
Bacterial Genome Assembly | Victor Jongeneel Radhika S. Khetani
Before we start: Align sequence reads to the reference genome
NGS Analysis Using Galaxy
An Introduction to RNA-Seq Transcriptome Profiling with iPlant
RNA-Seq Visualization
Introduction to RNA-Seq and Transcriptome Analysis
Customized cloud platform for computing on your terms !
Expression Analysis of RNA-seq Data
Genomics Virtual Lab: analyze your data with a mouse click Igor Makunin School of Agriculture and Food Sciences, UQ, April 8, 2015.
Transcriptome analysis With a reference – Challenging due to size and complexity of datasets – Many tools available, driven by biomedical research – GATK.
BIF Group Project Group (A)rabidopsis: David Nieuwenhuijse Matthew Price Qianqian Zhang Thijs Slijkhuis Species: C. Elegans Project: Advanced.
Advanced ChIPseq Identification of consensus binding sites for the LEAFY transcription factor.
RNAseq analyses -- methods
Introduction to RNA-Seq & Transcriptome Analysis
NGS data analysis CCM Seminar series Michael Liang:
Next Generation DNA Sequencing
TopHat Mi-kyoung Seo. Today’s paper..TopHat Cole Trapnell at the University of Washington's Department of Genome Sciences Steven Salzberg Center.
Transcriptome Analysis
RNA-Seq in Galaxy Igor Makunin QAAFI, Internal Workshop, April 17, 2015.
An Introduction to RNA-Seq Transcriptome Profiling with iPlant.
Introduction to RNA-Seq
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop RNA-Seq using the Discovery Environment And COGE.
Introduction To Next Generation Sequencing (NGS) Data Analysis
BIF Group Project Group (A)rabidopsis: David Nieuwenhuijse Matthew Price Qianqian Zhang Thijs Slijkhuis.
IPlant Collaborative Discovery Environment RNA-seq Basic Analysis Log in with your iPlant ID; three orange icons.
Introductory RNA-seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis.
Galaxy – Set up your account. Galaxy – Two ways to get your data.
RNA-Seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis is doing the.
Introduction to RNAseq
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop RNA-Seq visualization with cummeRbund.
Trinity College Dublin, The University of Dublin GE3M25: Data Analysis, Class 4 Karsten Hokamp, PhD Genetics TCD, 07/12/2015
Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented.
Trinity College Dublin, The University of Dublin Data download: bioinf.gen.tcd.ie/GE3M25/project Get.fastq.gz file associated with your student ID
The iPlant Collaborative
An Introduction to RNA-Seq Transcriptome Profiling with iPlant (
The iPlant Collaborative
No reference available
Canadian Bioinformatics Workshops
User-friendly Galaxy interface and analysis workflows for deep sequencing data Oskari Timonen and Petri Pölönen.
RNA-Seq visualization with CummeRbund
Introduction to Exome Analysis in Galaxy Carol Bult, Ph.D. Professor Deputy Director, JAX Cancer Center Short Course Bioinformatics Workshops 2014 Disclaimer…I.
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Overview of Genomics Workflows
Canadian Bioinformatics Workshops
RNA Seq Analysis Aaron Odell June 17 th Mapping Strategy A few questions you’ll want to ask about your data… - What organism is the data from? -
Introductory RNA-seq Transcriptome Profiling of the hy5 mutation in Arabidopsis thaliana.
Canadian Bioinformatics Workshops
Introductory RNA-seq Transcriptome Profiling
GCC Workshop 9 RNA-Seq with Galaxy
Cancer Genomics Core Lab
WS9: RNA-Seq Analysis with Galaxy (non-model organism )
Integrative Genomics Viewer (IGV)
Advanced ChIP-seq Identification of consensus binding sites for the LEAFY transcription factor Explain that you can use your own data Explain that data.
RNA-Seq visualization with CummeRbund
How to store and visualize RNA-seq data
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Introductory RNA-Seq Transcriptome Profiling
Kallisto: near-optimal RNA seq quantification tool
Introduction To Next Generation Sequencing (NGS) Data Analysis
Additional file 2: RNA-Seq data analysis pipeline
Transcriptomics – towards RNASeq – part III
RNA-Seq Data Analysis UND Genomics Core.
Presentation transcript:

Introductory RNA-seq Transcriptome Profiling

Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis is doing the alignments of the reads (in Sanger fastq format) for all replicates against the reference genome.

Overview: This training module is designed to provide a hands on experience in using RNA-Seq for transcriptome profiling. Question: How well is the annotated transcriptome represented in RNA-seq data in Arabidopsis WT and hy5 genetic backgrounds? How can we compare gene expression levels in the two samples? RNA-seq in the Discovery Environment

Scientific Objective LONG HYPOCOTYL 5 (HY5) is a basic leucine zipper transcription factor (TF). Mutations in the HY5 gene cause aberrant phenotypes in Arabidopsis morphology, pigmentation and hormonal response. We will use RNA-seq to compare the transcriptomes of seedlings from WT and hy5 genetic backgrounds to identify HY5-regulated genes.

Samples Experimental data downloaded from the NCBI Short Read Archive (GEO:GSM and GEO:GSM613466) Two replicates each of RNA-seq runs for Wild- type and hy5 mutant seedlings.

Specific Objectives By the end of this module, you should 1)Be more familiar with the DE user interface 2)Understand the starting data for RNA-seq analysis 3)Be able to align short sequence reads with a reference genome in the DE 4)Be able to analyze differential gene expression in the DE 5)Be able to use DE text manipulation tools to explore the gene expression data

RNA-Seq Conceptual Overview Image source:

@SRR HWUSI-EAS455:3:1:1:1096 length=41 CAAGGCCCGGGAACGAATTCACCGCCGTATGGCTGACCGG C HWUSI-EAS455:3:1:2:1592 length=41 GAGGCGTTGACGGGAAAAGGGATATTAGCTCAGCTGAATCT + @SRR HWUSI-EAS455:3:1:2:869 length=41 TGCCAGTAGTCATATGCTTGTCTCAAAGATTAAGCCATGCA + HWUSI-EAS455:3:1:4:1075 length=41 CAGTAGTTGAGCTCCATGCGAAATAGACTAGTTGGTACCAC HWUSI-EAS455:3:1:5:238 length=41 AAAAGGGTAAAAGCTCGTTTGATTCTTATTTTCAGTACGAA + @SRR HWUSI-EAS455:3:1:5:1871 length=41 GTCATATGCTTGTCTCAAAGATTAAGCCATGCATGTGTAAG HWUSI-EAS455:3:1:5:1981 length=41 GAACAACAAAACCTATCCTTAACGGGATGGTACTCACTTTC + : Bioinformagician

$ tophat -p 8 -G genes.gtf -o C1_R1_thout genome C1_R1_1.fq C1_R1_2.fq $ tophat -p 8 -G genes.gtf -o C1_R2_thout genome C1_R2_1.fq C1_R2_2.fq $ tophat -p 8 -G genes.gtf -o C1_R3_thout genome C1_R3_1.fq C1_R3_2.fq $ tophat -p 8 -G genes.gtf -o C2_R1_thout genome C2_R1_1.fq C1_R1_2.fq $ tophat -p 8 -G genes.gtf -o C2_R2_thout genome C2_R2_1.fq C1_R2_2.fq $ tophat -p 8 -G genes.gtf -o C2_R3_thout genome C2_R3_1.fq C1_R3_2.fq $ cufflinks -p 8 -o C1_R1_clout C1_R1_thout/accepted_hits.bam $ cufflinks -p 8 -o C1_R2_clout C1_R2_thout/accepted_hits.bam $ cufflinks -p 8 -o C1_R3_clout C1_R3_thout/accepted_hits.bam $ cufflinks -p 8 -o C2_R1_clout C2_R1_thout/accepted_hits.bam $ cufflinks -p 8 -o C2_R2_clout C2_R2_thout/accepted_hits.bam $ cufflinks -p 8 -o C2_R3_clout C2_R3_thout/accepted_hits.bam $ cuffmerge -g genes.gtf -s genome.fa -p 8 assemblies.txt $ cuffdiff -o diff_out -b genome.fa -p 8 –L C1,C2 -u merged_asm/merged.gtf \./C1_R1_thout/accepted_hits.bam,./C1_R2_thout/accepted_hits.bam,\./C1_R3_thout/accepted_hits.bam \./C2_R1_thout/accepted_hits.bam,\./C2_R3_thout/accepted_hits.bam,./C2_R2_thout/accepted_hits.bam Your RNA-Seq Data Your transformed RNA-Seq Data

RNA-Seq Analysis Workflow Tophat (bowtie) Cufflinks Cuffmerge Cuffdiff CummeRbund Your Data iPlant Data Store FASTQ Discovery Environment Atmosphere

Quick Summary Find Differentially Expressed genes Align to Genome: TopHat View Alignments: IGV Differential Expression: CuffDiff Download Reads from SRA Export Reads to FASTQ

Import SRA data from NCBI SRA Extract FASTQ files from the downloaded SRA archives Pre-Configured: Getting the RNA-seq Data

Examining Data Quality with fastQC

RNA-Seq Workflow Overview

Align the four FASTQ files to Arabidopsis genome using TopHat Step 1: Align Reads to the Genome Built-in ref. genomes User provided ref. genomes A single FASTQ file Folder with >= 1 FASTQ files

TopHat TopHat is one of many applications for aligning short sequence reads to a reference genome. It uses the BOWTIE aligner internally. Other alternatives are BWA, MAQ, OLego, Stampy, Novoalign, etc.

RNA-seq Sample Read Statistics Genome alignments from TopHat were saved as BAM files, the binary version of SAM (samtools.sourceforge.net/). Reads retained by TopHat are shown below Sequence runWT-1WT-2hy5-1hy5-2 Reads10,866,70210,276,26813,410,01112,471,462 Seq. (Mbase)

Index BAM files using SAMtools Prepare BAM files for viewing

Using IGV in Atmosphere 1.We already Launched an instance of NGS Viewers in Atmosphere 2.Use VNClient to connect to your remote desktop

Pre-configured VM for NGS Viewers

The Integrative Genomics Viewer (IGV) is a high-performance visualization tool for interactive exploration of large, integrated genomic datasets. It supports a wide variety of data types, including array-based and next-generation sequence data, and genomic annotations. Use IGV to inspect outputs from TopHat Integrated Genomics Viewer (IGV)

ATG44120 (12S seed storage protein) significantly down-regulated in hy5 mutant Background (> 9-fold p=0). Compare to gene on right lacking differential expression

RNA-Seq Workflow Overview

CuffDiff CuffLinks is a program that assembles aligned RNA-Seq reads into transcripts, estimates their abundances, and tests for differential expression and regulation transcriptome-wide. CuffDiff is a program within CuffLinks that compares transcript abundance between samples

Examining Differential Gene Expression

Examining the Gene Expression Data

Filter CuffDiff results for up or down-regulated gene expression in hy5 seedlings Differentially expressed genes

Example filtered CuffDiff results generated with the Filter_CuffDiff_Results to 1)Select genes with minimum two-fold expression difference 2)Select genes with significant differential expression (q <= 0.05) 3)Add gene descriptions

Coming Soon: Downstream Analysis with cummeRbund

Coming Soon: Support for Paired End Reads and Other Sequencing Platforms ?