An Introduction to RNA-Seq Transcriptome Profiling with iPlant (https://pods.iplantc.org/wiki/x/axO)

Slides:

Advertisements

Similar presentations

IMGS 2012 Bioinformatics Workshop: RNA Seq using Galaxy

Advertisements

Advanced ChIP-seq Identification of consensus binding sites for the LEAFY transcription factor Explain that you can use your own data Explain that data.

DEG Mi-kyoung Seo.

RNAseq analysis Bioinformatics Analysis Team

Introduction To Next Generation Sequencing (NGS) Data Analysis

RNA-seq Analysis in Galaxy

RNA-Seq data analysis Qi Liu Department of Biomedical Informatics

Bacterial Genome Assembly | Victor Jongeneel Radhika S. Khetani

Before we start: Align sequence reads to the reference genome

NGS Analysis Using Galaxy

DNA Subway Green Line Overview. Growth of Sequence Read Archive (SRA) 2.2 Quadrillion bases Log Scale!

An Introduction to RNA-Seq Transcriptome Profiling with iPlant

RNA-Seq Visualization

Introduction to RNA-Seq and Transcriptome Analysis

Expression Analysis of RNA-seq Data

Transcriptome analysis With a reference – Challenging due to size and complexity of datasets – Many tools available, driven by biomedical research – GATK.

BIF Group Project Group (A)rabidopsis: David Nieuwenhuijse Matthew Price Qianqian Zhang Thijs Slijkhuis Species: C. Elegans Project: Advanced.

Advanced ChIPseq Identification of consensus binding sites for the LEAFY transcription factor.

RNAseq analyses -- methods

Introduction to RNA-Seq & Transcriptome Analysis

Schedule change Day 2: AM - Introduction to RNA-Seq (and a touch of miRNA-Seq) Day 2: PM - RNA-Seq practical (Tophat + Cuffdiff pipeline on Galaxy) Day.

TopHat Mi-kyoung Seo. Today’s paper..TopHat Cole Trapnell at the University of Washington's Department of Genome Sciences Steven Salzberg Center.

Transcriptome Analysis

Next Generation Sequencing. Overview of RNA-seq experimental procedures. Wang L et al. Briefings in Functional Genomics 2010;9: © The Author.

RNA-Seq in Galaxy Igor Makunin QAAFI, Internal Workshop, April 17, 2015.

An Introduction to RNA-Seq Transcriptome Profiling with iPlant.

Introductory RNA-seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis.

Introduction to RNA-Seq

The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop RNA-Seq using the Discovery Environment And COGE.

Introduction To Next Generation Sequencing (NGS) Data Analysis

BIF Group Project Group (A)rabidopsis: David Nieuwenhuijse Matthew Price Qianqian Zhang Thijs Slijkhuis.

Introductory RNA-seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis.

RNA-Seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis is doing the.

Introduction to RNAseq

The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop RNA-Seq visualization with cummeRbund.

The iPlant Collaborative

The iPlant Collaborative

No reference available

RNA-Seq in Galaxy Igor Makunin DI/TRI, March 9, 2015.

Short read alignment BNFO 601. Short read alignment Input: –Reads: short DNA sequences (upto a few hundred base pairs (bp)) produced by a sequencing machine.

Canadian Bioinformatics Workshops

RNA-Seq visualization with CummeRbund

Canadian Bioinformatics Workshops

Canadian Bioinformatics Workshops

Canadian Bioinformatics Workshops

Canadian Bioinformatics Workshops

Overview of Genomics Workflows

Canadian Bioinformatics Workshops

RNA Seq Analysis Aaron Odell June 17 th Mapping Strategy A few questions you’ll want to ask about your data… - What organism is the data from? -

Introductory RNA-seq Transcriptome Profiling of the hy5 mutation in Arabidopsis thaliana.

Canadian Bioinformatics Workshops

Canadian Bioinformatics Workshops

Arrays How do they work ? What are they ?. WT Dwarf Transgenic Other species Arrays are inverted Northerns: Extract target RNA YFG Label probe + hybridise.

Transcriptomics History and practice.

Introductory RNA-seq Transcriptome Profiling

GCC Workshop 9 RNA-Seq with Galaxy

Cancer Genomics Core Lab

WS9: RNA-Seq Analysis with Galaxy (non-model organism )

Advanced ChIP-seq Identification of consensus binding sites for the LEAFY transcription factor Explain that you can use your own data Explain that data.

RNA-Seq visualization with CummeRbund

S1 Supporting information Bioinformatic workflow and quality of the metrics Number of slides: 10.

How to store and visualize RNA-seq data

Canadian Bioinformatics Workshops

Canadian Bioinformatics Workshops

Introductory RNA-Seq Transcriptome Profiling

Introduction To Next Generation Sequencing (NGS) Data Analysis

Transcriptomics History and practice.

Additional file 2: RNA-Seq data analysis pipeline

Transcriptomics – towards RNASeq – part III

RNA-Seq Data Analysis UND Genomics Core.

Presentation transcript:

An Introduction to RNA-Seq Transcriptome Profiling with iPlant (

Overview: This training module is designed to demonstrate a workflow in the iPlant Discovery Environment using RNA-Seq for transcriptome profiling. Question: How can we compare gene expression levels using RNA-Seq data in Arabidopsis WT and hy5 genetic backgrounds? RNA-seq in the Discovery Environment

Scientific Objective LONG HYPOCOTYL 5 (HY5) is a basic leucine zipper transcription factor (TF). Mutations cause aberrant phenotypes in Arabidopsis morphology, pigmentation and hormonal response. We will use RNA-seq to compare WT and hy5 to identify HY5-regulated genes. Source:

Samples Experimental data downloaded from the NCBI Short Read Archive (GEO:GSM and GEO:GSM613466) Two replicates each of RNA-seq runs for Wild- type and hy5 mutant seedlings.

RNA-Seq Conceptual Overview Image source:

RNA-seq Sample Read Statistics Genome alignments from TopHat were saved as BAM files, the binary version of SAM (samtools.sourceforge.net/). Reads retained by TopHat are shown below Sequence runWT-1WT-2hy5-1hy5-2 Reads10,866,70210,276,26813,410,01112,471,462 Seq. (Mbase)

RNA-Seq HWUSI-EAS455:3:1:1:1096 length=41 CAAGGCCCGGGAACGAATTCACCGCCGTATGGCTGACCGG C HWUSI-EAS455:3:1:2:1592 length=41 GAGGCGTTGACGGGAAAAGGGATATTAGCTCAGCTGAATCT + @SRR HWUSI-EAS455:3:1:2:869 length=41 TGCCAGTAGTCATATGCTTGTCTCAAAGATTAAGCCATGCA + HWUSI-EAS455:3:1:4:1075 length=41 CAGTAGTTGAGCTCCATGCGAAATAGACTAGTTGGTACCAC HWUSI-EAS455:3:1:5:238 length=41 AAAAGGGTAAAAGCTCGTTTGATTCTTATTTTCAGTACGAA + @SRR HWUSI-EAS455:3:1:5:1871 length=41 GTCATATGCTTGTCTCAAAGATTAAGCCATGCATGTGTAAG HWUSI-EAS455:3:1:5:1981 length=41 GAACAACAAAACCTATCCTTAACGGGATGGTACTCACTTTC + : …Now What?

@SRR HWUSI-EAS455:3:1:1:1096 length=41 CAAGGCCCGGGAACGAATTCACCGCCGTATGGCTGACCGG C HWUSI-EAS455:3:1:2:1592 length=41 GAGGCGTTGACGGGAAAAGGGATATTAGCTCAGCTGAATCT + @SRR HWUSI-EAS455:3:1:2:869 length=41 TGCCAGTAGTCATATGCTTGTCTCAAAGATTAAGCCATGCA + HWUSI-EAS455:3:1:4:1075 length=41 CAGTAGTTGAGCTCCATGCGAAATAGACTAGTTGGTACCAC HWUSI-EAS455:3:1:5:238 length=41 AAAAGGGTAAAAGCTCGTTTGATTCTTATTTTCAGTACGAA + @SRR HWUSI-EAS455:3:1:5:1871 length=41 GTCATATGCTTGTCTCAAAGATTAAGCCATGCATGTGTAAG HWUSI-EAS455:3:1:5:1981 length=41 GAACAACAAAACCTATCCTTAACGGGATGGTACTCACTTTC + : Bioinformatician

The Tuxedo Protocol

$ tophat -p 8 -G genes.gtf -o C1_R1_thout genome C1_R1_1.fq C1_R1_2.fq $ tophat -p 8 -G genes.gtf -o C1_R2_thout genome C1_R2_1.fq C1_R2_2.fq $ tophat -p 8 -G genes.gtf -o C1_R3_thout genome C1_R3_1.fq C1_R3_2.fq $ tophat -p 8 -G genes.gtf -o C2_R1_thout genome C2_R1_1.fq C1_R1_2.fq $ tophat -p 8 -G genes.gtf -o C2_R2_thout genome C2_R2_1.fq C1_R2_2.fq $ tophat -p 8 -G genes.gtf -o C2_R3_thout genome C2_R3_1.fq C1_R3_2.fq $ cufflinks -p 8 -o C1_R1_clout C1_R1_thout/accepted_hits.bam $ cufflinks -p 8 -o C1_R2_clout C1_R2_thout/accepted_hits.bam $ cufflinks -p 8 -o C1_R3_clout C1_R3_thout/accepted_hits.bam $ cufflinks -p 8 -o C2_R1_clout C2_R1_thout/accepted_hits.bam $ cufflinks -p 8 -o C2_R2_clout C2_R2_thout/accepted_hits.bam $ cufflinks -p 8 -o C2_R3_clout C2_R3_thout/accepted_hits.bam $ cuffmerge -g genes.gtf -s genome.fa -p 8 assemblies.txt $ cuffdiff -o diff_out -b genome.fa -p 8 –L C1,C2 -u merged_asm/merged.gtf \./C1_R1_thout/accepted_hits.bam,./C1_R2_thout/accepted_hits.bam,\./C1_R3_thout/accepted_hits.bam \./C2_R1_thout/accepted_hits.bam,\./C2_R3_thout/accepted_hits.bam,./C2_R2_thout/accepted_hits.bam Your RNA-Seq Data Your transformed RNA-Seq Data

RNA-Seq Analysis Workflow Tophat (bowtie) Cufflinks Cuffmerge Cuffdiff CummeRbund Your Data iPlant Data Store FASTQ Discovery Environment Atmosphere

The iPlant Discovery Environment

Staged Data

Tophat

TopHat in the Discovery Environment

TopHat TopHat is one of many applications for aligning short sequence reads to a reference genome. It uses the BOWTIE aligner internally. Other alternatives are BWA, MAQ, OLego, Stampy, Novoalign, etc.

Assembling the Transcripts

Cufflinks in the Discovery Environment

Merging the Transcriptomes

Cufffmerge in the Discovery Environment

Comparing wild-type to hy5 transcriptomes

Cuffdiff in the Discovery Environment

Cuffdiff Results

Differentially expressed genes Example filtered Cuffdiff results generated in the Discovery Environment.

Differentially expressed transcripts Example filtered Cuffdiff results generated in the Discovery Environment.