RNA Sequencing Day 7 Wooohoooo!

Slides:



Advertisements
Similar presentations
IMGS 2012 Bioinformatics Workshop: RNA Seq using Galaxy
Advertisements

Variant Calling Workshop Chris Fields Variant Calling Workshop v2 | Chris Fields1 Powerpoint by Casey Hanson.
RNA-seq data analysis Project
RNA-seq Analysis in Galaxy
Bacterial Genome Assembly | Victor Jongeneel Radhika S. Khetani
Before we start: Align sequence reads to the reference genome
NGS Analysis Using Galaxy
Introduction to RNA-Seq and Transcriptome Analysis
Li and Dewey BMC Bioinformatics 2011, 12:323
Tuning Tophat2 Belinda Giardine. Tophat2 Aligns reads from RNA to the genome Ribonucleic acid (RNA) is a ubiquitous family of large biological molecules.
Variant Calling Workshop Chris Fields Variant Calling Workshop | Chris Fields | PowerPoint by Casey Hanson.
Bioinformatics and OMICs Group Meeting REFERENCE GUIDED RNA SEQUENCING.
MES Genome Informatics I - Lecture V. Short Read Alignment
RNAseq analyses -- methods
Introduction to RNA-Seq & Transcriptome Analysis
RNA-Seq Analysis Simon V4.1.
Transcriptome Analysis
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop RNA-Seq using the Discovery Environment And COGE.
RNA surveillance and degradation: the Yin Yang of RNA RNA Pol II AAAAAAAAAAA AAA production destruction RNA Ribosome.
Cloud Implementation of GT-FAR (Genome and Transcriptome-Free Analysis of RNA-Seq) University of Southern California.
IPlant Collaborative Discovery Environment RNA-seq Basic Analysis Log in with your iPlant ID; three orange icons.
Introductory RNA-seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis.
National Center for Supercomputing Applications University of Illinois at Urbana-Champaign Variant Calling Workshop.
Trinity College Dublin, The University of Dublin GE3M25: Data Analysis, Class 4 Karsten Hokamp, PhD Genetics TCD, 07/12/2015
IGV tools. Pipeline Download genome from Ensembl bacteria database Export the mapping reads file (SAM) Map reads to genome by CLC Using the mapping.
No reference available
Manuel Holtgrewe Algorithmic Bioinformatics, Department of Mathematics and Computer Science PMSB Project: RNA-Seq Read Simulation.
Computing on TSCC Make a folder for the class and move into it –mkdir –p /oasis/tscc/scratch/username/biom262_harismendy –cd /oasis/tscc/scratch/username/biom262_harismendy.
Introduction to Exome Analysis in Galaxy Carol Bult, Ph.D. Professor Deputy Director, JAX Cancer Center Short Course Bioinformatics Workshops 2014 Disclaimer…I.
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
RNA Seq Analysis Aaron Odell June 17 th Mapping Strategy A few questions you’ll want to ask about your data… - What organism is the data from? -
Introductory RNA-seq Transcriptome Profiling of the hy5 mutation in Arabidopsis thaliana.
Setting up visualization. Make output folder for visualization files Log into vieques $ ssh
Canadian Bioinformatics Workshops
Short Read Workshop Day 1 - Experimental Design Video 1- Why to short read sequence (or not)
Konstantin Okonechnikov Qualimap v2: advanced quality control of
Simon v RNA-Seq Analysis Simon v
Introductory RNA-seq Transcriptome Profiling
Using command line tools to process sequencing data
NGS File formats Raw data from various vendors => various formats
Day 5 Mapping and Visualization
GCC Workshop 9 RNA-Seq with Galaxy
Cancer Genomics Core Lab
WS9: RNA-Seq Analysis with Galaxy (non-model organism )
Stubbs Lab Bioinformatics – 5 Review tophat, alignment summary and htseq-count exercises: MDS plots and Differential expression We want to be able to.
Integrative Genomics Viewer (IGV)
RNA-Seq Software, Tools, and Workflows
Short Read Sequencing Analysis Workshop
S1 Supporting information Bioinformatic workflow and quality of the metrics Number of slides: 10.
How to store and visualize RNA-seq data
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Introductory RNA-Seq Transcriptome Profiling
GE3M25: Data Analysis, Class 4
MiSeq Validation Pipeline
Kallisto: near-optimal RNA seq quantification tool
Next Gen. Sequencing Files and pysam
Learning to count: quantifying signal
Maximize read usage through mapping strategies
Next Gen. Sequencing Files and pysam
Next Gen. Sequencing Files and pysam
Additional file 2: RNA-Seq data analysis pipeline
Alignment of Next-Generation Sequencing Data
Computational Pipeline Strategies
Transcriptomics – towards RNASeq – part III
Introduction to RNA-Seq & Transcriptome Analysis

RNA-Seq Data Analysis UND Genomics Core.
Quality Control & Nascent Sequencing
Presentation transcript:

RNA Sequencing Day 7 Wooohoooo! Aaron Odell June 17th 2015

Outline For The Day Map raw fastq files to reference genome Convert mapped data to visualization ready files “Quantify” mapped data over reference genome genes/annotations Differential Expression

Data Set Mouse RNA Seq mm10 Reference Genome (.fasta file) PolyA Strand Specific Library. Ethanol treatment 2 x 100 Paired End Reads = 2 Fastq Files mm10 Reference Genome (.fasta file) Ensemble mm10 gene annotation file (.gtf) Due to time and compute restraints we will only be working with Chromosome 19 Pull up what a gtf looks like

What Mapping Program To Use?... Need program that is able to handle spliced reads… Tophat2, GSNAP, STAR, etc … We will be using Tophat2 https://ccb.jhu.edu/software/tophat/manual.shtml Excellent spliced alignment program evaluation http://www.nature.com/nmeth/journal/v10/n12/full/nmeth.2722.html

First things first Log onto vieques and load all modules we will need to run analysis successfully

module load commands module load python_2.7.3 module load samtools_0.1.16 module load numpy_1.6.1 module load pysam_0.8.1 module load bowtie_bowtie2-2.0.2 module load tophat_2.0.6 module load htseq_0.6.1

Lets begin… cp -r /projects/sreadgrp/RNA /Users/username/ Edit /Users/username/RNA/SCRIPTS/ILS_E_tophat_samtools.pbs in text editor Change all “odella” to your actual username Example /Users/odella/  /Users/your_username/ Change #PBS –M your_email_address to your actual email address (or if you don’t want the email then do nothing) From the /Users/username/RNA/SCRIPTS directory qsub ILS_E_1_tophat_samtools.pbs Have half the class copy. Then wait a minute or two. Then have to other half the class do the copy

While we wait… Edit /Users/username/RNA/SCRIPTS/ILS_E_htSeqCount.pbs Change all “odella” to your actual username htSeqCount manual http://www-huber.embl.de/users/anders/HTSeq/doc/count.html Once /Users/username/RNA/SCRIPTS/ILS_E_tophat_samtools.pbs has run to completion, qsub /Users/username/RNA/SCRIPTS/ILS_E_htSeqCount.pbs

For those of you who are bored… Read through the tophat and htseq manuals Take a look at various files created by our mapping/quantification pipeline. Tophat manual explains what all these files are and where they come from Run this command from /Users/username/RNA/ILS_E directory samtools flagstat ILS_E_sorted.bam Do you understand the output? Search the internet or ask someone if you don’t

Time to visualize! Copy bam and genome files to student directory cp /Users/username/RNA/ILS_E/ILS_E_sorted.bam /projects/sreadgrp/student/username/ILS_E_sorted.bam cp /Users/username/RNA/ILS_E/ILS_E_sorted.bam.bai /projects/sreadgrp/student/username/ILS_E_sorted.bam.bai cp /User/username/RNA/GENOME/chr19.fasta /projects/sreadgrp/student/username/chr19.fasta cp /User/username/RNA/GENOME/chr19.fasta.fai /projects/sreadgrp/student/username/chr19.fasta.fai cp /User/username/RNA/GENOME/chr19.gtf /projects/sreadgrp/student/username/chr19.gtf

Load the approriate files Load genome from file Chr19.fasta Load from file Chr19.gtf ILS_E_sorted.bam

Look at ILS_E_htSeq.txt Count the number of reads you see mapping in IGV over any given gene and compare to the htSeqCount value Does it make sense?