RNA Sequencing Day 7 Wooohoooo!

Slides:

Advertisements

Similar presentations

IMGS 2012 Bioinformatics Workshop: RNA Seq using Galaxy

Advertisements

Variant Calling Workshop Chris Fields Variant Calling Workshop v2 | Chris Fields1 Powerpoint by Casey Hanson.

RNA-seq data analysis Project

RNA-seq Analysis in Galaxy

Bacterial Genome Assembly | Victor Jongeneel Radhika S. Khetani

Before we start: Align sequence reads to the reference genome

NGS Analysis Using Galaxy

Introduction to RNA-Seq and Transcriptome Analysis

Li and Dewey BMC Bioinformatics 2011, 12:323

Tuning Tophat2 Belinda Giardine. Tophat2 Aligns reads from RNA to the genome Ribonucleic acid (RNA) is a ubiquitous family of large biological molecules.

Variant Calling Workshop Chris Fields Variant Calling Workshop | Chris Fields | PowerPoint by Casey Hanson.

Bioinformatics and OMICs Group Meeting REFERENCE GUIDED RNA SEQUENCING.

MES Genome Informatics I - Lecture V. Short Read Alignment

RNAseq analyses -- methods

Introduction to RNA-Seq & Transcriptome Analysis

RNA-Seq Analysis Simon V4.1.

Transcriptome Analysis

The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop RNA-Seq using the Discovery Environment And COGE.

RNA surveillance and degradation: the Yin Yang of RNA RNA Pol II AAAAAAAAAAA AAA production destruction RNA Ribosome.

Cloud Implementation of GT-FAR (Genome and Transcriptome-Free Analysis of RNA-Seq) University of Southern California.

IPlant Collaborative Discovery Environment RNA-seq Basic Analysis Log in with your iPlant ID; three orange icons.

Introductory RNA-seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis.

National Center for Supercomputing Applications University of Illinois at Urbana-Champaign Variant Calling Workshop.

Trinity College Dublin, The University of Dublin GE3M25: Data Analysis, Class 4 Karsten Hokamp, PhD Genetics TCD, 07/12/2015

IGV tools. Pipeline Download genome from Ensembl bacteria database Export the mapping reads file (SAM) Map reads to genome by CLC Using the mapping.

No reference available

Manuel Holtgrewe Algorithmic Bioinformatics, Department of Mathematics and Computer Science PMSB Project: RNA-Seq Read Simulation.

Computing on TSCC Make a folder for the class and move into it –mkdir –p /oasis/tscc/scratch/username/biom262_harismendy –cd /oasis/tscc/scratch/username/biom262_harismendy.

Introduction to Exome Analysis in Galaxy Carol Bult, Ph.D. Professor Deputy Director, JAX Cancer Center Short Course Bioinformatics Workshops 2014 Disclaimer…I.

Canadian Bioinformatics Workshops

Canadian Bioinformatics Workshops

RNA Seq Analysis Aaron Odell June 17 th Mapping Strategy A few questions you’ll want to ask about your data… - What organism is the data from? -

Introductory RNA-seq Transcriptome Profiling of the hy5 mutation in Arabidopsis thaliana.

Setting up visualization. Make output folder for visualization files Log into vieques $ ssh

Canadian Bioinformatics Workshops

Short Read Workshop Day 1 - Experimental Design Video 1- Why to short read sequence (or not)

Konstantin Okonechnikov Qualimap v2: advanced quality control of

Simon v RNA-Seq Analysis Simon v

Introductory RNA-seq Transcriptome Profiling

Using command line tools to process sequencing data

NGS File formats Raw data from various vendors => various formats

Day 5 Mapping and Visualization

GCC Workshop 9 RNA-Seq with Galaxy

Cancer Genomics Core Lab

WS9: RNA-Seq Analysis with Galaxy (non-model organism )

Stubbs Lab Bioinformatics – 5 Review tophat, alignment summary and htseq-count exercises: MDS plots and Differential expression We want to be able to.

Integrative Genomics Viewer (IGV)

RNA-Seq Software, Tools, and Workflows

Short Read Sequencing Analysis Workshop

S1 Supporting information Bioinformatic workflow and quality of the metrics Number of slides: 10.

How to store and visualize RNA-seq data

Canadian Bioinformatics Workshops

Canadian Bioinformatics Workshops

Introductory RNA-Seq Transcriptome Profiling

GE3M25: Data Analysis, Class 4

MiSeq Validation Pipeline

Kallisto: near-optimal RNA seq quantification tool

Next Gen. Sequencing Files and pysam

Learning to count: quantifying signal

Maximize read usage through mapping strategies

Next Gen. Sequencing Files and pysam

Next Gen. Sequencing Files and pysam

Additional file 2: RNA-Seq data analysis pipeline

Alignment of Next-Generation Sequencing Data

Computational Pipeline Strategies

Transcriptomics – towards RNASeq – part III

Introduction to RNA-Seq & Transcriptome Analysis

RNA-Seq Data Analysis UND Genomics Core.

Quality Control & Nascent Sequencing

Presentation transcript:

RNA Sequencing Day 7 Wooohoooo! Aaron Odell June 17th 2015

Outline For The Day Map raw fastq files to reference genome Convert mapped data to visualization ready files “Quantify” mapped data over reference genome genes/annotations Differential Expression

Data Set Mouse RNA Seq mm10 Reference Genome (.fasta file) PolyA Strand Specific Library. Ethanol treatment 2 x 100 Paired End Reads = 2 Fastq Files mm10 Reference Genome (.fasta file) Ensemble mm10 gene annotation file (.gtf) Due to time and compute restraints we will only be working with Chromosome 19 Pull up what a gtf looks like

What Mapping Program To Use?... Need program that is able to handle spliced reads… Tophat2, GSNAP, STAR, etc … We will be using Tophat2 https://ccb.jhu.edu/software/tophat/manual.shtml Excellent spliced alignment program evaluation http://www.nature.com/nmeth/journal/v10/n12/full/nmeth.2722.html

First things first Log onto vieques and load all modules we will need to run analysis successfully

module load commands module load python_2.7.3 module load samtools_0.1.16 module load numpy_1.6.1 module load pysam_0.8.1 module load bowtie_bowtie2-2.0.2 module load tophat_2.0.6 module load htseq_0.6.1

Lets begin… cp -r /projects/sreadgrp/RNA /Users/username/ Edit /Users/username/RNA/SCRIPTS/ILS_E_tophat_samtools.pbs in text editor Change all “odella” to your actual username Example /Users/odella/  /Users/your_username/ Change #PBS –M your_email_address to your actual email address (or if you don’t want the email then do nothing) From the /Users/username/RNA/SCRIPTS directory qsub ILS_E_1_tophat_samtools.pbs Have half the class copy. Then wait a minute or two. Then have to other half the class do the copy

While we wait… Edit /Users/username/RNA/SCRIPTS/ILS_E_htSeqCount.pbs Change all “odella” to your actual username htSeqCount manual http://www-huber.embl.de/users/anders/HTSeq/doc/count.html Once /Users/username/RNA/SCRIPTS/ILS_E_tophat_samtools.pbs has run to completion, qsub /Users/username/RNA/SCRIPTS/ILS_E_htSeqCount.pbs

For those of you who are bored… Read through the tophat and htseq manuals Take a look at various files created by our mapping/quantification pipeline. Tophat manual explains what all these files are and where they come from Run this command from /Users/username/RNA/ILS_E directory samtools flagstat ILS_E_sorted.bam Do you understand the output? Search the internet or ask someone if you don’t

Time to visualize! Copy bam and genome files to student directory cp /Users/username/RNA/ILS_E/ILS_E_sorted.bam /projects/sreadgrp/student/username/ILS_E_sorted.bam cp /Users/username/RNA/ILS_E/ILS_E_sorted.bam.bai /projects/sreadgrp/student/username/ILS_E_sorted.bam.bai cp /User/username/RNA/GENOME/chr19.fasta /projects/sreadgrp/student/username/chr19.fasta cp /User/username/RNA/GENOME/chr19.fasta.fai /projects/sreadgrp/student/username/chr19.fasta.fai cp /User/username/RNA/GENOME/chr19.gtf /projects/sreadgrp/student/username/chr19.gtf

Load the approriate files Load genome from file Chr19.fasta Load from file Chr19.gtf ILS_E_sorted.bam

Look at ILS_E_htSeq.txt Count the number of reads you see mapping in IGV over any given gene and compare to the htSeqCount value Does it make sense?