Download presentation
Presentation is loading. Please wait.
1
RNA Sequencing Day 7 Wooohoooo!
Aaron Odell June 17th 2015
2
Outline For The Day Map raw fastq files to reference genome
Convert mapped data to visualization ready files “Quantify” mapped data over reference genome genes/annotations Differential Expression
3
Data Set Mouse RNA Seq mm10 Reference Genome (.fasta file)
PolyA Strand Specific Library. Ethanol treatment 2 x 100 Paired End Reads = 2 Fastq Files mm10 Reference Genome (.fasta file) Ensemble mm10 gene annotation file (.gtf) Due to time and compute restraints we will only be working with Chromosome 19 Pull up what a gtf looks like
4
What Mapping Program To Use?...
Need program that is able to handle spliced reads… Tophat2, GSNAP, STAR, etc … We will be using Tophat2 Excellent spliced alignment program evaluation
5
First things first Log onto vieques and load all modules we will need to run analysis successfully
6
module load commands module load python_2.7.3
module load samtools_0.1.16 module load numpy_1.6.1 module load pysam_0.8.1 module load bowtie_bowtie module load tophat_2.0.6 module load htseq_0.6.1
7
Lets begin… cp -r /projects/sreadgrp/RNA /Users/username/ Edit /Users/username/RNA/SCRIPTS/ILS_E_tophat_samtools.pbs in text editor Change all “odella” to your actual username Example /Users/odella/ /Users/your_username/ Change #PBS –M your_ _address to your actual address (or if you don’t want the then do nothing) From the /Users/username/RNA/SCRIPTS directory qsub ILS_E_1_tophat_samtools.pbs Have half the class copy. Then wait a minute or two. Then have to other half the class do the copy
8
While we wait… Edit /Users/username/RNA/SCRIPTS/ILS_E_htSeqCount.pbs
Change all “odella” to your actual username htSeqCount manual Once /Users/username/RNA/SCRIPTS/ILS_E_tophat_samtools.pbs has run to completion, qsub /Users/username/RNA/SCRIPTS/ILS_E_htSeqCount.pbs
9
For those of you who are bored…
Read through the tophat and htseq manuals Take a look at various files created by our mapping/quantification pipeline. Tophat manual explains what all these files are and where they come from Run this command from /Users/username/RNA/ILS_E directory samtools flagstat ILS_E_sorted.bam Do you understand the output? Search the internet or ask someone if you don’t
10
Time to visualize! Copy bam and genome files to student directory
cp /Users/username/RNA/ILS_E/ILS_E_sorted.bam /projects/sreadgrp/student/username/ILS_E_sorted.bam cp /Users/username/RNA/ILS_E/ILS_E_sorted.bam.bai /projects/sreadgrp/student/username/ILS_E_sorted.bam.bai cp /User/username/RNA/GENOME/chr19.fasta /projects/sreadgrp/student/username/chr19.fasta cp /User/username/RNA/GENOME/chr19.fasta.fai /projects/sreadgrp/student/username/chr19.fasta.fai cp /User/username/RNA/GENOME/chr19.gtf /projects/sreadgrp/student/username/chr19.gtf
11
Load the approriate files
Load genome from file Chr19.fasta Load from file Chr19.gtf ILS_E_sorted.bam
12
Look at ILS_E_htSeq.txt
Count the number of reads you see mapping in IGV over any given gene and compare to the htSeqCount value Does it make sense?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.