Presentation is loading. Please wait.

Presentation is loading. Please wait.

RNA-Seq in Galaxy Igor Makunin QAAFI, Internal Workshop, April 17, 2015.

Similar presentations


Presentation on theme: "RNA-Seq in Galaxy Igor Makunin QAAFI, Internal Workshop, April 17, 2015."— Presentation transcript:

1 RNA-Seq in Galaxy Igor Makunin i.makunin@uq.edu.au QAAFI, Internal Workshop, April 17, 2015

2 About Genomics Virtual Lab GVL site: https://genome.edu.au The main aim: facilitate the genomics research in Australia Galaxy: Tutorials and protocols (nextGen sequencing) Galaxy for tutorials: galaxy-tut.genome.edu.au Galaxy for full-scale analysis: galaxy-qld.genome.edu.au “roll your own” GVL platform on the Australian government funded computer infrastructure (NeCTAR cloud): - virtual computer cluster - Galaxy - IPython Notebook - RStudio Learn Use Get

3 Plan Our goals for today: 1. Analysis of differential gene expression using NGS data (RNA-Seq) The data are from RNA-Seq tutorial (Basic Galaxy tutorial)RNA-Seq tutorial 2. Workflows in Galaxy. Sites: Galaxy-tut: http://galaxy-tut.genome.edu.auhttp://galaxy-tut.genome.edu.au Galaxy-qld: http://galaxy-qld.genome.edu.auhttp://galaxy-qld.genome.edu.au Genomics Virtual Lab: https://genome.edu.auhttps://genome.edu.au GVL FAQ: https://genome.edu.au/wiki/GVL_FAQhttps://genome.edu.au/wiki/GVL_FAQ

4 RNA-Seq with Cufflinks package Basic GVL Galaxy tutorial based on Trapnell et al. (2012) Nature Protocols simple version without replicates Inputs: - two fastq files (reads from two conditions) - gene annotation in gtf format - bowtie genome indices (provided by Galaxy) Import the data (NGS reads in fastq format) Align to a reference genome (tophat) Find differentially expressed genes (Cuffdiff) Filter the results https://genome.edu.au/wiki/Learn The workflow:

5 Getting started and good user practice GVL Galaxy in Queensland: galaxy-qld.genome.edu.au Register with your UQ email and get a bigger disk allocation. Use ftp for big datasets – it is faster. Galaxy recognises.gz compression. Do not store unneeded datasets. Delete temporary files such as SAM. Purge deleted datasets. Do not start many big jobs in parallel (BWA, bowtie, bowtie2, tophat, tophat2, velvet, trinity). Create and use workflows for multi-step analysis. Specify the quality score encoding for nextGen sequencing data (FASTQ files).

6 FASTQ quality score in Galaxy Many old illumina datasets have a proprietary data encoding (offset 64) Currently most NGS datasets use Sanger encoding (offset 33) Galaxy By default Galaxy assign ‘fastq’ data type to uploaded FASTQ files. In this case the offset is not specified, and many tools do not recognize the data fastqillumina – old illumina quality score encoding (offset 64, illumina 1.3+) fastqsanger – new illumina 1.8+ / Sanger quality score encoding Nearly all modern NGS data use Sanger encoding (fastqsanger in Galaxy) Solution: -specify a proper format, eg fastqsanger or fastqillumina, during the data upload -change the format via Attributes > Datatype

7 Thank you! GVL site: www.genome.edu.auwww.genome.edu.au Galaxy for tutorials: galaxy-tut.genome.edu.augalaxy-tut.genome.edu.au Galaxy Queensland: galaxy-qld.genome.edu.augalaxy-qld.genome.edu.au Contributors and participants:

8 FASTQ quality score encoding @SRR391845.1639 ILLUMINA-96BC32_0028_FC:3:1:8035:1092/1 TAGCAGCACATCATGGTTTACATCGTATGCCGTCTT + IIHIDIIIIIIIIIIIIIHIHIIIIIDGIBGGGGGG Qual. = 39 Offset = 33 ASCII(72): H


Download ppt "RNA-Seq in Galaxy Igor Makunin QAAFI, Internal Workshop, April 17, 2015."

Similar presentations


Ads by Google