Download presentation
Presentation is loading. Please wait.
Published byFelix Leonard Modified over 9 years ago
1
RNA-Seq in Galaxy Igor Makunin i.makunin@uq.edu.au DI/TRI, March 9, 2015
2
Genomics Virtual Lab GVL site: https://genome.edu.au The main aim: facilitate the genomics research in Australia Galaxy: Tutorials and protocols (nextGen sequencing) Galaxy for tutorials: galaxy-tut.genome.edu.au Galaxy for full-scale analysis: galaxy-qld.genome.edu.au “roll your own” GVL platform on the Australian government funded computer infrastructure (NeCTAR cloud): - virtual computer cluster - Galaxy - IPython Notebook - RStudio Mirror of UCSC Genome Browser RStudio Learn Use Get
3
Plan Our goals for today: Introduction to Galaxy platform -FASTQ quality score encoding in Galaxy Analysis of differential gene expression using nextGen sequencing data Workflows in Galaxy Sites: Galaxy-tut: http://galaxy-tut.genome.edu.auhttp://galaxy-tut.genome.edu.au Galaxy-qld: http://galaxy-qld.genome.edu.auhttp://galaxy-qld.genome.edu.au Genomics Virtual Lab: https://genome.edu.auhttps://genome.edu.au All GVL resources are public
4
Galaxy: how does it look like Tools Working window Data
5
Good user practice for Galaxy-qld GVL Galaxy in Queensland: galaxy-qld.genome.edu.au Register with your UQ email and get a bigger disk allocation. Use ftp for big datasets – it is faster. Galaxy recognises.gz compression. Do not store unneeded datasets. Delete temporary files such as SAM. Purge deleted datasets. Do not start many big jobs in parallel (BWA, bowtie, bowtie2, tophat, tophat2, velvet, trinity). Create and use workflows for multi-step analysis. Specify the quality score encoding for nextGen sequencing data (FASTQ files).
6
FASTQ quality score encoding @SRR391845.1639 ILLUMINA-96BC32_0028_FC:3:1:8035:1092/1 TAGCAGCACATCATGGTTTACATCGTATGCCGTCTT + IIHIDIIIIIIIIIIIIIHIHIIIIIDGIBGGGGGG Qual. = 39 Offset = 33 ASCII(72): H
7
FASTQ quality score in Galaxy Many old illumina datasets have a proprietary data encoding (offset 64) Currently most NGS datasets use Sanger encoding (offset 33) Galaxy By default Galaxy assign ‘fastq’ data type to uploaded FASTQ files. In this case the offset is not specified, and many tools do not recognize the data fastqillumina – old illumina quality score encoding (offset 64) fastqsanger – new illumina / Sanger quality score encoding Nearly all modern NGS data use Sanger encoding (fastqsanger in Galaxy) Solution: -specify a proper format, eg fastqsanger or fastqillumina, during the data upload -change the format via Attributes > Datatype
8
Differential gene expression Basic GVL Galaxy tutorial based on Trapnell et al. (2012) Nature Protocols. Import data Align to a reference genome (tophat) Find differentially expressed genes (Cuffdiff) https://genome.edu.au/wiki/Learn mRNA Library Reads Number of reads correlates with gene expression level.
10
Thank you! GVL site: www.genome.edu.auwww.genome.edu.au Galaxy for tutorials: galaxy-tut.genome.edu.augalaxy-tut.genome.edu.au Galaxy Queensland: galaxy-qld.genome.edu.augalaxy-qld.genome.edu.au Contributors and participants:
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.