NGS Analysis Using Galaxy

Slides:



Advertisements
Similar presentations
Downloading a multiple alignment for your region of interest from the UCSC Genome Browser ( that can be uploaded in ConTra for.
Advertisements

KompoZer. This is what KompoZer will look like with a blank document open. As you can see, there are a lot of icons for beginning users. But don't be.
Copyright OpenHelix. No use or reproduction without express written consent1.
Copyright OpenHelix. No use or reproduction without express written consent1.
1 of 4 This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS DOCUMENT. © 2007 Microsoft Corporation.
PowerPoint: Tables Computer Information Technology Section 5-11 Some text and examples used with permission from: Note: We are.
Working with SharePoint Document Libraries. What are document libraries? Document libraries are collections of files that you can share with team members.
NGS Analysis Using Galaxy
Copyright OpenHelix. No use or reproduction without express written consent1.
Genomics Virtual Lab: analyze your data with a mouse click Igor Makunin School of Agriculture and Food Sciences, UQ, April 8, 2015.
Copyright OpenHelix. No use or reproduction without express written consent1.
Copyright OpenHelix. No use or reproduction without express written consent1.
Copyright OpenHelix. No use or reproduction without express written consent1.
Fission Yeast Computing Workshop -1- Searching, querying, browsing downloading and analysing data using PomBase Basic PomBase Features Gene Page Overview.
Fall 2005 Using FrontPage to Enhance Blackboard - Darek Sady1 Using FrontPage to Enhance Blackboard 1.Introduction 2.Starting FrontPage 3.Creating Documents.
Copyright OpenHelix. No use or reproduction without express written consent 2 Overview of Genome Browsers Materials prepared by Warren C. Lathe, Ph.D.
Copyright OpenHelix. No use or reproduction without express written consent1.
Galaxy for Bioinformatics Analysis An Introduction TCD Bioinformatics Support Team Fiona Roche, PhD Date: 31/08/15.
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics Lab v1 | Saurabh Sinha1 Powerpoint by Casey Hanson.
Copyright OpenHelix. No use or reproduction without express written consent1.
NGS data analysis CCM Seminar series Michael Liang:
Introduction of Geoprocessing Topic 7a 4/10/2007.
Copyright OpenHelix. No use or reproduction without express written consent1.
Go to your school’s web locker site school name.schoolweblockers.com) Your user name is the first letter of your first name, the first 4.
Copyright OpenHelix. No use or reproduction without express written consent1.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
Chip – Seq Peak Calling in Galaxy Lisa Stubbs Chip-Seq Peak Calling in Galaxy | Lisa Stubbs | PowerPoint by Casey Hanson.
Microsoft ® Office Excel 2003 Training Using XML in Excel SynAppSys Educational Services presents:
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
IPlant Collaborative Discovery Environment RNA-seq Basic Analysis Log in with your iPlant ID; three orange icons.
The UCSC Table Browser & Custom Tracks Advanced searching and discovery using the UCSC Table Browser and Custom Tracks Osvaldo Graña CNIO Bioinformatics.
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics | Saurabh Sinha | PowerPoint by Casey Hanson.
Copyright OpenHelix. No use or reproduction without express written consent1.
Copyright OpenHelix. No use or reproduction without express written consent1.
GVS: Genome Variation Server Materials prepared by: Warren C. Lathe, PhD Updated: Q Version 2.
Copyright OpenHelix. No use or reproduction without express written consent1.
Copyright OpenHelix. No use or reproduction without express written consent1.
Copyright OpenHelix. No use or reproduction without express written consent1.
Introduction of Geoprocessing Lecture 9 3/24/2008.
The Genome Genome Browser Training Materials developed by: Warren C. Lathe, Ph.D. and Mary Mangan, Ph.D. Part 2.
Chip – Seq Peak Calling in Galaxy Lisa Stubbs Lisa Stubbs | Chip-Seq Peak Calling in Galaxy1.
CyVerse Workshop Discovery Environment Overview. Welcome to the Discovery Environment A Simple Interface to Hundreds of Bioinformatics Apps, Powerful.
Galaxy for analyzing genome data Hardison October 05, 2010
Getting an account with WordPress.com
5 In the Survey Options section, click an option to determine whether users' names will appear in survey results, and then whether users can respond to.
Dreamweaver – Setting up a Site and Page Layouts
5 In the Survey Options section, click an option to determine whether users' names will appear in survey results, and then whether users can respond to.
CyVerse Discovery Environment
Regulatory Genomics Lab
Chip – Seq Peak Calling in Galaxy
Adding a File to a Course
5 In the Survey Options section, click an option to determine whether users' names will appear in survey results, and then whether users can respond to.
Adding Assignments and Learning Units to Your TSS Course
OverDrive Digital Library Basics
Collaboration with Google Docs
An Introduction to Using
OverDrive Digital Library Basics
Microsoft Word Reviewing Documents.
Yonglan Zheng Galaxy Hands-on Demo Step-by-step Yonglan Zheng
Microsoft Office Access 2003
Microsoft Office Access 2003
Qualtrics Survey Kenyon
Introduction to Database Programs
Tutorial 7 – Integrating Access With the Web and With Other Programs
Regulatory Genomics Lab
Introduction to Database Programs
SRI Bioinformatics Research Group
Introduction to RNA-Seq & Transcriptome Analysis
Regulatory Genomics Lab
Chip – Seq Peak Calling in Galaxy
Presentation transcript:

NGS Analysis Using Galaxy Galaxy is a genomics analysis platform that allows researchers to obtain data from various databases such as the UCSC Genome Browser, ENCODE data and many other sources, prepare and manipulate the data, and perform various analyses on the data in ways that might not be possible at the original sites. Galaxy also includes a workflow tool that allows the user to save and share various customized and useful analysis steps.

NGS Analysis Using Galaxy Sequences and Alignment Format Galaxy overview and Interface Getting Data in Galaxy Analyzing Data in Galaxy Quality Control Mapping Data History and workflow Galaxy Exercises

FASTQ Format FASTQ format FASTQ variants The range of quality scores in a FASTQ file will depend on the technology and the base caller used. If the quality scores contain characters in the range ASCII 33 - 58 -> can only be Sanger If FastQ file is known to be from an Illumina/Solexa platform AND the quality scores contain characters in the range ASCII 59 - 63 -> can only be Solexa/Illumina 1.0 If ASCII characters 64 or 65 are used in quality scores -> cannot be Illumina 1.5+

SAM Format There are many short-read aligners... Most aligners use their own format to output the alignments. Hence, downstream tools can not be exchanged between aligners. To resolve this issue, Li et al. have suggested a standardized file format: the Sequence Alignment/Map (SAM) format Is flexible enough to store all the alignment information generated by various alignment programs; Is simple enough to be easily generated by alignment programs or converted from existing alignment formats; For more details: http://samtools.sourceforge.net/SAM1.pdf

NGS Analysis Using Galaxy Sequences and Alignment Format Galaxy overview and Interface Getting Data in Galaxy Analyzing Data in Galaxy Quality Control Mapping Data History and workflow Galaxy Exercises

Galaxy Start http://galaxy.psu.edu/ Galaxy Site If you go to the URL shown here, you will find a short introduction to Galaxy. We’ll start at the bottom of the page, since this is where we will learn what Galaxy is. In the red section at the bottom, you will learn the rationale for Galaxy. Galaxy is not a database, but rather it is an analysis tool. The amount of data and number of databases is increasing exponentially, but the tools available to analyze this data are small in number and have a difficult time keeping up with new data types. Galaxy aims to provide an analysis resource that allows the user to pull data from many different databases and analyze that data. It also aims to give the user a way to reproduce and share that analysis. Galaxy integrates many databases and analysis tools and is continually growing and evolving. The Galaxy tool was created for both bench scientists to have a relatively simple tool to use to do data analysis, and for the bioinformatics developer to integrate tools and analyses for the user. This tutorial is aimed at that first group of users, the biologists who wish to perform data analysis, but developers might find it useful. We also suggest that developers view the FAQs and screencasts for more information on how to integrate your tools into the Galaxy framework. This integration of databases, tools and workflows allows for the ability to collaborate between scientists and between scientists and developers. This also allows researchers to share their analysis and for other researcher to reproduce their analysis where otherwise they might not have. The website is available for anyone to use right now if you click the link to go to the public site.

Galaxy Conceptual Framework Obtain data from many data sources including the UCSC Table Browser, BioMart, WormBase, or your own data. Prepare data for further analysis by rearranging or cutting data columns, filtering data and many other actions. Analyze data by finding overlapping regions, determining statistics, phylogenetic analysis and much more Galaxy is a research tool that is of great use to the researcher and the developer. Through the Galaxy tool, the researcher can obtain data from many different sources and databases, prepare the data for further analysis and analyze the data using many different included tools or analysis tools that can be added by developers. The data types you start with will vary based on your research interests. The possible preparations and manipulations you can choose will be customized for your needs. The analyses you can perform will be nearly endless. But the basic conceptual framework is: obtain data, prepare data, and analyze the data. In this tutorial we will examine these basic concepts. But you should not be limited to these examples. You should be able to ask amazing and complex questions of the data using the Galaxy framework.

Galaxy Interface Sections User Register contains links to the downloading, preparation and analysis tools. show you the history of your analysis steps, allow you view data and results, and more. The center column is where the menus and data will appear The interface to the Galaxy straightforward. There are three sections. The left column contains links to the downloading, preparation and analysis tools. The center column is where the menus and data will appear. The right hand column will show you the history of your analysis steps, allow you view data and results, and more. Let’s do a really quick sample step of obtaining data to give you an idea of how the interface works. In the next section we’ll be explaining more about getting data and the types of data you can obtain, but for now we’ll be doing this just to give you an example. Click the link that says “Get Data.”

NGS Analysis Using Galaxy Sequences and Alignment Format Galaxy overview and Interface Getting Data in Galaxy Analyzing Data in Galaxy Quality Control Mapping Data History and workflow Galaxy Exercises

Getting Data Click Get Data On the Galaxy interface, the section in the left column includes all the links for getting data into Galaxy, preparing the data and analyzing it. “Get Data” has many venues for you to add all types of data to Galaxy in order to start your analysis.

Getting Data: Table Browser Get Table Main Right now, let’s upload some data from the UCSC Table Browser database. The UCSC Table Browser includes genomic data from dozens of species. It allows the user to customize a search with powerful filters and intersections of data to obtain exactly the data the user needs. We’ll just go through some basics of getting data, but if you haven’t used the Table Browser before, we suggest you view our tutorial on the UCSC Table Browser to get acquainted with this database search tool.

Getting Data: UCSC Table Browser clade: Mammal genome: Human assmbly: Mar. 2006 group: Genes and… track: UCSC Genes table: knownGene region: position, chrX Output format: BED, and check Send output to Galaxy The Table Browser Interface will now appear. We are going to get known gene data from the human X chromosome. To do so, we’ll just choose the right data set: Human genome assembly “Mar. 2006” and the “UCSC Genes” dataset from the “Genes and Gene Prediction Tracks” group. We’ll put the position “chrX.” Leaving all the rest as default, including the output format as BED and the “Send output to Galaxy” checkbox checked, we’ll click the link “Get Output.” Get Output

Getting Data: Upload File Upload or paste file File Format Upload File Species Execute You can also upload your own data. To upload your file, you’ll either paste the contents of your file into the window here, or click the browse button here and find the file on your computer. Here I’ve found and uploaded a file from my desktop. You’ll then choose which file format the file is in. In this case it is in the interval format, but as you see there are many other possible formats. You can also choose “Auto-detect”, which works pretty well to determine the format of a data file if you are not quite sure of the format, or even if you just don’t want to look for the format type on the list. Once you’ve chosen the file format, you choose which species the data has been obtained from. There are many to choose from. Ours is from Homo sapiens, so that is what we will choose. Then click “Execute.”

Getting Data: Upload File Specify multiple URLs into the "URL / Text" box

NGS Analysis Using Galaxy Sequences and Alignment Format Galaxy overview and Interface Getting Data in Galaxy Analyzing Data in Galaxy Quality Control Mapping Data History and workflow Galaxy Exercises

Analyzing Data: Next Generation Sequencing The Next Generation Sequencing (or NGS) Toolbox, which is in beta at the time of development of this tutorial, offers lots of new tools. We’ve started from the beginning with a new history here. Let’s open the “QC and manipulation tools.” You’ll notice there are many tools here to choose from. For example, we could choose to draw a nucleotides distribution chart from a statistics file we might have uploaded. If you have NGS data to analyze, you may want to explore these a bit closer.

Analyzing Data: Next Generation Sequencing FASTQ file manipulation, like format conversation, summary statistics, trimming reads, filtering reads by quality score…

Analyzing Data: Next Generation Sequencing Input: sanger FASTQ Output: SAM format

Analyzing Data: Next Generation Sequencing After alignment , there are many downstream analysis Galaxy can support. In this workshop, we currently only cover how to convert SAM file to BAM file. We will include more tools introduction in future workshops.

NGS Analysis Using Galaxy Sequences and Alignment Format Galaxy overview and Interface Getting Data in Galaxy Analyzing Data in Galaxy Quality Control Mapping Data History and workflow Galaxy Exercises

History: History Options List saved histories and shared histories. Work on Current History, create new, clone, share, create workflow, set permissions, show deleted datasets or delete history. List saved histories There are a lot of options that make histories very helpful. List all the analysis histories you’ve done, create a new empty history to start a new analysis, make a history into a workflow, share your history with other users and change the permissions and more. You can show deleted data within the current history or delete the current history. Let’s list all saved histories by clicking the “Saved Histories” link. Copyright OpenHelix. No use or reproduction without express written consent

Workflow Creates a workflow, allows user to repeat analysis using different datasets.

NGS Analysis Using Galaxy Sequences and Alignment Format Galaxy overview and Interface Getting Data in Galaxy Analyzing Data in Galaxy Quality Control Mapping Data History and workflow Galaxy Exercises