Genome Sequence Annotation Server

Slides:



Advertisements
Similar presentations
AIMSweb Benchmark Online Training For AIMSweb Teacher Users
Advertisements

Submitting a Genome to RAST. Uploading Your Job 1.Login to your RAST account. You will need to register if this is your first time using SEED technologies.
A Producer’s Guide to Chubb’s SMART Application Platform
AIMSweb Progress Monitor Online User Training
Web Apollo Resources at the National Agricultural Library Christopher Childers NAL ARS USDA i5k.nal.usda.gov.
Genome Annotation BCB 660 October 20, From Carson Holt.
Review of last session The Weebly Dashboard The Weebly Dashboard Controls your account and your sites Controls your account and your sites From here you.
GenSAS: Genome Sequence Annotation Server, a Tool for Online Annotation and Curation Dorrie Main, Taein Lee, Ping Zheng, Sook Jung, Stephen P. Ficklin,
Working with the Conifer_dbMagic database: A short tutorial on mining conifer assembly data. This tutorial is designed to be used in a “follow along” fashion.
Jodi Humann, Stephen Ficklin, Taein Lee, Chun-Huai Cheng, Jill Wegrzyn, David Neale and Dorrie Main A web-based platform for genome annotation GenSAS Poster.
USING REFWORKS Fall What is RefWorks? A web-based bibliographic and database manager Creighton University faculty, students, and staff have access.
Learning.com for New Users. This presentation will help educators… Login to Edit your Learning.com educator account Access resources.
UMR ASP UMR ASP Structural & Comparative Genomics in Bread Wheat TriAnnotPipeline A LifeGrid Project based on AUVERGRID F. Giacomoni, M.
Welcome to DNA Subway Classroom-friendly Bioinformatics.
Browsing the Genome Using Genome Browsers to Visualize and Mine Data.
Jodi Humann, Stephen Ficklin, Taein Lee, Chun-Huai Cheng, Sook Jung, Jill Wegrzyn, David Neale and Dorrie Main An easy to use, web-based solution for specialty.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
IPlant Collaborative Discovery Environment RNA-seq Basic Analysis Log in with your iPlant ID; three orange icons.
Web Apollo Resources at the National Agricultural Library Christopher Childers NAL ARS USDA i5k.nal.usda.gov.
Using PCM Virtual Class By: Mr.Shesha Kanta Pangeni, PhD scholar, KU Mr.Padam Raj Pant Consultant/Expert in ICT in Education Contact:
Copyright OpenHelix. No use or reproduction without express written consent1.
Tools in Bioinformatics Genome Browsers. Retrieving genomic information Previous lesson(s): annotation-based perspective of search/data Today: genomic-based.
The Bovine Genome Database Abstract The Bovine Genome Database (BGD, facilitates the integration of bovine genomic data. BGD is.
How to complete and submit a Final Report through Mobility Tool+ Technical guidelines Authentication, Completion and Submission 1 Antonia Gogaki IT Officer.
Emdeon Office Batch Management Services This document provides detailed information on Batch Import Services and other Batch features.
How to complete and submit a Final Report through
Training Guide for Residents
General System Navigation
Annotating The data.
T3/Tutorials: Data Submission
A step-by-Step Guide For labels or merges
Using PCM Virtual Class
Student SOLE Page – Living Page
Genome Sequence Annotation Server
Basic User Site Access Training & Producing Reports
SUBMITTING A PAYMENT REQUEST FORM
Materials Engineering Product Data Management (ePDM)
Boeing Supply Chain Platform (BSCP) Detailed Training
Central Document Library Quick Reference User Guide View User Guide
This presentation is designed to give you an overview of the features in your improved LLN Robot System and the functions each feature performs. For more.
Online Tools Guide to Security Products International Website.
The Smarter Balanced Assessment Consortium
How To UPLOAD Policies.
StudentWeb Orientation
This tutorial is designed to be used in a “follow along” fashion
How to Create and Start a Test Session
Cuong Nguyen, Deng Xin, Dongmei, Zheng Wang
Ensembl Genome Repository.
The Grants.gov Online Grant Submission Portal November 8, 2017
Taking an Online Benchmark Test in Schoolnet with TestNav
for the Cotton Community
Updates and Future Direction
Explore Evolution: Instrument for Analysis
Genome Database for Rosaceae:
StudentWeb Orientation
A web-based platform for structural and functional annotation of model and non-model organisms Jodi Humann, Taein Lee, Stephen Ficklin,
Yating Liu July 2018 G-OnRamp workshop
What is StudentWeb? In StudentWeb you can access:
Inside a PMI Online Course
What is StudentWeb? In StudentWeb you can access:
Follow-up from last night: XSEDE credits
How to Effectively Search and Download Data in CottonGen
CottonGen: Enabling Cotton Research through Big-Data Analysis and Integration Jing Yu, Sook Jung, Chun-Huai Cheng, Taein Lee, Katheryn Buble, Ping Zheng,
Welcome - webinar instructions
Resources for HLB and Citrus Genomics, Genetics and Breeding Research
What is StudentWeb? In StudentWeb you can access:
What is StudentWeb? In StudentWeb you can access:
What is StudentWeb? In StudentWeb you can access:
Presentation transcript:

Genome Sequence Annotation Server GenSAS v5.1: a web-based platform for structural and functional genome annotation and curation of genomes Jodi L. Humann, Taein Lee, Stephen P. Ficklin, Chun-Huai Cheng, Heidi Hough, Sook Jung, Jill Wegrzyn, David Neale, Dorrie Main jhumann@wsu.edu

What is DNA annotation and why do it? Getting the DNA sequence is only the first step Need to know the biological relevance of the DNA sequence Annotated sequence can be used to find putative genes of interest for study

What scientists want Current annotation tools: Many tools available, but run independently of each other Most of the tools are run via the command line and require server access Scientists want a platform that: Is a single location for DNA annotation Does not require management of computing equipment and software tools Is easy to use and can be adapted to a variety of DNA sequences

What is GenSAS? A single website that combines numerous annotation tools into one interface User accounts keep data private and secure as well as allow users to share data for collaborative annotation Easy-to-use interfaces, with integrated instructions allow researchers at all skill levels to annotate DNA

GenSAS annotation process Upload Sequences Create Project Upload Evidence Identify Repeats RepeatMasker, RepeatModeler Mask Sequences Align Transcripts BLAST, BLAT, PASA TopHat Structural Annotation Augustus, GeneMarkES, Genscan, GlimmerM, SNAP Choose Official Gene Set EvidenceModeler Refine Gene Models PASA Functional Annotation BLAST, InterProScan, Pfam, SignalP, TargetP Manual Curation Apollo, JBrowse Generate Files for Publication Flowchart starts in upper left corner, goes down for first column, up for second column and then down again for 3rd column. Last step is in lower right corner

www.gensas.org The animation is triggered by click. 3 different animations to help focus on what is being said Circle 1: Around block on right with links to help for GenSAS. Links to User’s Guide and other info along with a “Contact Us” Circle 2: Around request new acct link in login area. Users request accounts, account requests are reviewed (not automatic) and then an email is sent when account is approved. Circle 3: Around “Use GenSAS” tab, this is where you click to access the GenSAS interface after logging in.

The animation is triggered by click The animation is triggered by click. 3 different animations to help focus on what is being said Rectangle 1: Around header. Flowchart of annotation process and used to navigate to different sections. Project name in upper left, user name and links to account info in upper right. Rectangle 2: Around accordion menu on right. 4 sections: to monitor job progress (Job Queue), open Apollo/JBrowse (Browser), share projects with other GenSAS users (Sharing), and links to the User’s Guide (Help). Rectangle 3: Around Tab section. This is the primary area users interact with GenSAS. Each step opens a different tab. GenSAS welcome tab provides users with a quick overview of what each of the three screen sections do.

GenSAS account limits GenSAS user accounts will remain active as long as users have an active GenSAS project GenSAS users are limited to a total of 250 GB of storage space on GenSAS server Assembly files must be high quality (<25,000 sequences, over 50% of sequences longer than 2,500 bases) Users can only have seven jobs running at one time, but other jobs can be waiting in queue

Sequence Tab: View available sequences Upload single sequence or multi-sequence FASTA file Create sequence subset based on sequence names or minimum size Project Tab: Open existing project or shared project Create new project Edit project info and reset expiration Sequence tab: -upload single or multi-sequence fasta files -multi-sequence fasta files can be used to create subset sequence, filter by minimum size or by contig names Project tab: -web form used to create new project. Some fields are required, others are optional -users can also open previous projects and projects shared with them under project tab

GFF3 & Evidence Tabs (optional): EST, mRNA sequences Repeat motifs Protein sequences NCBI gene structures Pre-processed Illumina RNA- Seq reads All tabs have an Instructions section that can be opened and collapsed GFF3 tab: -users can upload GFF3 files generated by other annotation tools, or the previous annotation Evidence tab: -fasta files of species-specific repeat, protein, and transcript evidence -pre-processed Illumina RNA-seq reads for use with TopHat -NCBI gene structure files (for training Augustus) The more organism specific data you have, the better the annotation will be

RepeatMasker- Evidence based repeat finder Repeats Tab: RepeatMasker- Evidence based repeat finder RepeatModeler- De novo repeat finder Masking Tab: Check results in JBrowse and choose which set(s) to use to make masked consensus Repeats tab: -RepeatMasker (evidence based, GenSAS provide Repbase repeat libraries or users use fasta file of repeats they loaded) -RepeatModeler (de novo) -These steps are only available to eukaryotes -Tools can be run multiple times with different settings as long as user provides a different job name Masking: -Users have option to choose a single repeatmasking job result to use, to create a merged consensus between repeatmasking jobs, or to use an unmasked sequence

Job status can be monitored through Job Queue Progress through GenSAS is automatically saved Users can log off GenSAS and jobs will continue running While jobs are running, users can look at the completed results in Apollo/JBrowse Once the project has results, users can share the project with other GenSAS users for collaborative annotation Job Queue….same points as on the slide

Look at the results of jobs, before moving to next step! This slide is to emphasize looking at the results—with animated boxes. Click 1- Box around job name in job queue, and a few seconds later a box around the RepeatMasker results. User click on job name in queue to open the results tab. In the results tab, there are the raw files generated by the tool (which can be downloaded) and a summary table of number of predicted features. The summary tables are a good way to see if the job really ran, but not the best way to look at the results. Click 2- Highlights “Browser” section of accordion menu (which leads in to next slide)

Look at the results of jobs, before moving to next step! In the “Browser” section of side menu, click on “Open Apollo” to open Apollo tab Turn on the tracks to view them and make sure the data makes sense before using the data in downstream steps of the annotation

Align RNA-Seq data for training the gene prediction programs Align Tab: Align RNA-Seq data for training the gene prediction programs Align species-specific transcripts and proteins Structural Tab: Gene prediction programs SSR Finder, tRNAScanSE, RNammer, getorf Align tab: Nucleotide BLAST, BLAT and PASA tools for aligning transcript/EST evidence (user uploaded files or the NCBI RefSeq databases provided by GenSAS) TopHat for aligning RNA-seq reads Structural tab: -Gene prediction tools: Genscan, Glimmer, SNAP (not trainable), Augustus can be trained with user-provided evidence, GeneMarkES is self-training. FGENESH can’t be run through GenSAS, but users can upload output from that tool’s website -Other Features tools: getorf (general ORF finder), RNAammer (rRNA), tRNAscan (tRNA), and SSR tool

Use EVidenceModeler to create a consensus gene set Consensus Tab: Use EVidenceModeler to create a consensus gene set Can run multiple jobs, with different settings User can set weights and choose which tracks to include/exclude OGS tab: Users choose or create “official gene set”. All manual curations from Apollo will be merged into the OGS at the publish step. Users wither choose a single gene prediction track as the OGS or use EvidenceModeler to create a consensus. Users can assign weights to tracks in EVM, Refine tab: PASA can be used to align species-specific RNA evidence to OGS to refine gene models. EST data works the best at this step.

OGS (Official Gene Set) Tab: OGS tab: Users choose or create “official gene set”. All manual curations from Apollo will be merged into the OGS at the publish step. Users wither choose a single gene prediction track as the OGS or use EvidenceModeler to create a consensus. Users can assign weights to tracks in EVM, Refine tab: PASA can be used to align species-specific RNA evidence to OGS to refine gene models. EST data works the best at this step. OGS (Official Gene Set) Tab: Sets gene model for functional and manual annotation process and final publication

Use PASA and RNA evidence to refine OGS gene models Refine Tab: Use PASA and RNA evidence to refine OGS gene models Functional Tab: Gene models from OGS are functionally annotated Functional tab: Tools can be run on OGS. Protein BLAST (SwissProt, Trembl and NCBI Refseq protein libraries are provided by GenSAS). Other tools are InterProScan, Pfam, SignalP, and TargetP

Annotate tab: This is where manual curation of the OGS can occur with Apollo. All changes are tracked by Apollo, so you can see which user (if a shared project) changed what. Users drag models to User created annotation track (click to have box appear). From there, they use the Apollo editing functions to curate.

GenSAS exports data in GFF3 and FASTA formats Manual annotation from Apollo are automatically merged into OGS at Publish Step The OGS files needed for submitting annotation are automatically selected to be created, but user can select to have the files from other tools generated as well. GenSAS exports data in GFF3 and FASTA formats

Future development Add option to create single merged GFF3 of all annotation data under Publish step Improve how BLAST jobs are submitted to cluster to reduce run time Add BUSCO tool to assess genome assembly and annotation completeness Add step to check quality of final annotation, similar to NCBI submission check

Supported by