Genome Sequence Annotation Server

Genome Sequence Annotation Server
GenSAS v5.1: a web-based platform for structural and functional genome annotation and curation of genomes Jodi L. Humann, Taein Lee, Stephen P. Ficklin, Chun-Huai Cheng, Heidi Hough, Sook Jung, Jill Wegrzyn, David Neale, Dorrie Main

What is DNA annotation and why do it?
Getting the DNA sequence is only the first step Need to know the biological relevance of the DNA sequence Annotated sequence can be used to find putative genes of interest for study

What scientists want Current annotation tools:
Many tools available, but run independently of each other Most of the tools are run via the command line and require server access Scientists want a platform that: Is a single location for DNA annotation Does not require management of computing equipment and software tools Is easy to use and can be adapted to a variety of DNA sequences

What is GenSAS? A single website that combines numerous annotation tools into one interface User accounts keep data private and secure as well as allow users to share data for collaborative annotation Easy-to-use interfaces, with integrated instructions allow researchers at all skill levels to annotate DNA

GenSAS annotation process
Upload Sequences Create Project Upload Evidence Identify Repeats RepeatMasker, RepeatModeler Mask Sequences Align Transcripts BLAST, BLAT, PASA TopHat Structural Annotation Augustus, GeneMarkES, Genscan, GlimmerM, SNAP Choose Official Gene Set EvidenceModeler Refine Gene Models PASA Functional Annotation BLAST, InterProScan, Pfam, SignalP, TargetP Manual Curation Apollo, JBrowse Generate Files for Publication Flowchart starts in upper left corner, goes down for first column, up for second column and then down again for 3rd column. Last step is in lower right corner

The animation is triggered by click. 3 different animations to help focus on what is being said Circle 1: Around block on right with links to help for GenSAS. Links to User’s Guide and other info along with a “Contact Us” Circle 2: Around request new acct link in login area. Users request accounts, account requests are reviewed (not automatic) and then an is sent when account is approved. Circle 3: Around “Use GenSAS” tab, this is where you click to access the GenSAS interface after logging in.

The animation is triggered by click
The animation is triggered by click. 3 different animations to help focus on what is being said Rectangle 1: Around header. Flowchart of annotation process and used to navigate to different sections. Project name in upper left, user name and links to account info in upper right. Rectangle 2: Around accordion menu on right. 4 sections: to monitor job progress (Job Queue), open Apollo/JBrowse (Browser), share projects with other GenSAS users (Sharing), and links to the User’s Guide (Help). Rectangle 3: Around Tab section. This is the primary area users interact with GenSAS. Each step opens a different tab. GenSAS welcome tab provides users with a quick overview of what each of the three screen sections do.

GenSAS account limits GenSAS user accounts will remain active as long as users have an active GenSAS project GenSAS users are limited to a total of 250 GB of storage space on GenSAS server Assembly files must be high quality (<25,000 sequences, over 50% of sequences longer than 2,500 bases) Users can only have seven jobs running at one time, but other jobs can be waiting in queue

Sequence Tab: View available sequences Upload single sequence or multi-sequence FASTA file Create sequence subset based on sequence names or minimum size Project Tab: Open existing project or shared project Create new project Edit project info and reset expiration Sequence tab: -upload single or multi-sequence fasta files -multi-sequence fasta files can be used to create subset sequence, filter by minimum size or by contig names Project tab: -web form used to create new project. Some fields are required, others are optional -users can also open previous projects and projects shared with them under project tab

GFF3 & Evidence Tabs (optional): EST, mRNA sequences Repeat motifs
Protein sequences NCBI gene structures Pre-processed Illumina RNA- Seq reads All tabs have an Instructions section that can be opened and collapsed GFF3 tab: -users can upload GFF3 files generated by other annotation tools, or the previous annotation Evidence tab: -fasta files of species-specific repeat, protein, and transcript evidence -pre-processed Illumina RNA-seq reads for use with TopHat -NCBI gene structure files (for training Augustus) The more organism specific data you have, the better the annotation will be

RepeatMasker- Evidence based repeat finder
Repeats Tab: RepeatMasker- Evidence based repeat finder RepeatModeler- De novo repeat finder Masking Tab: Check results in JBrowse and choose which set(s) to use to make masked consensus Repeats tab: -RepeatMasker (evidence based, GenSAS provide Repbase repeat libraries or users use fasta file of repeats they loaded) -RepeatModeler (de novo) -These steps are only available to eukaryotes -Tools can be run multiple times with different settings as long as user provides a different job name Masking: -Users have option to choose a single repeatmasking job result to use, to create a merged consensus between repeatmasking jobs, or to use an unmasked sequence

Job status can be monitored through Job Queue
Progress through GenSAS is automatically saved Users can log off GenSAS and jobs will continue running While jobs are running, users can look at the completed results in Apollo/JBrowse Once the project has results, users can share the project with other GenSAS users for collaborative annotation Job Queue….same points as on the slide

Look at the results of jobs, before moving to next step!
This slide is to emphasize looking at the results—with animated boxes. Click 1- Box around job name in job queue, and a few seconds later a box around the RepeatMasker results. User click on job name in queue to open the results tab. In the results tab, there are the raw files generated by the tool (which can be downloaded) and a summary table of number of predicted features. The summary tables are a good way to see if the job really ran, but not the best way to look at the results. Click 2- Highlights “Browser” section of accordion menu (which leads in to next slide)

Look at the results of jobs, before moving to next step!
In the “Browser” section of side menu, click on “Open Apollo” to open Apollo tab Turn on the tracks to view them and make sure the data makes sense before using the data in downstream steps of the annotation

Align RNA-Seq data for training the gene prediction programs
Align Tab: Align RNA-Seq data for training the gene prediction programs Align species-specific transcripts and proteins Structural Tab: Gene prediction programs SSR Finder, tRNAScanSE, RNammer, getorf Align tab: Nucleotide BLAST, BLAT and PASA tools for aligning transcript/EST evidence (user uploaded files or the NCBI RefSeq databases provided by GenSAS) TopHat for aligning RNA-seq reads Structural tab: -Gene prediction tools: Genscan, Glimmer, SNAP (not trainable), Augustus can be trained with user-provided evidence, GeneMarkES is self-training. FGENESH can’t be run through GenSAS, but users can upload output from that tool’s website -Other Features tools: getorf (general ORF finder), RNAammer (rRNA), tRNAscan (tRNA), and SSR tool

Use EVidenceModeler to create a consensus gene set
Consensus Tab: Use EVidenceModeler to create a consensus gene set Can run multiple jobs, with different settings User can set weights and choose which tracks to include/exclude OGS tab: Users choose or create “official gene set”. All manual curations from Apollo will be merged into the OGS at the publish step. Users wither choose a single gene prediction track as the OGS or use EvidenceModeler to create a consensus. Users can assign weights to tracks in EVM, Refine tab: PASA can be used to align species-specific RNA evidence to OGS to refine gene models. EST data works the best at this step.

OGS (Official Gene Set) Tab:
OGS tab: Users choose or create “official gene set”. All manual curations from Apollo will be merged into the OGS at the publish step. Users wither choose a single gene prediction track as the OGS or use EvidenceModeler to create a consensus. Users can assign weights to tracks in EVM, Refine tab: PASA can be used to align species-specific RNA evidence to OGS to refine gene models. EST data works the best at this step. OGS (Official Gene Set) Tab: Sets gene model for functional and manual annotation process and final publication

Use PASA and RNA evidence to refine OGS gene models
Refine Tab: Use PASA and RNA evidence to refine OGS gene models Functional Tab: Gene models from OGS are functionally annotated Functional tab: Tools can be run on OGS. Protein BLAST (SwissProt, Trembl and NCBI Refseq protein libraries are provided by GenSAS). Other tools are InterProScan, Pfam, SignalP, and TargetP

Annotate tab: This is where manual curation of the OGS can occur with Apollo. All changes are tracked by Apollo, so you can see which user (if a shared project) changed what. Users drag models to User created annotation track (click to have box appear). From there, they use the Apollo editing functions to curate.

GenSAS exports data in GFF3 and FASTA formats
Manual annotation from Apollo are automatically merged into OGS at Publish Step The OGS files needed for submitting annotation are automatically selected to be created, but user can select to have the files from other tools generated as well. GenSAS exports data in GFF3 and FASTA formats

Future development Add option to create single merged GFF3 of all annotation data under Publish step Improve how BLAST jobs are submitted to cluster to reduce run time Add BUSCO tool to assess genome assembly and annotation completeness Add step to check quality of final annotation, similar to NCBI submission check

Supported by

Genome Sequence Annotation Server

Similar presentations

Presentation on theme: "Genome Sequence Annotation Server"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Genome Sequence Annotation Server

Similar presentations

Presentation on theme: "Genome Sequence Annotation Server"— Presentation transcript:

Similar presentations

About project

Feedback