A web-based platform for structural and functional annotation of model and non-model organisms www.gensas.org Jodi Humann, Taein Lee, Stephen Ficklin, Chun-Huai Cheng, Heidi Hough, Sook Jung, Jill Wegrzyn, David Neale, Dorrie Main jhumann@wsu.edu
What is genome annotation? ???? Annotation Predicted gene models to use in lab experiments
What is GenSAS? Web-based platform, no software installation by user Just need a user account, internet browser, and an internet connection User accounts keep data private and secure and allow for collaborative annotation projects Easy-to-use interfaces and detailed user manual
Account Limits User accounts will remain active as long there is an active project Projects expire after 60 days unless user resets expiration date 250 GB of storage space on server Assembly files must be high quality <25,000 sequences Over 50% of sequences longer than 2,500 bases Seven jobs running at one time, but other jobs can be waiting in queue
Eukaryote annotation workflow Upload Sequences PRINSEQ-lite, BUSCO Create Project Upload Evidence Identify Repeats RepeatMasker, RepeatModeler Mask Sequences Align Evidence BLAST, BLAT, DIAMOND, HISAT2, PASA, TopHat Structural Annotation Augustus, BRAKER2, GeneMarkES, Genscan, GlimmerM, SNAP, RNammer, tRNAScan-SE Choose Official Gene Set EvidenceModeler (optional) Refine Gene Models PASA (optional) Functional Annotation BLAST, DIAMOND, InterProScan, Pfam, SignalP, TargetP Manual Curation Apollo, JBrowse Generate Files for Publication BUSCO
Prokaryote annotation workflow Upload Sequences PRINSEQ-lite, BUSCO Create Project Upload Evidence Align Evidence BLAST, BLAT, DIAMOND Structural Annotation GeneMarkS, Glimmer3, RNAmmer, tRNAScan-SE Choose Official Gene Set Functional Annotation BLAST, DIAMOND, InterProScan, Pfam, SignalP Manual Curation Apollo, JBrowse Generate Files for Publication BUSCO
User provided files Required: Optional: Genome assembly Assembled transcripts or ESTs Species-specific repeats or proteins Species-specifc Genbank gene structures Filtered Illumina RNA-seq reads Aligned RNA-seq reads in the BAM file format Previous annotations in the GFF3 format
GenSAS provided information RepeatMasker: Repbase repeat libraries Transcript and protein alignment tools: NCBI RefSeq transcripts and proteins archaea, bacteria, fungi, invertebrate, mitochondrion, plant, plasmid, plastid, protozoa, vertebrate-mammalian, vertebrate- other, viral SwissProt Trembl
GenSAS Homepage Request free account Login to GenSAS Access User’s Guide and contact us Learn about tools and libraries Access the GenSAS interface
Once jobs are in queue, users can log out of GenSAS GenSAS Interface Once jobs are in queue, users can log out of GenSAS
Sequences Step Once uploaded, assembly metrics are calculated using PRINSEQ Users can run BUSCO on assembly
Project Step Fillable web form Select previously uploaded assembly Email options
GFF3 Step
Evidence Step
Repeats and Masking Steps Masking step produces consensus, or can skip masking
Align Step
Structural Step
Consensus Step Optional step using EVM Can adjust and remove weights Gene Predictions Protein Alignments Transcript Alignments
OGS Step Select “Official Gene Set”
Refine and Functional Steps Optional step to further refine OGS using PASA prior to functional annotation
Annotate Step Edits added to “User-created Annotations” will be merged into final results
Publish Step OGS and repeat consensus automatically prepared FASTA and GFF formats User can select other jobs
Final Annotation Results Summary table of annotation project Project Summary file with details about tool settings Option to create merged GFF3 file Add repeats, tRNA, rRNA Add functional job annotation to column 9
Final Annotation Results All results files are listed and can be downloaded individually or….
Final Annotation Results Use “Download all” option to get all the files at once Option to run BUSCO on proteins from final annotation
Funding GenSAS Poster – PO0085 www.gensas.org