Peter Rice and Mahmut Uludag EMBOSS as an Efficient DAS Annotation Source Peter Rice, EBI Mahmut Uludag, EBI 10th March 2009
EMBOSS: History European Molecular Biology Open Software Suite 1996: Started at Sanger Centre 2000: Release and moved to HGMP 2005: Moved to EBI (HGMP closed) 2008: Release
EMBOSS: Status Open source package Sequence analysis 200 applications 100 third-party applications Reads 40 sequence formats Writes 40 sequence formats Reads 6 feature formats Writes 10 feature formats
EMBOSS: Interfaces Over 100 interfaces / packages containing EMBOSS Command line Web interfaces GUIs SOAP Web services (EMBRACE) Taverna workflows Galaxy
Overview EMBOSS produces annotations in DASGFF format Protein sequence referencing using Uniprot protein identifiers Nucleotide sequence referencing using Ensembl gene identifiers MyDAS based annotation server Executes EMBOSS programs based on the incoming requests
Protein sequence annotation, EMBOSS programs used so far pepcoil; predicted coiled coil regions in protein sequences patmatmotifs; motifs from the PROSITE database helixturnhelix; nucleic acid-binding motifs in protein sequences garnier; predicted protein secondary structures using GOR method sigcleave; predicted signal cleavage sites in protein sequences digest; protein proteolytic enzyme or reagent cleavage sites antigenic; predicted antigenic regions in protein sequences
Nucleotide sequence annotation, EMBOSS programs used so far equicktandem, tandem; tandem repeats in nucleotide sequences silent; restriction enzyme sites in a nucleotide sequence which can be inserted (mutated) without changing the translation jaspscan; transcription factor binding sites from the JASPAR database marscan; matrix/scaffold recognition (MRS) signatures in DNA sequences restrict; restriction enzyme cleavage sites in nucleotide sequences tcode; protein-coding regions identified using Fickett TESTCODE statistic
Other EMBOSS programs that can be used for annotation 26 EMBOSS programs producing graphical outputs Possibly using stylesheet support in Ensembl & DAS 13 EMBOSS alignment programs DAS 1.53E has alignment extension
Test clients used Dasty2; for protein annotations Good in displaying individual features Useful links for further exploration Links to ontology terms used Links to original DAS responses Ensembl; for gene and protein annotations Displays features in genomic context Possible to use DAS resources that not in the registry
Example Dasty screen:
Example Ensembl screen:
Work in progress Need to register on dasregistry.org Experimental DAS server available at DAS servers as data sources Common coordinate systems
The EMBOSS Team Peter Rice Alan Bleasby Jon Ison Mahmut Uludag