Peter Rice and Mahmut Uludag EMBOSS as an Efficient DAS Annotation Source Peter Rice, EBI Mahmut Uludag, EBI 10th March.

Slides:



Advertisements
Similar presentations
Genome Annotation: A Protein-centric Perspective.
Advertisements

Andy Jenkinson, EBI An Introduction to DAS. Summary of Topics What is Data Integration? Problems in Data Integration An architectural overview of DAS.
Rafael C Jimenez DAS DAS Workshop 2012 February 27-29, 2012 Using DAS software, an introduction to some DAS implementations.
European Life Sciences Infrastructure for Biological Information Rafael C Jimenez ELIXIR CTO EMBL-EBI workshop networks and pathways.
Molecular Genetics DNA RNA Protein Phenotype Genome Gene
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
EMBOSS GUI 2k EMBOSS
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
Bioinformatics for biomedicine Summary and conclusions. Further analysis of a favorite gene Lecture 8, Per Kraulis
Structural Biology and Biocomputing Programme 1 Osvaldo Graña, CNIO Distributed Annotation System (DAS) part I Osvaldo Graña VIII.
Archives and Information Retrieval
EBI is an Outstation of the European Molecular Biology Laboratory. UniProt Jennifer McDowall, Ph.D. Senior InterPro Curator Protein Sequence Database:
Algorithm Animation for Bioinformatics Algorithms.
Protein and Function Databases
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
Data retrieval BioMart Data sets on ftp site MySQL queries of databases Perl API access to databases Export View.
BioPerl. cpan Open a terminal and type /bin/su - start "cpan", accept all defaults install Bio::Graphics.
NGS Analysis Using Galaxy
Wellcome Trust Workshop Working with Pathogen Genomes Module 3 Sequence and Protein Analysis (Using web-based tools)
Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
Gary Stormo by Andrew Bardee. History Born 1950 in South Dakota Undergraduate in Biology from Caltech PhD in Molecular Biology from University of Colorado.
Galaxy: Integrative, Reproducible Analysis of Genomics Data Genomic and Proteomic Approaches to Heart, Lung, Blood and Sleep Disorders Jackson Laboratories.
UCSC Genome Browser 1. The Progress 2 Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools.
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics Lab v1 | Saurabh Sinha1 Powerpoint by Casey Hanson.

DNA PACKAGING. 8 histones make up the nucleosome core DNA wraps twice around the 8 histones Histone 1 helps maintain the nucleosome DNA is negatively.
Module 3 Sequence and Protein Analysis (Using web-based tools) Working with Pathogen Genomes - Uruguay 2008.
1 of 38 Data Mining in Ensembl with BioMart. 2 of 38 Simple Text-based Search Engine.
Browsing the Genome Using Genome Browsers to Visualize and Mine Data.
EMBRACE An example of Grid Integration (I): The EMBRACE project Jean SALZEMANN CNRS/IN2P3.
EBI is an Outstation of the European Molecular Biology Laboratory. DAS Client Libraries An easy way to create your own DAS client Leyla García UniProtTeam.
Pfam, DAS and the future Rob Finn DAS Workshop 2009.
Pattern Matching Rhys Price Jones Anne R. Haake. What is pattern matching? Pattern matching is the procedure of scanning a nucleic acid or protein sequence.
LiveBASE, the Bioinformatics Application SuitE. Introduction: Mission Statement Leading Provider of Business Process Integration Solutions for Life Science.
EMBOSS over a Grid 1. 1st EELA Grid School December 4th of 2006 Eduardo MURRIETA LEON Romualdo ZAYAS-LAGUNAS Pierre-Alain BRANGER Jérôme VERLEYEN Roberto.
DAS Current Situation and Future Developments Jonathan Warren DAS coordinator for the Sanger Institute
NCBI Genome Workbench Chuong Huynh NIH/NLM/NCBI Sao Paulo, Brasil July 15, 2004 Slides from Michael Dicuccio’s Genome Workbench.
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics | Saurabh Sinha | PowerPoint by Casey Hanson.
Shell Interface Shell Interface Functions Data. Graphical Interface Graphical Interface Command-line Interface Command-line Interface Experiments Private.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
Wrapping analytical services for caBIG Taverna-caGrid technical review meeting Stian Soiland-Reyes, myGrid University of Manchester, UK
Analysis: Tools for directly examining sequence What follows is a simulation of the proposed sequence interface. A PC-based prototype exists, but the interface.
InterPro Sandra Orchard.
Genomes at NCBI. Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools lists 57 databases.
Welcome to the combined BLAST and Genome Browser Tutorial.
Summer Bioinformatics Workshop 2008 BLAST Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester Center
COURSE OF BIOINFORMATICS Exam_30/01/2014 A.
The Bovine Genome Database Abstract The Bovine Genome Database (BGD, facilitates the integration of bovine genomic data. BGD is.
 What is MSA (Multiple Sequence Alignment)? What is it good for? How do I use it?  Software and algorithms The programs How they work? Which to use?
Data Loading into Ensembl Database TGAC Browser
BUSINESS SENSITIVE 1 SAAW - Sequence Annotation and Analysis Workshop Boyu Yang and Gene Godbold Battelle Memorial Institute, Charlottesville Operations.
Presenter: Bradley Green.  What is Bioinformatics?  Brief History of Bioinformatics  Development  Computer Science and Bioinformatics  Current Applications.
EMBOSS "The European Molecular Biology Open Software Suite "
Designing, Executing and Sharing Workflows with Taverna 2.4 Different Service Types Katy Wolstencroft Helen Hulme myGrid University of Manchester.
The Transcriptional Landscape of the Mammalian Genome
Protein databases Henrik Nielsen
EMBOSS, MyGrid and EMBRACE
Alignment table: group 4
Regulatory Genomics Lab
Programmatic access to EMBL-EBI resources
Archives and Information Retrieval
University of Pittsburgh
There are four levels of structure in proteins
A User’s Guide to GO: Structural and Functional Annotation
Ensembl Genome Repository.
Regulatory Genomics Lab
Regulatory Genomics Lab
SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS.
Presentation transcript:

Peter Rice and Mahmut Uludag EMBOSS as an Efficient DAS Annotation Source Peter Rice, EBI Mahmut Uludag, EBI 10th March 2009

EMBOSS: History European Molecular Biology Open Software Suite 1996: Started at Sanger Centre 2000: Release and moved to HGMP 2005: Moved to EBI (HGMP closed) 2008: Release

EMBOSS: Status Open source package Sequence analysis 200 applications 100 third-party applications Reads 40 sequence formats Writes 40 sequence formats Reads 6 feature formats Writes 10 feature formats

EMBOSS: Interfaces Over 100 interfaces / packages containing EMBOSS Command line Web interfaces GUIs SOAP Web services (EMBRACE) Taverna workflows Galaxy

Overview EMBOSS produces annotations in DASGFF format Protein sequence referencing using Uniprot protein identifiers Nucleotide sequence referencing using Ensembl gene identifiers MyDAS based annotation server Executes EMBOSS programs based on the incoming requests

Protein sequence annotation, EMBOSS programs used so far pepcoil; predicted coiled coil regions in protein sequences patmatmotifs; motifs from the PROSITE database helixturnhelix; nucleic acid-binding motifs in protein sequences garnier; predicted protein secondary structures using GOR method sigcleave; predicted signal cleavage sites in protein sequences digest; protein proteolytic enzyme or reagent cleavage sites antigenic; predicted antigenic regions in protein sequences

Nucleotide sequence annotation, EMBOSS programs used so far equicktandem, tandem; tandem repeats in nucleotide sequences silent; restriction enzyme sites in a nucleotide sequence which can be inserted (mutated) without changing the translation jaspscan; transcription factor binding sites from the JASPAR database marscan; matrix/scaffold recognition (MRS) signatures in DNA sequences restrict; restriction enzyme cleavage sites in nucleotide sequences tcode; protein-coding regions identified using Fickett TESTCODE statistic

Other EMBOSS programs that can be used for annotation 26 EMBOSS programs producing graphical outputs Possibly using stylesheet support in Ensembl & DAS 13 EMBOSS alignment programs DAS 1.53E has alignment extension

Test clients used Dasty2; for protein annotations Good in displaying individual features Useful links for further exploration  Links to ontology terms used  Links to original DAS responses Ensembl; for gene and protein annotations Displays features in genomic context Possible to use DAS resources that not in the registry

Example Dasty screen:

Example Ensembl screen:

Work in progress Need to register on dasregistry.org Experimental DAS server available at DAS servers as data sources Common coordinate systems

The EMBOSS Team Peter Rice Alan Bleasby Jon Ison Mahmut Uludag