Annotator Interface Sharon Diskin GUS 3.0 Workshop June 18-21, 2002.

Slides:



Advertisements
Similar presentations
EAnnot: A genome annotation tool using experimental evidence Aniko Sabo & Li Ding Genome Sequencing Center Washington University, St. Louis.
Advertisements

Web Apollo Resources at the National Agricultural Library Christopher Childers NAL ARS USDA i5k.nal.usda.gov.
1 Welcome to the Protein Database Tutorial This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
Genome Browsers Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
InterPro/prosite UCSC Genome Browser Exercise 3. Turning information into knowledge  The outcome of a sequencing project is masses of raw data  The.
Sequence Analysis MUPGRET June workshops. Today What can you do with the sequence? What can you do with the ESTs? The case of SNP and Indel.
Lecture 2.21 Retrieving Information: Using Entrez.
The Central Dogma of Molecular Biology (Things are not really this simple) Genetic information is stored in our DNA (~ 3 billion bp) The DNA of a.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProt Jennifer McDowall, Ph.D. Senior InterPro Curator Protein Sequence Database:
Visualization of genomic data Genome browsers. UCSC browser Ensembl browser Others ? Survey.
Genome Annotation BCB 660 October 20, From Carson Holt.
BTN323: INTRODUCTION TO BIOLOGICAL DATABASES Day2: Specialized Databases Lecturer: Junaid Gamieldien, PhD
NGS Analysis Using Galaxy
GenSAS: Genome Sequence Annotation Server, a Tool for Online Annotation and Curation Dorrie Main, Taein Lee, Ping Zheng, Sook Jung, Stephen P. Ficklin,
GUS Overview June 18, GUS-3.0 Supports application and data integration Uses an extensible architecture. Is object-oriented even though it uses.
Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
Genome Annotation using MAKER-P at iPlant Collaboration with Mark Yandell Lab (University of Utah) iPlant: Josh Stein (CSHL) Matt Vaughn.
July 2015 CSHL Data analysis: GO tools and YeastMine, use-case examples.
Copyright OpenHelix. No use or reproduction without express written consent1.
© Wiley Publishing All Rights Reserved. Protein and Specialized Sequence Databases.
SAGExplore web server tutorial for Module II: Genome Mapping.
Doug Raiford Lesson 3.  More and more sequence data is being generated every day  Useless if not made available to other researchers.
SRI International Bioinformatics 1 Recent Developments in Pathway Tools GMOD Workshop November ‘07 Suzanne Paley Bioinformatics Research Group SRI International.
Use cases for Tools at the Bovine Genome Database Apollo and Bovine QTL viewer.
UCSC Genome Browser 1. The Progress 2 Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools.
SRI International Bioinformatics 1 Object Groups & Enrichment Analysis Suzanne Paley Pathway Tools Workshop 2010.
Part I: Identifying sequences with … Speaker : S. Gaj Date
Welcome to DNA Subway Classroom-friendly Bioinformatics.
Browsing the Genome Using Genome Browsers to Visualize and Mine Data.
The generic Genome Browser (GBrowse) A combination database and interactive web page for manipulating and displaying annotations on genomes Developed by.
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
PIRSF Classification System PIRSF: Evolutionary relationships of proteins from super- to sub-families Homeomorphic Family: Homologous proteins sharing.
Srr-1 from Streptococcus. i/v nonpolar s serine (polar uncharged) n/s/t polar uncharged s serine (polar uncharged) e glutamic acid (neg. charge) sserine.
Sackler Medical School
Functional Annotation and Functional Enrichment. Annotation Structural Annotation – defining the boundaries of features of interest (coding regions, regulatory.
Web Databases for Drosophila An introduction to web tools, databases and NCBI BLAST Wilson Leung08/2015.
GUS 3.0: Implementation and Dependencies June 19, 2002 Jonathan Crabtree
Design and Implementation of a Rationale-Based Analysis Tool (RAT) Diploma thesis from Timo Wolf Design and Realization of a Tool for Linking Source Code.
How can we find genes? Search for them Look them up.
EBI is an Outstation of the European Molecular Biology Laboratory. Gautier Koscielny VectorBase Meeting 08 Feburary 2012, EBI VectorBase Text Search Engine.
Worldwide Protein Data Bank Common D&A Project Sequence Processing Modular Demo May 6, 2010 Project Deliverable.
Web Apollo Resources at the National Agricultural Library Christopher Childers NAL ARS USDA i5k.nal.usda.gov.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
SRI International Bioinformatics 1 Editing Pathway/Genome Databases Ron Caspi.
SAGExplore web server tutorial. The SAGExplore server has three different modules …
UCSC Genome Browser Zeevik Melamed & Dror Hollander Gil Ast Lab Sackler Medical School.
Tools in Bioinformatics Genome Browsers. Retrieving genomic information Previous lesson(s): annotation-based perspective of search/data Today: genomic-based.
1 of 28 Evaluating Genes and Transcripts (“Genebuild”)
SRI International Bioinformatics 1 Pathway Tools Features Available Only in the Desktop Version PathoLogic.
Copyright OpenHelix. No use or reproduction without express written consent1.
Annotation of eukaryotic genomes
Genomes at NCBI. Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools lists 57 databases.
Welcome to the combined BLAST and Genome Browser Tutorial.
Visualization of genomic data Genome browsers. How many have used a genome browser ? UCSC browser ? Ensembl browser ? Others ? survey.
Work Presentation Novel RNA genes in A. thaliana Gaurav Moghe Oct, 2008-Nov, 2008.
Java Object-Relational Layer Sharon Diskin GUS 3.0 Workshop June 18-21, 2002.
The Bovine Genome Database Abstract The Bovine Genome Database (BGD, facilitates the integration of bovine genomic data. BGD is.
Using DNA Subway in the Classroom Genome Annotation: Red Line.
BUSINESS SENSITIVE 1 SAAW - Sequence Annotation and Analysis Workshop Boyu Yang and Gene Godbold Battelle Memorial Institute, Charlottesville Operations.
Center for Biologisk Sekvensanalyse Nikolaj Blom Center for Biological Sequence Analysis BioCentrum-DTU Technical University of Denmark
Strategies for annotation of a genome
Ensembl Genome Repository.
Rationale for GUS Answer queries:
Information Management Infrastructure for the Systematic Annotation of Vertebrate Genomes V Babenko (1), B Brunk (1), J Crabtree (1), S Diskin (1), Y Kondrahkin.
Leveraging EST Sequencing, Micro Array Experiments and Database Integration for Gene Expression Analyses The Computational Biology and Informatics Laboratory.
Yating Liu July 2018 G-OnRamp workshop
Part II SeqViewer AraCyc Help
Annotator Interface GUS 3.0 Workshop June 18-21, 2002.
Java Object-Relational Layer
Presentation transcript:

Annotator Interface Sharon Diskin GUS 3.0 Workshop June 18-21, 2002

Outline n Current annotation efforts n Motivation for new annotation tool n Requirements for new annotation tool n Thoughts on design and implementation n Future plans

Current Annotation Efforts

Overview of Current Efforts n Automated annotation has been applied to the DoTS transcripts –Predicted gene ownership (clustering of assemblies) –BlastX against NR Automated assignment of descriptions based on similarity –BlastX against ProDom and RPS-Blast against CDD Predicted GO Functions –Framefinder Predicted Protein Sequences –Blat alignments –EPCR, Index Words, etc… n Manual annotation efforts have focused on –validating the automated annotation and –adding additional information at the central dogma level n Manual annotation of the gene index utilizes an annotation tool, the GUS Annotator Interface, which directly updates the GUSdev database.

GenBank, dbEST sequences Make Quality (remove vector, polyA, NNNs) Incoming Sequences (EST/mRNA) “Quality” sequences “Unassembled” clusters CAP4 assemblies (generate consensus sequences) Dots Consensus sequences Assemble sequences with CAP4 Blocked sequences Block with RepeatMasker Blastn to cluster sequences Gene Cluster (RNA s in the Gene) BLASTn DoTs consensus sequences (98% identity, 150bps) DoTS RNA transcripts The assembly of sequences generates a consensus sequence or DoTS transcript

Current Efforts: Gene Annotation (1) Generate DoTS transcripts Feature_1 Feature_5 Feature_2 Feature_3 Feature_4 Gene_A Instance_1 Instance_5 Instance_2 Instance_3 Instance_4 Assembly_1 Assembly_5 Assembly_2 Assembly_3 Assembly_4 RNARNAInstanceAssembly RNA_1 RNA_5 RNA_2 RNA_3 RNA_4 RNAFeature … Task 1: Validation of Gene Membership ……… Gene

Current Efforts: Gene Annotation (2) Generate DoTS transcripts Feature_1 Feature_5 Feature_2 Feature_3 Feature_4 Gene_A Instance_1 Instance_5 Instance_2 Instance_3 Instance_4 Assembly_1 Assembly_5 Assembly_2 Assembly_3 Assembly_4 RNARNAInstanceAssembly RNA_1 RNA_5 RNA_2 RNA_3 RNA_4 RNAFeature … - Removing RNAs from the cluster results in the creation of a new Gene - An entry is made in the MergeSplit table for tracking purposes - Similar process followed when an RNA is added to a Gene …… Gene_B Gene

Current Efforts: Gene Annotation (3) Task 2: Assign Reference RNA –will be annotated further –RNA table n Task 3: Assign Approved Gene Name/Symbol –Gene Table –Evidence: Comment (specifies database link) n Task 4: Assign Gene Description –Gene Table –Evidence: Comment n Task 4: Associate known Gene synonyms –GeneSynonym table –Evidence: Comment

Current Efforts: RNA Annotation Annotation of “Reference Sequence” n Task 1: Assign/Confirm Description of assembly –RNA table n Task 2: Confirm/Add/Delete GO Functions –ProteinGOFunction (in GUSdev, GO tables have been re- designed in GUS3.0) –Evidence: Comments or Similarity (ProDom, CDD-Pfam, CDD- Smart, or NR)

Current Annotator Interface Architecture GUSdev “XML” file Annotator Interface Submitter GA-Plugin JavaServlet writes reads executes DBI(Insert/Update/Delete) Perl Object Layer JDBC (Query Only)

Current Annotator Interface

Current Gene Annotation Validate Cluster and Assign Reference RNA/Assembly

Current Gene Annotation (cont.) Assign Gene Name/Symbol Assign Gene Description Assign Gene Synonym(s)Evidence

Current RNA (and Protein) Annotation RNA Description GO Functions Evidence

Allgenes Display of Gene Annotation

Allgenes Display of RNA Annotation (Confirmed or manually added GO Functions) RNA Description

Status of Current Annotation (as of June 20, 2002) n 1289 manually reviewed genes –1003 with gene name –697 with gene synonyms –1046 with description n 6146 manually reviewed RNAs/DoTS assemblies n 949 ‘proteins’ with reviewed GO function

Motivation for new tool Want to annotate using genomic sequence Create “curated” gene models specifying structure Increase structure of annotation in GUS Annotation of proteins Redefinition of annotation tasks Current interface not designed for this purpose

Some Other Annotation Tools Artemis Developed and used at Sanger Reads and writes flat files Supports rich set of annotations Save as EMBL format Apollo Combined effort including members from Sanger and Berkeley Flat files (CORBA access to ENSEMBL) 2 versions, currently being merged Sanger: annotation viewer Berkeley: focus on editing No Existing Tool To Meet All of Our Needs

Requirements At a High Level

Requirements: Graphical View n Provide alignment of features on genomic sequence –could potentially display any feature type currently stored in GUS3.0 –features can be selected and used to generate “curated” features –similar to display and functionality in Apollo n Toggle (or configure) the display of each feature type n Zoom to sequence level and will include links to functionality relevant to the feature highlighted n Also support creation of features “from scratch” –based on literature, etc. n Detail editors provide ability to change endpoints, etc.

Gene Annotation n Create curated gene model –specify gene boundaries –specify location of exons (and thus introns) 5' exon boundary (putative transcription start site) 3' exon boundary (include poly adenylation signal) –automatic creation of Gene entry –merge with existing gene instances through GeneInstance table –tables/views affected: GeneFeature ExonFeature GeneInstance Gene MergeSplit –evidence: features used to create model, PubMed ID –should be as easy as clicking on existing features and saying make curated (then can modify endpoints, etc. if needed)

Gene Annotation (2) n Assign (HUGO or MGI approved) abbrievated gene name/symbol –Gene Table –Evidence: ExternalDatabaseLink n Assign full gene name (MGI or HUGO full gene name) –Gene Table –Evidence: ExternalDatabaseLink n Assign abbrievated gene name/symbol synonyms (non-approved gene symbols) –GeneSynonym Table –Evidence: ExternalDatabaseLink n Assign full gene name aliases –GeneAlias Table –Evidence: ExternalDatabaseLink

Gene Annotation (3) n Assign gene category (e.g. non-coding) –Gene Table –Evidence: ExternalDatabaseLink/Literature Reference Similarity (eg. to known non-coding RNA) n Confirm/assign gene chromosomal location –GeneChromosomalLocation –Evidence: ExternalDatabaseLink/Literature Reference RH mapping data Alignments/Features n OMIM Link assignment (verification if computationally determined) –ExternalDatabaseLink

RNA Annotation (1) n Create “curated RNAs” –Define RNA transcript forms of gene (create RNAs) –Using exons defined by curated gene –5' and 3' UTRs –Automatic creation of RNA entry –Merge existing RNA instances –Tables affected: RNAFeature UTRFeature RNAInstance RNA –Evidence: Features used to create n Assign RNA categories to created RNAs (e.g. alternative form) –RNARNACategory Table

RNA Annotation n Assign (or confirm computed) RNA description –RNA table –Evidence: Gene from which it is derived n Anatomy expression assignment(s) –RNAAnatomy –RNAAnatomyLOE –Evidence: ExternalDatabaseLink/Literature references Assembly anatomy percent from DoTS RAD experiments n Assign GO terms to curated RNA (non-coding RNAs, e.g. small RNA involved in splicing) –GOTermAssociation –GOTermAssociationEvid –Evidence: ExternalDatabaseLInk, Literature References n Computational analysis performed on curated RNA sequences –Annotation workflow Framefinder translation, GO terms, Similarities, etc.

Requirements: Protein Annotation n Confirm/assign GO Function –GOTermAssociation, GOTermAssociationEvid –Evidence: ExternalDatabaseLink and/or Literature References n Confirm/assign GO Biological Process –GOTermAssociation, GOTermAssociationEvid –Evidence: ExternalDatabaseLink and/or Literature References n Confirm/assign GO Cellular Component –GOTermAssociation, GOTermAssociationEvid –Evidence: ExternalDatabaseLink and/or Literature References n Assign protein name –Protein Table –Evidence: ExternalDatabaseLink, Literature Ref, Similarities n Assign protein name synonyms –Protein Table –Evidence: ExternalDatabaseLink, Literature Ref, Similarities

Protein Annotation (2) n Assign protein category (post-translational modifications) –ProteinProteinCategory –Evidence: ExternalDatabaseLink, Literature References n Protein-protein interactions assigned –Interaction –InteractionInteractionLOE –Evidence: PubMed ID, etc. n Protein pathway assignments –PathwayInteraction (for newly created interactions) –Still under consideration: What is best way to link with existing pathway for example, Pathway is represented in DoTS, and we want to say that this curated Protein is really the same as a protein in a pathway. n Assign post translational modification category n Assign interactions involving this protein n Assign pathway protein is known to be involved in n Assign protein family n Ability to modify and/or delete curated protein Evidence will be associated with all annotation

Next Steps/ Open Issues n Completion of Java Object Layer n Decision regarding BioJava wrappers –What exactly will this give us to aid in interface development (eg. FeatureRenderer, etc…) n Discussion on layout of interface –Joan’s input after experimentation with other tools n Depending on the above : –Client Side portion which communicates with remote GUS Server –Interface Implementation