Presentation is loading. Please wait.

Presentation is loading. Please wait.

GUS Overview June 18, 2002. GUS-3.0 Supports application and data integration Uses an extensible architecture. Is object-oriented even though it uses.

Similar presentations


Presentation on theme: "GUS Overview June 18, 2002. GUS-3.0 Supports application and data integration Uses an extensible architecture. Is object-oriented even though it uses."— Presentation transcript:

1 GUS Overview June 18, 2002

2 GUS-3.0 Supports application and data integration Uses an extensible architecture. Is object-oriented even though it uses an underlying relational database management system (Oracle). Warehouse instead of federation for local stable copy Uses standards for bulk data exchange (e.g., MAGE) Genomics Unified Schema

3 GUS Usage Annotation –of genomes - gene models, sequence features – of genes - gene function, gene expression, gene regulation Data mining –Develop algorithms and queryable resource Publish –Map identifiers with other resources/ databases –URL for entry retrieval/ ad hoc queries in web interface

4 GUS-3.0 Name Spaces GUS has 5 name spaces compartmentalizing different types of information. NamespaceDomainFeatures CoreData ProvenanceWorkflows SresShared resorurcesOntologies DoTS sequence and annotation Central dogma RADGene expresssionMIAME TESSGene regulationGrammars

5 Application Integration: PlasmoDB Automated Analysis & Integration WWW queries, browsing, & download Java Servlets & Perl CGI GenePlot Software GenePlot CD DoTS Oracle/SQL Genomic Sequence microArray & SAGE Experiments Mapping Data GenBank, InterPro, GO, etc GSSs & ESTs Annotation QTL,POP, SNP, Clinical Existing implementation Future implementation RADCoreSRes Object Layer TESS TIGR Sanger Stanford Plasmodium Investigators Public Databases Annotator’s Interface

6 GUS Supports Multiple Projects AllGenes PlasmoDB EPConDB CoreSRESTESSRADDoTS Oracle RDBMS Object Layer for Data Loading Java Servlets Other sites, Other projects Other sites, Other projects

7 Main Aspects of GUS Development Choice of development tools –Schema: CREATE TABLE statements Documentation plug-in: input is tab- delimited text UML - Rational Rose, PowerDesigner –Code: CVS Areas to emphasize –Plug-ins –Work flow –TESS –Proteomics –Images Preferred type of user interface –JSP –PHP

8 Data Integration GO Species Tissue Dev. Stage Ontologies SRes acute myeloid leukemia Data Provenance Ownership Protection Algorithms Similarity Versioning Workflow Core with sequence similarity to c-fos Genomic Sequence Genes, gene models STSs, repeats, etc Cross-species analysis Transcribed Sequence Characterize transcripts RH mapping Library analysis Cross-species analysis DOTS Protein Sequence Domains Function Structure Cross-species analysis DoTS Transcription factors Arrays SAGE Conditions Transcript Expression RAD up-regulated in Binding Sites Patterns Grammars Gene Regulation TESS and common promoter motifs

9 RAD EST clustering and assembly GUS TESS Genomic alignment and comparative Sequence analysis Identify shared TF binding sites

10 GUS Approach to Schema Think objects –Parents and children –Subclassing with views Views –Start with generic Imp table (e.g., NAFeatureImp) that contains base attributes plus generic attributes of various datatypes –Superclass view (e.g., NAFeature) just has base attributes –Subclass views (e.g., RNAFeature) have additional attributes using generic attributes Strongly-typed –Tend to avoid “name-value” pairs

11 NA Feature AA Feature AA Sequence NA Sequence DoTS Central Dogma Gene RNA Protein Gene Feature Genomic Sequence RNA Sequence Protein Sequence RNA Feature Protein Feature Gene Instance RNA Instance Protein Instance

12 Functional predictions Genomic Sequence DoTS consensus Sequences mRNA/EST Sequence Clustering and Assembly Predicted Genes Gene Index Merge Genes Gene/RNA cluster assignment SIM4 or BLAT ProteinsRNAs Gene predictions GenScan/ HMMer, PHAT GO Functions Protein Motifs BLAST Similarities PFAM, Smart, ProDom BLASTP BLASTX DoTS Schema Has Been Driven By Building Gene Indices Other computed annotation (EPCR, AssemblyAnatomyPercent, Index Key Words, SNP analysis) Annotate DoTS Manual Annotation Tasks translation framefinder

13 DoTS Gene Indices Are Based on Clustering and Assembling ESTs

14 RAD 3.0 Schema Incorporates MAGE and Experience With Microarrays LIMS for Data Analysis. Also holds SAGE.

15 Status of GUS Namespaces Core –Tables exist, Workflow documented Sres –Tables exist DoTS –Tables exist, some documentation RAD –Version 3.0 to include MAGE, experience Pretty much complete –Tables exist, mostly documented TESS –Tables ready but not created

16 Schema Development Releases on Sourceforge: –CREATE TABLE statements –Table dumps from Core::TableInfo, Core::DatabaseDocumentation –Gifs of ER diagrams Adding tables between releases –In CVS tree? –Use message forum for discussion

17 Documentation Schema Browser looks at TableInfo Plug-in –Populates DatabaseDocumentation –Input: Table\t\tDescription of table Table\tAttribute\tDescription of attribute

18 GUS Schema Browser http://www.cbil.upenn.edu/cgi- bin/GUS30/schemaBrowser.pl?db=GUS30http://www.cbil.upenn.edu/cgi- bin/GUS30/schemaBrowser.pl?db=GUS30 Points at GUS30 on CBIL development database server (erebus). –Need to move? Maintain release view? DoTS Tables: –Central dogma –Evidence/ Similarity –ProjectLink –SequenceGroupImp/ SequenceGroupExperimentImp –Plasmomap? Other tables of interest?


Download ppt "GUS Overview June 18, 2002. GUS-3.0 Supports application and data integration Uses an extensible architecture. Is object-oriented even though it uses."

Similar presentations


Ads by Google