Rationale for GUS Answer queries:

Slides:



Advertisements
Similar presentations
Introductory to database handling Endre Sebestyén.
Advertisements

Bioinformatics Platform Three-tier Architecture Object-based Relational Database implemented using Oracle Middleware implemented using Entity-Class Operations,
Protein Structure Database Introduction Database of Comparative Protein Structure Models ModBase 生資所 g 詹濠先.
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
Integrated Data Systems for Genomic Analysis Genomics and Bioinformatics for the Advancement of Clinical Sciences Thomas Jefferson University, Oct. 14,
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
Sequence Analysis MUPGRET June workshops. Today What can you do with the sequence? What can you do with the ESTs? The case of SNP and Indel.
Displaying associations, improving alignments and gene sets at UCSC Jim Kent and the UCSC Genome Bioinformatics Group.
How to access genomic information using Ensembl August 2005.
The MGED Ontology: A framework for describing functional genomics experiments SOFG Nov. 19, 2002 Chris Stoeckert, Ph.D. Dept. of Genetics & Center for.
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
Genome database & information system for Daphnia Don Gilbert, October 2002 Talk doc at
Doug Brutlag Professor Emeritus Biochemistry & Medicine (by courtesy) Genome Databases Computational Molecular Biology Biochem 218 – BioMedical Informatics.
GUS Overview June 18, GUS-3.0 Supports application and data integration Uses an extensible architecture. Is object-oriented even though it uses.
Viewing & Getting GO COST Functional Modeling Workshop April, Helsinki.
Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
GUS The Genomics Unified Schema A Platform for Genomics Databases V. Babenko, B. Brunk, J.Crabtree, S. Diskin, S. Fischer, G. Grant, Y. Kondrahkin, L.Li,
ANEXdb: An Integrated Animal ANnotation and Microarray EXpression Database Oliver Couture 1,2, Keith Callenberg 2,3#, Neeraj Koul 4, Sushain Pandit 4,
Sharing Microarray Experiment Knowledge Chips to Hits Oct. 28, 2002 Chris Stoeckert, Ph.D. Dept. of Genetics & Center for Bioinformatics University of.
GUS: A Functional Genomics Data Management System Chris Stoeckert, Ph.D. Center for Bioinformatics and Dept. of Genetics University of Pennsylvania ASM.
First GUS Workshop July 6-8, 2005 Penn Center for Bioinformatics Philadelphia, PA.
GENOME-CENTRIC DATABASES Daniel Svozil. NCBI Gene Search for DUT gene in human.
Copyright OpenHelix. No use or reproduction without express written consent1.
NCBI Vector-Parasite Genomic Related Databases Chuong Huynh NIH/NLM/NCBI Sao Paulo, Brasil July 12, 2004
Browsing the Genome Using Genome Browsers to Visualize and Mine Data.
1 Transcript modeling Brent lab. 2 Overview Of Entertainment  Gene prediction Jeltje van Baren  Improving gene prediction with tiling arrays Aaron Tenney.
Copyright OpenHelix. No use or reproduction without express written consent1.
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
Srr-1 from Streptococcus. i/v nonpolar s serine (polar uncharged) n/s/t polar uncharged s serine (polar uncharged) e glutamic acid (neg. charge) sserine.
INTRODUCTION ● Expressed sequence tags offer a low cost approach to gene discovery ● For a range of non-model organisms, ESTs represent the only sequence.
Sackler Medical School
Protein and RNA Families
Annotator Interface Sharon Diskin GUS 3.0 Workshop June 18-21, 2002.
Building WormBase database(s). SAB 2008 Wellcome Trust Sanger Insitute Cold Spring Harbor Laboratory California Institute of Technology ● RNAi ● Microarray.
GUS 3.0: Implementation and Dependencies June 19, 2002 Jonathan Crabtree
GUS 3.0: Web Sites and Tools June 20, 2002 Jonathan Crabtree
VectorBase Vectorbase probe mapping. VectorBase Automatic Annotation browser Array data CHADO Manual Annotation XML vectorbase Automatic Annotation.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
EBI is an Outstation of the European Molecular Biology Laboratory. Gautier Koscielny VectorBase Meeting 08 Feburary 2012, EBI VectorBase Text Search Engine.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
A guided tour of Ensembl This quick tour will give you an outline view of what Ensembl is all about. You will learn: –Why we need Ensembl –What is in the.
UCSC Genome Browser Zeevik Melamed & Dror Hollander Gil Ast Lab Sackler Medical School.
1 of 28 Evaluating Genes and Transcripts (“Genebuild”)
What is BLAST? Basic BLAST search What is BLAST?
Work Presentation Novel RNA genes in A. thaliana Gaurav Moghe Oct, 2008-Nov, 2008.
The Bovine Genome Database Abstract The Bovine Genome Database (BGD, facilitates the integration of bovine genomic data. BGD is.
GUS We have created the Genomic Unified Schema (GUS), a relational database that warehouses and integrates biological sequence, sequence annotation, and.
Bioinformatics Research Group
Basics of BLAST Basic BLAST Search - What is BLAST?
Genome Sequence Annotation Server
EPConDB: Endocrine Pancreas Consortium Database
University of Pittsburgh
Department of Genetics • Stanford University School of Medicine
Genome Annotation Continued
Large Scale Annotation of Genomic Datasets with Genephony
Identification and Characterization of pre-miRNA Candidates in the C
Ensembl Genome Repository.
EXTENDING GENE ANNOTATION WITH GENE EXPRESSION
Current and Future Directions
Information Management Infrastructure for the Systematic Annotation of Vertebrate Genomes V Babenko (1), B Brunk (1), J Crabtree (1), S Diskin (1), Y Kondrahkin.
RAD (RNA Abundance Database)
The Computational Biology and Informatics Laboratory
From EpoDB to EPConDB: Adventures in Gene Expression Databases
Integrating Genomic Databases
Leveraging EST Sequencing, Micro Array Experiments and Database Integration for Gene Expression Analyses The Computational Biology and Informatics Laboratory.
Functional Genomics Consortium: NIDDK (Kaestner) and (Permutt)
A web-based platform for structural and functional annotation of model and non-model organisms Jodi Humann, Taein Lee, Stephen Ficklin,
Collaborative RO1 with NCBO
Aligning Transcribed Sequences to the Human and Mouse Genomes
Annotator Interface GUS 3.0 Workshop June 18-21, 2002.
Presentation transcript:

Rationale for GUS Answer queries: ‘Identify all the “spots” on an array that represent genes on chromosome 1 that are predicted to be transcription factors’ ‘Identify tissues that “express” at least half the components of the interleukin-6 pathway’ ‘Identify image clones that represent genes for which there is evidence for expression in the pancreas’ Facilitate gene expression and pathways/networks analyses. Datamining of diverse genomics data Gene index (Gene centric view of biological data) Pragmatic: Combine CBIL databases and thus unify effort.

GUS: Genomics Unified Schema free text Ontologies Genes, gene models STSs, repeats, etc Cross-species analysis Genomic Sequence GO Species Tissue Dev. Stage RAD RNA Abundance DB Characterize transcripts RH mapping Library analysis Cross-species analysis DOTS Transcribed Sequence Special Features Transcript Expression Arrays SAGE Conditions Ownership Protection Algorithm Evidence Similarity Versioning under development Domains Function Structure Cross-species analysis Protein Sequence Pathways Networks Representation Reconstruction

GUS Data Public sequence data Public mapping data Annotation Genbank, SwissProt, Prodom, PFAM, UCSC Golden Path, TransFac.. Public mapping data Radiation hybrid Annotation DoTS (Assembled transcribed sequences) Gene predictions (via Plasmodium consortium collaborations) Functional predictions (GOFunction) Some comparative genomics (mouse/human) Transcription Factor binding site and promoter analyses data sets (some proprietary) from collaborators (primarily in RAD)

Light weight PERL object layer GUS system External Datasources Data Integration Computational Annotation Validation Light weight PERL object layer Data Warehouse Annotators interface Browser & bioWidgets Java Servlet (views)

Light weight PERL object layer GUS system External Datasources Data Integration Computational Annotation Validation Light weight PERL object layer Data Warehouse Annotators interface Browser & bioWidgets Java Servlet (views)

High Level Flow Diagram of GUS Annotation Genomic Sequence mRNA/EST Sequence Clustering and Assembly ORNL Gene predictions GRAIL/GenScan BLAST/SIM4 Predicted Genes DOTS consensus Sequences Merge Genes Gene/RNA cluster assignment Gene Index Gene families, Orthologs Assign Gene Name, Manual Annotation.. Predicted RNAs Predicted Proteins framefinder / DIANA BLASTX PFAM,SignalP, TMPred, ProDom, etc BLASTP Other Annotation (EPCR, AssemblyAnatomyPercent, Index Key Words, SNP analysis) BLAST Similarities Protein Features/Motifs Algorithms for functional predictions GO Functions

“Unassembled” clusters (consensus sequences and new) Incremental Updates of DoTS Sequences Incoming Sequences (EST/mRNA) Make Quality (remove vector, polyA, NNNs) “Quality” sequences AssemblySequence Block with RepeatMasker Blocked sequences Assign to DOTS consensus sequences (blastn at 40 bp length, 92% identity) Cluster incoming sequences that are not covered by consensus sequence. DOTS Consensus Sequences “Unassembled” clusters Assemble DOTS consensus sequences and incoming sequences with CAP4 - initially reassemble CAP4 assemblies (consensus sequences and new) Calculate new DOTS consensus sequence using weighted consensus sequence(s) and new CAP4 assembly. New Consensus sequences Update GUS database

Light weight PERL object layer GUS system External Datasources Data Integration Computational Annotation Validation Light weight PERL object layer Data Warehouse Annotators interface Browser & bioWidgets Java Servlet (views)

Different Views of GUS Focused annotation of specific organisms and biological systems: organisms biological systems Endocrine pancreas Human Mouse CNS GUS GUS Plasmodium falciparum Hematopoiesis *not drawn to scale*

WWW.ALLGENES.ORG

Summary of AllGenes.org content

Boolean Query - Finding Surface Antigens Can combine any of the ‘primitive’ queries on genes page. Query forms are ‘bookmarkable’.

StemCellDB-GUS website

Light weight PERL object layer GUS system External Datasources Data Integration Computational Annotation Validation Light weight PERL object layer Data Warehouse Annotators interface Browser & bioWidgets Java Servlet (views)

Assembly Validation Alignment to Genomic Sequence via Blast/sim4. preliminary data look good Assembly consistency

GUS: Genomics Unified Schema Ontologies Genes, gene models STSs, repeats, etc Cross-species analysis free Genomic Sequence text GO Species Tissue Dev. Stage RAD RNA Abundance DB Characterize transcripts RH mapping Library analysis Cross-species analysis DOTS Transcribed Sequence Special Features Transcript Expression Arrays SAGE Conditions Ownership Protection Algorithm Evidence Similarity Versioning Domains Function Structure Cross-species analysis Protein Sequence Pathways Networks Representation Reconstruction under development

RAD GUS TESS EST clustering and assembly Identify shared Genomic alignment and comparative Sequence analysis Identify shared TF binding sites

Acknowledgements CBIL: Chris Overton Brian Brunk Jonathan Crabtree Sharon Diskin Steve Fischer (Doubletwist) Mark Gibson (GeneLogic) Greg Grant Elisabetta Manduchi Joan Mazzarelli Debbie Pinney Angel Pizarro Jonathan Schug Jian Wang (Celera) Chris Stoeckert PlasmoDB collaborators: David Roos Martin Fraunholz Jesse Kissinger Jules Milgram Dan Lawson (Sanger) Ross Koppel (Monash U.) Malaria Genome Consortium Allgenes.org collaborators: Ed Uberbacher, ORNL Doug Hyatt, ORNL EPConDB collaborators: Klaus Kaestner Marie Scearce John Doug Melton, Harvard Alan Permutt, Wash. U Comparative Sequence Analysis Collaborators: Maja Bucan Tim Wiltshire A. Lengeling, L. Tarantino, S. Kanes Whitehead/MIT Center for Genome Research

StemCellDB Architecture All sequences get entered into flat file db first with efficient mechanism for filtering to public vs private Ones marked public get incoorporated into GUS and at regular intervals also submitted to dbEST (GenBank) Private sequences and ones not dealt with will stay in flat file db (current system). Public sequences will be removed from flat file db to decrease overhead and query times. StemCell static pages should be maintained by Princeton Automated annotation applied by CBIL and manual annotation in StemCell flat files moved over to GUS by semi-automatic methods CBIL annotators may prioritize StemCellDB entries for annotation