RAD (RNA Abundance Database)

Slides:



Advertisements
Similar presentations
The MGED Ontology: Providing Descriptors for Microarray Data Trish Whetzel Department of Genetics Center for Bioinformatics University of Pennsylvania.
Advertisements

BiGCaT Bioinformatics Hunting strategy of the bigcat.
Abstract BarleyBase ( is a USDA-funded public repository for plant microarray data. BarleyBase houses raw and normalized expression.
The MGED Ontology Is An Experimental Ontology Bio-Ontologies Aug 8, 2002 Chris Stoeckert, Helen Parkinson and the MGED Ontology Working Group.
Bioinformatics Needs for the post-genomic era Dr. Erik Bongcam-Rudloff The Linnaeus Centre for Bioinformatics.
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
MGED Ontology: An Ontology of Biomaterial Descriptions for Microarrays Microarray Data Analysis and Management: Bio-ontologies for Microarrays EMBL-EBI,
Bioinformatics: a Multidisciplinary Challenge Ron Y. Pinter Dept. of Computer Science Technion March 12, 2003.
Kate Milova MolGen retreat March 24, Microarray experiments: Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Use of Ontologies in the Life Sciences: BioPax Graciela Gonzalez, PhD (some slides adapted from presentations available at
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
The MGED Ontology: A framework for describing functional genomics experiments SOFG Nov. 19, 2002 Chris Stoeckert, Ph.D. Dept. of Genetics & Center for.
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Genome database & information system for Daphnia Don Gilbert, October 2002 Talk doc at
GUS Overview June 18, GUS-3.0 Supports application and data integration Uses an extensible architecture. Is object-oriented even though it uses.
Microrray Data Standardisation Microarray Gene Expression Database group -- MGED December, 2000.
INTRODUCTION GOAL: to provide novel types of interaction between classification systems and MIAME-compliant databases We present a prototype module aimed.
Bioinformatics and medicine: Are we meeting the challenge?
Sharing Microarray Experiment Knowledge Chips to Hits Oct. 28, 2002 Chris Stoeckert, Ph.D. Dept. of Genetics & Center for Bioinformatics University of.
The European Bioinformatics Institute MGED ontology for consistent annotation of microarray experiments Manchester Bioinformatics Week Ontologies Workshop1.
Networks and Interactions Boo Virk v1.0.
Abstract BarleyBase is a USDA-funded public repository for plant microarray data. BarleyBase houses raw and normalized expression data from the 22K Affymetrix.
1 MIAME The MIAME website: © 2002 Norman Morrison for Manchester Bioinformatics.
GENOME-CENTRIC DATABASES Daniel Svozil. NCBI Gene Search for DUT gene in human.
What is an Ontology? An ontology is a specification of a conceptualization that is designed for reuse across multiple applications and implementations.
1 Transcript modeling Brent lab. 2 Overview Of Entertainment  Gene prediction Jeltje van Baren  Improving gene prediction with tiling arrays Aaron Tenney.
The Functional Genomics Experiment Object Model (FuGE) Andrew Jones, School of Computer Science, University of Manchester MGED Society.
RADical microarray data: standards, databases, and analysis Chris Stoeckert, Ph.D. University of Pennsylvania Yale Microarray Data Analysis Workshop December.
Annotator Interface Sharon Diskin GUS 3.0 Workshop June 18-21, 2002.
Alvis Brazma, Johan Rung, Ugis Sarkans, Thomas Schlitt, Jaak Vilo European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge,
Ontologies Working Group Agenda MGED3 1.Goals for working group. 2.Primer on ontologies 3.Working group progress 4.Example sample descriptions from different.
ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.
Biotechnology and Bioinformatics: Bioinformatics Essential Idea: Bioinformatics is the use of computers to analyze sequence data in biological research.
High throughput biology data management and data intensive computing drivers George Michaels.
 What is MSA (Multiple Sequence Alignment)? What is it good for? How do I use it?  Software and algorithms The programs How they work? Which to use?
ArrayExpress Ugis Sarkans EMBL - EBI
Microarray Technology and Data Analysis Roy Williams PhD Sanford | Burnham Medical Research Institute.
Expression Data Integration Microarray Gene Expression Database Meeting Sunday 14th November 1999.
David Amar, Tom Hait, and Ron Shamir
GUS We have created the Genomic Unified Schema (GUS), a relational database that warehouses and integrates biological sequence, sequence annotation, and.
Networks and Interactions
Gene Expression Analysis
Microarray - Leukemia vs. normal GeneChip System.
Statistical Applications in Biology and Genetics
EPConDB: Endocrine Pancreas Consortium Database
Functional Annotation of the Horse Genome
Gene Expression Omnibus (GEO)
Gene Expression Analysis and Proteins
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
What is an Ontology An ontology is a set of terms, relationships and definitions that capture the knowledge of a certain domain. (common ontology ≠ common.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
EXTENDING GENE ANNOTATION WITH GENE EXPRESSION
Fouzia Moussouni, Anita Burgun, Franck Le Duff,
Rationale for GUS Answer queries:
Current and Future Directions
Information Management Infrastructure for the Systematic Annotation of Vertebrate Genomes V Babenko (1), B Brunk (1), J Crabtree (1), S Diskin (1), Y Kondrahkin.
MGED Ontology: An Ontology of Biomaterial Descriptions for Microarrays
The Computational Biology and Informatics Laboratory
From EpoDB to EPConDB: Adventures in Gene Expression Databases
Integrating Genomic Databases
Leveraging EST Sequencing, Micro Array Experiments and Database Integration for Gene Expression Analyses The Computational Biology and Informatics Laboratory.
Functional Genomics Consortium: NIDDK (Kaestner) and (Permutt)
Gene Expression Analysis
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Annotator Interface GUS 3.0 Workshop June 18-21, 2002.
Schematic representation of a transcriptomic evaluation approach.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Data Type 1: Microarrays
Presentation transcript:

RAD (RNA Abundance Database) Stoeckert, C.J.Jr., Pizarro, A., Manduchi, E., Gibson, M., Brunk, B., Crabtree, J., Schug, J., Shen-Orr, S., Overton, G.C. A relational schema for array and non-array based gene expression data. Bioinformatics. In press. 2001 The Computational Biology and Informatics Laboratory

Issues Accurate experiment description Data preprocessing issues clean-up calibration normalization other transformations Selecting, interpreting, and comparing experiments appropriately requires knowledge of how the experiments were performed and the samples that were used in sufficient detail to assess their quality and their degree of similarity. The most appropriate criteria for spot selection, normalization, etc., depend on the experiments under study and on the questions investigated.

www.mged.org

RAD Multiple labs Multiple biological systems Multiple platforms Multiple image quantification software RAD Expressed genes Differentially-expressed genes Class discovery Class prediction Gene networks

RAD versatility Platforms 2-channel microarrays Filter arrays Affymetrix SAGE Image quantification software ScanAlyze GEMTools BioImage …

Views A “view” renames attributes of a low-level generic table for specific implementations. Common fields are specified as the same attributes for all implementations and implementation-specific fields rename generic attributes of the appropriate data type. These views are not the same as materialized views that provide precalculated values to improve database query performance.

A SpotResult View

RAD strengths sample description use of ontologies information about array elements links to GUS (http://www.allgenes.org) storing of raw and processed data captures all available information; history and parameter tracking storing of public and proprietary data user-group-other read/write permissions The schema is compliant with the minimum annotations recommended by MGED. Controlled vocabularies: Taxonomy: the Taxon table uses the NCBI taxonomy obtained from GSDB in relational form. Anatomy: the Anatomy table is modified from the stage 28 (adult) mouse anatomy from the Mouse Gene Expression Database (MGD) at Jackson Laboratory to include human terms and expand description of systems such as hematopoiesis using medical textbooks as sources (e.g. the 37th edition of Gray’s Anatomy) as well as the expertise of the biologists at CBIL and in collaborating laboratories. The Anatomy hierarchy includes “cell lines” as a substructure for many different types of cells to distinguish immortalized from primary cells . Disease: the Disease table uses the KEGG representation of the CDC ICD-9 classification. The KEGG representation has associated MIM identifiers with many of the ICD-9 terms.

Information to be captured Figure from: David J. Duggan et al. (1999) Expression Profiling using cDNA microarrays. Nature Genetics 21: 10-14

Categories of tables  Experiment Raw Data Platform Algorithm Metadata Processed Data

Experiment Tables B A Figure from: David J. Duggan et al. (1999) Expression Profiling using cDNA microarrays. Nature Genetics 21: 10-14

Experiment Tables (A) Label Sample Treatment Disease Devel. Stage Hybridization Conditions Label Sample Treatment Disease Devel. Stage ExperimentSample Taxon Anatomy RelExperiments Exp.ControlGenes ControlGenes Experiment ExpGroups Groups

Experiment Tables (B) Views Experiment ExpImageImp ExpResultImp PhosphorImager, ScanAlyzeImage, GEMImage, StanfordScanner, AffymetrixScanner, SAGESequence, … ExpImageImp ExpResultImp BioImage, ScanAlyzeAnalysis, GEMResult, StanfordAnalysis, AffymetrixAnalysis, SAGEAnalysis, …

Platform Tables Figure from: David J. Duggan et al. (1999) Expression Profiling using cDNA microarrays. Nature Genetics 21: 10-14

Platform Tables SpotFamilyImp SpotImp Array

SpotFamily views (comparisons) SAGESpotFamily spot_family_id tag ext_db_id cluster_id … GEMSpotFamily spot_family_id ext_db_id source_id plate_id plate_row plate_column … AffymetrixSpotFamily spot_family_id ext_db_id accession … Each is a view of SpotFamily table Link to data with spot_family_id Integrate through gene index (http://www.allgenes.org) GUS EST assemblies mRNA

Raw Data Tables SpotImp SpotResultImp SpotFamilyImp ExpResultImp SpotFamilyResult

Processed Data/Algorithm Tables SpotResultImp raw spot value SpotFamilyResult summary of raw values Algorithm type of program used AlgImplementation actual program used AlgoInvocation usage of the algorithm AlgParamKeyType parameter data type AlgParamKey parameter description AlgParam value used SpotResAnalysis processed spot result SpotFamResAnalysis processed spot fam res AnalysisType type of processing

Query RAD by Sample or by Experiment Access by Experiment groups Sample info ontologies Image info

What genes are expressed in the top 20% of normal B-lymphocytes and mapped to Chromosome 19?

The allgenes (GUS) index provides annotation of array elements in RAD EST clustering and assembly Different representations of the same RNA are identified. EST/mRNA annotations are combined. Consensus sequence is annotated (e.g., gene function).

GUS: Genomics Unified Schema Ontologies GO Species Tissue Dev. Stage Genes, gene models STSs, repeats, etc Cross-species analysis Genomic Sequence RAD RNA Abundance DB Characterize transcripts RH mapping Library analysis Cross-species analysis DOTS Transcribed Sequence Special Features Transcript Expression Arrays SAGE Conditions Ownership Protection Algorithm Evidence Similarity Versioning under development Domains Function Structure Cross-species analysis Protein Sequence Pathways Networks Representation Reconstruction

Different Views of RAD Focused annotation of specific organisms and biological systems: organisms biological systems Endocrine pancreas Human Mouse CNS RAD RAD Plasmodium falciparum Hematopoiesis *not drawn to scale*

WWW.CBIL.UPENN.EDU/EPCONDB

Continuing Work and Future Issues Analysis perspective: ontologies data preprocessing cross-platform comparisons utilize other types of high-throughput data (e.g. protein expression) DB perspective: capture conclusions from analyses in a structured way integrate other types of high-throughput data

RAD: www.cbil.upenn.edu/RAD2 Elisabetta Manduchi Angel Pizarro Shannon McWeeney Allgenes: www.allgenes.org Brian Brunk Ed Uberbacher, ORNL Jonathan Crabtree Doug Hyatt. ORNL Sharon Diskin Joan Mazzarelli Jonathan Schug EPConDB: www.cbil.upenn.edu/EPConDB Greg Grant Klaus Kaestner, Penn Phillip Le Marie Scearce, Penn Debbie Pinney Doug Melton, Harvard Alan Permutt, Wash U MGED: www.mged.org