1 Integrated Microarray Database System NHLBI-MGH-PGA.

Slides:



Advertisements
Similar presentations
Garnet.arabidopsis.org.uk Beatrice Schildknecht NASC Data Availability and NASC tools NASC Nottingham Arabidopsis Stock Centre
Advertisements

1 IMDS Tutorial Integrated Microarray Database System.
Application of available statistical tools Development of specific, more appropriate statistical tools for use with microarrays Functional annotation of.
The Rice Functional Genomics Program of China cDNA microarray database (RIFGP-CDMD) consists of complete datasets, including the probe sequences, microarray.
Abstract BarleyBase ( is a USDA-funded public repository for plant microarray data. BarleyBase houses raw and normalized expression.
How to Work With Affymetrix .Cel Files in geWorkbench
Microarray Simultaneously determining the abundance of multiple(100s-10,000s) transcripts.
Microarray Data Analysis Using BASE Danny Park MGH Microarray Core March 15, 2004.
Microarray technology and analysis of gene expression data Hillevi Lindroos.
The MGED Ontology Is An Experimental Ontology Bio-Ontologies Aug 8, 2002 Chris Stoeckert, Helen Parkinson and the MGED Ontology Working Group.
Kate Milova MolGen retreat March 24, Microarray experiments: Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Educational Initiatives and Data Analysis in the Microarray Core Danny Park Bioinformatics (Sidney St) Lipid Metabolism Unit (Freeman)
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
DNA Arrays …DNA systematically arrayed at high density, –virtual genomes for expression studies, RNA hybridization to DNA for expression studies, –comparative.
Microarray Analysis Software at NIH. BRB ArrayTools Visualization and Statistical analysis of gene expression data Features –Excel Add-in –Flexible Data.
Attribute databases. GIS Definition Diagram Output Query Results.
Using ArrayExpress. ArrayExpress is an international public repository for well-annotated microarray data, including gene expression, comparative genomic.
Midterm project Course: Statistics in Bioinformatics Date: 指導教授 : 陳光琦 學生 : 吳昱賢.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
ACAT 2008 Erice, Sicily WebDat: Bridging the Gap between Unstructured and Structured Data Jerzy M. Nogiec, Kelley Trombly-Freytag, Ruben Carcagno Fermilab,
Image Quantitation in Microarray Analysis More tomorrow...
ArrayExpress and Expression Atlas: Mining Functional Genomics data Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
Affymetrix vs. glass slide based arrays
Copyright 2000, Media Cybernetics, L.P. Array-Pro ® Analyzer Software.
Microrray Data Standardisation Microarray Gene Expression Database group -- MGED December, 2000.
Support for MAGE-TAB in caArray 2.0 Overview and feedback MAGE-TAB Workshop January 24, 2008.
The following slides have been adapted from to be presented at the Follow-up course on Microarray Data Analysis.
Gene Expression Omnibus (GEO)
Test1 April 2004 Microarray Data Management Jianwei (Jerry) Li.
Clustering of DNA Microarray Data Michael Slifker CIS 526.
Introduction to DNA Microarray Technology Steen Knudsen Uma Chandran.
Copyright OpenHelix. No use or reproduction without express written consent1.
Rahul Raman, Ram Sasisekharan Bioinformatics Core Massachusetts Institute of Technology Glue Grants Bioinformatics Meeting April 22-23, 2004 San Diego,
The European Bioinformatics Institute MGED ontology for consistent annotation of microarray experiments Manchester Bioinformatics Week Ontologies Workshop1.
Abstract BarleyBase is a USDA-funded public repository for plant microarray data. BarleyBase houses raw and normalized expression data from the 22K Affymetrix.
ITS-VIP SPRING 2012 FINAL PRESENTATION DATA MINING GROUP PHP?HTML INTERFACE Mide Ajayi Nakul Dureja Data Miners Rakesh Kumar David Fleischhauer.
Microarray - Leukemia vs. normal GeneChip System.
Copyright OpenHelix. No use or reproduction without express written consent1.
1 maxdLoad The maxd website: © 2002 Norman Morrison for Manchester Bioinformatics.
Scenario 6 Distinguishing different types of leukemia to target treatment.
3/24/2005 TIGP 1 Bioinformatics for Microarray Studies at IBS Pei-Ing Hwang, Ph.D. Mar. 24, 2005.
MIAMExpress and the development of annotation ontologies for gene expression experiments Ele Holloway Microarray Informatics European Bioinformatics Institute.
Microarrays and Gene Expression Analysis. 2 Gene Expression Data Microarray experiments Applications Data analysis Gene Expression Databases.
K Phone: Web: A Software Package for the Design and Analysis of Microbial Functional.
1 FINAL PROJECT- Key dates –last day to decided on a project * 11-10/1- Presenting a proposed project in small groups A very short presentation (Max.
Using Cvt2Mae to Convert GenePix Array Data for MAExplorer Using Cvt2Mae to Convert GenePix Array Data for MAExplorer
PROGNOCHIP-BASE, FORTH-ICS 1 PrognoChip-BASE: An Information System for the Management of Spotted DNA MicroArray Experiments Extension of BASE v
Lao H. Saal 1,3,*, Carl Troein 2,*, Johan Vallon-Christersson 1,*, Sofia Gruvberger 1, Björn Samuelsson 2, Åke Borg 1 and Carsten.
Alvis Brazma, Johan Rung, Ugis Sarkans, Thomas Schlitt, Jaak Vilo European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge,
Analysis of GEO datasets using GEO2R Parthav Jailwala CCR Collaborative Bioinformatics Resource CCR/NCI/NIH.
Gene Expression Omnibus (GEO)
1 Outline Standardization - necessary components –what information should be exchanged –how the information should be exchanged –common terms (ontologies)
ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.
Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.
Introduction and Applications of Microarray Databases Chen-hsiung Chan Department of Computer Science and Information Engineering National Taiwan University.
SAGExplore web server tutorial. The SAGExplore server has three different modules …
AHM04: Sep 2004 Nottingham CCLRC e-Science Centre eMinerals: Environment from the Molecular Level Managing simulation data Lisa Blanshard e- Science Data.
Presentation on Database management Submitted To: Prof: Rutvi Sarang Submitted By: Dharmishtha A. Baria Roll:No:1(sem-3)
ArrayExpress Ugis Sarkans EMBL - EBI
Expression Data Integration Microarray Gene Expression Database Meeting Sunday 14th November 1999.
MESA A Simple Microarray Data Management Server. General MESA is a prototype web-based database solution for the massive amounts of initial data generated.
Using Cvt2Mae to Convert User-Defined Array Data for MAExplorer Using Cvt2Mae to Convert User-Defined Array Data for MAExplorer
GEO (Gene Expression Omnibus) Deepak Sambhara Georgia Institute of Technology 21 June, 2006.
GO! with Microsoft Office 2016
Transcriptomics on Bio-Linux
Using ArrayExpress.
GO! with Microsoft Access 2016
Microarray Technology and Applications
Gene Expression Omnibus (GEO)
Presentation transcript:

1 Integrated Microarray Database System NHLBI-MGH-PGA

2 Desired Features for Database Ability to accept data from MGH Core Facility and Core Facilities of remote collaborators Ability to store both spotted array data and Affymetrix data Web-accessibility Flexibility to accommodate various types of experiments and the descriptions of those experiments Tools for analyzing data and exporting data as tab-delimited files and XML (GEML)

3 Database Users MGH researchers (able to submit data) Collaborators (able to submit data through MGH collaborator) Scientific community (able to access published data through the web interface)

4 Types of Tools for Database Tools for visualization of the array image (TIFF or proxy GIF file) as a clickable image map –Browse individual spots –Evaluate the placement of the grid used during data acquisition –Change the flag status of any of the spots Normalization tools Clustering analysis tools

5 Experimental design General information about a series of experiments with the goal of answering a biological question Biological samples Target preparation Hybridization Slide elements Slide manufacturing Data acquisition Raw data Partially password protected data, multiple scan per slide Processed data Expression data A fixed expression data format, can be published on the web Final analyzed data Data format that will answer the question asked in the experimental design and be published in a scientific journal Tools Parameters retrieved and presented with data Tools Filtering, Statistical tools, Hierarchial clustering, SOMs, Pathway analysis, data mining software, … Filtering, Normalization, Averaging, Extrapolation (Maslint), Statistical tools, Quality assessment, … Parameters stored in DB Each box contains a set of tables Data stored in DB Data to be manipulated by tools to different levels (not all data will end in a publication). Data has to be viewed and monitored in the process to determine the necessity to continue the analysis and filter out data points. Experimental parameters and external web resources may need to be called upon in the process. Links to external web resources and other software packages, data mining tools, … ”Eric’s lines”

6 Background: Related Software and Other Implementations Stanford Microarray Database Express DB Array Express/Expression Profiler MaxD

7 Stanford Microarray Database Strengths –Open source system –Supports spotted microarrays –Sophisticated data normalization tools Weaknesses –Affymetrix data format not supported –RDBMS is Oracle, with Oracle-specific functions in the source code

8 Express DB Strengths –Supports both spotted microarrays and Affymetrix data Weaknesses –RDBMS is Sybase 11 –Used as a demonstration system with Saccharomyces, but not yet adapted for other organisms

9 Array Express/Expression Profiler Strengths –Supports both spotted microarrays and Affymetrix data –Implements the MIAME data specification Weaknesses –No storage of raw luminosity data –RDBMS is Oracle –More tables would need to be added to contain data pertaining to sample preparation, hybridization and other experimental details

10 MaxD Strengths –Implementation of Array Express table structure suitable for SQL92-complaint databases, thus supporting MySQL –Java based software with source code available for download on the web –Strengths of Array Express Weaknesses –Weaknesses of Array Express –Not open source

11 Formats of Data Input Automatically entered when spotted arrays are scanned by the core facility –Array ID, chip layout, spot intensities, software used by the Arrayer Directly entered by users –Experiment names, hybridization conditions, procedures Imported from flat files –Spot layout of chips, normalization intensities generated by third party software packages (Affymetrix)

12 Critical Data to Be Stored Description of each experiment Information about the submitter Description of the hybridization Description of the array design Description of experiment info related to Affymetrix chips or the core Axon Arrayer Description of the sample and target

13 Critical Data to Be Stored: Experiment Unique experiment ID Human-readable experiment name Classification of experiment type Free text description of experiment Date of entry References to publications Submitter ID

14 Critical Data to Be Stored: Submitter Submitter ID Submitter’s name Institution Laboratory Principal Investigator Grant address Postal address Phone number

15 Critical Data to Be Stored: Hybridization Hybridization ID Reference to the associated experiment and arrays Free text description of a particular hybridization Hybridization protocol Ordinal number for a particular hybridization if the hybridization is part of a sequential set of hybridizations

16 Critical Data to Be Stored: Array Design Array Design ID Human-readable name of the chip design Indication of the type of probe used (i.e., spotted vs. synthesized, cDNA vs. oligos) Size of array (number of rows and columns and total spots) Kind of chip used (e.g., glass, nylon) Type of Array (Affymetrix or Axon) Supplier who produced the slide (company, individual) Protocol to create the chip or provider information if purchased

17 Critical Data to Be Stored: Affymetrix Name of chip Sample applied to chip Probe used with chip Experimental information found in Affymetrix.EXP files

18 Critical Data to Be Stored: Axon Arrayer Description of information from core Axon Arrayer that is also stored in the core microarray database

19 Critical Data to Be Stored: Sample Description of the sample used to make the target that is applied to the chip Description of the source of the sample (which may include the following information as applicable to a given sample: ID, genus, species, strain, ecotype, organism, organ, tissue, cell type, cell line, cell culture, developmental stage, sex, genetic variation)

20 Critical Data to Be Stored: Target Description extract used to make the target Description of the extraction protocol Description of the labeling method (if any)

21 Database Schema for Integrated Microarray Database System

22 I. Submitter Information: Summitter Name: (blank text field to type in name of person who is submitting the experiment (not the data entry person, if different) Organization: MGH, other Laboratory: Ausubel, Freeman, Pier, Seed, other *Grant: PGA, other *Grant Number: PI of Grant: Ausubel, Freeman, Pier, Seed, other Address: Lipid Metabolism Unit, Massachusetts General Hospital, 32 Fruit Street, GRJ 1328, Boston, MA (blank text field) Phone: (xxx) xxx-xxxx (blank text field) Experiment name: name of experiment (blank text field) Abstract: one line description of experiment (blank text field)

23 II. Taxonomy: Organism: Mouse (pull-down choices) Genus: Mus (pull-down choices) Species: musculus (pull-down choices) Genotype: wild type, mutant, transgenic (pull-down choices) Strain: Organ/Tissue: lungs, liver (text field) Cell type: text field Cell line: text field Cell culture: text field Developmental Stage: text field Sex: Male, Female, hermaphrodite Genetic Variation: link to supplemental database if needed Free Text: Mutant Name: tlr4 (free text) *Name of mutated gene: toll-like receptor 4 (free text) Gene abbreviation: tlr4 (free text) Allele name: free text Dominance: dominant, recessive, semi-dominant, other (pull-down choices) Mutant type: gain of function, loss of function, null, overexpressor, suppressor, unknown, other (pull-down choices) Description: free text

24 III. Sample Treatment: Sample Description: free text *Is this experiment a time course? Yes or No (radio buttons) Hours after treatment: 2, 4, other (free text) Temperature: Type of Treatment: pathogen, hormone, chemical, serum, growth-factor, other (pull-down choices) Compound: name of chemical, hormone, pathogen, etc. (free text) *Dose: free text *Concentration: free text Treatment Protocol: free text RNA extraction method: free text Amount of RNA obtained: free text Hybridization: free text Number of Hybridization: (if more than one hybridization per chip) free text of a number Hybridization protocol: free text Labeling method for target: free text Labeling protocol: free text Amount of sample used to make target: free text Supplemental Database: (pull-down choice) plant

25 Example Queries 1) List all experiments performed by a single user. 2) Retrieve all experiments entered into the database since October 31, ) Retrieve normalized data for two arrays in an experiment and graph the luminosity values on a log-log scatter plot.

26 Example Queries 4) List all experiments from a particular lab, or operator. 5) List all experiments using a particular protocol. 6) List all experiments performed on an extract from a particular tissue type.

27 Example Queries 7) Which genes are expressed in response to pathogen A, but not pathogen B in a given host? 8) Compare the results of multiple treatments and produce a Venn diagram showing sets of genes induced or repressed by these different treatments or pathogens. 9) Calculate distance matrices to analyze the extent of differences between treatments, time points or mutants.

28 Tools Cluster (Stanford): clustering on large datasets (hierarchical, SOMs, kmeans, PCA) TreeView (Stanford): view cluster output EPCLUST (EBI): hierarchical clustering of gene expression datasets

29 IMDS Development Team Harry Bjorkbacka (End User/Feature Consultant) Cheri Chen (End User) Lance Davidow (Developer/User) Julia Dewdney (End User/Feature Consultant) Chen Liu (Developer) Christina Powell (Developer/End User) Sean Quinlan (Database/Program Developer) Jonathan M. Urbach (Program Developer) Eric VanHelene (Manager)