Introduction and Applications of Microarray Databases Chen-hsiung Chan Department of Computer Science and Information Engineering National Taiwan University.

Slides:



Advertisements
Similar presentations
The MGED Ontology: Providing Descriptors for Microarray Data Trish Whetzel Department of Genetics Center for Bioinformatics University of Pennsylvania.
Advertisements

Sandra Orchard EMBL-EBI Molecular Interactions
Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
Oncomine Database Lauren Smalls-Mantey Georgia Institute of Technology June 19, 2006 Note: This presentation contains animation.
Provenance in a Collaborative Bio-database RAASWiki Donald Dunbar & Jon Manning Queen’s Medical Research Institute University of Edinburgh Use Cases for.
The National Center for Biotechnology Information (NCBI) a primary resource for molecular biology information Database Resources.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
The Rice Functional Genomics Program of China cDNA microarray database (RIFGP-CDMD) consists of complete datasets, including the probe sequences, microarray.
Abstract BarleyBase ( is a USDA-funded public repository for plant microarray data. BarleyBase houses raw and normalized expression.
Basic Genomic Characteristic  AIM: to collect as much general information as possible about your gene: Nucleotide sequence Databases ○ NCBI GenBank ○
BioPathways SIG, July Networks in Biology Molecular interaction and similarity networks are vital for understanding gene function.
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
NCBI resources III: GEO and expression data analysis Yanbin Yin Fall
Using ArrayExpress. ArrayExpress is an international public repository for well-annotated microarray data, including gene expression, comparative genomic.
GCB/CIS 535 Microarray Topics John Tobias November 15 th, 2004.
Midterm project Course: Statistics in Bioinformatics Date: 指導教授 : 陳光琦 學生 : 吳昱賢.
MARS: Microarray analysis, retrieval, and storage system Albert F. Cervantes.
Introduction The goal of translational bioinformatics is to enable the transformation of increasingly voluminous genomic and biological data into diagnostics.
1 ArrayExpress and MAGE Jamboree II Ugis Sarkans, EBI.
ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
Gene expression services: ArrayExpress and the Gene Expression Atlas Contact: Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
ArrayExpress and Expression Atlas: Mining Functional Genomics data Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
Microrray Data Standardisation Microarray Gene Expression Database group -- MGED December, 2000.
Ch10. Intermolecular Interactions and Biological Pathways
Data Curation and Management activities within the UCT Computational Biology Group Dr Nicky Mulder.
Gene Expression Omnibus (GEO)
EBI is an Outstation of the European Molecular Biology Laboratory. EBI Bioinformatics Roadshow ILRI/BecA Nairobi Campus 2 nd - 3 rd March 2011.
ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
Copyright OpenHelix. No use or reproduction without express written consent1.
ArrayExpress and Expression Atlas: Mining Functional Genomics data Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
Copyright OpenHelix. No use or reproduction without express written consent1.
Abstract BarleyBase is a USDA-funded public repository for plant microarray data. BarleyBase houses raw and normalized expression data from the 22K Affymetrix.
Basic features for portal users. Agenda - Basic features Overview –features and navigation Browsing data –Files and Samples Gene Summary pages Performing.
1 MIAME The MIAME website: © 2002 Norman Morrison for Manchester Bioinformatics.
Taverna Workflow. A suite of tools for bioinformatics Fully featured, extensible and scalable scientific workflow management system – Workbench, server,
HUMAN-MOUSE CONSERVED COEXPRESSION NETWORKS PREDICT CANDIDATE DISEASE GENES Ala U., Piro R., Grassi E., Damasco C., Silengo L., Brunner H., Provero P.
EADGENE and SABRE Post-Analyses Workshop 12-14th November 2008, Lelystad, Netherlands 1 François Moreews SIGENAE, INRA, Rennes Cytoscape.
Copyright OpenHelix. No use or reproduction without express written consent1.
1 maxdLoad The maxd website: © 2002 Norman Morrison for Manchester Bioinformatics.
Introduction to caArray caBIG ® Molecular Analysis Tools Knowledge Center April 3, 2011.
Content, Format, and Standards in Genomics Scale Data The ILSI – EBI Collaboration Wm. B. Mattes, PhD, DABT.
Taverna Workflows for Systems Biology Katy Wolstencroft School of Computer Science University of Manchester.
MIAMExpress and the development of annotation ontologies for gene expression experiments Ele Holloway Microarray Informatics European Bioinformatics Institute.
A plant-specific annotation and submission tool for the incorporation of Arabidopsis gene expression data into ArrayExpress, the EBI’s public DNA microarray.
RADical microarray data: standards, databases, and analysis Chris Stoeckert, Ph.D. University of Pennsylvania Yale Microarray Data Analysis Workshop December.
PROGNOCHIP-BASE, FORTH-ICS 1 PrognoChip-BASE: An Information System for the Management of Spotted DNA MicroArray Experiments Extension of BASE v
Alvis Brazma, Johan Rung, Ugis Sarkans, Thomas Schlitt, Jaak Vilo European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge,
Analysis of GEO datasets using GEO2R Parthav Jailwala CCR Collaborative Bioinformatics Resource CCR/NCI/NIH.
Gene Expression Omnibus (GEO)
1 Outline Standardization - necessary components –what information should be exchanged –how the information should be exchanged –common terms (ontologies)
ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
EBI is an Outstation of the European Molecular Biology Laboratory. Gautier Koscielny VectorBase Meeting 08 Feburary 2012, EBI VectorBase Text Search Engine.
Primary vs. Secondary Databases Primary databases are repositories of “raw” data. These are also referred to as archival databases. -This is one of the.
Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
GeWorkbench Overview Support Team Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT and Harvard.
ArrayExpress - a Public Repository for Microarray Based Gene Expression Data European Bioinformatics Institute - EMBL outstation and German Cancer Research.
CSC, Dec.15-16,2005. Cytoscape Team Trey Ideker Mark Anderson Nerius Landys Ryan Kelley Chris Workman Past contributors: Nada Amin Owen Ozier Jonathan.
Presentation on Database management Submitted To: Prof: Rutvi Sarang Submitted By: Dharmishtha A. Baria Roll:No:1(sem-3)
Human Genetics Integrative Bioinformatics using Cytoscape (and R2)
Bioinformatics Shared Resource Introduction to Gene Expression Omnibus (GEO) bsrweb.sanfordburnham.org
Nature as blueprint to design antibody factories Life Science Technologies Project course 2016 Aalto CHEM.
ArrayExpress Ugis Sarkans EMBL - EBI
MESA A Simple Microarray Data Management Server. General MESA is a prototype web-based database solution for the massive amounts of initial data generated.
GEO (Gene Expression Omnibus) Deepak Sambhara Georgia Institute of Technology 21 June, 2006.
Using ArrayExpress.
ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
How to store and visualize RNA-seq data
Platforms A Platform record describes the list of elements on the array (e.g., cDNAs, oligonucleotide probesets, ORFs, antibodies) or the list.
Gene Expression Omnibus (GEO)
Presentation transcript:

Introduction and Applications of Microarray Databases Chen-hsiung Chan Department of Computer Science and Information Engineering National Taiwan University

MIAME (Minimum Information About a Microarray Experiment)  MIAME describes the Minimum Information About a Microarray Experiment that is needed to enable the interpretation of the results of the experiment unambiguously and potentially to reproduce the experiment. [Brazma et al, Nature Genetics]Brazma et al, Nature Genetics

MIAME  raw data (CEL or GPR files)  final processed (normalized) data  essential sample annotation including experimental factors and their values  experimental design including sample data relationships  sufficient annotation of the array  essential laboratory and data processing protocols

Databases using MIAME  ArrayExpress at EBI  GEO at NCBI  CIBEX at DDBJ

ArrayExpress  Stores transcriptomics and related data  Data warehouse stores gene indexed expression profiles  In accordance with MGED recommendations: MIAME

ArrayExpress statistics  Experiment repository: 2,914 experiments (each with at least 6 microarrays) and growing  Expression profiles: including 267 experiments, 121,891 genes  Data warehouse updated everyday

Searching ArrayExpress  Keywords: breast cancer, cell cycle, … etc.  Accession numbers: E-XXXX-d, e.g. E-AFFY-1281, E-TIGR-372, … etc.  Secondary accession numbers: GEO accession, e.g. GSE5389.  Species names mainly in Latin names (e.g. Homo sapiens), common names may be used as well (e.g. human).

ArrayExpress interface

ArrayExpress Search/Browse Result Keyword: lung cancer

ArrayExpress Search/Browse Result Detailed view

Expression Profile results  Thumbnail view  BigPlot view  Gene ranking (most differentially expressed experiments are top ranked)  Similarity search: search genes with similar expression levels

Take a break …

Gene Expression Omnibus (GEO)  Gene expression/molecular abundance repository  MIAME compliant  Supports browsing, query and retrieval

GEO record types  Platform  Sample  Series  DataSet  Profile

GEO Platform  Platform record defines the list of elements that may be detected and quantified in that experiment (e.g., cDNAs, oligonucleotide probesets)  Each Platform record is assigned a unique and stable GEO accession number (GPLxxx)  A Platform may reference many Samples that have been submitted by multiple submitters

GEO Sample  Sample record describes the conditions under which an individual Sample was handled, the manipulations it underwent, and the abundance measurement of each element derived from it  Each Sample record is assigned a unique and stable GEO accession number (GSMxxx)  A Sample entity must reference only one Platform and may be included in multiple Series

GEO Series  A Series record links together a group of related Samples and provides a focal point and description of the whole study  Series records may also contain tables describing extracted data, summary conclusions, or analyses  Each Series record is assigned a unique and stable GEO accession number (GSExxx)

GEO DataSet  Assembled in NCBI  Samples are all equivalently measured and normalized  Can be viewed and analyzed with NCBI ’ s advanced data display and analysis tool

GEO Profile  Profile consists of the expression measurements for an individual gene across all Samples in a DataSet  Profiles can be searched using Entrez GEO Profiles  Similar to Expression Profile in ArrayExpress

SOFT (Simple Omnibus Format in Text)  Text based  Line based  Easily parsed with text processing languages, including Perl, Python, Ruby, PHP, … etc.

Take a break …

Network Biology Visualization and Analysis

Cytoscape  Open source network visualization and analysis software  ‘ Core ’ features include network layout and query, also integrate visualizations with state data  Can be extended by plugins

Cytoscape developers  University of California at San Diego (Trey Ideker)  Institute for Systems Biology (Leroy Hood)  Memorial Sloan-Kettering Cancer Center (Chris Sander)  Institut Pasteur (Benno Schwikowski)  Agilent Technologies (Annette Adler)  University of California at San Francisco (Bruce Conklin)

Cytoscape  A java application  Require Java 5 or 6 (JDK5/6 or JRE5/6)

Simple Interaction Format (SIF)  Each line denotes one interaction InteractorA xx Interactor B  ‘ xx ’ are interaction types: pp: protein-protein interaction pd: protein-DNA interaction (transcription factor/regulation) pr (protein-reaction), rc (reaction- compound), cr (compound-reaction), gl (genetic-lethal), pm (protein-metabolite), mp (metabolite-protein)

Other interaction formats supported  GML  XGMML  SBML  BioPAX  PSI-MI  Tab-delimited text table and excel

Cytoscape Demonstration

Applications of Gene Expression  Gene selection (differentially expressed genes)  State annotation in networks (expression level)  Gene regulatory network identification