Making Sense of Public Domain Expression Data- GeneVestigator

Slides:



Advertisements
Similar presentations
Garnet.arabidopsis.org.uk Beatrice Schildknecht NASC Data Availability and NASC tools NASC Nottingham Arabidopsis Stock Centre
Advertisements

© ETH Zürich | Genevestigator | This module was contributed by Philip Zimmermann Genevestigator – Module I Overview of Genevestigator.
Microarray Pitfalls Stem Cell Network Microarray Course, Unit 3 October 2006.
The Rice Functional Genomics Program of China cDNA microarray database (RIFGP-CDMD) consists of complete datasets, including the probe sequences, microarray.
Abstract BarleyBase ( is a USDA-funded public repository for plant microarray data. BarleyBase houses raw and normalized expression.
Working with gene lists: Finding data using GEO & BioMart June 5, 2014.
Basic Genomic Characteristic  AIM: to collect as much general information as possible about your gene: Nucleotide sequence Databases ○ NCBI GenBank ○
Minimum Information About a Microarray Experiment - MIAME MGED 5 workshop.
Microarray GEO – Microarray sets database
Kate Milova MolGen retreat March 24, Microarray experiments: Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Gene Expression Networks Esra Erdin CS 790g Fall 2010.
Demonstration Trupti Joshi Computer Science Department 317 Engineering Building North (O)
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
NCBI resources III: GEO and expression data analysis Yanbin Yin Fall
Using ArrayExpress. ArrayExpress is an international public repository for well-annotated microarray data, including gene expression, comparative genomic.
Midterm project Course: Statistics in Bioinformatics Date: 指導教授 : 陳光琦 學生 : 吳昱賢.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Microarray Preprocessing
Introduction The goal of translational bioinformatics is to enable the transformation of increasingly voluminous genomic and biological data into diagnostics.
Paola CASTAGNOLI Maria FOTI Microarrays. Applicazioni nella genomica funzionale e nel genotyping DIPARTIMENTO DI BIOTECNOLOGIE E BIOSCIENZE.
A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Affymetrix GeneChips Oligonucleotide.
ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
Gene expression services: ArrayExpress and the Gene Expression Atlas Contact: Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
ArrayExpress and Expression Atlas: Mining Functional Genomics data Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
Microrray Data Standardisation Microarray Gene Expression Database group -- MGED December, 2000.
Gene Expression Omnibus (GEO)
ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
Copyright OpenHelix. No use or reproduction without express written consent1.
ArrayExpress and Expression Atlas: Mining Functional Genomics data Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
PLEXdb Plant Expression database Ethalinda Cannon Iowa State University January 15th, 2007.
Data Type 1: Microarrays
The European Bioinformatics Institute MGED ontology for consistent annotation of microarray experiments Manchester Bioinformatics Week Ontologies Workshop1.
Abstract BarleyBase is a USDA-funded public repository for plant microarray data. BarleyBase houses raw and normalized expression data from the 22K Affymetrix.
Basic features for portal users. Agenda - Basic features Overview –features and navigation Browsing data –Files and Samples Gene Summary pages Performing.
Agenda Introduction to microarrays
VectorBase Gene expression data in VectorBase Fotis Kafatos, George Christophides, Bob MacCallum & Seth Redmond Imperial College London (thanks also to.
Microarray - Leukemia vs. normal GeneChip System.
1 maxdLoad The maxd website: © 2002 Norman Morrison for Manchester Bioinformatics.
Gene expression analysis
MIAMExpress and the development of annotation ontologies for gene expression experiments Ele Holloway Microarray Informatics European Bioinformatics Institute.
Microarrays and Gene Expression Analysis. 2 Gene Expression Data Microarray experiments Applications Data analysis Gene Expression Databases.
A plant-specific annotation and submission tool for the incorporation of Arabidopsis gene expression data into ArrayExpress, the EBI’s public DNA microarray.
1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:
PROGNOCHIP-BASE, FORTH-ICS 1 PrognoChip-BASE: An Information System for the Management of Spotted DNA MicroArray Experiments Extension of BASE v
Alvis Brazma, Johan Rung, Ugis Sarkans, Thomas Schlitt, Jaak Vilo European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge,
Analysis of GEO datasets using GEO2R Parthav Jailwala CCR Collaborative Bioinformatics Resource CCR/NCI/NIH.
Gene Expression Omnibus (GEO)
Idea: measure the amount of mRNA to see which genes are being expressed in (used by) the cell. Measuring protein might be more direct, but is currently.
ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
SUPPLEMENTAL FIGURES AND TABLES. Supplementary Table 1: List of new and improved features in GSEA-P version 2 Java software. Examples and screenshots.
ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.
Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.
Microarrays and Other High-Throughput Methods BMI/CS 576 Colin Dewey Fall 2010.
Data Integration & Data Mining Tool Donald Dunbar BHF CoRE Bioinformatics Team Edinburgh Bioinformatics Meeting April 2013.
Introduction and Applications of Microarray Databases Chen-hsiung Chan Department of Computer Science and Information Engineering National Taiwan University.
Microarray Data Analysis The Bioinformatics side of the bench.
Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.
Oigonucleotide (Affyx) Array Basics Joseph Nevins Holly Dressman Mike West Duke University.
Transcriptome What is it - genome wide transcript abundance How do you obtain it - Arrays + MPSS What do you do with it when you have it - ?
Affymetrix User’s Group Meeting Boston, MA May 2005 Keynote Topics: 1. Human genome annotations: emergence of non-coding transcripts -tiling arrays: study.
Bioinformatics Shared Resource Introduction to Gene Expression Omnibus (GEO) bsrweb.sanfordburnham.org
ArrayExpress Ugis Sarkans EMBL - EBI
Microarray Technology and Data Analysis Roy Williams PhD Sanford | Burnham Medical Research Institute.
MESA A Simple Microarray Data Management Server. General MESA is a prototype web-based database solution for the massive amounts of initial data generated.
GEO (Gene Expression Omnibus) Deepak Sambhara Georgia Institute of Technology 21 June, 2006.
Using ArrayExpress.
How to store and visualize RNA-seq data
Gene Expression Omnibus (GEO)
Presentation transcript:

Making Sense of Public Domain Expression Data- GeneVestigator Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09

Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09 On the Agenda - Microarray databases – characteristics pros and cons Examples: GEO and ArrayExpress GeneVestigator - meta-analytical approach Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09

Meta-data in Microarray Experiments Gene expression studies generate large amounts of data ! http://titan.biotec.uiuc.edu/cs491jh/slides/cs491jh-Yong.ppt#268,6,Capturing Data and Meta-data in Microarray Experiments Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09

Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09 Properties of High-throughput Data Microarray databases: have the ability to accept, store and export (share) large quantities of data. Data (stored) contain: Many genes Many samples Various organisms/tissues Variety of biological phenomena Time course Replicates Different technologies: various data format Data Retrieval: user-friendly web-based interfaces Links to Analysis Tools Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09

Gene Expression Matrix The final gene expression matrix (on the right) is needed for higher level analysis and mining Images Spots Samples Spot/Image quantiations  Genes Gene expression levels Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09 http://titan.biotec.uiuc.edu/cs491jh/slides/cs491jh-Yong.ppt#271,8,Gene Expression Matrix

Microarray Data Precision and Loss Electron microscopy Only provided in 0.1% of public experiments Processed data loses precision ! 90% of CEL files generated from microarray experiments have never been deposited to any repository. Stokes et al. BMC Bioinformatics 2008 9(Suppl 6):S18   Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09 http://www.bio-miblab.org/arraywiki

Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09 Microarray Data Formats Raw image data, the intensity of the signal at each spot is proportional to the expression level of the gene under test. Image intensities are quantified using image analysis software. B. Raw numerical data (signal intensities). C. Processed data. A. B. C. Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09

Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09 Problem: Raw Data Complete description of complex experiments is desired. We don’t always know what’s important: “Noise” probes could end up being informative (e.g. detection of a splice variant). The Future Better (more accurate) summarization algorithms will emerge. New uses for raw data may emerge. Challenge: Store the raw data in accessible form. Different labs have different needs – a central system is needed ! Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09

Complexity and Categories of Data and MIAME 6 parts The MIAME (Minimum Information About a Microarray Experiment) guidelines contain standards for publication of information. Brazma et al. (2001), Nature Genetics 29(4), 365-71 Publication Experimental design Source (e.g., Taxonomy) Sample – Source & treatment, prep. & labelling Hybridisation Array design Gene (e.g., EMBL) Normalization Data measurements Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09 http://www.ict.ox.ac.uk/odit/projects/digitalrepository/docs/workshop/Helen_Parkinson-RDMW0608.ppt#429,18,Slide 18 9

Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09 Microarray Database Repositories are Biased The relative size of each pie corresponds to the number of experiments contained in each repository. All human data Mostly custom arrays Mostly old data Mostly human data Mainly Affy chips Stokes et al. BMC Bioinformatics 2008 9 (Suppl 6): S18 http://www.biomedcentral.com/1471-2105/9/S6/S18 Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09

Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09 Overlaps of Data Between Repositories Stokes et al. BMC Bioinformatics 2008 9 (Suppl 6): S18   http://www.biomedcentral.com/1471-2105/9/S6/S18 Total Experiments: 2376 August 2005 – June 2006 Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09

User-Friendly Microarray Databases Many gene expression databases exist: commercial and non-commercial. Most focus on either a particular technology, particular organism or both. We will discuss most promising ones: ArrayExpress – EBI (AE) The Gene expression Omnibus (GEO; NCBI) GeneVestigator Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09

Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09 http://www.ncbi.nlm.nih.gov/geo/ The Gene Expression Omnibus is a public repository in the Entrez database that includes high-throughput gene expression data, hosted at the National library of Medicine (NIH). GEO was designed to accommodate diverse types of data. Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09

Gene Express Omnibus - Experiment centered view (GDS)

Gene Express Omnibus - Gene centered view Example: GDS563 Expression profile of the Dystrophin gene in a DataSet examining skeletal muscle biopsies from 12 Duchenne muscular dystrophy patients and 12 normal subjects. Red bars: level of abundance of an individual transcript across the Samples that make up a DataSet. Values are presented as arbitrary units. Single channel: normalized Values signal count data. Dual channel: submitted Values are normalized log ratios. Blue square rank order, give an indication of where the expression of that gene falls with respect to all other genes on that array (enrichment). Duchenne Normal Experimental design Faded bars/squares: These correspond to Affymetrix 'Detection call' = Absent. Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09

http://www.ebi.ac.uk/microarray-as/ae/ Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09 Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09 16

Query ArrayExpress Annotations Experiments and description Condition Click Gene name Species Results: a list of all experiments, ordered by p value. For each experiment: short description, experimental factors and gene expression. Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09

Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09 Query ArrayExpress – similar expressed genes Select the ‘find 3 closest genes’ option. IER2, FOS, JUN, have similar expression to nfkbia. Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09

Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09 HeatMap Atlas Output Number of up/down regulated genes Experimental condition Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09 http://www.ebi.ac.uk/microarray-as/atlas/qr?q_gene=saa4&q_updn=updn&q_orgn=MUS+MUSCULUS&q_expt=%28all+conditions%29&view=heatmap&view=

Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09 https://www.genevestigator.com/gv/index.jsp GeneVesigator –a reference expression database and meta-analysis system Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09

Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09 Genevestigator – a system for the meta-analysis of microarray data A database & Web-browser data mining interface for Affymetrix GeneChip data, based on a the new concept of “Meta-Profiles“, relying on reference expression databases. Allows biologists to study the expression and regulation of genes in a broad variety of contexts by summarizing information from hundreds of manually curated microarray experiments. Workspaces and views can be stored into files and re-opened for another analysis session (*.gvw which stands for GenevestigatorWorkspace). Application server Java application Analysis output http://bar.utoronto.ca/ICAR19/ICAR19_BioinfoWorkshop%20-%20Genevestigator.ppt#257,2,Overview of the Genevestigator system Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09

Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09 Database Content and Quality Database consist of large and various manually curated and quality-controlled Affymetrix chips: Quality control of EACH experiment is manually done by Genevestigator curators using a pipeline of Bioconductor packages performing normalization and probe-level analysis. Low quality arrays are characterized by: fall out of range relative to the other arrays from the same experiment, exhibit higher RNA degradation, particularly noisy, do not correlate with replicate samples. Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09 22

Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09 User Hardware Requirements Genevestigator is a web-based application running in Java. Java applet provides several advantages: users don’t have to install any software users always work with the latest software release Java is more powerful than HTML/Javascript for data manipulation To run the application, client machines must have Java runtime environment (JRE; version 1.4.2 or higher) installed (usually available by default on PCs). JRE is freely available for download at Sun Microsystems (http://www.Java.com). To optimally work with the Genevestigator application, we recommend: screen resolution: 1024 x 768 or higher memory: preferably 512 MB RAM or more Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09

Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09 GeneVestigator Species Availability Species: Human Mouse Rat [Mammals] Arrays: Human 133_2 & Human Genome 10k 20k 47 k 1109, 3786, 2782 Mouse Genome 12k 40k 3071, 1967 Rat Genome 8k 31k 2146, 858 Number of arrays: Species: Arabidopsis Barley Rice Soybean [Plants] Barley Genome 22k 706 Rice Genome 22k - Arrays: Arabidopsis Genome 22k 3110 Number of arrays: Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09

Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09 Data Sources and Referencing The Genevestigator analysis platform comprises a large database of manually curated microarray experiments collected from the public domain or from individual contributors. The array annotations necessary for data analysis were retrieved from public repositories and/or, if insufficiently available, from the authors themselves. Genevestigator contains data from the following repositories and databases: Link Database http://www.ncbi.nlm.nih.gov/geo/ Gene Expression Omnibus (GEO) http://www.ebi.ac.uk/arrayexpress/ ArrayExpress http://chipperdb.chip.org/adb/adb-home ChipperDB http://www.arabidopsis.org/ The Arabidopsis Information Resource (TAIR) http:proteogenomics.musc.eduma MUSC Microarray Database http://pepr.cnmcresearch.org Public Expression Profiling Resource (PEPR) http://affymetrix.arabidopsis.info/narrays/experimentbrowse.pl NASC Microarray Database (NASCArrays) http://arrayconsortium.tgen.org/np2/home.do NIH Neuroscience Microarray Consortium https://genes.med.virginia.edu/intro to geoss.html Gene Expression Open Source System (GEOSS) http://www.cbil.upenn.edu/RAD/php/index.php RNA Abundance Database (RAD) Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09

Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09 GeneVestigator – focus on gene expression in the context of: Time (Gene expression during stages of development\life-cycle). Space (Tissue specific expression). Response (Expression caused by stimuli: biotic stress, abiotic stress, chemical, hormone, light, drug treatment, disease). Users can query the database to retrieve the expression patterns of individual genes throughout chosen environmental conditions, growth stages, or organs. Reversely, mining tools allow users to identify genes specifically expressed during selected stresses, growth stages, or in particular organs Access: Free / By license Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09

Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09 http://sbw.kgi.edu/

Thank-you ! Dr. Metsada Pasmanik-Chor Bioinformatics Unit, Life Science, TAU Tel: x 6992 E-mail: metsada@bioinfo.tau.ac.il Bioinfo. Unit webpage: http://bioinfo.tau.ac.il 28 Bioinformatics Intro, 15/12/2008, Metsada Pasmanik-Chor Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09