Download presentation
Presentation is loading. Please wait.
Published byNoreen Hudson Modified over 9 years ago
1
DESPRAD subproject Alvis Brazma EMBL-EBI Hinxton, October 20, 2003
2
DESPRAD – Development and Establishment of Standards and Prototype Repository for Array Data
3
Participants F EBI F UMC Utrecht F University of Bergen F RZPD F Cambridge University F EMBL Heidelberg F University of Marseille (CIML) F University of Madrid (CMB)
4
Three major sets of WPs: F Developing standards and an international infrastructure for microarray data sharing (WP1 – WP4) F Establishing a public repository for microarray data – ArrayExpress (WP4 – WP9) F Research in gene expression data analysis and gene networks (WP9 – WP12)
5
ArrayExpress goals F Serving as an archival repository for microarray data supporting publications F Providing easy access to microarray data in a structured and standardised format for research community F Facilitating the sharing of microarray designs and protocols
6
ArrayExpress approach F To collect the necessary information enabling the user to understand how to interpret the data F To try to represent the information in a structured way potentially allowing for automated analysis and mining F To work towards a community agreement to represent the microarry data in a standard way – founding of the MGED society
7
1. Standards F Founding the Microarray Gene Expression Data (MGED) society F Development of the standards –MIMAE –MAGE –MGED ontology
8
Array scans Spots Quantitations Genes Samples Sharing microarray data – which data? A B C D
9
Samples Genes Gene expression levels – problem 2 Sample annotations problem 1 Gene annotations Gene expression matrix Annotations
10
MGED Society MGED 1, Hinxton, November 1999 MGED 2, Heidelberg, May 2000 MGED 3, Stanford University, April 2001 MGED 4, Boston, February 2002 MGED 5, Tokyo, September 2002 MGED 6, Aix-en-Provence, September 2003 MGED 7, Toronto, September 2004 -Microarray Gene Expression Data Society is an international organisation for facilitating the sharing of functional genomics and proteomics array data Board of directors – EBI, Stanford, UCB, TIGR, Affymetrix, Rosetta,…
11
hybridisation labelled nucleic acid array RNA extract Sample Array design hybridisation labelled nucleic acid array RNA extract Sample hybridisation labelled nucleic acid array RNA extract Sample hybridisation labelled nucleic acid array RNA extract Sample hybridisation labelled nucleic acid Microarray RNA extract Sample Experiment Gene expression data matrix normalization integration Protocol genes
12
The first database model - developed in collaboration with DKFZ in 1999
13
MGED standards - MIAME
14
Nature editorial
15
MAGE-ML MGED standards – MAGE-ML
16
The organisations and software supporting MAGE-ML include F Affymetrix F Agilent F Biodiscovery (Imagene5.5) F BASE (Open source project coordinated at Lund) F Iobion (Gene Traffic) F Manchester University (MAXDB) F Molmine (J-Express) F NCI F NIEHS F Rosetta Biosoftware (Rosetta Resolver) F RZPD F Sanger Institute LIMS (MIDAS) F Silicon Genetics (GeneNet) F Stanford University (SMD) F TIGR (MADAM) F UC at Berkeley F University of Pennsylvania (RAD) F UMC Utrecht
17
The organisations and software supporting MAGE-ML include F Affymetrix F Agilent F Biodiscovery (Imagene5.5) F BASE (Open source project coordinated at Lund) F Iobion (Gene Traffic) F Manchester University (MAXDB) F Molmine (J-Express) F NCI F NIEHS F Rosetta Biosoftware (Rosetta Resolver) F RZPD F Sanger Institute LIMS (MIDAS) F Silicon Genetics (GeneNet) F Stanford University (SMD) F TIGR (MADAM) F UC at Berkeley F University of Pennsylvania (RAD) F UMC Utrecht
19
Data in ArrayExpress 2002 2003 2004 AprilSeptember 1000 2000 3000 Hybs February November September 6 ~100 ~250 1172 ~3000
20
ArrayExpress content (experiments) By experiment +1 drosophyla experiment
21
Submissions by labs (in hybs)
22
Submissions by country (in experiments)
27
Expression Profiler (component interface) SUBSELECT CLUSTER 1 2
28
ArrayExpress web-page hits F 2002 – 49 245 F 2003 – 274 983 (by 12 September)
29
ArrayExpress components ArrayExpress MIAMExpress - online submission tool MAGE-ML Internet www MAGE-ML SubmissionsQueries, Analysis Large-scale microarray facilities Expression Profiler - online analysis tool Smaller labs Export to local analysis tools
32
MIAMExpress F Online since December 1, 2002 –2002 – 15 951 hits –2003 – 112 871 hits by 12 September F So far ~20 submissions completed through MIAMExpress, i.e., about 25% of all experiments in ArrayExpress F MIAMExpress is open source software - installed in at least 15 labs (EMBL, RZPD, Leipzig, Leuven, Vancouver, VIB) F Tox-MIAMExpress – a specialised version for Toxicology
33
ww w Array Manufacturers (Affymetrix,Agilent) LIMS (EMBL,TIGR) Desktop Data Analysis software ArrayExpress infrastructure MAGE-ML retrieval Queries Access Submissions Repository (Oracle) Expression Profiler MIAMExpress (MySQL) ArrayExpress Local databases MIAMExpress Local installations (Cambridge,…) Local databases (RZPD,Stanford) ww w MAGE-ML MAGE-ML pipelines Query interface (Tomcat)
34
Submissions by pipeline (in hybs)
35
ArrayExpress development Simple queries (species, author, lab, array types, etc) Hyperlinks to other databases Repository (MAGE-OM model) submissions curation Links back to the evidence More complex queries (genes, expression levels, etc) Database integration Warehouse (simple gene-centric model) curation Ensmart
36
Samples Genes Gene expression levels Sample annotations Gene annotations Gene expression data matrix
37
ArrayExpress development Simple queries (species, author, lab, array types, etc) Hyperlinks to other databases Repository (MAGE-OM model) submissions curation Summarised information about which gene is expressed where Database integration Gene Expression Atlas curation Links back to the evidence More complex queries (genes, expression levels, etc) Database integration Warehouse (simple gene-centric model) curation Ensmart
38
New in ArrayExpress F Password protected logins F Can be used to support anonymous refereeing of microarray papers F Discussions with Nature
39
2002 2003 2004 1000 2000 3000 4000 Hybs Data growth in ArrayExpress ?
40
Distributed data collection ArrayExpress National microarray centre National microarray centre National microarray centre Small lab Stanford TIGR EMBL Sanger Small lab Small lab Small lab Small lab Small lab Small lab Small lab
41
Data analysis tools F Expression profiler – complete redevelopment of the earlier tool – new interface, new functionality, XML based modularity – beta version will be ready on months 24 F J-express – (developed in Bergen), talk by Inge Jonassen
42
Research F Microarray based gene network analysis – 2 publications out, 1 in print, 1 submitted F S. Pombe gene expression data analysis (in collaboration with the Sanger Institute) – publication in preparation F New algorithms for clustering and cluster comparison – 2 publications in preparation
43
Transcription factor binding network F Chromatin IP experiments on a chip (ChiP on chip) –Using microarrays for finding genomic (intragenic) sequences (of length of few hundred bp) where a particular transcription factor is likely to bind F ChIP by Lee et al. (Science 2002) – binding site location data in yeast genome for 107 transcription factors (from about 250 yeast transcription factors in total) F Identified around 4500 binding locations
44
ChIP on chip network by Lee et al
45
Gene disruption network A C B D AA BB CC gene B gene C gene D gene A
46
Data for over 200 gene disruptions in Yeast Hughes et al, Cell, 102 (2000)
47
Mutation network for S. Cerevisiae
48
Three networks in yeast F ChIP network (Lee et al) F Mutation network (Hughes et al) F In silico network – matching 38 experimentally known transcription factor binding sites (Pilpel et al) against yeast genome sequence
49
Intersection of the networks Red – 39 arcs present in all networks Green – arcs present in at least 2 networks and adjacent to one of SWI4, SWI6 or MBP1
50
How Chip-chip and disruption networks relate? All genes Transcription factors Disrupted genes t Regulation set of t h Effectual set of h
51
All genes Transcription factors Disrupted genes Regulation set of g Effectual set of g 107 220 11 The overlap between the regulation and effectual sets are higher than expected only for 3 of 11 – –STE 12- pheramone response –GCN 4- aa/purine starvation –SWI 5- cell cycle How Chip-chip and disruption networks relate?
52
All genes Pairs of genes sometimes tend to have highly overlapping effects All genes Transcription factors Disrupted genes t Regulation set of t h Effectual set of h
53
How to verify the possible relationships? F protein-protein interaction (Y2H, cellzome, etc.) F MIPS (C. v. Mering „reference set“) Co-citation network (PubMed)
54
Overlapping target sets can be explained by p/p-interactions known interaction target set overlap small target set overlap large predicted interaction
56
Predicted relationships pheromone response genes cell cycle genes
57
WP13 – dissemination F www.ebi.ac.uk/microarray, www.ebi.ac.uk/arrayexpress, www.mged.org F Talks in over 40 international conferences and workshops F At least 5 peer-reviewed publications (counting only the ones with the major contribution from TEMBLOR) F Contribution to two EMBO courses on microarray data analysis F Trainee students at the EBI F A press-release in preparation
60
Summary F ArrayExpress public repository is up and running ahead of the schedule, the number of submissions are growing and we already have ‘hits’ and requests from the users (though there is not very much data in yet) F The MIAME and MAGE-ML standards are established (and being finalised). Several TEMBLOR partners have implemented these standards and ArrayExpress is getting submissions F Data analysis tools are under development, prototypes are working and there is a number of peer-reviewed papers out
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.