Presentation is loading. Please wait.

Presentation is loading. Please wait.

DESPRAD subproject Alvis Brazma EMBL-EBI Hinxton, October 20, 2003.

Similar presentations


Presentation on theme: "DESPRAD subproject Alvis Brazma EMBL-EBI Hinxton, October 20, 2003."— Presentation transcript:

1 DESPRAD subproject Alvis Brazma EMBL-EBI Hinxton, October 20, 2003

2 DESPRAD – Development and Establishment of Standards and Prototype Repository for Array Data

3 Participants F EBI F UMC Utrecht F University of Bergen F RZPD F Cambridge University F EMBL Heidelberg F University of Marseille (CIML) F University of Madrid (CMB)

4 Three major sets of WPs: F Developing standards and an international infrastructure for microarray data sharing (WP1 – WP4) F Establishing a public repository for microarray data – ArrayExpress (WP4 – WP9) F Research in gene expression data analysis and gene networks (WP9 – WP12)

5 ArrayExpress goals F Serving as an archival repository for microarray data supporting publications F Providing easy access to microarray data in a structured and standardised format for research community F Facilitating the sharing of microarray designs and protocols

6 ArrayExpress approach F To collect the necessary information enabling the user to understand how to interpret the data F To try to represent the information in a structured way potentially allowing for automated analysis and mining F To work towards a community agreement to represent the microarry data in a standard way – founding of the MGED society

7 1. Standards F Founding the Microarray Gene Expression Data (MGED) society F Development of the standards –MIMAE –MAGE –MGED ontology

8 Array scans Spots Quantitations Genes Samples Sharing microarray data – which data? A B C D

9 Samples Genes Gene expression levels – problem 2 Sample annotations problem 1 Gene annotations Gene expression matrix Annotations

10 MGED Society MGED 1, Hinxton, November 1999 MGED 2, Heidelberg, May 2000 MGED 3, Stanford University, April 2001 MGED 4, Boston, February 2002 MGED 5, Tokyo, September 2002 MGED 6, Aix-en-Provence, September 2003 MGED 7, Toronto, September 2004 -Microarray Gene Expression Data Society is an international organisation for facilitating the sharing of functional genomics and proteomics array data Board of directors – EBI, Stanford, UCB, TIGR, Affymetrix, Rosetta,…

11 hybridisation labelled nucleic acid array RNA extract Sample Array design hybridisation labelled nucleic acid array RNA extract Sample hybridisation labelled nucleic acid array RNA extract Sample hybridisation labelled nucleic acid array RNA extract Sample hybridisation labelled nucleic acid Microarray RNA extract Sample Experiment Gene expression data matrix normalization integration Protocol genes

12 The first database model - developed in collaboration with DKFZ in 1999

13 MGED standards - MIAME

14 Nature editorial

15 MAGE-ML MGED standards – MAGE-ML

16 The organisations and software supporting MAGE-ML include F Affymetrix F Agilent F Biodiscovery (Imagene5.5) F BASE (Open source project coordinated at Lund) F Iobion (Gene Traffic) F Manchester University (MAXDB) F Molmine (J-Express) F NCI F NIEHS F Rosetta Biosoftware (Rosetta Resolver) F RZPD F Sanger Institute LIMS (MIDAS) F Silicon Genetics (GeneNet) F Stanford University (SMD) F TIGR (MADAM) F UC at Berkeley F University of Pennsylvania (RAD) F UMC Utrecht

17 The organisations and software supporting MAGE-ML include F Affymetrix F Agilent F Biodiscovery (Imagene5.5) F BASE (Open source project coordinated at Lund) F Iobion (Gene Traffic) F Manchester University (MAXDB) F Molmine (J-Express) F NCI F NIEHS F Rosetta Biosoftware (Rosetta Resolver) F RZPD F Sanger Institute LIMS (MIDAS) F Silicon Genetics (GeneNet) F Stanford University (SMD) F TIGR (MADAM) F UC at Berkeley F University of Pennsylvania (RAD) F UMC Utrecht

18

19 Data in ArrayExpress 2002 2003 2004 AprilSeptember 1000 2000 3000 Hybs February November September 6 ~100 ~250 1172 ~3000

20 ArrayExpress content (experiments) By experiment +1 drosophyla experiment

21 Submissions by labs (in hybs)

22 Submissions by country (in experiments)

23

24

25

26

27 Expression Profiler (component interface) SUBSELECT CLUSTER 1 2

28 ArrayExpress web-page hits F 2002 – 49 245 F 2003 – 274 983 (by 12 September)

29 ArrayExpress components ArrayExpress MIAMExpress - online submission tool MAGE-ML Internet www MAGE-ML SubmissionsQueries, Analysis Large-scale microarray facilities Expression Profiler - online analysis tool Smaller labs Export to local analysis tools

30

31

32 MIAMExpress F Online since December 1, 2002 –2002 – 15 951 hits –2003 – 112 871 hits by 12 September F So far ~20 submissions completed through MIAMExpress, i.e., about 25% of all experiments in ArrayExpress F MIAMExpress is open source software - installed in at least 15 labs (EMBL, RZPD, Leipzig, Leuven, Vancouver, VIB) F Tox-MIAMExpress – a specialised version for Toxicology

33 ww w Array Manufacturers (Affymetrix,Agilent) LIMS (EMBL,TIGR) Desktop Data Analysis software ArrayExpress infrastructure MAGE-ML retrieval Queries Access Submissions Repository (Oracle) Expression Profiler MIAMExpress (MySQL) ArrayExpress Local databases MIAMExpress Local installations (Cambridge,…) Local databases (RZPD,Stanford) ww w MAGE-ML MAGE-ML pipelines Query interface (Tomcat)

34 Submissions by pipeline (in hybs)

35 ArrayExpress development Simple queries (species, author, lab, array types, etc) Hyperlinks to other databases Repository (MAGE-OM model) submissions curation Links back to the evidence More complex queries (genes, expression levels, etc) Database integration Warehouse (simple gene-centric model) curation Ensmart

36 Samples Genes Gene expression levels Sample annotations Gene annotations Gene expression data matrix

37 ArrayExpress development Simple queries (species, author, lab, array types, etc) Hyperlinks to other databases Repository (MAGE-OM model) submissions curation Summarised information about which gene is expressed where Database integration Gene Expression Atlas curation Links back to the evidence More complex queries (genes, expression levels, etc) Database integration Warehouse (simple gene-centric model) curation Ensmart

38 New in ArrayExpress F Password protected logins F Can be used to support anonymous refereeing of microarray papers F Discussions with Nature

39 2002 2003 2004 1000 2000 3000 4000 Hybs Data growth in ArrayExpress ?

40 Distributed data collection ArrayExpress National microarray centre National microarray centre National microarray centre Small lab Stanford TIGR EMBL Sanger Small lab Small lab Small lab Small lab Small lab Small lab Small lab

41 Data analysis tools F Expression profiler – complete redevelopment of the earlier tool – new interface, new functionality, XML based modularity – beta version will be ready on months 24 F J-express – (developed in Bergen), talk by Inge Jonassen

42 Research F Microarray based gene network analysis – 2 publications out, 1 in print, 1 submitted F S. Pombe gene expression data analysis (in collaboration with the Sanger Institute) – publication in preparation F New algorithms for clustering and cluster comparison – 2 publications in preparation

43 Transcription factor binding network F Chromatin IP experiments on a chip (ChiP on chip) –Using microarrays for finding genomic (intragenic) sequences (of length of few hundred bp) where a particular transcription factor is likely to bind F ChIP by Lee et al. (Science 2002) – binding site location data in yeast genome for 107 transcription factors (from about 250 yeast transcription factors in total) F Identified around 4500 binding locations

44 ChIP on chip network by Lee et al

45 Gene disruption network A C B D AA BB CC gene B gene C gene D gene A

46 Data for over 200 gene disruptions in Yeast Hughes et al, Cell, 102 (2000)

47 Mutation network for S. Cerevisiae

48 Three networks in yeast F ChIP network (Lee et al) F Mutation network (Hughes et al) F In silico network – matching 38 experimentally known transcription factor binding sites (Pilpel et al) against yeast genome sequence

49 Intersection of the networks Red – 39 arcs present in all networks Green – arcs present in at least 2 networks and adjacent to one of SWI4, SWI6 or MBP1

50 How Chip-chip and disruption networks relate? All genes Transcription factors Disrupted genes t Regulation set of t h Effectual set of h

51 All genes Transcription factors Disrupted genes Regulation set of g Effectual set of g 107 220 11 The overlap between the regulation and effectual sets are higher than expected only for 3 of 11 – –STE 12- pheramone response –GCN 4- aa/purine starvation –SWI 5- cell cycle How Chip-chip and disruption networks relate?

52 All genes Pairs of genes sometimes tend to have highly overlapping effects All genes Transcription factors Disrupted genes t Regulation set of t h Effectual set of h

53 How to verify the possible relationships? F protein-protein interaction (Y2H, cellzome, etc.) F MIPS (C. v. Mering „reference set“)  Co-citation network (PubMed)

54 Overlapping target sets can be explained by p/p-interactions known interaction target set overlap small target set overlap large predicted interaction

55

56 Predicted relationships pheromone response genes cell cycle genes

57 WP13 – dissemination F www.ebi.ac.uk/microarray, www.ebi.ac.uk/arrayexpress, www.mged.org F Talks in over 40 international conferences and workshops F At least 5 peer-reviewed publications (counting only the ones with the major contribution from TEMBLOR) F Contribution to two EMBO courses on microarray data analysis F Trainee students at the EBI F A press-release in preparation

58

59

60 Summary F ArrayExpress public repository is up and running ahead of the schedule, the number of submissions are growing and we already have ‘hits’ and requests from the users (though there is not very much data in yet) F The MIAME and MAGE-ML standards are established (and being finalised). Several TEMBLOR partners have implemented these standards and ArrayExpress is getting submissions F Data analysis tools are under development, prototypes are working and there is a number of peer-reviewed papers out


Download ppt "DESPRAD subproject Alvis Brazma EMBL-EBI Hinxton, October 20, 2003."

Similar presentations


Ads by Google