Presentation is loading. Please wait.

Presentation is loading. Please wait.

European Bioinformatics Institute MGED Society Establishing the infrastructure for sharing microarray data Alvis Brazma European Bioinformatics Institute.

Similar presentations


Presentation on theme: "European Bioinformatics Institute MGED Society Establishing the infrastructure for sharing microarray data Alvis Brazma European Bioinformatics Institute."— Presentation transcript:

1 European Bioinformatics Institute MGED Society Establishing the infrastructure for sharing microarray data Alvis Brazma European Bioinformatics Institute EMBL-EBI Microarray Gene Expression Data Society

2 European Bioinformatics Institute MGED Society Outline F Establishing the infrastructure for sharing microarray data – MGED, MIAME, MAGE-ML, databases F Microarray Informatics at the EBI

3 Microarrays - a tool for the golden age of genome discoveries

4 European Bioinformatics Institute MGED Society Some questions for the golden age of genomics F How gene expression differs in different cell types? F How gene expression changes when the organism develops and cells are differentiating? F How gene expression differs in a normal and diseased (e.g., cancerous) cell? F How gene expression changes when a cell is treated by a drug? F How gene expression is regulated – which genes regulate which and how?

5 European Bioinformatics Institute MGED Society Potential amounts of microarray data F Experiments: ~ 30 000 genes in a human genome ~ 320 cell types in a human organism –2000 compounds for screening –2 concentrations –3 time points –5 replicates F Data ~ 10 12 data-points  1 Tera Byte

6 European Bioinformatics Institute MGED Society Making microarray data available to the public F Authors web-sites F Local, lab based public databases (Stanford University, Whitehead,…) F Journal web-sites F There is a wide community consensus that there is a need for public repositories for microarray data, analogous to DDBJ/EMBL/Genbank for sequence data

7 Raw data Array scans Spots Quantitations Quantitation matrices Genes Samples Gene expression data matrix Gene expression levels Which data to share?

8 Samples Genes Gene expression levels – problem 2 Sample annotations problem 1 Gene annotations Gene expression matrix Annotations

9 hybridisation labelled nucleic acid array RNA extract source Sample treatment elements (spots) Design protocols image quantitation matrix Sample annotation Gene annotation

10 hybridisation labelled nucleic acid array RNA extract Sample elements (spots) Design hybridisation labelled nucleic acid array RNA extract Sample elements (spots) Design hybridisation labelled nucleic acid array RNA extract Sample elements (spots) Design hybridisation labelled nucleic acid array RNA extract Sample elements (spots) Design hybridisation labelled nucleic acid array RNA extract Sample elements (spots) Design Experiment Gene expression data matrix transformation integration Gene expression measurements

11 European Bioinformatics Institute MGED Society Problem 4 F The nature and structure of the above described gene expression data and annotations are complex F For the public repositories to make the maximum use out of these data, standards for representing and communicating it should be established

12 European Bioinformatics Institute MGED Society Standards for microarray data F Understanding and agreement what data and annotations should be provided F Standard controlled vocabularies (ontologies) that can be used in such annotations F Standard format for exchange of annotated data F Understanding how to compare different datasets

13 European Bioinformatics Institute MGED Society Microarray Gene Expression Database meeting was organised in Cambridge, UK, November 1999 to discuss these problems

14 European Bioinformatics Institute MGED Society MGED 1 – some participants F Affymetrix F DDBJ F DKFZ F EMBL F Gene Logic F Incyte F Max Plank Institute F NCGR F NHGRI F Sanger Centre F Stanford University F Uni Pennsylvania F Uni Washington, Seattle F Whitehead Institute

15 European Bioinformatics Institute MGED Society MGED working groups F Experiment annotation F Data exchange format and modelling F Ontologies F Data normalisation and transformations F Queries

16 European Bioinformatics Institute MGED Society MGED meetings MGED 2, Heidelberg, May 2000 MGED 3, Stanford University, April 2001 MGED 4, Boston, February 2002 MGED 5, Tokyo, September 2002

17 European Bioinformatics Institute MGED Society MGED Society was founded in June 2002 Microarray Gene Expression Data (MGED) society is an international organisation for facilitating sharing of functional genomics and proteomics array data Board of 17 directors www.mged.org

18 European Bioinformatics Institute MGED Society MGED standards F Annotation content – MIAME F Data representation and exchange format MAGE-OM (MAGE-ML) – jointly with OMG

19 European Bioinformatics Institute MGED Society MIAME – Minimum Information About a Microarray experiment An attempt to outline the minimum information required to interpret unambiguously and potentially reproduce and verify an array based gene expression experiment www.mged.org/miame

20 European Bioinformatics Institute MGED Society MGED standards

21 hybridisation labelled nucleic acid array RNA extract Sample elements (spots) Design hybridisation labelled nucleic acid array RNA extract Sample elements (spots) Design hybridisation labelled nucleic acid array RNA extract Sample elements (spots) Design hybridisation labelled nucleic acid array RNA extract Sample elements (spots) Design hybridisation labelled nucleic acid array RNA extract Sample elements (spots) Design Experiment Gene expression data matrix normalization integration MIAME – the content (annotation) of all boxes and lines should be given

22 European Bioinformatics Institute MGED Society MIAME ‘checklist’ to authors and reviewers F Experimental design F Samples used, RNE extraction and labelling F Hybridisation F Measurement data and specifications F Array Design –(Row images) –Image quantitation (data and specification) –Gene expression data matrix (data and transformations)

23 European Bioinformatics Institute MGED Society MIAME ‘checklist’ F An open letter was sent to the journals last week - all the information in MIAME ‘checklist’ should be made available as a requirement for accepting publications F The Lancet has indicated that it will adopt MIAME checklist as a requirement F Nature will adjust its policy in the line with MIAME recommendations

24 European Bioinformatics Institute MGED Society A need for a supporting infrastructure F MIAME itself will not solve the problem F A standard format is needed for representing and exchanging this information

25 European Bioinformatics Institute MGED Society MGED standards 2 F Data exchange format – MicroArray Gene Expression Mark-up language – MAGE-ML – an XML based file format able to capture all MIAME required information F Based on object model MAGE-OM (Paul Spellman, Michael Miller, Jason Stewart, Ugis Sarkans, …) F Adopted by OMG as a standard for microarrays www.mged.org/mage

26 Treatment Transformation BioEventExperiment ArrayDesign BioMaterial BioAssayData BioAssay DesignElement UML Packages of MAGE HigherLevelAnalysis BioSequence Array QuantitationType Description Protocol Measurement AuditAndSecurity BQS

27 MAGE – an example diagram

28 European Bioinformatics Institute MGED Society Use case of MAGE: ArrayExpress architecture ArrayExpress (Oracle) Browser MIAMEexpress MAGE-ML (DTD) MAGE-OM MAGE-ML (doc) data loader Velocity template engine Castor object/ relational mapping Web page template Web page template Java servlets Tomcat

29 European Bioinformatics Institute MGED Society MGED standards 3 F MGED ontologies – organism part, cell type, diseased state, genotype, chemical compounds (Chris Stoeckert, Helen Parkinson, Susanna Sansone,…) F Symposium “Standards and Ontologies for Functional Genomics” – November 17-20, Cambridge, UK www.mged.org/ontology

30 European Bioinformatics Institute MGED Society MGED standards 4 F Data transformation and normalisation (Cathy Ball, John Quackenbush, Gavin Sherlock, …) www.mged.org/normalization www.mged.org/normalization

31 European Bioinformatics Institute MGED Society Infrastructure for sharing microarray data F Standard for experiment annotation F Standard for data exchange F Public repositories F Local databases and LIMS F Ways of comparing the data

32 European Bioinformatics Institute MGED Society ArrayExpress – a MIAME/MAGE supportive public repository for microarray data at EBI ArrayExpress MIAMExpress Expression Profiler MAGE-ML Internet www MAGE-ML Submissions Queries, Analysis

33 European Bioinformatics Institute MGED Society Microarray data sharing infrastructure Public repositories MAGE-ML ww w Data queries, retrieval, and analysis Data submissions Array descriptions (from manufacturers) Data analysis software MIAMExpress local instalations LIMS MAGE-ML LIMS Data analysis software html Other databases MAGE-ML ww w wwwwww

34 European Bioinformatics Institute MGED Society MIAME/MAGE supportive software F Sanger Institute LIMS (MIDAS) F TIGR LIMS F Gene Traffic (Iobion) F Affymetrix F MAXDB (Manchester) F Rosetta Resolver (Rosetta Biosoftware) F Base (Lund) F J-Express (Molmine) F MIAMExpress (EBI) F ArrayExpress (EBI)

35 European Bioinformatics Institute MGED Society Acknowledgements F MGED board –Cathy Ball (Stanford) –Helen Causton (Imperial Col) –Terry Gaasterland (Rockefel) –Jason Gonzales (Iobion) –Pascal Hingamp (Marseille) –Barbara Jasny (Science) –Helen Parkinson (EBI) –John Quackenbush (TIGR) –Martin Ringwald (Jackson) –Gavin Sherlock (Stanford) –Paul Spellman (Berkely) –Jason Stewart (Open Inf) –Chris Stoeckert (Uni Penns) –Yoshio Tateno (DDBJ) –Ron Taylor (Colorado) –Charles Troup (Agilent) –MGED supporters –Rob Andrews (Sanger) –Wilhelm Ansorge (EMBL) –Mike Cherry (Stanford) –Peter Dansky (Affymetrix) –David Hancock (Manchester) –Frank Holstege (Utrecht) –Michael Miller (Rosetta) –Kate Rice (Sanger) –Christian Schwager (EMBL) –Joe White (TIGR) –Rick Young (MIT) –EBI Microarry Team –Niran Abeygunawardena –Helen Parkinson –Philippe Rocca-Sera –Susanna Sansone –Ugis Sarkans –Mohammadreza Shojatalob –Jaak Vilo

36 Microarray informatics at the EBI F ArrayExpress (Helen Parkinson) F Expression profiler data analysis tool and promoter analysis (Jaak Vilo) F Reconstructing and analysing gene networks

37 European Bioinformatics Institute MGED Society Gene Networks – graphs: nodes are genes, arcs are relationships

38 European Bioinformatics Institute MGED Society Different ways to build a gene network G1G2 - The product of gene G1 is a transcription factor, which binds to the promoter of gene G2 – physical interaction network G1G2 - The disruption of gene G1 changes the expression level of gene G2 – data interpretation network G1G2 - Gene G2 is mentioned in a paper about gene G1 – literature networks

39 Data for over 200 gene disruptions in Yeast Hughes et al, Cell, 102 (2000)

40 European Bioinformatics Institute MGED Society Discretization of the data: The normalized expression log(ratios) are discretized using different thresholds  = 2 , 2.1 , …, 4  : X <    d(X) =  1    X    d(X) = 0 X >   d(X) = 1

41 European Bioinformatics Institute MGED Society Gene disruption network A C B D AA BB CC gene B gene C gene D gene A

42 Data for over 200 gene disruptions in Yeast Hughes et al, Cell, 102 (2000)

43 European Bioinformatics Institute MGED Society Mutation network for S. Cerevisiae

44 European Bioinformatics Institute MGED Society Mutation network   =2, filtered for the genes marked in red (mating) Thomas Schlitt, Johan Rung

45 European Bioinformatics Institute MGED Society Comparison to literature network derived from YPD Result Overlap between calculated networks and YPD-graph is always larger than overlap between randomised networks and the YPD-graph

46 European Bioinformatics Institute MGED Society Network modularity F Is there one “big” dominant connected component and possibly a number of small components, or several components of comparable sizes? F Can the network be broken down in several components of comparable size by removing nodes of high degree (i.e., nodes with many incoming or outgoing edges)?

47 European Bioinformatics Institute MGED Society

48 European Bioinformatics Institute MGED Society

49 European Bioinformatics Institute MGED Society

50 Number of connected components in the networks  componentfull network 1% removed 5% removed 10% removed 2.0 largest second total 5383 1 4707 1 3682 2 2614 5 2 3.0 largest second total 3556 2 2461 2 1385 4 9 764 6 17 4.0 largest second total 2354 3 4 1205 3 7 542 6 22 45 28 51

51 European Bioinformatics Institute MGED Society Other opinions F Wagner, 2002 (Genome Res) – there exists many independent modules F Feathersone, 2002 (Bioessays) - there is only one giant module F All depends on the definition of the ‘module’

52 European Bioinformatics Institute MGED Society Disruption network properties F In and out degree of genes distributed according to power-low F There are no obvious modules in this particular network F ‘Local’ networks make sense (J.Rung, T.Schlitt et al, to appear in ECCB special issue of Bioinformatics)

53 European Bioinformatics Institute MGED Society Gaurab Mukherjee, Alvis Brazma, Gonzalo Garcia Lara, Ugis Sarkans, Koichi Tazaki, Ahmet Ociamen, Helen Parkinson, Mohammadreza Shojatalab, Thomas Schlitt, Katja Kivinen, Misha Kapushesky, Ele Holloway, Nastja Samsonova, Philppe Rocca-Serra, Johan Rung, Niran Abeygunawardena, Susanna Sansone, Jaak Vilo Microarray Informatics at the EBI


Download ppt "European Bioinformatics Institute MGED Society Establishing the infrastructure for sharing microarray data Alvis Brazma European Bioinformatics Institute."

Similar presentations


Ads by Google