CASIMIR Networking Meeting Heathrow, July 2007 CASIMIR WP4 Data Representation John Hancock Duncan Davidson.

2 CASIMIR Networking Meeting Heathrow, July 2007 Objectives Assessment of technical aspects of database interoperability as a barrier to scientific and financial sustainability Assessment of the variability of practice in the semantics of biological data representation, e.g. genotype, gene expression Assessment of emerging standards and current practice for data representation, annotation and ontologies

3 CASIMIR Networking Meeting Heathrow, July 2007 4.1 - D9 - Classified list of data representations in European mouse-centric and related databases 4.4 - Network meeting 1 - June-Sep 07 - Bring together bioinformatics reps from (EU-funded) mouse projects to discuss data representation 4.4 - Joint work package meeting to discuss results (4-5 Oct 07) 4.5 - Sep - Dec 07 - Report of network meeting 4.6 - Present conclusions at meetings

4 CASIMIR Networking Meeting Heathrow, July 2007 Discussion Points What do we understand by “data representation” - is it just CVs/Ontologies? –Interaction with other work packages What kinds of data? What ontologies? How many on the PRIME list do you use? Do you use others? Do you use OBO ontologies by default? What processes are they involved in elsewhere to discuss/unify data representation?

5 CASIMIR Networking Meeting Heathrow, July 2007 Future: Cross-Species Interactions Mouse-Human must be a priority because of the disease angle Mouse-Rat - already quite well integrated (?To what extent?) because of MGI-RGD-OBO interactions Other important models –Chick (ChickEST (UK), ChickVD (CN), Ensembl, others?) –Xenopus –Zebrafish –Drosophila –C. elegans –Yeast, E.coli In longer term get together with community reps to discuss similarities & differences

6 CASIMIR Networking Meeting Heathrow, July 2007 Extant Resources PRIME Expert Group Report and Outcomes Euromouse Interphenome discussion group & pilots EUMORPHIA/EUMODIC bioinformaticians

7 CASIMIR Networking Meeting Heathrow, July 2007 PRIME Expert Group Draft lists of: –Databases –Ontologies

8 CASIMIR Networking Meeting Heathrow, July 2007 Interphenome Phenotype data: –Common data description –Common protocol description –Standard for data exchange

9 CASIMIR Networking Meeting Heathrow, July 2007 Interphenome - Current Status Ontologies –Investigate cross-mapping of current approaches and eventual possible convergence (?) Protocols –Work on developing a format that can accommodate all information needed for a protocol –Encode this as an XML schema –PPML? Data Exchange –Work on an XML schema that will allow structured exchange of phenotype data and metadata - started work on this in EUMODIC Publication in Mammalian Genome 18, 157-163 (March 2007): “Integration of Mouse Phenome Data Resources” By The Mouse Phenotype Database Integration Consortium

10 CASIMIR Networking Meeting Heathrow, July 2007 WP4 - 1st Actions Update the PRIME list of European mouse projects Also identify “mouse-related” projects Identify contacts To hold a meaningful dialogue, get as many as possible to a networking meeting

11 CASIMIR Networking Meeting Heathrow, July 2007 Ontologies - So Far We have a little list Test how many of these are actually in use - Questionnaire Check how up to date it is, and track developments (e.g. Relationships Ontology, potential Synapse Ontology)

12 CASIMIR Networking Meeting Heathrow, July 2007 The CASIMIR Questionnaire 1a. Are you using a relational database, object database or flat files? 1b. If relational, what is your chosen RDBMS (Relational Database Management System)? 2a. Is your database providing external links to other on-line resources; possibly via URL/HTTP (if yes please name them)? 2b. Supported/Installed Web Services (if yes please name them)? Do you plan to install or develop web services in the near future?

13 CASIMIR Networking Meeting Heathrow, July 2007 The CASIMIR Questionnaire 3a. Please list the sorts of data entities you store (e.g. protein sequence data, mouse strain information etc...) 4a. Can you provide a brief explanatory description/schema of your data/data structure? 4b. Are you willing to provide a entity relationship diagram and would you be willing to provide it under an open source license?

14 CASIMIR Networking Meeting Heathrow, July 2007 The CASIMIR Questionnaire 5a.Are you currently using or do you intend to use any ontologies or controlled vocabularies to describe your data? 5b. Do you plan to expand your use of ontologies in future? 5c. Do you use OBO ontologies? 5d. Do you perceive the need for additional ontologies to serve your domain of knowledge?

15 CASIMIR Networking Meeting Heathrow, July 2007 The CASIMIR Questionnaire 6. Do you make use of Minimum Information standards (such as MIAME for microarray experiments) to describe any data? If so, which ones? If you do not make use of these standards, are you likely to do so in future?

16 CASIMIR Networking Meeting Heathrow, July 2007 Minimum Standards MIAME - Brazma et al (2001) Nat. Genet. 29, 365-71

17 CASIMIR Networking Meeting Heathrow, July 2007 The CASIMIR Questionnaire 7. What do you perceive as the main limiting factor in data representation/interoperability etc. in European bioinformatics databases? 8. Do you have any comments/thoughts on standards for data representation that need to be developed or that you might like discussed in CASIMIR?

18 CASIMIR Networking Meeting Heathrow, July 2007 The CASIMIR Questionnaire Please fill it in as soon as humanly possible! We will be chasing around database coordinators over the next few months to make sure we have as much information as possible

19 CASIMIR Networking Meeting Heathrow, July 2007 Agenda for Today Reports from some databases: –MUGEN - Christina Chandras –EMMA - Glenn Proctor –EUMODIC - Niels Adams –EUCLIS - Eduardo Mendoza Discussion, e.g. –Comments on the questionnaire/CASIMIR’s aims –How to get widest possible participation –What do people see as the main obstacles to the aim of integrating all this data?

20 CASIMIR Networking Meeting Heathrow, July 2007 Mouse to Human DISEASE Phenotypic Attributes HumanHuman MouseMouse PHENOTYPING Phenotypic Measures

