The Royal Society London, May 19-21st, 2010Mouse models for human disease Phenotype database interoperability and integration Damian Smedley, EBI.

Slides:



Advertisements
Similar presentations
The library as a virtual research environment Bill Hubbard SHERPA Project Manager University of Nottingham.
Advertisements

Mouse Phenotype Ontology George Gkoutos. Phenotype Annotation Traditional phenotypic descriptions are captures as free text Information retrieval based.
Wincite Knowledge Warehousing and Networking Sophisticated Simplicity.
Mouse Phenotyping Informatics Infrastructure (MPI2) Vivek Iyer, Hugh Morgan, Henrik Westerberg, Terry Meehan and Helen Parkinson EMBL-EBI.
Rafael C Jimenez DAS DAS Workshop 2012 February 27-29, 2012 Using DAS software, an introduction to some DAS implementations.
MouseMine: Mouse Gene Lists (and a whole lot more) Joel Richardson.
Users can now register interest in genes Will receive updates on knockout strain production mousephenotype.org The IMPC home page that provides access.
Genomic Innovations- Orthology Paralogy. Genomic innovation.
Collaboration with IntAct and InterMine: SGD Rama Balakrishnan Saccharomyces Genome Database Gene Ontology Consortium Stanford University, CA USA.
An International Centre for Mouse Genetics MINING PHENOTYPE DATABASES TO IDENTIFY MOUSE MODELS OF CLINICAL RELEVANCE Michelle Simon and Ann-Marie Mallon.
Guillaume Berthommier¹, Dominique Santiard-Baron², Olivier Poch¹ and Raymond Ripp¹ ¹ Laboratoire de BioInformatique et Génomique Intégratives IGBMC (CNRS.
Bioinformatics Needs for the post-genomic era Dr. Erik Bongcam-Rudloff The Linnaeus Centre for Bioinformatics.
1 Enriching UK PubMed Central SPIDER launch meeting, Wolfson College, Oxford Paul Davey, UK PubMed Central Engagement Manager.
Community of Science The Leading Internet Site for Researchers Worldwide
International Mouse Phenotyping Consortium Mark Moore, Ph.D.
KNOWLEDGE MANAGEMENT AT ACCENTURE
Data Mining in Ensembl with EnsMart. 2 of 24 All genes from a candidate region Genes with a particular protein domain Members of a protein family Genes.
Bioinformatics. Analysis of proteomic data. Dr Richard J Edwards 28 August 2009; CALMARO workshop. ©Gary Larson (In not much detail)
Mouse Genome Informatics November 2008 Paul Szauter MGI User Support.
UniProt - The Universal Protein Resource
Slide 1 University Systems Project Profiling research and maximising the benefits of the Themis November 2005.
QCDgrid Technology James Perry, George Beckett, Lorna Smith EPCC, The University Of Edinburgh.
VectorBase A Resource Centre for Invertebrate Hosts of Human Pathogens Bob MacCallum Imperial College London.
CASIMIR Networking Meeting Heathrow, July 2007 CASIMIR WP4 Data Representation John Hancock Duncan Davidson.
Biological Annotation in R Manchester R, 13th Nov, 2013 Nick Burgoyne Bioinformatician, fiosgenomics
Monterotondo Cryopreservation Course, October 20-24, 2014 INFRAFRONTIER-I3 CNR, THE JACKSON LAB, EMMA LABORATORY COURSE ON CRYOPRESERVATION.
BioMart A Federated Query Architecture Arek Kasprzyk European Bioinformatics Institute 26 April 2004.
Challenges for the study of disease in the 21 st century Characterise the function of every gene in the mammalian genome Generate mutations in every gene.
The ICDP Information Network Telework and Information Management in Scientific Drilling Projects Jens Klump and Ronald Conze GeoForschungsZentrum Potsdam.
Copyright OpenHelix. No use or reproduction without express written consent1.
BioMart Databases made easy Richard Holland European Bioinformatics Institute Helsinki, September 2006.
Collaborative Markup of Library and Research Data Examples from Ontario Council of University Libraries (OCUL)
QCDGrid Progress James Perry, Andrew Jackson, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.
1 st International Semantic Web Conference (ISWC2002) Sardinia, Italy, June 2002.
BioMart and CHADO Arek Kasprzyk GMOD meeting 16 May 2005.
John Womersley John Womersley Director, Science Programmes Science and Technology Facilities Council Technology Gateway Centres.
VectorBase Gene expression data in VectorBase Fotis Kafatos, George Christophides, Bob MacCallum & Seth Redmond Imperial College London (thanks also to.
1 of 38 Data Mining in Ensembl with BioMart. 2 of 38 Simple Text-based Search Engine.
Module 4: Understanding KO designs Mark Thomas Wellcome Trust Sanger Institute.
Data Mining in Ensembl with BioMart Nov,
An International Centre for Mouse Genetics EuroPhenome and the International Mouse Phenotyping Consortium John Hancock MRC Harwell.
Data provenance in biomedical discovery Donald Dunbar Queen’s Medical Research Institute University of Edinburgh Workshop on Principles of Provenance in.
Building WormBase database(s). SAB 2008 Wellcome Trust Sanger Insitute Cold Spring Harbor Laboratory California Institute of Technology ● RNAi ● Microarray.
A L I M E N T A T I O N A G R I C U L T U R E E N V I R O N N E M E N T 1 G20 – 12th May 2011 An International Research Initiative for Wheat Improvement.
MUGEN MICE DATABASE (MMdb) (
Variation data in VectorBase NIH/NIAID VectorBase site visit March 2015.
Data Mining in Ensembl with BioMart Giulietta Spudich.
WTSI Mouse Genetics Programme CASIMIR Meeting, July 2007.
A curated database of biological pathways.
Search Functions Simple Search Advanced Search.
 9 European Countries  1 Third Country  14 Research Centers of Excellence  5 Universities  4 SMEs  1 Venture Capital.
Data Integration & Data Mining Tool Donald Dunbar BHF CoRE Bioinformatics Team Edinburgh Bioinformatics Meeting April 2013.
ArrayExpress - a Public Repository for Microarray Based Gene Expression Data European Bioinformatics Institute - EMBL outstation and German Cancer Research.
The (IMG) Systems for Comparative Analysis of Microbial Genomes & Metagenomes: N America: 1,180 Europe: 386 Asia: 235 Africa: 6 Oceania: 81 S America:
BioMart Federated Database Architecture Arek Kasprzyk EBI 9 June 2005.
IMDB: A Generic Insertional Mutagenesis Database Xiaokang Pan and Lincoln Stein Cold Spring Harbor Laboratory.
1 I.U. Professional Opportunities Orientation Program Kristin Gaines Manager, Global Financial Support & Services.
Anatomy Ontologies & Potential Users: Bridging the Gap Ravensara Travillian European Bioinformatics Institute
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
An International Centre for Mouse Genetics CASIMIR WP4 Data Representation John Hancock MRC Harwell.
Charaka Palansuriya EPCC, The University of Edinburgh An Alarms Service for Federated Networks Charaka.
Year 12: Unit 2, living in the digital world. 1. What is ICT? ICT is the use of technology to convert data to information. It covers many areas, especially.
ProteomeXchange: Data Deposition … but where? Questions about submission: Which repository should I submit to? Should I submit to more than one? Do I need.
Introduction: AstroGrid increases scientific research possibilities by enabling access to distributed astronomical data and information resources. AstroGrid.
Towards a unified MOD resource: An Overview
Data Mining with BioMart
The Integrated Microbial Genome (IMG) systems
The Most Visited Countries
GGUS Partnership between FZK and ASCC
1) Brazil Greece 33⁰ 13⁰.
Presentation transcript:

The Royal Society London, May 19-21st, 2010Mouse models for human disease Phenotype database interoperability and integration Damian Smedley, EBI

The Royal Society London, May 19-21st, 2010Mouse models for human disease Why do we need data integration and interoperability?

The Royal Society London, May 19-21st, 2010Mouse models for human disease Centralised vs distributed solutions Genomics MGI Ensembl IKMC projects KOMPEUCOMMNorCOMM Eurexpress /GXD etc JaxMice Phenotype/Expression Strains IMSREMMA Europhenome TIGM portal Centralised warehouse v1 Central database Centralised warehouse v2Distributed solution nightly data syncs web services

The Royal Society London, May 19-21st, 2010Mouse models for human disease Centralised solutions Advantages –Better query performance for large datasets –Easier to analyse raw data in one location Disadvantages –Regular data deposition is non-trivial –Designing a single schema to store different types of data is not simple. –Persuading people to “give up” their data/databases/websites –Will still need to make interoperable with other data sources

The Royal Society London, May 19-21st, 2010Mouse models for human disease Distributed solutions Advantages –Domain expertise at production site exploited –Different types of data easily integrated as long as they share something in common such as a gene identifier –No need for nightly data flow to keep data up to date –No need for redundant data in each database –Easier to persuade people to collaborate in a distributed scenario Disadvantages –Technical knowledge required to deploy the web services –Potential query performance problems for large datasets (may need to provide summary level data) –Potential problems performing analysis over all datasets –Problems with services going down

The Royal Society London, May 19-21st, 2010Mouse models for human disease 1000 Genomes - centralisation

The Royal Society London, May 19-21st, 2010Mouse models for human disease International Cancer Genome Consortium Canada Pancreas Australia Pancreas China Stomach Japan Liver (virus related) France Liver (alcohol-related) Breast (HER2+ve) UK Breast (several subtypes) Spain CLL India Oral Cavity

The Royal Society London, May 19-21st, 2010Mouse models for human disease ICGC - distributed

The Royal Society London, May 19-21st, 2010Mouse models for human disease Joint Ensembl and EurExpress query

The Royal Society London, May 19-21st, 2010Mouse models for human disease IKMC portal: knockoutmouse.org GXD EurexpressNorCOMM EUCOMM KOMP TIGM EMMA KOMP rep CMMR IMSR Ensembl CREATE Europhenome

The Royal Society London, May 19-21st, 2010Mouse models for human disease IKMC interoperability strategy IKMC Sanger, UK ES cells + lines EMMA (UK), KOMP (USA), CMMR (Canada) Harwell, UK Phenotype(EuroPhenome etc) JAX, USA MGI Edinburgh, UK EURExpress Sanger, UK Ensembl JAX, USA GXD CREATE EBI, UK BioMart query interface(s) MGI ID

The Royal Society London, May 19-21st, 2010Mouse models for human disease

The Royal Society London, May 19-21st, 2010Mouse models for human disease Europhenome: raw and summary data

The Royal Society London, May 19-21st, 2010Mouse models for human disease Possible strategy for phenotype data BioMart query interface(s) IKMC Sanger, UK ES cells + lines EMMA (UK), KOMP (USA), CMMR (Canada) MGI ID JAX, USA MGI Edinburgh, UK EURExpress Sanger, UK Ensembl MGI ID JAX, USA GXD MGI ID CREATE EBI, UK Central database High thoughput phenotyping centres Presentation of raw results Analysis to assign phenotypes to genes MGI ID High throughput phenotyping

The Royal Society London, May 19-21st, 2010Mouse models for human disease Linking from IKMC portal Phenotyping Phenotype searches

The Royal Society London, May 19-21st, 2010Mouse models for human disease Linking from IKMC portal

The Royal Society London, May 19-21st, 2010Mouse models for human disease

The Royal Society London, May 19-21st, 2010Mouse models for human disease Acknowledgements The whole CASIMIR consortium and in particular: Paul Schofield, Michael Gruenberger, Chao-Kung Chen, George Gkoutos, Ann-Marie Mallon, John Hancock : MouseFinder tool. MartSearch: Vivek Iyer, Darren Oakley, Bill Skarnes BioMart: Arek Kaspryzk, Syed Haider, Edoardo Marcora