EnVisioning Data Integration SME forum 2009, Vienna Henning Hermjakob Henning Hermjakob
Henning Hermjakob Enfin Experiment Model EnCore
Henning Hermjakob Use cases 1.Target user group: Bioinformaticians, programmatic access 2.Simple 1.Set of “interesting” Affymetrix ids, 2.Get the relevant UniProt accession numbers 3.Get the surrounding interaction networks from IntAct 3.A bit more 1.Set of differentially expressed proteins in Pride Find experiments with “similar” set of regulated genes Get Reactome pathways Expand protein set by IntAct, then get Reactome pathways 4.Even more: EnVision
Red edges: Bouwmeester et al, 2005 Green edges: Rual et al, 2005 Violet edges: Stelzl et al, 2005
Henning Hermjakob Infrastructure Shallow integration easy addition of resources independent resources minimal centralisation easier to maintain very flexible Common Service Interface established standards well defined schema
Henning Hermjakob Diverse web service world External service External service SOAP XML REST CSV plain text PERL API JAVA API SOAP XML REST CSV plain text PERL API JAVA API database access analysis tools database access analysis tools Multiple manual connections with possibly multiple technologies Multiple result files which have to be combined manually Difficult to keep audit trail Much work to reproduce Multiple manual connections with possibly multiple technologies Multiple result files which have to be combined manually Difficult to keep audit trail Much work to reproduce ? ? ? ? ? ? ? ?
Henning Hermjakob EnCORE Enfin XML EnCORE service Enfin XML EnCORE service Enfin XML EnCORE service Enfin XML EnCORE service EnVISION User interface & representation heterogeneous external world standardised EnCORE world External service External service Single entry point One technology No manual combination of results Audit trial build in Visualisation build in Easy to reproduce Single entry point One technology No manual combination of results Audit trial build in Visualisation build in Easy to reproduce ! !
Henning Hermjakob ENFIN XML enXml – the EnCORE data exchange format XML schema standard interface to services simple and easy to understand structure generic to allow various data types stores service results and keeps an audit trail minimal restrictions for data representation high degree of freedom modelling user data need for modelling guidelines to ensure service interoperability
Henning Hermjakob enXml document graph EnsMartIntAct start toUniProt ppiExpand s2 s12 s26 s28 s27s29 Molecules Sets Experiments 1993_s_at BRCA1 BRAP Q5ST83 H2AFX
Henning Hermjakob enXml document graph EnsMartIntAct start toUniProt ppiExpand s2 s12 s26 s28 s27s29 Molecules Sets Experiments 1993_s_at BRCA1 BRAP Q5ST83 H2AFX Source relation
Henning Hermjakob Existing EnCore web services AffyMetrixprobe set ID to protein ID mapping ArrayExpressmicro array data BioModelssearch for biological models CellMINTprotein localization information g:GOStprotein grouping, functional profiling IntActprotein interactions KEGG pathwaypathway search PICRProtein Identifier Cross Reference PRIDEprotein identification Reactomepathway search UniProtprotein information retrieval Utility generation of ENFIN XML from protein IDs
Henning Hermjakob EnCORE Enfin XML EnCORE service Enfin XML EnCORE service Enfin XML EnCORE service Enfin XML EnCORE service EnVISION User interface & representation heterogeneous external world standardised EnCORE world External service External service Single entry point One technology No manual combination of results Audit trial build in Visualisation build in Easy to reproduce Single entry point One technology No manual combination of results Audit trial build in Visualisation build in Easy to reproduce ! !
Henning Hermjakob Synchronous communication - doService performs service with standard parameters - doServiceAdv performs service with custom parameters - doServiceTest only echoes the input clientclient serviceservice ENFIN XML ENFIN XML ENFIN XML ENFIN XML call service
Henning Hermjakob Domaination protein domain prediction tool analysis tool, not only data retrieval service possible long run times sync communication inadequate initiator for async communication model
Henning Hermjakob Asynchronous web services - doServiceAsync submits service with standard parameters & returns job ticket - getStatus reports the status of the job with specified ticket - retrieveResult returns the result of job with specified ticket serviceservice ENFIN XML ENFIN XML ENFIN XML ENFIN XML ticket number status clientclient submit loop retrieve if status OK
Henning Hermjakob EnCore use Primarily designed as framework for bioinformaticians Write your own client to access one or multiple services (example clients available in different programming languages) Very flexible access, can be tailored to your specific needs Full control over the client and its functionality Create your own services to extend the functionality of EnCORE Semi-automatic WSDL wrapper generation for services Workflow control with Taverna (Prototype)
Henning Hermjakob EnVision: Application of EnCore in a semi-fixed data flow Easier to demonstrate functionality than by showing a bunch of WSDLs Production application for the analysis of (proteomics) datasets Source for biologist feedback EnVision(1): Technically oriented demonstrator, access to XML configuration files, XSLT output generation EnVision2: “Friendly” end user application Beta version EnVision
Henning Hermjakob
Protein Identifier Space Translation PICR translates between ca. 20 protein identifier spaces Based on sequence identity Shows all known sequence-identifier associations, both historic and current Based on UniParc archive of 18 million public protein sequences Interactive use and computational access (web service, REST) Côté RG, et al.: The Protein Identifier Cross-Referencing (PICR) service: reconciling protein identifiers across multiple source databases. BMC Bioinformatics Oct 18;8:401.
Protein Identifier Space Translation
Ontology Lookup Service
Côté RG, Jones P, Martens L, Apweiler R, Hermjakob H.: The Ontology Lookup Service: more data and better tools for controlled vocabulary queries. Nucleic Acids Res May 8.
Tying databases together: DAS DAS Registry DAS Proxy DAS Servers DAS Infrastructure User
- Lightweight integration
Henning Hermjakob Acknowledgements EU FP6 LSHG-CT Pascal Kahlem
Henning Hermjakob ?
Henning Hermjakob
Henning Hermjakob
Henning Hermjakob Examples of data modelled in enXml Enfin IntAct service: find interaction partners enfin-intact ID2 ID56 Enfin Reactome service: find pathways from protein list enfin-reactome ID8 ID13 ID14 true
Henning Hermjakob EnCORE Enfin XML EnCORE service Enfin XML EnCORE service Enfin XML EnCORE service Enfin XML EnCORE service EnVISION User interface & representation heterogeneous external world standardised EnCORE world External service External service Single entry point One technology No manual combination of results Audit trial build in Visualisation build in Easy to reproduce Single entry point One technology No manual combination of results Audit trial build in Visualisation build in Easy to reproduce ! !
Henning Hermjakob EnVision