Open PHACTS “Data integration for all” Andrew Leach.

Slides:



Advertisements
Similar presentations
OMV Ontology Metadata Vocabulary April 10, 2008 Peter Haase.
Advertisements

The use of Ontology in Organising and Managing Protein Family Resources Katy Wolstencroft, University Of Manchester.
Supporting Engagement in Open Access: a Publishers Perspective
Improving the sharing of NICE content via syndication: what the future could hold Andrew Fenton CIO NICE 20 March 2014.
CHORUS Implementation Webinar May 16, 2014 Mark Martin Assistant Director, Office of Scientific and Technical Information Office of Science U.S. Department.
2014Katrin Stierand Accessing OpenPHACTS: Interactive exploration of compounds and targets from the semantic web Katrin Stierand.
Open PHACTS Easy API Community Workshop, June 25, 2014 Christine Chichester Swiss Institute of Bioinformatics.
Help communities share knowledge more effectively across the language barrier Automated Community Content Editing PorTal.
Educate to Innovate A SusChem programme for building skills capacity for a sustainable European chemical sector Susan Fleet - Britest Limited Professor.
Antonis Loizou (some slides created by Paul Groth) VU University Amsterdam LDBC TUC Meeting.
Royal Society of Chemistry developments to support open drug discovery Antony Williams, Ken Karapetyan, Valery Tkachenko, Colin Batchelor Alexey Pshenichnov.
1 Enriching UK PubMed Central SPIDER launch meeting, Wolfson College, Oxford Paul Davey, UK PubMed Central Engagement Manager.
August 29, 2002InforMax Confidential1 Vector PathBlazer Product Overview.
1 BrainWave Biosolutions Limited Accelerating Life Science Research through Technology.
Domain Modelling the upper levels of the eframework Yvonne Howard Hilary Dexter David Millard Learning Societies LabDistributed Learning, University of.
Open PHACTS: a precompetitive infrastructure for pharmacological research Bryn Williams-Jones.
Work Package 3 SEE cluster policy learning platform.
1 The Discovery Informatics Framework Pat Rougeau President and CEO MDL Information Systems, Inc. Delivering the Integration Promise American Chemical.
Key integrating concepts Groups Formal Community Groups Ad-hoc special purpose/ interest groups Fine-grained access control and membership Linked All content.
The Open PHACTS Discovery Platform Open PHACTS for Academia.
European Life Sciences Infrastructure for Biological Information ELIXIR
Dr. Nikos Houssos| National Documentation Centre / NHRF European Network of National Contact Points for Research Infrastructures moving forward The CERIF-based.
Paul Groth VU University Amsterdam Convergence Meeting: Semantic Interoperability for Clinical Research & Patient.
The Innovative Medicines Initiative (IMI) High level the IMI Concept, Strategic Research Agenda and Call topics Eva Lindgren.
Save time. Reduce costs. Find and reuse interoperability solutions on Joinup for developing European public services Nikolaos Loutas
Using the Open Metadata Registry (openMDR) to create Data Sharing Interfaces October 14 th, 2010 David Ervin & Rakesh Dhaval, Center for IT Innovations.
U.S. Department of the Interior U.S. Geological Survey CDI Webinar Sept. 5, 2012 Kevin T. Gallagher and Linda C. Gundersen September 5, 2012 CDI Science.
The Open Pharmacological Concepts Triple Store
Open PHACTS in a few slides. Why? Public Domain Drug Discovery Data: Pharma are accessing, processing, storing & re-processing each company x.
Big Data Supporting Drug Discovery Cautionary Tales from the World of Chemistry for Translational Informatics Valery Tkachenko RSC-CSIR/OSDD meeting Pune,
Monograph Development Process for the European Pharmacopoeia: How to participate in the work programme of the European Pharmacopoeia Dr Claude Coune Head.
1 NEST New and emerging science and technology EUROPEAN COMMISSION - 6th Framework programme : Anticipating Scientific and Technological Needs.
September 13, 2007SGH&M2B International&Training Workshop What does the European Technology Platform “Innovative Medicines initiative” offer? Vitalijs.
EU Project proposal. Andrei S. Lopatenko 1 EU Project Proposal CERIF-SW Andrei S. Lopatenko Vienna University of Technology
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
ChEMBL– Open Access Database For Drug Discovery By – Udghosh Singh M.S.(Pharm), 3 rd Sem Pharmacoinformatics.
Pathway Interaction Database (PID) Market Research BioPortals Tiger Team Meeting Mervi Heiskanen January 31, 2013.
Marrying ACD/Labs technologies to eScience Projects at the Royal Society of Chemistry Antony Williams ACD/Labs User Meeting June 2013.
Page 1 SCAI Dr. Marc Zimmermann Department of Bioinformatics Fraunhofer Institute for Algorithms and Scientific Computing (SCAI) Grid-enabled drug discovery.
Linking Tasks, Data, and Architecture Doug Nebert AR-09-01A May 2010.
W HAT IS I NTEROPERABILITY ? ( AND HOW DO WE MEASURE IT ?) INSPIRE Conference 2011 Edinburgh, UK.
1 Direction scientifique Networks of Excellence objectives  Reinforce or strengthen scientific and technological excellence on a given research topic.
17 th October 2005CCP4 Database Meeting (York) CCP4(i)/BIOXHIT Database Project: Scope, Aims, Plans, Status and all that jazz Peter Briggs, Wanjuan Yang.
MEDIN Work Plan for By March 2011 MEDIN will be 3 years into the original 5 year development plan started in Would normally ask for continued.
SKOS. Ontologies Metadata –Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies –Provide.
The PHEA Educational Technology Initiative. Project Partners PHEA Foundations – Ford, Carnegie, Kresge, MacArthur South African Institute for Distance.
Construction of Shanghai Life Science & Bio-technology Service Platform for Data Access and Sharing International Workshop on Strategies Presentation of.
Consultant Advance Research Team. Outline UNDERSTANDING M&E DATA NEEDS PEOPLE, PARTNERSHIP AND PLANNING 1.Organizational structures with HIV M&E functions.
Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford.
Clinical research data interoperbility Shared names meeting, Boston, Bosse Andersson (AstraZeneca R&D Lund) Kerstin Forsberg (AstraZeneca R&D.
The Open PHACTS Ecosystem Fostering a user community Open PHACTS Community Workshop June 2014.
ISWG / SIF / GEOSS OOS - August, 2008 GEOSS Interoperability Steven F. Browdy (ISWG, SIF, SCC)
Informatics for Scientific Data Bio-informatics and Medical Informatics Week 9 Lecture notes INF 380E: Perspectives on Information.
The opportunities and challenges of sharing genomics data with the pharmaceutical industry Shahid Hanif, Head of Health Data & Outcomes, ABPI DNA digest.
Indiana University School of Indiana University ECCR Summary Infrastructure: Cheminformatics web service infrastructure made available as a community resource.
Semantic Graph Mining for Biomedical Network Analysis: A Case Study in Traditional Chinese Medicine Tong Yu HCLS
Implementing chemistry platform for OpenPHACTS: Lessons learned
Building linked-data, large-scale chemistry platform: challenges, lessons and solutions Valery Tkachenko, Alexey Pshenichnov, Aileen Day, Colin Batchelor,
Ian Bruno, Suzanna Ward The Cambridge Crystallographic Data Centre
EUDAT: collaborative pan-European infrastructure providing research data services, training and consultancy This work is licensed.
RDA US Science workshop Arlington VA, Aug 2014 Cees de Laat with many slides from Ed Seidel/Rob Pennington.
ELIXIR Core Data Resources and Deposition Databases
Open PHACTS 1.3 Release ( triples)
YourDataStories: Transparency and Corruption Fighting through Data Interlinking and Visual Exploration Georgios Petasis1, Anna Triantafillou2, Eric Karstens3.
What is €5 billion worth? Magda Gunn, IMI Scientific Project Manager.
Sponsored by the University of Southampton
Health Ingenuity Exchange - HingX
2. An overview of SDMX (What is SDMX? Part I)
LOSD Publication Deirdre Lee
LOD reference architecture
Presentation transcript:

Open PHACTS “Data integration for all” Andrew Leach

Task, workflow and results AUREUS search targets: voltage-gated potassium channels Apply filters (MW, cLogP, Lipinski + remove undesirable target) ⇒ ~1000 molecules Similarity searches (RG, TP, Daylight) Cluster analysis ⇒ ~ molecules selected IonWorks © single shot screening 240 single shot hits progressed into full curve assay 5 full curve actives (in at least one test occasion) Series for lead optimisation Stefan Senger, ca Task: create a focussed set to identify leads against voltage- gated potassium channels

We (may) know where the data is, but integrating is a pain, bespoke, and often only for experts Q: Identify all oxidoreductase inhibitors with an activity <100nM in both mouse and human Q: The current Factor Xa lead series is characterised by substructure X. Retrieve all bioactivity data in serine protease assays for molecules that contain substructure X. Q: For a given interaction profile, give me compounds similar to it. ChEMBL DrugBank Gene Ontology Wikipathways Uniprot ChemSpider UMLS ConceptWiki ChEBI etc. Internal

The Innovative Medicines Initiative Biggest public-private partnership in area of medicine Collaboration between European Commission and European Federation of Pharmaceutical Industries and Associations (EFPIA) Promotion of medical innovation in Europe Tackle key bottlenecks Recognises “in kind” contributions Focus on key problems –Efficacy, Safety, Education & Training, Knowledge Management

Public Domain Drug Discovery Data Pharma are accessing, processing, storing & re-processing Why repeat at each company? GSK AZ Pfizer Merck

Information Tombs –Built for primary use-case –Tailored indexes –Tailored GUIs –Unique language & metadata –Poor interoperability/integration LiteratureHRSynthesisPortfolioSARDocsSafetyIn vivoEtc

Pfizer Limited – Coordinator Universität Wien – Managing entity Technical University of Denmark University of Hamburg, Center for Bioinformatics BioSolveIT GmBH Consorci Mar Parc de Salut de Barcelona Leiden University Medical Centre Royal Society of Chemistry Vrije Universiteit Amsterdam Spanish National Cancer Research Centre University of Manchester Maastricht University Aqnowledge University of Santiago de Compostela Rheinische Friedrich-Wilhelms-Universität Bonn AstraZeneca GlaxoSmithKline Esteve Novartis Merck Serono H. Lundbeck A/S Eli Lilly Netherlands Bioinformatics Centre Swiss Institute of Bioinformatics ConnectedDiscovery EMBL-European Bioinformatics Institute Janssen OpenLink Project Partners

A use-case driven approach, focussed on delivery for the real world Main architecture, technical implementation and primary capabilities driven by a set of prioritised research questions Based on the main research questions define prioritised data sources Develop three Exemplars to demonstrate the capabilites of the Open PHACTS System and to define interfaces and input/output standards

Work Streams Build: Service layer and resource integration Drive: Development of exemplar work packages & Applications Sustain: Community engagement and long-term sustainability

Platform Explorer Standards Apps API

NumbersumNr of 1Question All oxido,reductase inhibitors active <100nM in both human and mouse Given compound X, what is its predicted secondary pharmacology? What are the on and off,target safety concerns for a compound? What is the evidence and how reliable is that evidence (journal impact factor, KOL) for findings associated with a compound? Given a target find me all actives against that target. Find/predict polypharmacology of actives. Determine ADMET profile of actives For a given interaction profile, give me compounds similar to it The current Factor Xa lead series is characterised by substructure X. Retrieve all bioactivity data in serine protease assays for molecules that contain substructure X Retrieve all experimental and clinical data for a given list of compounds defined by their chemical structure (with options to match stereochemistry or not) A project is considering Protein Kinase C Alpha (PRKCA) as a target. What are all the compounds known to modulate the target directly? What are the compounds that may modulate the target directly? i.e. return all cmpds active in assays where the resolution is at least at the level of the target family (i.e. PKC) both from structured assay databases and the literature Give me all active compounds on a given target with the relevant assay data Give me the compound(s) which hit most specifically the multiple targets in a given pathway (disease) Identify all known protein-protein interaction inhibitors Prioritised research questions Kamal Azzaoui et al, DDT in press 2013

` ` Pathways Pharmacological Activities Biological Processes Transcripts Pathological Processes Diseases Genes Proteins Interactions Clinical Drug Applications Indications Drugs Compounds Chemicals

Open PHACTS will be built upon semantic technologies and standards, providing an opportunity to: Demonstrate that semantic technologies can perform to the same degree as existing systems Provide an open platform to address common drug discovery questions; expose pharma’s use-cases and knowledge Create a pre-competitive infrastructure that can be sustained and expanded into new areas; providing the platform for future collaboration Why Semantic Technologies? Rapidly developing technology, powerful algorithms for integration and querying of data “schema free” Open standards – facilitating sharing public, private, commercial A community of developers, leverage work going on elsewhere

User Interfaces & Applications Linked Data API Linked Data Cache Identity Mapping Service Identity Mapping Service Identity Resolution Service Domain Specific Services Domain Specific Services Data Key architecture components

Nanopub Db VoID Data Cache (Triple Store) Data Cache (Triple Store) Semantic Workflow Engine (LARKC) Linked Data API (RDF/XML, TTL, JSON) Domain Specific Services Open PHACTS Explorer 1 st Gen Apps App Framework Identity Resolution Service (ConceptWiki) Chemistry Normalisation & Q/C ChemSpider Identifier Management Service (BridgeDb+) Partner Apps Data Import Core Platform P12374 EC CS4532 “Adenosine receptor 2a” Oct VoID Db Nanopub Db VoID Db VoID Nanopub VoID Public Content Commercial Public Ontologies User Annotations

Building Quality High quality chemical names and synonyms. Leverage ChemSpider and Concept wiki curation, Q/C and mapping ChemSpider Validation and Standardization Platform (CVSP) for flagging chemical representation issues Basic curation interface for editing concept terms available through Concept Wiki Data quality issues detected in data sources reported back to depositors for their evaluation

STANDARD_TYPE UNIT_COUNT AC50 7 Activity 421 EC50 39 IC50 46 ID50 42 Ki 23 Log IC50 4 Log Ki 7 Potency 11 log IC50 0 STANDARD_TYPE STANDARD_UNITS COUNT(*) IC50 nM IC50 ug.mL IC IC50 ug/ml 2038 IC50 ug ml IC50 mg kg IC50 molar ratio 178 IC50 ug 117 IC50 % 113 IC50 uM well-1 52 IC50 p.p.m. 51 IC50 ppm 36 IC50 uM-1 25 IC50 nM kg-1 25 IC50 milliequivalent 22 IC50 kJ m-2 20 ~ 100 units >5000 types Implemented using the Quantities, Dimension, Units, Types Ontology ( Quantitative Data Challenges

Chemistry within Open PHACTS The challenges associated with handling chemistry data require the support of a publicly accessible platform to integrate, standardise and host the data. ChemSpider, an online database from the Royal Society of Chemistry hosts the chemical compound collection underpinning Open PHACTS and is responsible for standardising the chemical compounds and providing both regular updates and ongoing data curation. To serve the Open PHACTS platform, a structure validation and standardisation platform (CVSP) has been developed to ensure chemical structures are normalised to rules derived from the FDA structure standardisation guidelines and modified based on input from the EFPIA members.

The many challenges of chemistry representation…

Identities within Open PHACTS Open PHACTS integrates information from multiple different databases, many of which use unique identifiers. The Identity Mapping Service (IMS) ensures these identifiers are linked and available for use interchangeably throughout the Open PHACTS platform. To maintain vocabulary heterogeneity and provide interoperability, the ConceptWiki is used. The ConceptWiki is an open access system that accepts essentially unlimited numbers of synonyms, in multiple languages, and then maps all the terms correctly back to one unique concept identifier, alleviating vocabulary problems and identifier differences. Synonyms: Aspirin Dispril 2-Acetoxybenzoic acid Acetyl salicylic acid Salicylic acid, acetyl- ChemSpider ID: 2157 Explorer FDA: ChEBI ID: CHEBI:15365 DrugBank ID: APRD00264 IMS

Why Provenance Matters Using a community specification known as “VoID” (Vocabulary of Interlinked Datasets) Record version, author, derivations Builds trust with users – know what you are querying (and why it might have changed) Provides mechanism to provide usage statistics back to providers, help them understand the value Easier to track errors and ensure quality Actively participating in community provenance programme (W3C)

What does Open PHACTS do?Currently integrated databases Database Number of triples (million) ACD Labs / ChemSpider ChEBI0.91 ChEMBL_v ConceptWiki3.74 DrugBank0.52 Enzyme0.07 Gene Ontology0.85 SwissProt WikiPathways0.14 TOTAL Open PHACTS draws together multiple sources of publicly- available pharmacological and chemical data, allowing public access to the information via the Open PHACTS Explorer, an intuitive interface.

Licensing: 3 “public” databases Comparative Toxicogenomics Database OMIM Drugbank All are available as “open” RDF you can download right now. But:

“CUTTING THE GORDIAN KNOT” What are the problems with licensing we had to address? –To make the data and software generated by the project usable and reusable –Multiplicity of unclear or non-standard licenses on original data sources ‘Public’ can mean use but not redistribute, use in commercial environment, Legal position on use and reuse extremely unclear Different issues than just linking to data –What is the legal status of integrated collections of the above, and of derived knowledge from such a collection? –Appropriate software license selection –Legal clarity for EFPIA and end users –Approaches for commercial data integration, EFPIA in-house data AIM: to enable maximum possible dissemination and usability of the integrated data and architecture generated by the project - with approaches that will be applicable in other data integration projects

Chose John Wilbanks as consultant A framework built around STANDARD well-understood Creative Commons licences – and how they interoperate Deal with the problems by: Interoperable licences Appropriate terms Declare expectations to users and data publishers One size won‘t fit all requirements Data Licensing Solution

Development partnerships Influence on API developments Opportunities to demo ideas & use cases to core team Need MoU and annexe Associated partners Support, information Exchange of ideas, data, technology Opportunities to demo at ctions, mostommunity webinars Need MoU Associated partners Development partnerships Consortium MoU +Annexe Consortium 28 current members Open PHACTS and the scientific community

Example applications Advanced analytics ChemBioNavigator Navigating at the interface of chemical and biological data with sorting and plotting options TargetDossier Interconnecting Open PHACTS with multiple target centric services. Exploring target similarity using diverse criteria PharmaTrek Interactive Polypharmacology space of experimental annotations UTOPIA Semantic enrichment of scientific PDFs Predictions GARFIELD Prediction of target pharmacology based on the Similar Ensemble Approach eTOX collector Automatic extraction of data for building predictive toxicology models in eTOX project

ChemBioNavigator Matthias Rarey et al PharmaTrek Jordi Mestres et al

Call for expressions of interest Open PHACTS ENSO proposal Open PHACTS intends to submit a proposal for IMI ENSO funding. We are currently drafting our ENSO proposal and invite all EFPIA companies with an interest in Open PHACTS to contact us to discuss opportunities for involvement. The Open PHACTS Foundation Open PHACTS has a successor organisation, the Open PHACTS Foundation. Please register your interest with us for further information on membership and other opportunities to get involved within Open PHACTS. For more information and/or to register interest us at

Acknowledgements Stefan Senger Gerhard Ecker The OpenPHACTS consortium

Data Targets; Chemistry; Pharmacology; Literature; Patents Standards Ontology/taxonomy; Minimum information guide; Dictionaries; Interchange mapping Assertions e.g. Gene-to-Disease; Compound-to-Target; Compound-to-ADR Application(Knowledge) Fact Visualisation e.g. Target Dossiers; SAR VisualisationSERVICES After Barnes et al Nature Review Drug Discovery 2009 doi /nrd2944

Nanopublications – Capturing scientific information in the Triple Store