Open PHACTS: a precompetitive infrastructure for pharmacological research Bryn Williams-Jones.

Slides:



Advertisements
Similar presentations
Supporting Engagement in Open Access: a Publishers Perspective
Advertisements

ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
The Open Innovation Center Susie Stephens, Principal Research Scientist, Eli Lilly.
Antonis Loizou (some slides created by Paul Groth) VU University Amsterdam LDBC TUC Meeting.
INTRODUCTION TO CLOUD COMPUTING CS 595 LECTURE 6 2/13/2015.
Bringing it all Together Sierra as Library Services Platform Today and Tomorrow Next Generation Library Services Platform Steven Nielsen Vice President-
Riding the Wave: a Perspective for Today and the Future APA Conference, November 2011 Monica Marinucci EMEA Director for Research, Oracle.
Royal Society of Chemistry developments to support open drug discovery Antony Williams, Ken Karapetyan, Valery Tkachenko, Colin Batchelor Alexey Pshenichnov.
The FI-WARE Project – Base Platform for Future Service Infrastructures OCTOBER 2011 Presentation at proposers day.
Systems Engineering in a System of Systems Context
Fungal Semantic Web Stephen Scott, Scott Henninger, Leen-Kiat Soh (CSE) Etsuko Moriyama, Ken Nickerson, Audrey Atkin (Biological Sciences) Steve Harris.
Data Sources & Using VIVO Data Visualizing Scholarship VIVO provides network analysis and visualization tools to maximize the benefits afforded by the.
Advanced Data Mining and Integration Research for Europe ADMIRE – Framework 7 ICT ADMIRE Overview European Commission 7 th.
SaaS, PaaS & TaaS By: Raza Usmani
Open Cloud Sunil Kumar Balaganchi Thammaiah Internet and Web Systems 2, Spring 2012 Department of Computer Science University of Massachusetts Lowell.
Plan Introduction What is Cloud Computing?
#acquia Commons The Open Alternative for Social Business Software Name Title Acquia Month XXth, 2011.
OpenMDR: Generating Semantically Annotated Grid Services Rakesh Dhaval Shannon Hastings.
Open PHACTS “Data integration for all” Andrew Leach.
Moving forward our shared data agenda: a view from the publishing industry ICSTI, March 2012.
DYNAMICS CRM AS AN xRM DEVELOPMENT PLATFORM Jim Novak Solution Architect Celedon Partners, LLC
Key integrating concepts Groups Formal Community Groups Ad-hoc special purpose/ interest groups Fine-grained access control and membership Linked All content.
The Open PHACTS Discovery Platform Open PHACTS for Academia.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
EGI-Engage EGI-Engage Engaging the EGI Community towards an Open Science Commons Project Overview 9/14/2015 EGI-Engage: a project.
Dr. Nikos Houssos| National Documentation Centre / NHRF European Network of National Contact Points for Research Infrastructures moving forward The CERIF-based.
Paul Groth VU University Amsterdam Convergence Meeting: Semantic Interoperability for Clinical Research & Patient.
Climate Sciences: Use Case and Vision Summary Philip Kershaw CEDA, RAL Space, STFC.
U.S. Department of the Interior U.S. Geological Survey Next Generation Data Integration Challenges National Workshop on Large Landscape Conservation Sean.
The Innovative Medicines Initiative (IMI) High level the IMI Concept, Strategic Research Agenda and Call topics Eva Lindgren.
Using the Open Metadata Registry (openMDR) to create Data Sharing Interfaces October 14 th, 2010 David Ervin & Rakesh Dhaval, Center for IT Innovations.
U.S. Department of the Interior U.S. Geological Survey CDI Webinar Sept. 5, 2012 Kevin T. Gallagher and Linda C. Gundersen September 5, 2012 CDI Science.
The Open Pharmacological Concepts Triple Store
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
Open PHACTS in a few slides. Why? Public Domain Drug Discovery Data: Pharma are accessing, processing, storing & re-processing each company x.
Big Data Supporting Drug Discovery Cautionary Tales from the World of Chemistry for Translational Informatics Valery Tkachenko RSC-CSIR/OSDD meeting Pune,
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
Pathway Interaction Database (PID) Market Research BioPortals Tiger Team Meeting Mervi Heiskanen January 31, 2013.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
ResistVir-Db The database of ResistVir European Project Co-ordination of Research on Genetic Resistance to Plant Pathogenic Viruses, and their Vectors,
Exploring ‘Workspaces’ Tom Visser, SARA compute and networking services, Amsterdam Garching Workshop 21 st September 2010.
Using Open Data to Create Value for Citizens. Data.gov Provides instant access to ~400,000 datasets in easy to use formats Contributions from UN, World.
Datalayer Notebook Allows Data Scientists to Play with Big Data, Build Innovative Models, and Share Results Easily on Microsoft Azure MICROSOFT AZURE ISV.
An Enterprise Clinical Data Search Solution. is Designed for: Informatics professionals, clinicians, statisticians, data managers and process/quality.
The Open PHACTS Ecosystem Fostering a user community Open PHACTS Community Workshop June 2014.
B2A Pharma Prototype Implementation of an industrial-strength pharmaceutical workflow in a Grid environment Falk Zimmermann NEC Europe Ltd. IT Research.
High Risk 1. Ensure productive use of GRID computing through participation of biologists to shape the development of the GRID. 2. Develop user-friendly.
MDL Information Systems, Inc. Powering the Process of Invention Donna del Rey Director, Business Planning
Active Directory Domain Services (AD DS). Identity and Access (IDA) – An IDA infrastructure should: Store information about users, groups, computers and.
On Demand RDF Databases in the Cloud Presentation for the Ontology Forum March 3, 2016.
The opportunities and challenges of sharing genomics data with the pharmaceutical industry Shahid Hanif, Head of Health Data & Outcomes, ABPI DNA digest.
Q K-12 Blueprint Overview. 2 The K-12 Blueprint offers resources for education leaders involved in planning and implementing personalized learning.
Data Sources & Using VIVO Data Visualizing Science VIVO provides network analysis and visualization tools to maximize the benefits afforded by the data.
Agenda  What is Cloud Computing?  Milestone of Cloud Computing  Common Attributes of Cloud Computing  Cloud Service Layers  Cloud Implementation.
Implementing chemistry platform for OpenPHACTS: Lessons learned
Building linked-data, large-scale chemistry platform: challenges, lessons and solutions Valery Tkachenko, Alexey Pshenichnov, Aileen Day, Colin Batchelor,
Ian Bruno, Suzanna Ward The Cambridge Crystallographic Data Centre
EUDAT: collaborative pan-European infrastructure providing research data services, training and consultancy This work is licensed.
Joslynn Lee – Data Science Educator
National e-Infrastructure Vision
Steering Group Member, Link Digital
ATOM Accelerating Therapeutics for Opportunities in Medicine
Sponsored by the University of Southampton
Tailor slide to customer industry/pain points
An ecosystem of contributions
AWS Cloud Computing Masaki.
Brian Matthews STFC EOSCpilot Brian Matthews STFC
Salesforce.com Salesforce.com is the world leader in on-demand customer relationship management (CRM) services Manages sales, marketing, customer service,
Remedy Integration Strategy Leverage the power of the industry’s leading service management solution via open APIs February 2018.
Presentation transcript:

Open PHACTS: a precompetitive infrastructure for pharmacological research Bryn Williams-Jones

Fundamental issue: There is a *lot* of science outside your walls It’s a chaotic space Scientists want to find information quickly and easily Often they just “can’t get there” (or don’t even know where “there” is) And you have to manage it all (or not)

Pre-competitive Informatics: Pharma are all accessing, processing, storing & re-processing external research data each company x Lowering industry firewalls: pre-competitive informatics in drug discovery Nature Reviews Drug Discovery (2009) 8, doi: /nrd2944

The Project The Innovative Medicines Initiative EC funded public-private partnership for pharmaceutical research Focus on key problems –Efficacy, Safety, Education & Training, Knowledge Management The Open PHACTS Project Create a semantic integration hub (“Open Pharmacological Space”)… to start with, moving to broader biomedical topics later Not just another project; Leading academics in semantics, pharmacology and informatics, driven by solid industry business requirements 23 academic partners, 8 pharmaceutical companies, 3 biotechs >120 people. Delivered production system, live, useful (and being used) within 18 months Strong, active participation from pharma companies – not passengers

Open PHACTS Project Partners Pfizer Limited – Coordinator Universität Wien – Managing entity Technical University of Denmark University of Hamburg, Center for Bioinformatics BioSolveIT GmBH Consorci Mar Parc de Salut de Barcelona Leiden University Medical Centre Royal Society of Chemistry Vrije Universiteit Amsterdam Spanish National Cancer Research Centre University of Manchester Maastricht University Aqnowledge University of Santiago de Compostela Rheinische Friedrich-Wilhelms-Universität Bonn AstraZeneca GlaxoSmithKline Esteve Novartis Merck Serono H. Lundbeck A/S Eli Lilly Netherlands Bioinformatics Centre Swiss Institute of Bioinformatics ConnectedDiscovery EMBL-European Bioinformatics Institute Janssen OpenLink PHACTS

ChEMBL DrugBank Gene Ontology Wikipathways UniProt ChemSpider UMLS ConceptWiki ChEBI TrialTrove GVKBio GeneGo TR Integrity “Find me compounds that inhibit targets in NFkB pathway assayed in only functional assays with a potency <1 μM” “What is the selectivity profile of known p38 inhibitors?” “Let me compare MW, logP and PSA for known oxidoreductase inhibitors”

Business Question Driven Approach Drug Discovery Today…paper in press

Data Integration Approaches Data Source Data Source Data Source Data Source Data Warehouse Queries Extract Transform Load Data Source Data Source Data Source Data Source Federator Queries Query Reformulation SpeedMaintenance

A Hybrid Model –Cache data locally that requires computing over. “Cache” rather than “Warehouse” – obliterate & rebuild at will (think Google) –Bring in ancillary properties by wiring in web-services –Both provide an opportunity for secure data (see later) Use Semantic Technology –“Schema Free” means you don’t need to change your warehouse when the data changes (as just happened for ChEMBL) –Open standards increase opportunities (not tied to any particular vendor) and shared, interoperable data models (code public & internal data to the same abstract standard) The Open PHACTS Approach

Nanopub Db VoID Data Cache (Virtuoso Triple Store) Data Cache (Virtuoso Triple Store) Semantic Workflow Engine Linked Data API (RDF/XML, TTL, JSON) Domain Specific Services Identity Resolution Service Chemistry Registration Normalisation & Q/C Identifier Management Service Indexing Core Platform P12374 EC CS4532 “Adenosine receptor 2a” VoID Db Nanopub Db VoID Db VoID Nanopub VoID Public Content Commercial Public Ontologies User Annotations Apps

Present Content - Pharmacology SourceInitial RecordsTriplesProperties Chembl1,149,792 ~1,091,462 cmpds ~8845 targets 146,079,19417 cmpds 13 targets DrugBank19,628 ~14,000 drugs ~5000 targets 517,58474 UniProt536,789156,569,76478 ENZYME6,18773,8382 ChEBI35,584905,1892 GO/GOA38,13724,574,77442 ChemSpider/ACD1,194,437161,336,85722 ACD, 4 CS ConceptWiki2,828,9663,739,8841 > 1 billion triples

Data Cache (Virtuoso Triple Store) Data Cache (Virtuoso Triple Store) Semantic Workflow Engine Infrastructure Hardware (development) - 2 x Intel Xeon E GB DDR3 1333MHz RAM TB SSD - 3TB 7200rpm Triple Store -Virtuoso 7 column store -Shown to scale to > 100 billion triples -Project aiming for billion mark Network -AMX-IS -Extensive memcache

Are These Two Molecules The Same(*) *Really: Is it sensible to combine data associated with these two molecules? Yeah No way!

Chemistry Registration Normalisation & Q/C Chemistry Registration Existing chemistry registration system uses standard ChemSpider deposition system: includes low-level structure validation and manual curation service by RSC staff. New Registration System in Development Utilizes ChemSpider Validation and Standardization platform including collapsing tautomers Utilizes FDA rule set as basis for standardization (GSK lead) Will generate Open PHACTS identifier (OPS ID)

ChemSpider Validation & Standardization Platform Quality Assurance

Developer Centric API dev.openphacts.org

Access to a wide range of interconnected data – easily jump between pharmacology, chemistry, disease, pathways and other databases without having to perform complex mapping operations Query by data type, not by data source (“Protein Information” not “Uniprot Information”) API queries that seamlessly connect data (for instance the Pharmacology query draws data from Chembl, ChemSpider, ConceptWiki and Drugbank) Strong chemistry representation – all chemicals reprocessed via Open PHACTS chemical registry to ensure consistency across databases Built using open community standards, not an ad-hoc solution. Developed in conjuction with 8 major pharma (so your app will speak their language!) Simple, flexible data-joining (join compound data ignoring salt forms, join protein data ignoring species) Provenance everywhere – every single data point tagged with source, version, author, etc Nanopublication-enabled. Access to a rich dataset of established and emerging biomedical “assertions” Community-curation tools to enhance and correct content Access to a rich application network (many different App builders) Toolkits to support many different languages, workflow engines and user applications Developer-friendly JSON/XML methods. Consistent API for multiple services Seamless data upgrades. We manage updates so you don’t have to Professionally Hosted (Continually Monitored) Private and secure, suitable for confidential analyses Neutral Party. Active & still growing through a unique public-private partnership Benefits Of the Open PHACTS API

Commercial Data Pilot (aka Authentication) Open PHACTS Applications Open PHACTS Applications

Out of scope of current 3 year project Very much in our minds Direction will be set by foundation members (get in there now!) Options (not exhaustive): –Load data into authenticated portion of public system –Create a private clone in the cloud –VM (cloud or internal) –Web-service calls –Configure workflows to retain separation (SD file export helps here) [Remember, current system is fully secure & private via EFPIA led agreement] –All the software is open… Internal Data Integration

Creating A Biomedical “App Store”.. How far have we come?

explorer.openphacts.org

Links Home Page: Papers/Publications: Developer API: Explorer: GSK/Pharmatrek in use video: iPhone app video: Accelrys Community Open PHACTS group:

A Precompetitive Knowledge Framework Integration Pharma Needs Inputs Sustainability Stability Security Sustainability Stability Security Management / Governance Data Mining Services/Alg orithms Mapping & Populating Architecture Interfaces & Services Interfaces & Services Content Structured & Unstructured Content Structured & Unstructured Vocabularies & Identifiers (URIs) Community KD Innovation Community KD Innovation

The Ecosystem is …. Industry Academia Data Provider Software Provider

The Open PHACTS community ecosystem

Sustaining Impact “Software is free like puppies are free - they both need money for maintenance” …and more resource for future development

Kick-Starting Sustainability Collaboration Grants Industry Open PHACTS API Users Apps API

Becoming part of the Open PHACTS Foundation Members membership offers early access to platform updates and releases the opportunity to steer research and development directions receive technical support work with the ecosystem of developers and semantic data integrators around Open PHACTS tiered membership familiar business and governance model A UK-based not-for-profit member owned company

Timeline Kick-off 6mo Prototype Hosted System Running Hosting Tender Version 1.0 release! v1.1 Public API Release Updates & eAPPs Original Project Close Pathways Ontology-based queries Nanopublications (incl. publisher data) Human genetics & disease Human drug data (e.g. adverse events) Commercial Data Pilot results Beyond 2013 Internal data integration Full commercial data implementation Advanced analytics Translational data Other IMI integration? ….. Open PHACTS Foundation est. Open PHACTS Foundation running business as usual

Conclusions There is a lot of public data out there. To deal with it you must: –Talk to the providers (EBI, NCBI, NIH, NBIC, UofM, Publishers, SMEs) –Identify the use cases –Promote data standards –Physically integrate the data –Manage the nightmare of different identifiers –Manage the complexity of equality –Maintain the data –Identify quality issues, have a plan to address them –Develop apps, build scientific success stories Open PHACTS provides a cost-effective way to accomplish greater impact of public (and beyond) scientific data by sharing this burden across industry

Open PHACTS Project Partners Pfizer Limited – Coordinator Universität Wien – Managing entity Technical University of Denmark University of Hamburg, Center for Bioinformatics BioSolveIT GmBH Consorci Mar Parc de Salut de Barcelona Leiden University Medical Centre Royal Society of Chemistry Vrije Universiteit Amsterdam Spanish National Cancer Research Centre University of Manchester Maastricht University Aqnowledge University of Santiago de Compostela Rheinische Friedrich-Wilhelms-Universität Bonn AstraZeneca GlaxoSmithKline Esteve Novartis Merck Serono H. Lundbeck A/S Eli Lilly Netherlands Bioinformatics Centre Swiss Institute of Bioinformatics ConnectedDiscovery EMBL-European Bioinformatics Institute Janssen OpenLink PHACTS