Presentation is loading. Please wait.

Presentation is loading. Please wait.

Implementing chemistry platform for OpenPHACTS: Lessons learned

Similar presentations


Presentation on theme: "Implementing chemistry platform for OpenPHACTS: Lessons learned"— Presentation transcript:

1 Implementing chemistry platform for OpenPHACTS: Lessons learned
Colin Batchelor, Alexey Pshenichnov, Jon Steele, Valery Tkachenko Royal Society of Chemistry ACS Spring 2016 San Diego, CA March 17th 2016

2 Open PHACTS Mission: Integrate Multiple Research Biomedical Data Resources Into A Single Open & Sustainable Access Point

3 Acknowledgements Open PHACTS Practical Semantics
GlaxoSmithKline – Coordinator Universität Wien – Managing entity Technical University of Denmark University of Hamburg, Center for Bioinformatics BioSolveIT GmBH Consorci Mar Parc de Salut de Barcelona Leiden University Medical Centre Royal Society of Chemistry Vrije Universiteit Amsterdam Novartis Merck Serono H. Lundbeck A/S Eli Lilly Netherlands Bioinformatics Centre Swiss Institute of Bioinformatics ConnectedDiscovery EMBL-European Bioinformatics Institute Janssen Esteve Almirall OpenLink Scibite The Open PHACTS Foundation Spanish National Cancer Research Centre University of Manchester Maastricht University Aqnowledge University of Santiago de Compostela Rheinische Friedrich-Wilhelms-Universität Bonn AstraZeneca Pfizer @Open_PHACTS

4 Why is it so hard to…. IP? What’s the structure? Are they in our file?
What’s similar? What’s the target? Pharmacology data? Known Pathways? Working On Now? Connections to disease? Expressed in right cell type? IP? Remember this, some of these questions are easier to answer than others Competitors?

5 How do R&D companies use public data?
Literature PubChem Genbank Patents Databases Downloads Data Analysis Data Integration Firewalled Databases Using available public data is critical to drug discovery

6 Expanding Integration
Research Informatics Translational Informatics Medical Informatics (RWD)

7 Business Question Driven Approach
Number sum Nr of 1 Question 15 12 9 All oxidoreductase inhibitors active <100nM in both human and mouse 18 14 8 Given compound X, what is its predicted secondary pharmacology? What are the on and off,target safety concerns for a compound? What is the evidence and how reliable is that evidence (journal impact factor, KOL) for findings associated with a compound? 24 13 Given a target find me all actives against that target. Find/predict polypharmacology of actives. Determine ADMET profile of actives. 32 For a given interaction profile, give me compounds similar to it. 37 The current Factor Xa lead series is characterised by substructure X. Retrieve all bioactivity data in serine protease assays for molecules that contain substructure X. 38 Retrieve all experimental and clinical data for a given list of compounds defined by their chemical structure (with options to match stereochemistry or not). 41 A project is considering Protein Kinase C Alpha (PRKCA) as a target. What are all the compounds known to modulate the target directly? What are the compounds that may modulate the target directly? i.e. return all cmpds active in assays where the resolution is at least at the level of the target family (i.e. PKC) both from structured assay databases and the literature. 44 Give me all active compounds on a given target with the relevant assay data 46 Give me the compound(s) which hit most specifically the multiple targets in a given pathway (disease) 59 Identify all known protein-protein interaction inhibitors 10 Can go get everything Open PHACTS not a repo of the world, specific sources

8 Mx/psa, how calculated who did it? Mash up. With your data too,
“Let me compare MW, logP and PSA for known oxidoreductase inhibitors” “What is the selectivity profile of known p38 inhibitors?” “Find me compounds that inhibit targets in NFkB pathway assayed in only functional assays with a potency <1 μM” ChEMBL DrugBank Gene Ontology Wikipathways GeneGo Open PHACTS was developed to support the key questions of drug discovery Business questions have been at the heart of Open PHACTS and have driven the development of the platform Mx/psa, how calculated who did it? Mash up. With your data too, - top layer join together but need them all commercial ChEBI UniProt UMLS neXtProt GVKBio ConceptWiki ChemSpider DisGeNet TrialTrove TR Integrity ChEMBL Target Class ENZYME FDA adverse events SureChEMBL 8

9 Open PHACTS was developed to support the key questions of drug discovery
Business questions have been at the heart of Open PHACTS and have driven the development of the platform Mx/psa, how calculated who did it? Mash up. With your data too, - top layer join together but need them all commercial Data provided by many publishers Originally in many formats: relational, SD files and RDF Worked closely with publishers Data licensing was a major issue Over 5 billion triples – 14 datasets & growing Hosted on beefy hardware; data in memory (aim) Extensive memcaching Pose complex queries to extract data @gray_alasdair Big Data Integration

10 Patent annotations in Open PHACTS
Huge amount of knowledge hidden in patent corpus Most of which will never be published elsewhere Substantial lag between patent and scientific literature SureChEMBL system already extracts chemical entities from full-text patent documents Text (title, abstract, description, claims), images, molfiles Complemented with gene and disease entity annotations Using the Termite text-mining tool by SciBite Relevance scoring to reduce noise Tested for recall Patent, compound, gene, disease info available via API 4 million full-text patent documents annotated • USPTO, WIPO, EPO – English language • Life-sciences relevant • Patents mapped to SureChEMBL IDs (e.g. EP A2) • Title, publication date, classification codes • Compounds mapped to SCHEMBL IDs (e.g. SCHEMBL15064) • Genes mapped to HGNC symbols (e.g. FDFT1) • Diseases mapped to MeSH terms (e.g. D009765)

11 Open PHACTS Expanding EcoSystem
Data Warrior Further Apps Db Stds :which ones (later) Access (API). Driven by the API. Acelerate bulding if apps

12 Want to run Open PHACTS within your environment?
Want to load your data into Open PHACTS? VM install of Open PHACTS Docker Image is now available Updating to ver 2.0 Open PHACTS Allows you to customise and load your own data into the environment

13 Open PHACTS discover platform is now supported by a Foundation

14 Usage >500 million queries
Seen a growing usage of the platform both in volume and in registered applications API remains the cornerstone of the delivery

15 All Users by Sector Type
Connected to different consumer groups

16 Upgrading Challenge of migrating between versions of the API
Once users get connected to an API, they tend to stick with it. We are listening to this and will have a version independent URL Upgrading

17

18 Explorer Explorer2 ChemBioNavigator Target Dossier Pharmatrek Helium
MOE Collector Cytophacts Utopia Garfield SciBite KNIME Mol. Data Sheets PipelinePilot scinav.it Taverna

19 Explorer.openphacts.org openphactsfoundation.org/apps.html

20 Here we see the RSC dataset in Open PHACT’s data repository, which is running the open source Artifactory. The repository understands Maven metadata, and also maintain and verifies checksums of data artifacts. We see here it includes the suggested <dependency> setting for using the dataset from a different Maven project – while this would be a bit exotic perhaps, doing so would put the dataset directly on the classloader without any worrying about downloads or file paths. We can see the hierarchy of the dataset on the left – the repository has expanded the archive for us. The .ro folder contains the Research Object manifest, the void file is the Dataset description – the rest is the “actual data”. One power of Maven is the ease of setting up mirroring – the dataset above is actually from a mirror of the Maven repository of the build server in Manchester.

21 Apps Linked Data API (RDF/XML, TTL, JSON) Data Cache RDF RDF RDF RDF
Identity Resolution Service Linked Data API (RDF/XML, TTL, JSON) “Adenosine receptor 2a” Domain Specific Services Semantic Workflow Engine Identifier Management Service P12374 Core Platform EC2.43.4 CS4532 Data Cache (Virtuoso Triple Store) Chemistry Registration Normalisation & Q/C All have probably seen this slide. Want to pick out some of the key changes and tomorrow will here more Indexing VoID VoID VoID VoID VoID RDF RDF Nanopub Nanopub Nanopub RDF RDF RDF Public Ontologies Db Db Db Db User Annotations Public Content Commercial

22 The Royal Society of Chemistry’s role in Open PHACTS
We integrate, standardize and host the chemical compound collection underpinning Open PHACTS. We have developed a structure validation and normalization platform (CVSP) to ensure chemical structures are normalized to rules derived from the FDA structure normalization guidelines and modified based on input from members of EFPIA.

23 CVSP and the Open Pharmacological Space Chemical Registration System (OPS CRS)
Freely-available (requires logging in) chemical validation system for: Structure validation: warning on query atoms, pseudoatoms, nonsensical or unclear stereo Standardization workflows.

24 Chemical data sources Data source Number of records in source DrugBank
6828 PDB ligands 18681 MeSH (extracted by text mining) 24381 ChEBI 40503 HMDB 41494 ChEMBL 20 SureChEMBL 1.0

25 Royal Society of Chemistry data provided to Open PHACTS
We generate RDF that: Describes synonyms and identifiers Provides linksets between our data sources and the OPS identifiers Describes molecule–molecule relations of interest to the pharma industry Delivers calculated physicochemical properties of compounds Lists the validation and standardization issues found by CVSP.

26 Principles Use standard ontologies where possible (CHEMINF for cheminformatics properties, QUDT for units, OBO ontologies elsewhere) Use an event-based pattern for cheminformatics outputs. This enables us to add arbitrary provenance information.

27 1. Synonyms and identifiers
Use the CHEMINF ontology: Validated ChemSpider synonyms, Unvalidated ChemSpider synonyms, Validated database identifiers, Unvalidated database identifiers, InChI, InChIKey, SMILES, preferred ChemSpider name

28 2. Linksets: Vocabulary of Interlinked Datasets
Metadata describing the RDF: Can be used to build a directory of the RDF available Find what’s there without having to download all of it first Describes how Datasets are linked by the Linksets using SKOS. Recommendations here:

29 3. Molecule–molecule relations in CHEMINF
We relate molecules to “parent” forms, variously, those which are: uncharged not isotopically-specified not stereochemically-specified the preferred tautomer the largest fragment the “superparent” (all of the above)

30 4. Calculated physicochemical properties
log P, log D (at pH 5.5 and 7.4), bioconcentration factor, KOC (at pH 5.5 and 7.4), index of refraction, polar surface area, molar refractivity, molar volume, polarizability, surface tension, density at STP, flash point a 1 atm, enthalpy of vaporization at STP, vapour pressure at STP.

31 5. Issues from validation and standardization
We use the CHEMINF ontology again. We distinguish between information, warnings and errors. Only serious failures to process, such as a structure having an invalid atom, count as errors.

32 This is the world we live in

33 Data quality issue and CVSP
Robochemistry Proliferation of errors in public and private databases ChemSpider PubChem DrugBank KEGG ChEBI/ChEMBL Automated quality control system

34 Chemistry Validation and Standardization Platform

35 Chemistry Validation and Standardization Platform

36 Thank you Slides: 36


Download ppt "Implementing chemistry platform for OpenPHACTS: Lessons learned"

Similar presentations


Ads by Google