Open PHACTS 1.3 Release (2 701 602 484 triples) Blue font indicates new data and new data sources available in 1.3
1.3 Release Summary Integration of WikiPathways to provide a series of pathway-based API calls A refresh of ChEMBL to ChEMBL_16 Extension and better support for units and filtering of pharmacology data. Support for pChEMBL filtering Ability to query hierarchy data in queries (GO, ChEBI, ChEMBL target, ENZYME) All new chemistry processing using the Open PHACTS Chemistry registry and new chemistry "lenses" for flexible mapping Further development of KNIME and Pipeline Pilot Nodes Further development of our support portal at support.openphacts.org
Open PHACTS 1.3 Data Content
Open PHACTS 1.3 Supported Identifiers We now support an increased range of public database identifiers for proteins and compounds. Please see: openphacts.cs.man.ac.uk:9093/QueryExpander/mappingSet?lensUri=All Home page: openphacts.cs.man.ac.uk for supported identifiers including: HGNC Symbols GeneOntology ChEMBL ID Ensembl EntrezGene and many more….
Details on new methods & data
Compound API Methods
Compound Info Data SMILES, InChI, InChIKey logP, Hydrogen bond acceptor or donor, Rule of 5 violations, Polar surface area Rotatable bonds, molecular weight, molecular formula, Freebase molecular weight Biotransformation, description, protein binding, toxicity, melting point, drug name, drug type (approved, experimental, etc) Compound name
Compound Info Data Sources OPS chemical registration system (OCRS) ChEMBL (ChEMBL_16) 82 003 819 molecule dataset triples DrugBank ConceptWiki (Sept 9, 2013 ChemSpider)
Compound Class Data Sources ChEBI : Count, List, Compound Classifications GeneOntology : Compound Classifications, Target Classification ChEBML Targets: Target Classification (103 021 triples) ENZYME: Target Classification
Compound Pharmacology Data Compound data: Activity (type, value, units, comment), pChEMBL, Assay (type, organism, description), URIs (OCRS, ChEMBL, ConceptWiki, DrugBank), Drug type, Generic name, SMILES, InChI, InChIKey, molecular weight, Rule of 5, Literature (DOI, PMID) Target data: Name, organism target component, target type
Compound Pharmacology Data Sources OPS chemical registration system (OCRS) ChEMBL 184 257 585 activity dataset triples 34 571 481 assay dataset triples DrugBank (765 936 triples)
Compound Class Pharmacology Activity values for compounds from a given ChEBI compound class
Chemical Structure Mapping Methods
Chemical Structure Conversion and Search Data Conversion of InChI, InChIKey, SMILES, to OCRS URIs Chemical similarity search: type of search (Tanimoto, Tversky, Euclidian) search threshold (alpha, beta) Chemical substructure search Chemical structure exact search: options for tautomers, same skeleton in/excluding H, all isomers Relevance score for each structure search result returned
Target API Methods
Target Info Data URIs (ConceptWiki, UniProt, ChEMBL, DrugBank) Name, synonyms Sequence, Protein existence, Mass, Functional annotation, GO terms, UniProt curated Protein-Protein Interactions, links to PDB and IntAct, Number of residues, Theoretical pI, Cellular location ChEMBL target component description
Target Info Data Sources ConceptWiki (including WikiPathways, Pathway Ontology concepts) UniProt (including UniParc) GOA (update Sept 9, 2013) ChEMBL 92 473 target dataset triples 438 945 target component triples) DrugBank IntAct
Target Class Data Sources GeneOntology Annotations (GOA): Classification of Targets ChEMBL Targets: Classification of Targets ENZYME: Classification of Targets
Target Pharmacology Data Target data: Name, organism, target component, target type (API method for determining 16 different target types which can be used for filtering results) Compound data: Activity (type, value, units, comment), pChEMBL, Assay (type, organism, description), URIs (OCRS, Chembl, ConceptWiki, DrugBank), Drug type, Generic name, SMILES, InChI, InChIKey, molecular weight, Rule of 5, Literature (DOI, PMID)
Target Class Pharmacology Activity values for targets found in a given class in the supported hierarchies: ENZYME Classification ChEMBL Target Classification GeneOntology
Pathway API Methods
Pathway Info Data URIs (WikiPathways, Pathway Ontology, ConceptWiki) Title, Description, Annotations, Organism Pathway participants Compounds Proteins Literature
Pathway Info Data Sources WikiPathways contains Curated pathways, converted KEGG pathways Metabolite identifiers: HMDB, ConceptWiki, Target identifiers: GeneID, UniProt, ConceptWiki, Ensembl Publication identifiers: DOI, PMID NCBI taxonomy URIs and textual names
Open PHACTS API Methods for Hierarchies 1.3 release methods for hierarchies and classifications: ENZYME, GeneOntology, ChEBI, and ChEMBL targets
Hierarchy Methods Visualization of hierarchy data by hierarchy structure: ENZYME ChEBI ChEMBL target classification GeneOntology
Basic hierarchy API methods ENZYME EC 1.-.-.- EC 6.-.-.- ChEMBL Target Hierarchy (Root) EC 2.-.-.- EC 5.-.-.- EC 3.-.-.- EC 4.-.-.- GeneOntology Biological process Molecular function Cellular component ChEBI Subatomic particle Chemical entity Has role
Basic hierarchy API methods DNA methylation DNA methylation or demethylation regulation of gene expression, epigenetic DNA alkylation macromolecule methylation cellular response to hypoxia regulation of transcription from RNA polymerase II promoter in response to hypoxia hypoxia-inducible factor-1alpha signaling pathway
Query of ChEMBL Target Classification ChEMBL Target Hierarchy Query for Protein Kinase class : CHEMBL_PC_320 646 Results Transversal of all nodes below Protein Kinase to retrieve all leaves (targets), e.g all 3 proteins in the Histk node will be returned
Closer look: HistK class members Query with classes: Atypical: CHEMBL_PC_1451 or Histk: CHEMBL_PC_267 Same 3 results for both classes: 2 different types of targets
“Protein family” target: PKC Alpha Target Class Members List: Query for Alpha class CHEMBL_PC_317 PKC alpha (P17252) is part of protein family target, Chembl2093867, which is comprised of members from several other PKC classes
Target Classification: PKC Alpha Target Classification: Query for protein target PKC alpha (P17252)
Classification API for ChEMBL targets A “protein family” target can be represented in several classes P17252 is a “single protein” target as well as part of a “protein family” target For retrieving only “single protein” targets, filter results by using the target_type restriction with the “single_protein” parameter The Target Classification API method returns all classes that have been annotated to contain the target P17252 as part of the “protein family” target is found in classes: Ser_Thr, Agc, Pkc, Alpha, as well as Camk and Pkd P17252 as part of the “protein family” target is found in the ENZYME classes: EC 2.7.10.2 and EC 2.7.11.13
Target Classification: GO terms Target Classification method returns all GO classes that have been annotated to contain the target. Similar to ChEMBL, super classes are returned.
Query for protein target PKC alpha (P17252) Classes of compounds that have pharmacology with a target: ChEBI results Query for protein target PKC alpha (P17252) Target query: PKC Alpha Compound class result: monohydroxyquinoline
Members of ChEBI Compound Class: monohydroxyquinoline Compound Class Members List: Query for monohydroxyquinoline (ChEBI_38775) The members of the compound class of monohydroxyquinoline (indacaterol, chloroxine, etc) can be used In Compound Classifications methods
Compound Classification: Indacaterol Compound Classification: Query for compound Indacaterol (OPS1278639) Concepts from the 3 branches of the ontology can be returned. Has role concepts: beta-adrenergic agonist and bronchodilator agent Chemical entity (type) concepts: quinoline
Hierarchical Activity Clustering Use Case Sorafenib: 2755 Assay Points total Activity only cutoff returned 707 results Human only cutoff returned 634 results Single Protein only cutoff returned 482 results 120 distinct human targets But what does the distribution look like? When you run pharmacology on a promiscuous compound you’re going to get hits to a lot of different targets- You may want to cluster those hits by family, to see whether the hits were concentrated in sets of protein types- This example starts the compound Sorafenib, and filters for human, activity type pharmacology, and single proteins only – which reduces the pool to120 distinct protein-compound links-
-Developed a small script, for each target, get the hierarchy [A => B => C] and set a counter for each level [A=1, B=1, C=1] Then, get the next protein hit, and get the full hierarchy: [A => B => D] And set a counter for each level [A=2, B=2, C=1, D=1]- -Results in “tree-node to count distinct target proteins"- -Use the basic API functionality (Hierarchies: Parent nodes) for each node, to get the full path from tree top and concatenated it Put path->count in excel and sorted- Shows that the target proteins group on the two main kinase branches, and other specific nodes s such as Tie, Src, Eph where there are many hits.
Extra Methods Map URL: visualization of mappings between URLs Data sources: visualization of VoID for integrated data sources