Open PHACTS 1.3 Release ( triples)

Slides:



Advertisements
Similar presentations
Annotation of Gene Function …and how thats useful to you.
Advertisements

Open PHACTS Easy API Community Workshop, June 25, 2014 Christine Chichester Swiss Institute of Bioinformatics.
Antonis Loizou (some slides created by Paul Groth) VU University Amsterdam LDBC TUC Meeting.
Integration of Protein Family, Function, Structure Rich Links to >90 Databases Value-Added Reports for UniProtKB Proteins iProClass Protein Knowledgebase.
1 Welcome to the Protein Database Tutorial This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
The IntAct Database Sandra Orchard & Birgit Meldal.
5 EBI is an Outstation of the European Molecular Biology Laboratory. Master title Molecular Interactions – the IntAct Database Sandra Orchard EMBL-EBI.
Kate Milova MolGen retreat March 24, Microarray experiments: Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Biological Databases Notes adapted from lecture notes of Dr. Larry Hunter at the University of Colorado.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
Today’s menu: -SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProt Jennifer McDowall, Ph.D. Senior InterPro Curator Protein Sequence Database:
Protein and Function Databases
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
An introduction to using the AmiGO Gene Ontology tool.
1 iProLINK: An integrated protein resource for literature mining and literature-based curation 1. Bibliography mapping - UniProt mapped citations 2. Annotation.
Metagenomic Analysis Using MEGAN4
Erice 2008 Introduction to PDB Workshop From Molecules to Medicine: Integrating Crystallography in Drug Discovery Erice, 29 May - 8 June Peter Rose
1 SRI International Bioinformatics Large-Scale Metabolic Network Alignment: MetaCyc and KEGG Tomer Altman Bioinformatics Research Group SRI International.
Paul Groth VU University Amsterdam Convergence Meeting: Semantic Interoperability for Clinical Research & Patient.
Information Need Question Understanding Selecting Sources Information Retrieval and Extraction Answer Determina tion Answer Presentation This work is supported.
Protein 3D-structure analysis Exercises. Practicals Find update frequency for RCSB PDB: weekly. When was the last update? How many protein structures.
1 Welcome to the GrameneMart Tutorial A tool for batch data sequence retrieval 1.Select a Gramene dataset to search against. 2.Add filters to the dataset.
Copyright OpenHelix. No use or reproduction without express written consent1.
Gene Expression and Regulation
Intralab Workshop - Reactome CMAP Chang-Feng Quo June 29 th, 2006.
1 Bio-Trac 40 (Protein Bioinformatics) October 8, 2009 Zhang-Zhi Hu, M.D. Associate Professor Department of Oncology Department of Biochemistry and Molecular.
Use of Machine Learning in Chemoinformatics Irene Kouskoumvekaki Associate Professor December 12th, 2012 Biological Sequence Analysis course.
ChEMBL– Open Access Database For Drug Discovery By – Udghosh Singh M.S.(Pharm), 3 rd Sem Pharmacoinformatics.
Help: Strain Page Header Yeast ORF deletion: _d suffix : dubious ORF _p suffix : putative (uncharacterized) ORF Gene/Protein: The established name for.
Copyright OpenHelix. No use or reproduction without express written consent1.
PIRSF Classification System PIRSF: Evolutionary relationships of proteins from super- to sub-families Homeomorphic Family: Homologous proteins sharing.
Alessandro Pedretti MetaPies, an annotated database for metabolism analysis and prediction: results and future perspectives L’Aquila November 21, 2011.
Other biological databases and ontologies. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and.
Copyright OpenHelix. No use or reproduction without express written consent1.
Motif discovery and Protein Databases Tutorial 5.
Help: Strain Page Header Yeast ORF deletion: _d suffix : dubious ORF _p suffix : putative (uncharacterized) ORF Gene/Protein: The established name for.
Gene set analyses of genomic datasets Andreas Schlicker Jelle ten Hoeve Lodewyk Wessels.
Copyright OpenHelix. No use or reproduction without express written consent1.
Welcome to Gramene’s RiceCyc (Pathways) Tutorial RiceCyc allows biochemical pathways to be analyzed and visualized. This tutorial has been developed for.
Copyright OpenHelix. No use or reproduction without express written consent1 1.
Use of Machine Learning in Chemoinformatics
Copyright OpenHelix. No use or reproduction without express written consent1.
Tools in Bioinformatics Ontologies and pathways. Why are ontologies needed? A free text is the best way to describe what a protein does to a human reader.
Protein databases Petri Törönen Shamelessly copied from material done by Eija Korpelainen and from CSC bio-opas
Welcome to the GrameneMart Tutorial A tool for batch data sequence retrieval 1.Select a Gramene dataset to search against. 2.Add filters to the dataset.
Welcome to the Protein Database Tutorial. This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
OncoTrack Bioinformatics Workshop Max Planck Institute for Molecular Genetics, Berlin Wednesday 6 th November 2013 TimeSubject 13:30-15:00 Introduction.
Pathway Team SNU, IDB Lab. DongHyuk Im DongHee Lee.
Cheminformatics and Metabolism Team The EBI Enzyme Portal.
Introduction to PubChem BioAssay
Classifying Chemistry: Current Efforts in Canada
Editing Pathway/Genome Databases
Protein databases Henrik Nielsen
Exploring and Presenting Results
Take a REST from manual searching: PDBe, programmatically
Intersecting different databases to define the inner and outer limits of the data-supported druggable proteome
Data Exchange & Public Reference Data
The Complex Portal Birgit Meldal
Department of Genetics • Stanford University School of Medicine
Annotation: linking literature to gene products
Advanced PGDB Editing: Regulation GO Terms
Welcome to the Protein Database Tutorial
Tutorial: Bioinformatics Resources
Gene expression analysis
Annotation Presentation
Presentation transcript:

Open PHACTS 1.3 Release (2 701 602 484 triples) Blue font indicates new data and new data sources available in 1.3

1.3 Release Summary Integration of WikiPathways to provide a series of pathway-based API calls A refresh of ChEMBL to ChEMBL_16 Extension and better support for units and filtering of pharmacology data. Support for pChEMBL filtering Ability to query hierarchy data in queries (GO, ChEBI, ChEMBL target, ENZYME) All new chemistry processing using the Open PHACTS Chemistry registry and new chemistry "lenses" for flexible mapping Further development of KNIME and Pipeline Pilot Nodes Further development of our support portal at support.openphacts.org

Open PHACTS 1.3 Data Content

Open PHACTS 1.3 Supported Identifiers We now support an increased range of public database identifiers for proteins and compounds. Please see: openphacts.cs.man.ac.uk:9093/QueryExpander/mappingSet?lensUri=All Home page: openphacts.cs.man.ac.uk for supported identifiers including: HGNC Symbols GeneOntology ChEMBL ID Ensembl EntrezGene and many more….

Details on new methods & data

Compound API Methods

Compound Info Data SMILES, InChI, InChIKey logP, Hydrogen bond acceptor or donor, Rule of 5 violations, Polar surface area Rotatable bonds, molecular weight, molecular formula, Freebase molecular weight Biotransformation, description, protein binding, toxicity, melting point, drug name, drug type (approved, experimental, etc) Compound name

Compound Info Data Sources OPS chemical registration system (OCRS) ChEMBL (ChEMBL_16) 82 003 819 molecule dataset triples DrugBank ConceptWiki (Sept 9, 2013 ChemSpider)

Compound Class Data Sources ChEBI : Count, List, Compound Classifications GeneOntology : Compound Classifications, Target Classification ChEBML Targets: Target Classification (103 021 triples) ENZYME: Target Classification

Compound Pharmacology Data Compound data: Activity (type, value, units, comment), pChEMBL, Assay (type, organism, description), URIs (OCRS, ChEMBL, ConceptWiki, DrugBank), Drug type, Generic name, SMILES, InChI, InChIKey, molecular weight, Rule of 5, Literature (DOI, PMID) Target data: Name, organism target component, target type

Compound Pharmacology Data Sources OPS chemical registration system (OCRS) ChEMBL 184 257 585 activity dataset triples 34 571 481 assay dataset triples DrugBank (765 936 triples)

Compound Class Pharmacology Activity values for compounds from a given ChEBI compound class

Chemical Structure Mapping Methods

Chemical Structure Conversion and Search Data Conversion of InChI, InChIKey, SMILES, to OCRS URIs Chemical similarity search: type of search (Tanimoto, Tversky, Euclidian) search threshold (alpha, beta) Chemical substructure search Chemical structure exact search: options for tautomers, same skeleton in/excluding H, all isomers Relevance score for each structure search result returned

Target API Methods

Target Info Data URIs (ConceptWiki, UniProt, ChEMBL, DrugBank) Name, synonyms Sequence, Protein existence, Mass, Functional annotation, GO terms, UniProt curated Protein-Protein Interactions, links to PDB and IntAct, Number of residues, Theoretical pI, Cellular location ChEMBL target component description

Target Info Data Sources ConceptWiki (including WikiPathways, Pathway Ontology concepts) UniProt (including UniParc) GOA (update Sept 9, 2013) ChEMBL 92 473 target dataset triples 438 945 target component triples) DrugBank IntAct

Target Class Data Sources GeneOntology Annotations (GOA): Classification of Targets ChEMBL Targets: Classification of Targets ENZYME: Classification of Targets

Target Pharmacology Data Target data: Name, organism, target component, target type (API method for determining 16 different target types which can be used for filtering results) Compound data: Activity (type, value, units, comment), pChEMBL, Assay (type, organism, description), URIs (OCRS, Chembl, ConceptWiki, DrugBank), Drug type, Generic name, SMILES, InChI, InChIKey, molecular weight, Rule of 5, Literature (DOI, PMID)

Target Class Pharmacology Activity values for targets found in a given class in the supported hierarchies: ENZYME Classification ChEMBL Target Classification GeneOntology

Pathway API Methods

Pathway Info Data URIs (WikiPathways, Pathway Ontology, ConceptWiki) Title, Description, Annotations, Organism Pathway participants Compounds Proteins Literature

Pathway Info Data Sources WikiPathways contains Curated pathways, converted KEGG pathways Metabolite identifiers: HMDB, ConceptWiki, Target identifiers: GeneID, UniProt, ConceptWiki, Ensembl Publication identifiers: DOI, PMID NCBI taxonomy URIs and textual names

Open PHACTS API Methods for Hierarchies 1.3 release methods for hierarchies and classifications: ENZYME, GeneOntology, ChEBI, and ChEMBL targets

Hierarchy Methods Visualization of hierarchy data by hierarchy structure: ENZYME ChEBI ChEMBL target classification GeneOntology

Basic hierarchy API methods ENZYME EC 1.-.-.- EC 6.-.-.- ChEMBL Target Hierarchy (Root) EC 2.-.-.- EC 5.-.-.- EC 3.-.-.- EC 4.-.-.- GeneOntology Biological process Molecular function Cellular component ChEBI Subatomic particle Chemical entity Has role

Basic hierarchy API methods DNA methylation DNA methylation or demethylation regulation of gene expression, epigenetic DNA alkylation macromolecule methylation cellular response to hypoxia regulation of transcription from RNA polymerase II promoter in response to hypoxia hypoxia-inducible factor-1alpha signaling pathway

Query of ChEMBL Target Classification ChEMBL Target Hierarchy Query for Protein Kinase class : CHEMBL_PC_320 646 Results Transversal of all nodes below Protein Kinase to retrieve all leaves (targets), e.g all 3 proteins in the Histk node will be returned

Closer look: HistK class members Query with classes: Atypical: CHEMBL_PC_1451 or Histk: CHEMBL_PC_267 Same 3 results for both classes: 2 different types of targets

“Protein family” target: PKC Alpha Target Class Members List: Query for Alpha class CHEMBL_PC_317 PKC alpha (P17252) is part of protein family target, Chembl2093867, which is comprised of members from several other PKC classes

Target Classification: PKC Alpha Target Classification: Query for protein target PKC alpha (P17252)

Classification API for ChEMBL targets A “protein family” target can be represented in several classes P17252 is a “single protein” target as well as part of a “protein family” target For retrieving only “single protein” targets, filter results by using the target_type restriction with the “single_protein” parameter The Target Classification API method returns all classes that have been annotated to contain the target P17252 as part of the “protein family” target is found in classes: Ser_Thr, Agc, Pkc, Alpha, as well as Camk and Pkd P17252 as part of the “protein family” target is found in the ENZYME classes: EC 2.7.10.2 and EC 2.7.11.13

Target Classification: GO terms Target Classification method returns all GO classes that have been annotated to contain the target. Similar to ChEMBL, super classes are returned.

Query for protein target PKC alpha (P17252) Classes of compounds that have pharmacology with a target: ChEBI results Query for protein target PKC alpha (P17252) Target query: PKC Alpha Compound class result: monohydroxyquinoline

Members of ChEBI Compound Class: monohydroxyquinoline Compound Class Members List: Query for monohydroxyquinoline (ChEBI_38775) The members of the compound class of monohydroxyquinoline (indacaterol, chloroxine, etc) can be used In Compound Classifications methods

Compound Classification: Indacaterol Compound Classification: Query for compound Indacaterol (OPS1278639) Concepts from the 3 branches of the ontology can be returned. Has role concepts: beta-adrenergic agonist and bronchodilator agent Chemical entity (type) concepts: quinoline

Hierarchical Activity Clustering Use Case Sorafenib: 2755 Assay Points total Activity only cutoff returned 707 results Human only cutoff returned 634 results Single Protein only cutoff returned 482 results 120 distinct human targets But what does the distribution look like? When you run pharmacology on a promiscuous compound you’re going to get hits to a lot of different targets- You may want to cluster those hits by family, to see whether the hits were concentrated in sets of protein types- This example starts the compound Sorafenib, and filters for human, activity type pharmacology, and single proteins only – which reduces the pool to120 distinct protein-compound links-

-Developed a small script, for each target, get the hierarchy [A => B => C] and set a counter for each level [A=1, B=1, C=1] Then, get the next protein hit, and get the full hierarchy: [A => B => D]  And set a counter for each level [A=2, B=2, C=1, D=1]- -Results in “tree-node to count distinct target proteins"- -Use the basic API functionality (Hierarchies: Parent nodes) for each node, to get the full path from tree top and concatenated it Put path->count in excel and sorted- Shows that the target proteins group on the two main kinase branches, and other specific nodes s such as Tie, Src, Eph where there are many hits.

Extra Methods Map URL: visualization of mappings between URLs Data sources: visualization of VoID for integrated data sources