PubChem—Substance, Compound, BioAssay Part 3: Essentials.

Slides:



Advertisements
Similar presentations
Scientific & technical presentation JChem Cartridge for Oracle
Advertisements

EBI is an Outstation of the European Molecular Biology Laboratory. PDBeChem The Ligand Database.
Introduction to PubMed® (pubmed.gov)
1.
Structure Determination: MS, IR, NMR (A review)
Lipinski’s rule of five
SciFinder Scholar Gary Wiggins IU School of Informatics.
NATIONAL LIBRARY OF MEDICINE The PubMed ID and Entrez, PubMed and PubMed Central Edwin Sequeira National Center for Biotechnology Information June 21,
Cheminformatics II Apr 2010 Postgrad course on Comp Chem Noel M. O’Boyle.
Jeffery Loo NLM Associate Fellow ’03 – ’05 chemicalinformaticsforlibraries.
IST Computational Biology1 Information Retrieval Biological Databases 2 Pedro Fernandes Instituto Gulbenkian de Ciência, Oeiras PT.
Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Copyright  1996, 1999, All rights reserved.
The Protein Data Bank (PDB)
Design of Small Molecule Drugs Targeted to RNA RNA Ontology Group May
Comparing protein structure and sequence similarities Sumi Singh Sp 2015.
ATOMIC MASS & AVERAGE ATOMIC MASS
A ‘How To’ on Reproducing Data Obtained During The CHEM6128: Mini Project.
Molecular Descriptors
Stoichiometry Quantitative nature of chemical formulas and chemical reactions Chapter 3 (Sections )
Document databases in medicine. Alpe Adria Master Course :: Medical Informatics :: Dr. J. Dimec: Document databases in medicine.2 Bibliographic databases:
X-ray crystallography NMR cryoEM Experimental approaches for structural biology.
Organic Mass Spectrometry
Atoms, Elements, and Compounds- Chapter 6
Introduction to Chemoinformatics Irene Kouskoumvekaki Associate Professor December 12th, 2012 Biological Sequence Analysis course.
NCBI’s Bioinformatics Resources Michele R. Tennant, Ph.D., M.L.I.S. Health Science Center Libraries U.F. Genetics Institute January 2015.
Doug Raiford Lesson 3.  More and more sequence data is being generated every day  Useless if not made available to other researchers.
1 Chemical Analysis by Mass Spectrometry. 2 All chemical substances are combinations of atoms. Atoms of different elements have different masses (H =
ChemModLab: A Web-based Cheminformatics Modeling Laboratory S. Stanley Young + ECCR and ChemSpider Teams.
Standards for Digital Data Representation 1) The IUPAC/NIST Chemical Identifier 2) IUPAC Terminology NSF Workshop Constructing a Kinetics Database NIST,
Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:
Copyright OpenHelix. No use or reproduction without express written consent1.
Virtual Screening C371 Fall INTRODUCTION Virtual screening – Computational or in silico analog of biological screening –Score, rank, and/or filter.
Alessandro Pedretti MetaPies, an annotated database for metabolism analysis and prediction: results and future perspectives L’Aquila November 21, 2011.
EBI is an Outstation of the European Molecular Biology Laboratory. MSDchem and the chemistry of the wwPDB EMBO 22nd-26th September 2008 EMBL-EBI Hinxton.
NCBI Literature Databases: PubMed
F LORIDA I NTERNATIONAL U NIVERSITY Advanced Mass Spectrometry Piero R. Gardinali/Yong Cai/ Bruce McCord Revised on August 23, 2009.
Molecular Mass. Mass of Atom Measured in atomic mass unit (amu) 1 amu = 1.66 x g Defined by assigning the mass of 12 amu to the carbon-12 isotope.
Section 6.1 Atoms and Moles 1.To understand the concept of average mass 2.To learn how counting can be done by weighing 3.To understand atomic mass and.
Copyright OpenHelix. No use or reproduction without express written consent1.
Bioinformatics Project BB201 Metabolism A.Nasser
EMBL-EBI Chemistry & the PDB MSDchem Primary Developer: Dimitris Dimitropoulos.
EBI is an Outstation of the European Molecular Biology Laboratory. PDBeChem The Ligand Database.
Introduction to Chemoinformatics and Drug Discovery Irene Kouskoumvekaki Associate Professor February 15 th, 2013.
PubMed …featuring more than 20 million citations for biomedical literature from MEDLINE, life science journals, and online books.
PubChem: An Open Repository for Chemical Structure and Biological Activity Information Steve Bryant The NIH Biowulf Cluster: 10 Years of Scientific Supercomputing.
Use of Machine Learning in Chemoinformatics
An Introduction to NCBI & BLAST National Center for Biotechnology Information Richard Johnston Pasadena City College.
Copyright OpenHelix. No use or reproduction without express written consent1.
Reaxys Demonstration Search Chemistry 137 – Spring 2013 Grace Baysinger Head Librarian & Bibliographer, Swain Chemistry & Chemical Engineering Library.
WELCOME STUDENTS Mobile : Skype: aamarpali.puri.
MEDLINE®/PubMed® PubMed for Trainers, Fall 2015 U.S. National Library of Medicine (NLM) and NLM Training Center An introduction.
Essential Questions How does the structure of water make it a good solvent? What are the similarities and differences between solutions and suspensions?
Lecture 1: Introduction to Entrez October 16-19, 2007 NCBI PowerScripting.
PubChem—Substance, Compound, BioAssay Part 1: Essentials Principles of May 24, 2007.
Computational Challenges in Metabolomics (Part 1)
Year 11 Chemistry Relative Atomic Masses Mass Spectrometry.
WRITING AND NAMING CHEMICAL FORMULAS. STANDARDS Predict chemical formulas based on the number of valence electrons and oxidation numbers Name and write.
Keeping Current: Genetics Resources. This workshop will provide an overview of NCBI resources for finding-- Background information & journal articles.
Introduction to PubChem BioAssay
Lipinski’s rule of five
Open PHACTS 1.3 Release ( triples)
Introduction to PubChem BioAssay
Dimitris Dimitropoulos
Daylight and Discovery
Mobilizing EPA’s CompTox Chemistry Dashboard Data on Mobile Devices
Virtual Screening.
Biochemistry Biology Review L – Properties of Water Discuss the special properties of water that contribute to Earth's suitability as an environment.
FUNDAMENTALS OF CHEMISTRY
Presentation transcript:

PubChem—Substance, Compound, BioAssay Part 3: Essentials

PubChem—Substance, Compound, BioAssay Global Entrez Search Page All[Filter]

PubChem—Substance, Compound, BioAssay Overall Goal: An on-line resource providing comprehensive information on the biological activities of small molecules

PubChem—Substance, Compound, BioAssay Why Are Small Molecules Important?  Constituents to all macromolecules (DNA, RNA, protein, carbohydrates, etc.)  Serve as cofactors and signaling molecules to thousands of proteins  The chemistry part of “biochemistry”  Most drug entities and drug types are small molecules  Most biomarkers used in clinical chemistry are small molecules

PubChem—Substance, Compound, BioAssay PubChem Databases and Tools:

PubChem—Substance, Compound, BioAssay Chemical Diversity Technology Development Screening Instrumentation Assay Development Predictive ADMET Compound Repository (MLSMR) Informatics Chem- informatics Research Centers The Molecular Libraries Roadmap: An Integrated Initiative Molecular Libraries Screening Centers Network ( M L S C N )

PubChem—Substance, Compound, BioAssay PubChem =  Repository for small molecules and bioactivity assay data  Part of Entrez search and linking system  Links to other NCBI databases, e.g., PubMed, MeSH Protein structures (MMDB) Protein/Nucleotide sequences (GenPept/GenBank)  Contains complete chemical structures  Standardized for uniformity  Small set of computed properties  Structure similarity searching

PubChem—Substance, Compound, BioAssay and more… Other Depositors to PubChem

PubChem—Substance, Compound, BioAssay PubChem: Bird’s Eye View Depositors PubChem BioAssays PubChem Compound PubChem Substance Chemical Structure Similarity

PubChem—Substance, Compound, BioAssay How does data get into PubChem?

PubChem—Substance, Compound, BioAssay PubChem integration in Entrez Protein Sequences Literature VAST Structure Similarity Bioactivity Assay Results Small Molecule Structures 3D Structures Term Frequency Statistics Chemical Structure Similarity Activity Profile Similarity

PubChem—Substance, Compound, BioAssay

Primary Database

PubChem—Substance, Compound, BioAssay Depositor Data No “Global” rules or standards –Based on organizational needs –Lots of data overlap –Often based on individual Scientist preferences PubChem accepts data from many organizations –Previously unseen data representation –Combinatorial explosion of ways for drawing the same structure

PubChem—Substance, Compound, BioAssay Redundancy, mixtures Mixture

PubChem—Substance, Compound, BioAssay Derivative Database

PubChem—Substance, Compound, BioAssay Chemical Structures may be represented in many different ways

PubChem—Substance, Compound, BioAssay Chemical Structures may be represented in many different ways

PubChem—Substance, Compound, BioAssay Compound Substance

PubChem—Substance, Compound, BioAssay Known stereochemistry Unknown stereo Unknown E/Z isomers Compound Substance

PubChem—Substance, Compound, BioAssay Most molecules come out right, even complex ones Vancomycin Need to fix heme bond orders Result Sometimes there is a need to fix problems, e.g. bond orders PDB lacks chemical detail –no bond order information –no hydrogens Substances (heterogens) from Protein 3D structures (PDB) Deposited structure receives –bond information –hydrogens –stereochemistry (where possible) Dopamine

PubChem—Substance, Compound, BioAssay PubChem Compound Processing Chemical Data Verification –Atom description (label, element?) –Functional group clean-up –Atom valence verification to prevent non-sense “Normalize” and “Standardize” –Valence-Bond canonicalize (for Tautomer invariance) –Aromaticity detection and self-consistency –Stereochemistry detection –Explicit hydrogen assignment Calculation –2-D Coordinate generation –Image Depictions –Fingerprints –IUPAC Name –SMILES, InChI, Hash Codes –xLogP, TPSA, HBD, HBA, MW, MF

PubChem—Substance, Compound, BioAssay Chemical Structure “Sanitization”  Chemical Structures that fail Sanitization  Are not part of the aggregated PubChem Compound Database  Still “searchable” via PubChem Substance Database  Keeps the PubChem Compound Database “Clean” for Chemical Informatic Analysis  Collapses structures represented in various ways into a uniform, identical representation

PubChem—Substance, Compound, BioAssay Compound for mixture Component compounds

PubChem—Substance, Compound, BioAssay Components of a mixture

PubChem—Substance, Compound, BioAssay Substance vs. Compound Substance summary Compound summary

PubChem—Substance, Compound, BioAssay Substance vs. Compound

PubChem—Substance, Compound, BioAssay "InChI=1/Ca.3H2O/h;3*1H2/q 2;;;/p-3/fCa.3HO/h;3*1h/qm;3*-1"[InChI]  200[MW]  300:500[MW]  “ dopamine”[CompleteSynonym]  “ pcsubstance structure"[Filter]  “ ca"[Element] AND 300:500[MW] AND "chemidplus"[SourceName]  "lipinski"[Filter] AND "antineoplastic agents"[PharmAction] Examples of queries Lipinski rule of 5 -- a molecule is likely to be bioactive if it has: not more than 5 hydrogen bond donors (OH and NH groups) <10 hydrogen bond acceptors (N or O) a molecular weight under 500 a LogP under 5

PubChem—Substance, Compound, BioAssay All [ALL] -- All of the following fields are searched; default search field. Uid[UID] -- The integer represents SID for PCSubstance database. By default, an integer without a field alias is recognized as a UID. Same as [SID]. Filter [Filter] -- Limits the records to various indexed filters. ActiveAid [AA] -- Active BioAssay identifier, integer. ActiveAidCount [AC, ACNT] -- # bioassays where tested active. AtomChiralCount [ACC, ACCNT] -- Total count of chiral atoms in a given compound. BioAssayID [BAID, AID] -- BioAssay identifier. BondChiralCount [BCC, BCCNT] –- Number of chiral bonds. Comment [CMT] -- Substance or bioassay comment. CompleteSynonym [CSYN, CSYNO] – exactly matching name for substance/compound. CompoundID [CID] -- Compound identifier, integer. DepositDate [DDAT, DEPDAT] -- Deposition timestamp for a substance. Element [ELMT, EL] -- Chemical element in a substance/compound. ExactMass [EMAS, EXMASS]-- The calculated mass of an ion or a molecule containing most likely isotopic composition for a single random molecule, corresponding to mass of most intense ion/molecule peak in a MS spec. A real number. HeavyAtomCount [HAC, HACNT] -- Atom count in a compound except hydrogen, integer. HydrogenBondAcceptorCount [HBAC, HBACNT] -- Hydrogen bond acceptors for a compound, integer. HydrogenBondDonorCount [HBDC, HBDCNT] -- Hydrogen bond donors for a compound, integer. InChI [inchi] -- IUPAC International Chemical Identifier. Examples of PubChem Index Fields …

PubChem—Substance, Compound, BioAssay IUPACName [UPAC, IUPAC] -- Standard IUPAC name for compound. MeSHDescription [MHD] MeSHTerm [MSHT, MESHT] -- Medical Subject Heading term. MeSHTreeNode [MSHN, MESHTN] -- Medical Subject Heading tree node (tree structures). MolecularWeight [MW, MWT, MOLWT] -- Mass of a molecule calculated using the average mass of each element weighted for its natural isotopic abundance. E.g., Carbon has two natural isotopes 12 and 13 with relative abundances of 98.9% and 1.1% to yield an average mass of g/mol. A real number. MonoisotopicMass [MMAS, MIMASS] -- Mass of a molecule calculated using the mass of the most abundant isotope of each element. E.g., Carbon has a monoisotopic mass of g/mol. A real number. PharmAction [PHMA, PHARMA] -- MeSH pharmacological actions heading. RotatableBondCount [RBC, RBCNT] – Number of rotatable bonds. SourceCategory [SRCC, SRCCAT, SRCCATG] -- Depositor categories. SourceID [SRID, SRCID] -- Depositor's external id. SourceName [SRC, SRCNAM, SRCNAME] -- official depositor name. SubstanceID [SID] -- Substance ID. Same as [UID]. Synonym [SYNO] -- Synonyms for substance. TautomerCount [TC, TCNT, TTMC] -- Possible tautomer count for each given structure, ≤ 200. TotalFormalCharge [TFC, CHG, CHRG] -- Total formula charge. TPSA [TPSA] -- Topological Polar Surface Area. XLogP [XLGP, LOGP] Examples of PubChem Index Fields, contd.

PubChem—Substance, Compound, BioAssay Preview/Index Tab

PubChem—Substance, Compound, BioAssay History Tab Substances of MW Da having antineoplastic properties and obeying Lipinski rule of 5

PubChem—Substance, Compound, BioAssay Links For the whole set or only selected records

PubChem—Substance, Compound, BioAssay Property Report

PubChem—Substance, Compound, BioAssay SDF format

PubChem—Substance, Compound, BioAssay

Medical Subject Headings (MeSH)  MeSH is the National Library of Medicine's controlled vocabulary thesaurus.  Consists of sets of terms naming descriptors in a hierarchical and alphabetic structure, e.g.: "Mental Disorders”, “Pharmacological action”, “Catecholamine hormones”, etc.  Permits searching at various levels of specificity  MeSH thesaurus is used for indexing articles for the MEDLINE/PubMed database  MeSH is continually updated  PubChem assigns MeSH headings to Compound records

PubChem—Substance, Compound, BioAssay  Contains bioactivity screens of chemical substances described in PubChem Substance  Provides searchable descriptions of each bioassay, including descriptions of the conditions and readouts specific to a screening protocol  Depositor decides on data definitions and interpretation  Data can be plotted as graphs of statistical histograms  Cross-indexed to other Entrez databases Primary Database

PubChem—Substance, Compound, BioAssay

Click to view structure

PubChem—Substance, Compound, BioAssay

NCBI FTP >> PubChem Folder

PubChem—Substance, Compound, BioAssay Entrez PubChem: Help and Tabs

PubChem—Substance, Compound, BioAssay PubChem is part of NIH Molecular Libraries Roadmap for Medicine Initiative PubChem consists of 3 databases, Substance, Compound and BioAssay, and a poweful Structure Search engine Substance = samples; Compounds = calculated structures, properties PubChem is integrated into NCBI’s Entrez Search and Linking system of databases Records are indexed using number of terms Records are linked to each other and to other databases at NCBI Brief Summary

PubChem—Substance, Compound, BioAssay For More Information…

PubChem—Substance, Compound, BioAssay For More Information… General Telephone: Voice: +1 (301) Fax: +1 (301) addresses The (free!) NCBI Newsletter The NCBI Handbook The NCBI Education Page Follow the link from the NCBI Home Page