ChemModLab: A Web-based Cheminformatics Modeling Laboratory S. Stanley Young + ECCR and ChemSpider Teams.

Slides:



Advertisements
Similar presentations
SOMA2 – Drug Design Environment. Drug design environment – SOMA2 The SOMA2 project Tekes (National Technology Agency of Finland) DRUG2000 program.
Advertisements

Supporting Engagement in Open Access: a Publishers Perspective
Analysis of High-Throughput Screening Data C371 Fall 2004.
EBI is an Outstation of the European Molecular Biology Laboratory. PDBeChem The Ligand Database.
ChemSpider: Searching by Chemical Name. ChemSpider  What is ChemSpider?  How to conduct a search  What do you get?
Royal Society of Chemistry developments to support open drug discovery Antony Williams, Ken Karapetyan, Valery Tkachenko, Colin Batchelor Alexey Pshenichnov.
Cheminformatics II Apr 2010 Postgrad course on Comp Chem Noel M. O’Boyle.
Jeffery Loo NLM Associate Fellow ’03 – ’05 chemicalinformaticsforlibraries.
SCIENTIFIC SOLUTIONS Thomson ResearchSoft Paul Torpey April 8, 2005.
August 29, 2002InforMax Confidential1 Vector PathBlazer Product Overview.
Cloud Computing for Chemical Property Prediction Paul Watson School of Computing Science Newcastle University, UK Microsoft Cloud.
1 Searching Patents for Chemical Processes Ron Hambric Patent & Trademark Research Center Evans Library 1 st Floor, Room 105 Hours: M-F 8-10am and 1-3pm.
1 BrainWave Biosolutions Limited Accelerating Life Science Research through Technology.
The collection, curation and modeling of Open Melting Point measurements August 26, th Meeting on U.S. Government Chemical Databases and Open Chemistry.
Improving Quality with the Substance Registry Services (SRS) John Harman U.S. EPA May 14, 2009.
How community crowdsourcing and social networking is helping to build a quality online resource for chemists.
1 Chapter 1: Introduction 1.1 Introduction to SAS Enterprise Miner.
Chapter 1: Introduction
SCOPUS AND SCIVAL EVALUATION AND PROMOTION OF UKRAINIAN RESEARCH RESULTS PIOTR GOŁKIEWICZ PRODUCT SALES MANAGER, CENTRAL AND EASTERN EUROPE KIEV, 31 JANUARY.
Crowdsourced Curation of Chemistry Data. How Bad is Online Chemistry Data? Antony Williams Wolfram Summit, September 2010.
Crowdsourcing Chemistry for the Community – 5 Years of Experiences Antony Williams NFAIS, February 28 th 2012.
A ‘How To’ on Reproducing Data Obtained During The CHEM6128: Mini Project.
1 The Discovery Informatics Framework Pat Rougeau President and CEO MDL Information Systems, Inc. Delivering the Integration Promise American Chemical.
Molecular Descriptors
The Value of a Unique Researcher Identifier to ChemSpider Projects Antony Williams ORCID Meeting, Boston, May 18 th 2011.
ChemSpider – A Crowdsourcing Environment for Hosting and Validating Chemistry Resources (and lessons from President Bush) Antony Williams 5th Meeting on.
Information Need Question Understanding Selecting Sources Information Retrieval and Extraction Answer Determina tion Answer Presentation This work is supported.
Introduction to Chemoinformatics Irene Kouskoumvekaki Associate Professor December 12th, 2012 Biological Sequence Analysis course.
Citrix Partner Locator Phase I November 13, 2008.
Thomson Scientific October 2006 ISI Web of Knowledge Autumn updates.
ChemSpider – A Combination Platform of Free Chemistry Database, Free Prediction Engines and Crowdsourcing Environment Antony Williams University of Oregon,
CZ3253: Computer Aided Drug design Lecture 3: Drug and Cheminformatics Databases Prof. Chen Yu Zong Tel:
Big Data Supporting Drug Discovery Cautionary Tales from the World of Chemistry for Translational Informatics Valery Tkachenko RSC-CSIR/OSDD meeting Pune,
Personalized Web Search by Mapping User Queries to Categories Fang Liu Presented by Jing Zhang CS491CXZ February 26, 2004.
1 PowerMV Chemical Data Mining Environment S. Stanley Young Jun Feng and Jack Liu NISS MPDM, McMaster University 4 June 2005.
Use of Machine Learning in Chemoinformatics Irene Kouskoumvekaki Associate Professor December 12th, 2012 Biological Sequence Analysis course.
Chemical health and safety data online – data consistency Antony Williams iRAMP Meeting, Ithaca, Feb 2014.
Notre Dame Radiation Chemistry Data Center. Keith P. Madden Notre Dame Radiation Laboratory.
Marrying ACD/Labs technologies to eScience Projects at the Royal Society of Chemistry Antony Williams ACD/Labs User Meeting June 2013.
ChemBank Building a Public Web Resource Using Daycart Erik Brauner Head of Chemical and Biological Computing Harvard Institute of Chemistry and Cell Biology.
CAS — Bringing You the World’s Chemistry Knowledge.
EBI is an Outstation of the European Molecular Biology Laboratory. MSDchem and the chemistry of the wwPDB EMBO 22nd-26th September 2008 EMBL-EBI Hinxton.
RESEARCH – DOING AND ANALYSING Gavin Coney Thomson Reuters May 2009.
Vendor Session: ChemSpider, from Royal Society of Chemistry.
ECCR Overview/MLSCN. NIH Roadmap Series of initiatives designed to pursue major opportunities in biomedical research and gaps in current knowledge that.
EBI is an Outstation of the European Molecular Biology Laboratory. Literature Resources at the EBI Information Workshop on European Bioinformatics Resources.
Introduction to Chemoinformatics and Drug Discovery Irene Kouskoumvekaki Associate Professor February 15 th, 2013.
Copyright © 2001, SAS Institute Inc. All rights reserved. Data Mining Methods: Applications, Problems and Opportunities in the Public Sector John Stultz,
PubChem: An Open Repository for Chemical Structure and Biological Activity Information Steve Bryant The NIH Biowulf Cluster: 10 Years of Scientific Supercomputing.
New Paradigms for Broadband Data Building the Fact Base: The State of Broadband Adoption and Utilization Federal Communications Commission Kate Williams.
Use of Machine Learning in Chemoinformatics
Data Mining in Germany IIM Conference, Oct. 24, 2012 Gottfried Schwarz, DLR > Lecture > Author Document > Datewww.DLR.de Chart 1.
Progress on TripalBIMS Breeding Information Management System in Tripal Sook Jung, Taein Lee, Chun-Huai Chen, Jing Yu, Ksenija Gasic, Todd Campbell, Kate.
MEDLINE®/PubMed® PubMed for Trainers, Fall 2015 U.S. National Library of Medicine (NLM) and NLM Training Center An introduction.
Networking and Chemistry Final Lecture. Internet is powerful tool for chemists Hardware and software architecture of the internet. Finding scientific.
Structure verification and elucidation using the ChemSpider database Antony J Williams, Valery Tkachenko and Alexey Pshenichnov SERMACS, November 16 th.
Chemical Informatics and Cyberinfrastructure Collaboratory An NIH-Funded Exploratory Center for Cheminformatics Research Project of the IU School of Informatics.
Indiana University School of Indiana University ECCR Summary Infrastructure: Cheminformatics web service infrastructure made available as a community resource.
Who is NCCT? National Center for Computational Toxicology – part of EPA’s Office of Research and Development Research driven by EPA’s Chemical Safety for.
The CompTox Chemistry Dashboard: an informational data hub at the
The KNIME workflow for automated processing of PHYSPROP data
US EPA’s CompTox Chemistry Dashboard
Outline Introduction NMF Chemistry Problem
Chemical Informatics and Cyberinfrastructure Collaboratory
Overview of open resources to support automated structure verification
ISI Web of Knowledge Early updates
Mobilizing EPA’s CompTox Chemistry Dashboard Data on Mobile Devices
What is Chemical Informatics?
Consortium: National networks in 16 European countries.
Consortium: National networks in 16 European countries.
Presentation transcript:

ChemModLab: A Web-based Cheminformatics Modeling Laboratory S. Stanley Young + ECCR and ChemSpider Teams

ChemSpider : A Web-based Chemical Informatics Resource

3 What is ChemSpider? ChemSpider is a molecular structure-centric web service for chemists: ChemSpider is a molecular structure-centric web service for chemists: Chemical structure drawing, manipulation, visualization, modeling & databasing Chemical structure drawing, manipulation, visualization, modeling & databasing Web location to deposit, curate and enhance data associated with chemical structures Web location to deposit, curate and enhance data associated with chemical structures Web structure-based access to federated chemistry databases representing chemical vendors, literature, online data, patents and other forms of chemistry data Web structure-based access to federated chemistry databases representing chemical vendors, literature, online data, patents and other forms of chemistry data

4 How do people generally use ChemSpider? Searching for chemical structures, in rank order, via: Searching for chemical structures, in rank order, via: Registry numbers, trade names and synonyms. Registry numbers, trade names and synonyms. Structure identifiers such as SMILES or InChI Structure identifiers such as SMILES or InChI Intrinsic properties: commonly mass-based searches executed by mass spectrometrists Intrinsic properties: commonly mass-based searches executed by mass spectrometrists By systematic names: IUPAC or CAS Index name By systematic names: IUPAC or CAS Index name Generation of physicochemical properties Generation of physicochemical properties Text-based searching of Open Access articles Text-based searching of Open Access articles

5 ChemSpider Status August 2007 Online database of over 16.5 million structures Online database of over 16.5 million structures Systems in place for: Systems in place for: Single structure and data collection depositions Single structure and data collection depositions Association of analytical data with structures Association of analytical data with structures Ability to curate data for each individual record Ability to curate data for each individual record Indexing of and Integration to: Indexing of and Integration to: Over 70 individual databases Over 70 individual databases Patents from the US, European and Asian Patent offices Patents from the US, European and Asian Patent offices Text-based searching of over 50,000 Open Access articles Text-based searching of over 50,000 Open Access articles Over a thousand unique users access ChemSpider per day Over a thousand unique users access ChemSpider per day

6 Flexible Boolean Searching

7 Predicted Properties Details “Prozac”

8 Search result: 49 hits in 2.8 seconds

9 Integrated Visualization Tools

10 External Integrations - Wikipedia The links between Wikipedia and ChemSpider are formed automatically

11 What is ChemModLab? ChemModLab is a Web Service for building and evaluating QSAR models. ChemModLab is a Web Service for building and evaluating QSAR models. Send your data: assay results and SD file. Send your data: assay results and SD file. Use any or all of five descriptor types (2D). Use any or all of five descriptor types (2D). (Use your own descriptors) (Use your own descriptors) Use any or all of 16 statistical modeling methods. Use any or all of 16 statistical modeling methods. Predict potency of untested compound. Predict potency of untested compound.

12 Virtual Screening ChemSpider ChemModLab

13 ChemModLab Dialog (1) Data Input

14 ChemModLab Dialog (2) Five 2D Descriptor Sets

15 ChemModLab Dialogue (3) 16 Modeling Methods

16 ChemModLab Modeling Methods 16 Statistical Modeling Methods Trees: RandomForest, rpart, tree Neural networks k-nearest neighbors Support vector machines Partial least squares Partial least squares with linear discriminant analysis Least angle regression Ridge regression Elastic net Principal components regression Family ensemble of k-nearest neighbors, using 70% selection Family ensemble of tree, using 70% selection Family ensemble of rpart, using 70% selection randomForest using 70% selection

17 + ChemSpider Plan User submits data to ChemModLab to get QSAR Model(s). Model is sent to ChemSpider. ChemSpider computes a “virtual screen”. The hit-list is clustered and sent to the user.

18 Accumulation curves Compare descriptor sets, given a method

19 Accumulation Curves Compare modeling methods, given a descriptor set

20 Diversity Map Cluster Active Compounds Modeling Methods

21 Continuous Response

22 Continuous Response

23 Continuous Response

24 Model Evaluation Take detailed looks at which models? AID348 (NCGC) : KNN – Ph ENet – CAP RF – B# RF – CAP RF – FF Tree – CAP Tree – Ph Tree – FF PLS – CAP

25 Summary 1.ChemSpider is a web chemical informatics center. 2.ChemModLab is a free, web service for QSAR. 3.Together they support sophisticated virtual screening. * ChemModLab is supported by the NCI RoadMap project.

26 Group ChemSpider Group ChemModLab Team Jacqueline M. Hughes-Oliver Atina D. Brooks Gary W. Howell Kirtesh Patil Stan Young Qianyi Zhang ChemSpider Team Antony Williams (project lead) A rotating team of advisors and developers including many contributions from the Open Source community eccr.stat.ncsu.edu