Crowdsourced Curation of Chemistry Data. How Bad is Online Chemistry Data? Antony Williams Wolfram Summit, September 2010.

Slides:



Advertisements
Similar presentations
Vitamin K (napthoquinones)  Menaquinone- Bacteria present in the intestine  Phylloquinones- Plant sources e,g, spinach, cabbage  Menadione-synthetic.
Advertisements

Web 2.0 Collaborative Learning Tools By Dr Ken Ryba.
THE GLOBAL CHEMISTRY NETWORK David James Executive Director, Strategic Innovation Jim Iley Executive Director, Science and Education 3 rd September 2013.
ChemSpider: Searching by Chemical Name. ChemSpider  What is ChemSpider?  How to conduct a search  What do you get?
February 6, Background: Where We Are The Internet is changing the way Americans obtain news and information 55 million blogs Explosion of social.
The Royal Society of Chemistry: Advancing Excellence in the Chemical Sciences Dan Dyer Head of Sales.
Royal Society of Chemistry developments to support open drug discovery Antony Williams, Ken Karapetyan, Valery Tkachenko, Colin Batchelor Alexey Pshenichnov.
AND TO MAKE A DECISION ON WHICH EXPERIMENT TO DO, YOU WANT TO ORGANIZE YOUR CONTENT, NORMALIZE AND COMPARE, TO UNDER- STAND WHICH COMPOUND INTERACTS WITH.
Why you need this App Sean Ekins 1, Alex M. Clark 2 1 Collaborations in Chemistry, 5616 Hilltop Needmore Road, Fuquay Varina, NC 27526, U.S.A. 2 Molecular.
Engineering Village ™ ® Basic Searching On Compendex ®
THE UNIVERSITY OF HONG KONG WEB BY DANIEL CHURCHILL 2.0.
1. Scopus Update November 2004 American University of Beirut Presented by:Amanda Hart Date: 11 November 2004.
The Royal Society of Chemistry Support for chemical science students Katie Dryden-Holt 7 December 2011 Imperial College.
 2008 Pearson Education, Inc. All rights reserved What Is Web 2.0?  Web 1.0 focused on a relatively small number of companies and advertisers.
CHM338 Organic Chemistry Synthesis Paper Linda Shackle Noble Science & Engineering Library Room 130E
The collection, curation and modeling of Open Melting Point measurements August 26, th Meeting on U.S. Government Chemical Databases and Open Chemistry.
Crowdsourcing, Collaborations and Text-Mining in a World of Open Chemistry Nature Publishing Group 11/2008 Antony Williams.
How community crowdsourcing and social networking is helping to build a quality online resource for chemists.
Protecting Your Company’s Reputation Online TLTA Annual Conference & Business Meeting June 15, 2012 TLTA Annual Conference & Business Meeting June 15,
Web 2.0: Concepts and Applications 2 Publishing Online.
Crowdsourcing Chemistry for the Community – 5 Years of Experiences Antony Williams NFAIS, February 28 th 2012.
The Value of a Unique Researcher Identifier to ChemSpider Projects Antony Williams ORCID Meeting, Boston, May 18 th 2011.
Online communications for development Nick Scott 26 November 2008.
ChemSpider – A Crowdsourcing Environment for Hosting and Validating Chemistry Resources (and lessons from President Bush) Antony Williams 5th Meeting on.
1 Benjamin Perry, Venkata Kambhampaty, Kyle Brumsted, Lars Vilhuber, William Block Crowdsourcing DDI Development: New Features from the CED 2 AR Project.
E-Learning portal Learning and Development Portal.
Functions of a Database Management System
Introduction to Pharmacoinformatics
PUBLISHING ONLINE Chapter 2. Overview Blogs and wikis are two Web 2.0 tools that allow users to publish content online Blogs function as online journals.
Royal Society of Chemistry activities to develop a data repository for chemistry-specific data Aileen Day, Alexey Pshenichnov, Ken Karapetyan, Colin Batchelor,
Michelle Miller Marie Booz
Chemical Database Projects Delivered by RSC eScience at the FDA Meeting “Development of a Freely Distributable Data System for the Registration of Substances”
ChemSpider – A Combination Platform of Free Chemistry Database, Free Prediction Engines and Crowdsourcing Environment Antony Williams University of Oregon,
Big Data Supporting Drug Discovery Cautionary Tales from the World of Chemistry for Translational Informatics Valery Tkachenko RSC-CSIR/OSDD meeting Pune,
Publishing in a Many-to-Many Online Word The Gilbane Conference San Francisco April 11, 2007 Moderator is the new role model.
Searching the Chemical Literature: Reference Books and Online Resources Dr. Sheppard Chemistry 4401L.
ChemModLab: A Web-based Cheminformatics Modeling Laboratory S. Stanley Young + ECCR and ChemSpider Teams.
55 Participants Average years in education = 12 Average age = 38 3 males; 52 females Most used applications = Reviews, Polling, etc.; Events; Social Networks.
Chemical health and safety data online – data consistency Antony Williams iRAMP Meeting, Ithaca, Feb 2014.
Marrying ACD/Labs technologies to eScience Projects at the Royal Society of Chemistry Antony Williams ACD/Labs User Meeting June 2013.
The Benefits of Participation in the Social Web of Science Antony Williams Research Square October 30 th 2014.
Igniting 21st century learning ® ® © One-to-One Institute 1 Teaching & Learning in a One-to-One Environment 1 Muskegon August 16,17,18.
Integrating ICT in Secondary Gail Butler Macmillan teaching training 2010.
Vendor Session: ChemSpider, from Royal Society of Chemistry.
Room to Read Computer Program December 4 – 6, 2006.
One publisher’s perspectives on an evolving industry Grace Baynes Nature Publishing Group October 2009.
Brought to you by the Geendale ICT committee Slides can be found at sciencepw.wikispaces.com Originated from the Hawaiian language. The.
Individual Project by Nora-Marie Myers May 3, 2011 Social Media Communities in the Media King 5 Seattle The Huffington Post.
Three Internet Medias Lauren Castiglioni CSC /31/06.
Organic Chemistry Lab 318 Spring, DUE DATES Today –At beginning of lab – Synthesis of di-t-Bu-biphenyl Report –Spectroscopy Problem Set, Part II,
Geeks - FDU Library Staff Meeting - Summer 2007 Geeks Bearing Gifts Unwrapping New Technology Trends.
What is a Wiki? A wiki is an online database that can be edited by anyone with access to it. “ Wiki ” is Hawaiian meaning ‘ fast ’ or ‘ quick ’
Networking and Chemistry Final Lecture. Internet is powerful tool for chemists Hardware and software architecture of the internet. Finding scientific.
Wikis On-line Collaboration Using Wikispaces. What is a Wiki? A wiki is a type of website that allows users easily to add, remove, or otherwise edit and.
Clustering the Royal Society of Chemistry chemical repository to enable enhanced navigation across millions of chemicals Valery Tkachenko, Ken Karapetyan,
A Chemistry Data Repository to Serve Them All Antony Williams.
Structure verification and elucidation using the ChemSpider database Antony J Williams, Valery Tkachenko and Alexey Pshenichnov SERMACS, November 16 th.
General & Background InformationPractical & Useful DataDetailed, Original Research Encyclopedias Dictionaries Reference Texts Books Safety Information.
The CompTox Chemistry Dashboard: an informational data hub at the
Applying Royal Society of Chemistry Cheminformatics Skills to Support the PharmaSea Project Antony Williams, Alexey Pshenichnov, Valery Tkachenko, Ken.
Experiences in Hosting Big Chemistry Data Collections for the Community Antony Williams July 30th 2014, NIST.
Dealing with the complex challenge of managing diverse chemistry data online Antony Williams, Valery Tkachenko, Alexey Pshenichnov and Ken Karapetyan.
ORCID ID: Chemical Information in the Big Data Era: Data Quality, Data Integration and Building a Profile for Yourself as an Online.
ORCID ID: Driving needs for analytical data exchange standards and the potential impacts on the chemical sciences Antony Williams.
Who knew I would get here from there: How I became the ChemConnector
Beyond the paper resume and how to develop an online profile as a scientist Antony Williams.
Overview of open resources to support automated structure verification
NUTRITION.
Mobilizing EPA’s CompTox Chemistry Dashboard Data on Mobile Devices
The molecules that form life.
Presentation transcript:

Crowdsourced Curation of Chemistry Data. How Bad is Online Chemistry Data? Antony Williams Wolfram Summit, September 2010

A Pragmatic Vision “Build a Structure Centric Community”  Integrate chemistry across the internet based on “chemical structure”  A “structure-based hub” to information and data  Let chemists contribute their own data  Allow the community to curate/correct data

We Answer Questions for Chemists  Questions a chemist might ask…  What is the melting point of n-heptanol?  What is the chemical structure of Xanax?  Chemically, what is phenolphthalein?  What are the stereocenters of cholesterol?  Where can I find publications about xylene?  What are the different trade names for Aspirin?  What is the NMR spectrum of Benzoic Acid?  What are the safety handling issues for toluene?

Search for a Chemical…by name

Available Information…  Linked to vendors, safety data, toxicity, metabolism

Available Information….

Search for chemicals

ChemSpider Today  24.8 million structures  400 data sources  Grows daily  Community annotation and curation  We curate, edit, change, enhance data daily

Linked Data on the Web

Three Years of Experience  Internet-based chemistry is a mess!  Most public compound databases on the web are contaminated. Including ours!  The annotation/curation of data online is difficult  Most database hosts are non-responsive to feedback – “We are a host/repository of data”  Who cares?

Where is chemistry online?  Encyclopedic articles (Wikipedia)  Chemical vendor databases  Metabolic pathway databases  Property databases  Patents with chemical structures  Drug Discovery data  Scientific publications  Compound aggregators  Blogs/Wikis and Open Notebook Science

What is the Structure of Vitamin K?

MeSH – Medical Subject Headings  A lipid cofactor that is required for normal blood clotting. Several forms of vitamin K have been identified: VITAMIN K 1 (phytomenadione) derived from plants, VITAMIN K 2 (menaquinone) from bacteria, and synthetic naphthoquinone provitamins, VITAMIN K 3 (menadione). Vitamin K 3 provitamins, after being alkylated in vivo, exhibit the antifibrinolytic activity of vitamin K. Green leafy vegetables, liver, cheese, butter, and egg yolk are good sources of vitamin K

What is the Structure of Vitamin K1?

Chemical Abstracts “Common Chemistry” Database

Wikipedia

Incorrect Structures

Wow!

Lack of Stereochemistry

Does stereochemistry matter?  Distaval, Talimol, Nibrol, Sedimide, Quietoplex, Contergan, Neurosedyn, Softenon, Thalidomide

PubChem

“2-methyl-3-(3,7,11,15-tetramethylhexadec-2- enyl)naphthalene-1,4-dione”  Variants of systematic names on PubChem  2-methyl-3-[(E,7R,11R)-3,7,11,15-tetramethyl  2-methyl-3-[(E,7S,11R)-3,7,11,15-tetramethyl  2-methyl-3-[(E,7R,11S)-3,7,11,15-tetramethyl  2-methyl-3-[(E,7S,11S)-3,7,11,15-tetramethyl  2-methyl-3-[(E,11S)-3,7,11,15-tetramethyl  2-methyl-3-[(E)-3,7,11,15-tetramethyl  2-methyl-3-(3,7,11,15-tetramethyl  2-methyl-3-[(E)-3,7,11,15-tetramethyl

ChEBI – Manual Curation

What’s Methane?

What ELSE is Methane???

The EXPERTS must get it right?!

Wikipedia, C&E News, PubChem C&E News (from ACS)

Internet-Based Chemistry is a Mess  Algorithms can get you so far  Human curation is necessary  Only the crowds can help with big data… ChemSpider is approaching 25 million compounds

Search “Vitamin H”

“Curate” Identifiers

 General curation activities  Remove incorrect names  Correct spellings  Add multilingual names  Add alternative names  In 3 years over 1 million structure-identifier relationships have been validated – robotically and manually  130 people have participated in validation or annotation. “Crowds” can be quite small!

Crowdsourced “Annotations”  Registered Users can add  Descriptions/Syntheses/Commentaries  Links to articles, blogs, wikis etc  Add spectral data  Add photos  Add MP3 files  Add Videos

Data Validation – Not Vitamin K1

Data Validation – Not Beclamethasone Dipropionate DailyMed Article

Data Validation …NOT Cholesterol

Data Validation – ONE Cymarin Question Quality in Big Databases

First request to Database Hosts!  Every public compound database host should add ONE feature – “Leave Comments”

Second request to Database Hosts! Show Comments

Always Question Online Chemistry

Thank you Twitter: ChemConnector Blog: Personal Blog: SLIDES: