Crowdsourcing, Collaborations and Text-Mining in a World of Open Chemistry Nature Publishing Group 11/2008 Antony Williams.

Slides:



Advertisements
Similar presentations
NATIONAL LIBRARY OF MEDICINE PubMed Central Edwin Sequeira National Library of Medicine May 26, 2004.
Advertisements

S.J. Coles a*, M.B. Hursthouse a, R.A. Stephenson a, P. Cliff b, E. Lyon b, M. Patel b J. Downing c & P. Murray-Rust.
Classification & Your Intranet: From Chaos to Control Susan Stearns Inmagic, Inc. E-Libraries E204 May, 2003.
Supporting Engagement in Open Access: a Publishers Perspective
Introduction to Mendeley. What is Mendeley? Mendeley is a reference manager allowing you to manage, read, share, annotate and cite your research papers...
EBI is an Outstation of the European Molecular Biology Laboratory. PDBeChem The Ligand Database.
THE GLOBAL CHEMISTRY NETWORK David James Executive Director, Strategic Innovation Jim Iley Executive Director, Science and Education 3 rd September 2013.
ChemSpider: Searching by Chemical Name. ChemSpider  What is ChemSpider?  How to conduct a search  What do you get?
1.
The Library behind the scene How does it work ? The Library behind the scenes 1 JINR / CERN Grid and advanced information systems 2012 Anne Gentil-Beccot.
Royal Society of Chemistry developments to support open drug discovery Antony Williams, Ken Karapetyan, Valery Tkachenko, Colin Batchelor Alexey Pshenichnov.
The Imperial College Tissue Bank A searchable catalogue for tissues, research projects and data outcomes Prof Gerry Thomas - Dept. Surgery & Cancer The.
SciVal Experts & SciVal Funding Information Sessions.
Overview of Library Resources Chemistry
NATIONAL LIBRARY OF MEDICINE The PubMed ID and Entrez, PubMed and PubMed Central Edwin Sequeira National Center for Biotechnology Information June 21,
Validation of chemical data on Wikipedia Martin A. Walker Dept. of Chemistry, SUNY Potsdam Member of the Wikipedia Chemistry Project.
Writing for the Internet, collaborative writing Skills (content creation): collaborative writing IT concepts: compiled versus co-authored documents, structured.
UWWD In our quest to eliminate bad websites, we present…. HALLELUJAH!!
XIS™ XML Intranet System. XIS, the XML Intranet System provides the foundation for your database production and management. XIS maximizes the flexible.
An innovative platform to allow translation and indexing of internet sites Localization World
How community crowdsourcing and social networking is helping to build a quality online resource for chemists.
Searching the Scientific Literature Douglas A. Loy.
Web 2.0: Concepts and Applications 2 Publishing Online.
Crowdsourced Curation of Chemistry Data. How Bad is Online Chemistry Data? Antony Williams Wolfram Summit, September 2010.
Crowdsourcing Chemistry for the Community – 5 Years of Experiences Antony Williams NFAIS, February 28 th 2012.
The Value of a Unique Researcher Identifier to ChemSpider Projects Antony Williams ORCID Meeting, Boston, May 18 th 2011.
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
Avalanche Internet Data Management System. Presentation plan 1. The problem to be solved 2. Description of the software needed 3. The solution 4. Avalanche.
1 The BT Digital Library A case study in intelligent content management Paul Warren
ChemSpider – A Crowdsourcing Environment for Hosting and Validating Chemistry Resources (and lessons from President Bush) Antony Williams 5th Meeting on.
Introduction to Mendeley. What is Mendeley? Mendeley is a reference manager allowing you to manage, read, share, annotate and cite your research papers...
Web 2.0: Concepts and Applications 2 Publishing Online.
PUBLISHING ONLINE Chapter 2. Overview Blogs and wikis are two Web 2.0 tools that allow users to publish content online Blogs function as online journals.
OpenURL Link Resolvers 101
Chemical Database Projects Delivered by RSC eScience at the FDA Meeting “Development of a Freely Distributable Data System for the Registration of Substances”
ChemSpider – A Combination Platform of Free Chemistry Database, Free Prediction Engines and Crowdsourcing Environment Antony Williams University of Oregon,
Big Data Supporting Drug Discovery Cautionary Tales from the World of Chemistry for Translational Informatics Valery Tkachenko RSC-CSIR/OSDD meeting Pune,
Searching the Chemical Literature: Reference Books and Online Resources Dr. Sheppard Chemistry 4401L.
ChemModLab: A Web-based Cheminformatics Modeling Laboratory S. Stanley Young + ECCR and ChemSpider Teams.
Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:
XP New Perspectives on The Internet, Sixth Edition— Comprehensive Tutorial 3 1 Searching the Web Using Search Engines and Directories Effectively Tutorial.
The Internet 8th Edition Tutorial 4 Searching the Web.
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
Marrying ACD/Labs technologies to eScience Projects at the Royal Society of Chemistry Antony Williams ACD/Labs User Meeting June 2013.
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
ITGS Databases.
CAS — Bringing You the World’s Chemistry Knowledge.
Delivering an online service for validating and standardizing chemical structure files using the ChemSpider platform.
FAMILYSEARCH INDEXING IS WORLDWIDE. INDEXING 1.WHAT IS INDEXING? - A PROCESS WHERE A PERSON CAN TRANSCRIBE DATA FROM A DIGITAL IMAGE WHICH IS THEN POSTED.
Vendor Session: ChemSpider, from Royal Society of Chemistry.
Chuck Koscher Director of Technology CrossRef ICSTI General Assembly TACC Workshop Tokyo October 19, 2014 crossref: mainstay of the scholarly communication.
WEB 2.0 PATTERNS Carolina Marin. Content  Introduction  The Participation-Collaboration Pattern  The Collaborative Tagging Pattern.
1 © Xchanging 2010 no part of this document may be circulated, quoted or reproduced without prior written approval of Xchanging. MOSS Training – UI customization.
Information Retrieval
It’s the data that makes a paper Joerg Heber Executive Editor Nature Communications.
EBI is an Outstation of the European Molecular Biology Laboratory. PDBeChem The Ligand Database.
Copyright OpenHelix. No use or reproduction without express written consent1 1.
Chapter 7 Researching Your Speech. Researching your speech: Introduction Researching your topic and providing strong evidence for your claims can make.
General Architecture of Retrieval Systems 1Adrienn Skrop.
A Chemistry Data Repository to Serve Them All Antony Williams.
Frompo is a Next Generation Curated Search Engine. Frompo has a community of users who come together and curate search results to help improve.
General & Background InformationPractical & Useful DataDetailed, Original Research Encyclopedias Dictionaries Reference Texts Books Safety Information.
Searching the Scientific Literature Douglas A. Loy.
Internet Made Easy! Make sure all your information is always up to date and instantly available to all your clients.
Overview Blogs and wikis are two Web 2.0 tools that allow users to publish content online Blogs function as online journals Wikis are collections of searchable,
Applying Royal Society of Chemistry Cheminformatics Skills to Support the PharmaSea Project Antony Williams, Alexey Pshenichnov, Valery Tkachenko, Ken.
Experiences in Hosting Big Chemistry Data Collections for the Community Antony Williams July 30th 2014, NIST.
Dealing with the complex challenge of managing diverse chemistry data online Antony Williams, Valery Tkachenko, Alexey Pshenichnov and Ken Karapetyan.
opening our collections data to the public
Overview Blogs and wikis are two Web 2.0 tools that allow users to publish content online Blogs function as online journals Wikis are collections of searchable,
Presentation transcript:

Crowdsourcing, Collaborations and Text-Mining in a World of Open Chemistry Nature Publishing Group 11/2008 Antony Williams

Building a Structure Centric Community for Chemists Imagine a time when …. The internet is searchable by chemical structure and substructure (e.g.Wikipedia, Google Scholar) The internet is searchable by chemical structure and substructure (e.g.Wikipedia, Google Scholar) Chemistry articles are indexed and searchable by a free online service Chemistry articles are indexed and searchable by a free online service The web is linked together through the “language of chemistry” The web is linked together through the “language of chemistry” Publicly funded research data can be shared and discussed in the Open, maybe as ONS? Publicly funded research data can be shared and discussed in the Open, maybe as ONS? Cheminformatics has as much of a public face as bioinformatics Cheminformatics has as much of a public face as bioinformatics

Building a Structure Centric Community for Chemists ChemSpider - A Search Engine for Chemists Questions a chemist might ask… Questions a chemist might ask… What is the melting point of n-butanol? What is the melting point of n-butanol? What is the chemical structure of Xanax? What is the chemical structure of Xanax? Chemically, what is phenolphthalein? Chemically, what is phenolphthalein? What are the stereocenters of cholesterol? What are the stereocenters of cholesterol? Where can I find publications about xylene? Where can I find publications about xylene? What are the different trade names for Ketoconazole? What are the different trade names for Ketoconazole? What is the NMR spectrum of Aspirin? What is the NMR spectrum of Aspirin? What are the safety handling issues for Thymol Blue? What are the safety handling issues for Thymol Blue? ChemSpider can answer all of these questions ChemSpider can answer all of these questions

Building a Structure Centric Community for Chemists What is a Structure? Ask a computer…ask a chemist

Building a Structure Centric Community for Chemists Tell Me About Glutathione

Building a Structure Centric Community for Chemists Tell Me About Glutathione

Building a Structure Centric Community for Chemists Tell Me About Glutathione

Building a Structure Centric Community for Chemists Tell Me About Glutathione

Building a Structure Centric Community for Chemists Tell Me About Glutathione

Building a Structure Centric Community for Chemists Tell Me About Glutathione

Building a Structure Centric Community for Chemists Link outs

Building a Structure Centric Community for Chemists Links out to KEGG Kyoto Encyclopedia of Genes and Genomes

Building a Structure Centric Community for Chemists How many names does a compound have?

Building a Structure Centric Community for Chemists ChemSpider Data Content Over 21.5 million unique chemical structures from ca. 150 data sources Over 21.5 million unique chemical structures from ca. 150 data sources Online Databases –PubChem, Drugbank, KEGG, Wikipedia Online Databases –PubChem, Drugbank, KEGG, Wikipedia Literature – PubMed, J Het Chem, Nature, RSC, Open Access Literature – PubMed, J Het Chem, Nature, RSC, Open Access Chemical Vendors – over 40 different vendors and growing Chemical Vendors – over 40 different vendors and growing Personal Depositions – individual contributions Personal Depositions – individual contributions Content database vendors Content database vendors Analytical data collections Analytical data collections Patents Patents Web scraping Web scraping Content is linked back to the original data sources

Building a Structure Centric Community for Chemists Other Searches What compounds have a mass of 300+/-0.001? What compounds have a mass of 300+/-0.001? or search a combination of intrinsic/predicted properties or search a combination of intrinsic/predicted properties

Building a Structure Centric Community for Chemists Other Searches

Building a Structure Centric Community for Chemists Complex Search

Building a Structure Centric Community for Chemists The Quality of Data Online… Aggregating data opens up quality issues Aggregating data opens up quality issues Structure-identifier associations are “dirty” Structure-identifier associations are “dirty” Structures are COMMONLY incorrect Structures are COMMONLY incorrect Manual curation of small databases is enough work – what about millions of structures? Manual curation of small databases is enough work – what about millions of structures? Structures are far from perfect. What is a “correct structure”? Structures are far from perfect. What is a “correct structure”? Full stereochemistry? Full stereochemistry? Historical timeline of structure? Historical timeline of structure? Who is the authority? Who is the authority?

Building a Structure Centric Community for Chemists Who holds THE Quality Authority? Chemical Abstracts Service is the structural authority today employees, world standard in chemistry information Chemical Abstracts Service is the structural authority today employees, world standard in chemistry information 101 years of knowledge, process and expertise. 101 years of knowledge, process and expertise. How can an online, free access system peacefully co- exist with the authority? How can an online, free access system peacefully co- exist with the authority?

Building a Structure Centric Community for Chemists Quality is a Major Issue- Search Butanol OLD EXAMPLE..now fixed

Building a Structure Centric Community for Chemists Wikipedia Chemistry Curation project Only ca organic structures, 7000 total structures Only ca organic structures, 7000 total structures Almost a year of work so far for a team of 6 people Almost a year of work so far for a team of 6 people Many errors removed in the process. Curation process is a daily event for users/depositors Many errors removed in the process. Curation process is a daily event for users/depositors Slow and torturous process Slow and torturous process IUPAC_Name_and_structure IUPAC_Name_and_structure IUPAC_Name_and_structure IUPAC_Name_and_structure

Building a Structure Centric Community for Chemists Wikipedia Curation Looking for self-consistency across a Wikipedia Page Looking for self-consistency across a Wikipedia Page Primary key is the article TITLE Primary key is the article TITLE The chemical shown needs to match the title The chemical shown needs to match the title Cyclic self-consistency – and decisions must get made Cyclic self-consistency – and decisions must get made

Building a Structure Centric Community for Chemists Viagra or Sildenafil

Building a Structure Centric Community for Chemists Other issues…

Building a Structure Centric Community for Chemists Charges

Sugars – Machine Readable vs Aesthetics Haworth Stereo Fischer

Building a Structure Centric Community for Chemists Wikipedia – Crowdsourcing Chemistry

Building a Structure Centric Community for Chemists Thymol Blue on ChemSpider Data online includes: Data online includes: UV-vis spectrum UV-vis spectrum Measured experimental properties Measured experimental properties Link to Wikipedia article Link to Wikipedia article Links to chromatography details Links to chromatography details Multiple identifiers/trade names etc. Multiple identifiers/trade names etc. Links to vendors/suppliers/other databases Links to vendors/suppliers/other databases Safety information Safety information

Building a Structure Centric Community for Chemists Differences between ChemSpider/Wikipedia ChemSpiderWikipedia >21 million unique structures ~5000 organics, 2000 others Complex queries – Properties, Text, structure/substructure, OA publishers, Data Sources, … Text Prediction of properties No Analytical Data No, but links. Active depositors/curators – 30 Active editors > 50 (?) 6000 people/day; 1900 registered ???? Compound monographs linked Detailed compound monographs

Building a Structure Centric Community for Chemists Differences between Wikipedia/ChemSpider WikipediaChemSpider Supported by tried and tested Media-Wiki platform. Primarily Microsoft.NET technologies with OS components Established infrastructure and Wikipedia Foundation Team “Out of a basement” on three servers and 5 volunteers Chemistry is a subset of the ‘Pedia Chemistry is the focus of ‘Spider GFL licensing for everything Mixed “licensing” Strong team of WP:Chem advocates, curators and admins Growing team of advocates, curators and users Worldwide reputation as quality source – good and bad Growing reputation as focused on quality

Building a Structure Centric Community for Chemists Crowd-sourcing Curation How to curate data for millions of structures? How to curate data for millions of structures? Robot processes can clean up depositions Robot processes can clean up depositions Search for Chloride and check molecular formula for Cl Search for Chloride and check molecular formula for Cl Check for stereochemistry and remove names with stereo Check for stereochemistry and remove names with stereo Provide a simple-to-use platform to curate, annotate and tag data Provide a simple-to-use platform to curate, annotate and tag data Provide curator administration to prevent vandalism (Veropedia) Provide curator administration to prevent vandalism (Veropedia)

Building a Structure Centric Community for Chemists Post Comments Anyone can “Post Comments” associated with a structure. To curate data we require login to track Anyone can “Post Comments” associated with a structure. To curate data we require login to track

Building a Structure Centric Community for Chemists Multi-level Curation and Approval

Building a Structure Centric Community for Chemists Crowd-sourcing Chemistry Crowd-sourced curation: identify and tag errors, edit names, synonyms, identify records for deprecation Crowd-sourced curation: identify and tag errors, edit names, synonyms, identify records for deprecation ALSO ALSO Crowd-sourced deposition: anyone can deposit data (structures, text, images, analytical data) Crowd-sourced deposition: anyone can deposit data (structures, text, images, analytical data)

Building a Structure Centric Community for Chemists DailyMed

Quality of Structures

Building a Structure Centric Community for Chemists Quality of Structures!!!

Building a Structure Centric Community for Chemists Structure-Centric We want to search “information” by structure, substructure, similarity of structure We want to search “information” by structure, substructure, similarity of structure Specific focus on Open Chemistry at present Specific focus on Open Chemistry at present Standard approaches would be: Standard approaches would be: Identify chemical names “entity extraction” Identify chemical names “entity extraction” Convert chemical names to structures and index Convert chemical names to structures and index ChemSpider has a validated dictionary of structure-name pairs ChemSpider has a validated dictionary of structure-name pairs Use name extraction, name-conversion and dictionary look- up. THEN curate. Use name extraction, name-conversion and dictionary look- up. THEN curate.

Building a Structure Centric Community for Chemists “Entity Extraction” Rule-based recognition of systematic names: Rule-based recognition of systematic names: Use a lexeme of name fragments Use a lexeme of name fragments Rules for identifying bounds of a name Rules for identifying bounds of a name Look-up dictionary: Look-up dictionary: Drug Names Drug Names Trivial Names Trivial Names Numbers : Registry IDs, EINECS/ELINCS Numbers : Registry IDs, EINECS/ELINCS Massive look-up dictionary of validated identifiers on ChemSpider Massive look-up dictionary of validated identifiers on ChemSpider

Building a Structure Centric Community for Chemists

Name Recognition Azo aldehyde 2 was synthesized according to a reported method [17]. To a stirred solution of azo aldehyde 2 (1.08 g, 3.76 mmol ) in dry CH2Cl2 (30.00 mL) at 0 oC were successively added (3,4-diaminophenyl)phenyl methanone 1(0.40 g, 1.88 mmol) and a excces of anhydrous MgSO4 (2.00 g,16.67 mmol). Azo aldehyde 2 was synthesized according to a reported method [17]. To a stirred solution of azo aldehyde 2 (1.08 g, 3.76 mmol ) in dry CH2Cl2 (30.00 mL) at 0 oC were successively added (3,4-diaminophenyl)phenyl methanone 1(0.40 g, 1.88 mmol) and a excces of anhydrous MgSO4 (2.00 g,16.67 mmol). The resulting mixture was stirred for 6 hours at room temperature [18]. The mixture was filtered and washed with dichloromethane. Then the solvent was evaporated under reduced pressure to give azo Schiff base 3 as a red solid which was recrystalized from ethanol 95% (1.28 g, 91 %)

Building a Structure Centric Community for Chemists Name Recognition Azo aldehyde 2 was synthesized according to a reported method [17]. To a stirred solution of azo aldehyde 2 (1.08 g, 3.76 mmol ) in dry CH2Cl2 (30.00 mL) at 0 oC were successively added (3,4-diaminophenyl)phenyl methanone 1(0.40 g, 1.88 mmol) and a excess of anhydrous MgSO4 (2.00 g,16.67 mmol). Azo aldehyde 2 was synthesized according to a reported method [17]. To a stirred solution of azo aldehyde 2 (1.08 g, 3.76 mmol ) in dry CH2Cl2 (30.00 mL) at 0 oC were successively added (3,4-diaminophenyl)phenyl methanone 1(0.40 g, 1.88 mmol) and a excess of anhydrous MgSO4 (2.00 g,16.67 mmol). The resulting mixture was stirred for 6 hours at room temperature [18]. The mixture was filtered and washed with dichloromethane. Then the solvent was evaporated under reduced pressure to give azo Schiff base 3 as a red solid which was recrystalized from ethanol 95% (1.28 g, 91 %)

Building a Structure Centric Community for Chemists How Many Chemical Names? “She had the drive to derive success in any venture and was well versed in Karate. When the man in the tartan shirt approached her with a dagger in his hand she spat in his face, took the stance of a commando and took advantage of his shock to release the dagger from his grip and causing him to recoil. He went home and took an aspirin after the beating.”

Building a Structure Centric Community for Chemists How Many Chemical Names? “She had the drive to derive success in any venture and was well versed in Karate. When the man in the tartan shirt approached her with a dagger in his hand she spat in his face, took the stance of a commando and took advantage of his shock to release the dagger from his grip and causing him to recoil. He went home and took an aspirin after the beating.” drivesuccessversedKarateintartandaggerspatof commandoadvantagereleaserecoilHeaspirinthedrivesuccessversedKarateintartandaggerspatof commandoadvantagereleaserecoilHeaspirinthe

Building a Structure Centric Community for Chemists ChemMantis Chemical Markup And Nomenclature Transformation Integrated System Chemical Markup And Nomenclature Transformation Integrated System

Building a Structure Centric Community for Chemists Making Open Access Articles Searchable Proof of Concept Can we HOST Chemistry Open Access articles on ChemSpider and add-value Can we HOST Chemistry Open Access articles on ChemSpider and add-value Can we identify chemical names in Open Access articles in a user-friendly manner Can we identify chemical names in Open Access articles in a user-friendly manner Can we convert names to structures in Open-Access articles and expand ChemSpider and provide structure searching of Open Access chemistry articles? Can we convert names to structures in Open-Access articles and expand ChemSpider and provide structure searching of Open Access chemistry articles? Can we provide an environment for chemists to mark-up their own articles and crowd-source markup of an archive? Can we provide an environment for chemists to mark-up their own articles and crowd-source markup of an archive?

Building a Structure Centric Community for Chemists Document markup ChemSpider now hosting Open Access articles from MDPI, Molecular Diversity Preservation International ChemSpider now hosting Open Access articles from MDPI, Molecular Diversity Preservation International Hosting the Molbank collection at present Hosting the Molbank collection at present

Building a Structure Centric Community for Chemists A Standard for Document Markup? NLM-DTD: National Library of Medicine; Document Type Definition NLM-DTD: National Library of Medicine; Document Type Definition Approved markup definitions to apply to journal articles – extended as necessary for our purposes Approved markup definitions to apply to journal articles – extended as necessary for our purposes

Building a Structure Centric Community for Chemists NLM/DTD markup

Building a Structure Centric Community for Chemists Chemistry and Biology Menus can be extended as necessary Menus can be extended as necessary

Building a Structure Centric Community for Chemists Document markup

Building a Structure Centric Community for Chemists Markup – 3 seconds!

Building a Structure Centric Community for Chemists On the fly conversion

Building a Structure Centric Community for Chemists Shorthand Formulae Supported

Building a Structure Centric Community for Chemists One Click to more Info…

Building a Structure Centric Community for Chemists Structure Image Conversion

Building a Structure Centric Community for Chemists Two Seconds Later

Building a Structure Centric Community for Chemists Not Always Perfect….

Building a Structure Centric Community for Chemists A Platform for Markup Can we provide a platform for document markup for chemists? Can we provide a platform for document markup for chemists? Workflow: Workflow: Upload word docs, RTF files or point to HTML and load Upload word docs, RTF files or point to HTML and load Apply entity extraction, convert names to structures, mark-up automatically and ask for user participation Apply entity extraction, convert names to structures, mark-up automatically and ask for user participation Publish final version with NLM-DTD markup Publish final version with NLM-DTD markup Deposit all structures on ChemSpider under embargo and wait for article DOI to release Deposit all structures on ChemSpider under embargo and wait for article DOI to release

Building a Structure Centric Community for Chemists Challenges Computer software can generate chemical names better than the majority of chemists Computer software can generate chemical names better than the majority of chemists The majority of chemical names are generated by humans, and Incorrect – convert to the wrong structure or are ambiguous The majority of chemical names are generated by humans, and Incorrect – convert to the wrong structure or are ambiguous One name, Multiple Structures One name, Multiple Structures

Building a Structure Centric Community for Chemists Names and Structures Dichloroacetone Dichloroacetone Trichloromethylsilane Trichloromethylsilane

Building a Structure Centric Community for Chemists Ambiguity

Ambiguity in Abbreviations - DPA

Building a Structure Centric Community for Chemists Ambiguity in Abbreviations - THF

Building a Structure Centric Community for Chemists Import is Easy Make articles Public/Private (embargo date soon) Make articles Public/Private (embargo date soon) Auto-markup and check by user Auto-markup and check by user

Building a Structure Centric Community for Chemists IUPAC PAC Articles

Building a Structure Centric Community for Chemists Supports Word.DOC, HTML, RTF

Building a Structure Centric Community for Chemists Drexel University Documents

Building a Structure Centric Community for Chemists Drexel University Documents

Building a Structure Centric Community for Chemists Drexel University Documents

Building a Structure Centric Community for Chemists Patents

Single Configuration File defines entities for markup Single Configuration File defines entities for markup Algorithms can be built for certain entities but the majority are dictionaries – vendors, Phys Properties, Analytical Algorithms can be built for certain entities but the majority are dictionaries – vendors, Phys Properties, Analytical We can extend our system to support your needs based on dictionaries – what does NPG need/not need? We can extend our system to support your needs based on dictionaries – what does NPG need/not need?

Building a Structure Centric Community for Chemists Nature Publications

Building a Structure Centric Community for Chemists Entity Balloons Structures are the language of chemistry Structures are the language of chemistry Show structures to chemists and search/link from there Show structures to chemists and search/link from there

Building a Structure Centric Community for Chemists Other Dictionaries - Species Other Dictionaries - Species We are considering We are considering Bacteria Bacteria Fungi Fungi Enzymes Enzymes Viruses Viruses PDB codes…. PDB codes….

Building a Structure Centric Community for Chemists Integrations Out to Other Sources

Building a Structure Centric Community for Chemists Integrations Out to Other Sources

Building a Structure Centric Community for Chemists Reactions

Manual Curation is Always Necessary

Building a Structure Centric Community for Chemists Text-Indexing and ChemSpider? ChemSpider text-indexes almost 500,000 Open Access and Free Access articles ChemSpider text-indexes almost 500,000 Open Access and Free Access articles Collection is growing and more publishers have already agreed. Including theses in the future. Collection is growing and more publishers have already agreed. Including theses in the future.

Building a Structure Centric Community for Chemists Open Access Literature Search

Building a Structure Centric Community for Chemists Conclusions The quality of structure-based data online should always be questioned – that includes ChemSpider The quality of structure-based data online should always be questioned – that includes ChemSpider Data on ChemSpider are being added and curated on a daily basis but we need more eyeballs helping always Data on ChemSpider are being added and curated on a daily basis but we need more eyeballs helping always ChemSpider has a large validated structure-name dictionary ChemSpider has a large validated structure-name dictionary Chemical name extraction and document markup is very enabling Chemical name extraction and document markup is very enabling

Building a Structure Centric Community for Chemists Oops…