Dealing with the complex challenge of managing diverse chemistry data online Antony Williams, Valery Tkachenko, Alexey Pshenichnov and Ken Karapetyan.

Slides:



Advertisements
Similar presentations
S.J. Coles a*, J.G. Frey a, M.B. Hursthouse a, L. Carr b & C.J. Gutteridge b. a School of Chemistry, University of Southampton, UK.; b School of Electronics.
Advertisements

Supporting Engagement in Open Access: a Publishers Perspective
THE GLOBAL CHEMISTRY NETWORK David James Executive Director, Strategic Innovation Jim Iley Executive Director, Science and Education 3 rd September 2013.
UK National Chemical Database Service: An integration of commercial and public chemistry services to support chemists in the United Kingdom Antony Williams,
Royal Society of Chemistry developments to support open drug discovery Antony Williams, Ken Karapetyan, Valery Tkachenko, Colin Batchelor Alexey Pshenichnov.
Because good research needs good data Research Data Management for Researchers University of Aberdeen 7 th October 2014 Jonathan Rans Digital Curation.
Why you need this App Sean Ekins 1, Alex M. Clark 2 1 Collaborations in Chemistry, 5616 Hilltop Needmore Road, Fuquay Varina, NC 27526, U.S.A. 2 Molecular.
Making sense of doi: /01/503C303E9B551 Digital Object Identifiers DOIs.
Data, data standards and sharing Dr Daniel Swan Bioinformatics Support Unit
How community crowdsourcing and social networking is helping to build a quality online resource for chemists.
August 14, 2015 Research data management – an introduction Slides provided by the DaMaRO Project, University of Oxford Research Services.
Crowdsourced Curation of Chemistry Data. How Bad is Online Chemistry Data? Antony Williams Wolfram Summit, September 2010.
Crowdsourcing Chemistry for the Community – 5 Years of Experiences Antony Williams NFAIS, February 28 th 2012.
The Value of a Unique Researcher Identifier to ChemSpider Projects Antony Williams ORCID Meeting, Boston, May 18 th 2011.
Mendeley Institutional Edition Hazman Aziz, eProduct Manager (APAC) University Kebangsaan Malaysia.
ChemSpider – A Crowdsourcing Environment for Hosting and Validating Chemistry Resources (and lessons from President Bush) Antony Williams 5th Meeting on.
Royal Society of Chemistry activities to develop a data repository for chemistry-specific data Aileen Day, Alexey Pshenichnov, Ken Karapetyan, Colin Batchelor,
Chemical Database Projects Delivered by RSC eScience at the FDA Meeting “Development of a Freely Distributable Data System for the Registration of Substances”
ChemSpider – A Combination Platform of Free Chemistry Database, Free Prediction Engines and Crowdsourcing Environment Antony Williams University of Oregon,
Software Sustainability Institute Dealing with software: the research data issues 26 August.
Big Data Supporting Drug Discovery Cautionary Tales from the World of Chemistry for Translational Informatics Valery Tkachenko RSC-CSIR/OSDD meeting Pune,
Chemical health and safety data online – data consistency Antony Williams iRAMP Meeting, Ithaca, Feb 2014.
October 24, 2015 Research data management – a brief introduction Slides provided by the DaMaRO Project, University of Oxford Research Services.
Marrying ACD/Labs technologies to eScience Projects at the Royal Society of Chemistry Antony Williams ACD/Labs User Meeting June 2013.
The Benefits of Participation in the Social Web of Science Antony Williams Research Square October 30 th 2014.
June 3, 2016 Research data management – an introduction Slides provided by the DaMaRO Project, University of Oxford Research Services.
Now launched! Visit nature.com/scientificdata Honorary Academic Editor Susanna-Assunta Sansone Advisory.
Choosing Between Data Sharing Repositories for Engineering Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch.
Delivering an online service for validating and standardizing chemical structure files using the ChemSpider platform.
RSC Publishing Platform Amanda Sun
Vendor Session: ChemSpider, from Royal Society of Chemistry.
It’s the data that makes a paper Joerg Heber Executive Editor Nature Communications.
Data Citation Implementation Pilot Workshop
Guide to publishing OA at the RSC. How to apply for open access There are two main ways to apply for open access: Gold for Gold voucher Payment of an.
Research Data Management 26 th April 2016 Federica Fina, Data Scientist, University of St Andrews Library.
Clustering the Royal Society of Chemistry chemical repository to enable enhanced navigation across millions of chemicals Valery Tkachenko, Ken Karapetyan,
A Chemistry Data Repository to Serve Them All Antony Williams.
Today’s Agenda 1. Sign in please 2. Brief Check in 3.Presentation of Learning Management Systems (LMS) - 4.Comparison of 2 LMSs 5.Sign in to
Structure verification and elucidation using the ChemSpider database Antony J Williams, Valery Tkachenko and Alexey Pshenichnov SERMACS, November 16 th.
Measuring Your Research Impact Citation and Altmetrics Tools University Libraries Search Savvy Seminar Series April 9 & 10, 2014 Prof. Amanda Izenstark.
NRF Open Access Statement
The CompTox Chemistry Dashboard: an informational data hub at the
Open Research Data and Open Access publications: How do they sit in the Web of Science? Guillaume Rivalle, Manager, Europe solution specialists
Overview Blogs and wikis are two Web 2.0 tools that allow users to publish content online Blogs function as online journals Wikis are collections of searchable,
Applying Royal Society of Chemistry Cheminformatics Skills to Support the PharmaSea Project Antony Williams, Alexey Pshenichnov, Valery Tkachenko, Ken.
Experiences in Hosting Big Chemistry Data Collections for the Community Antony Williams July 30th 2014, NIST.
The importance of being Connected
ORCID ID: Driving needs for analytical data exchange standards and the potential impacts on the chemical sciences Antony Williams.
ACS 2016 Moving research forward with persistent identifiers
RSC电子平台使用介绍 联系人:孙燕 Tel:
Five years of helping chemists to create an online presence using freely available resources Antony Williams National.
ASEAN PATENTSCOPE Service
Who knew I would get here from there: How I became the ChemConnector
Linking persistent identifiers at the British Library
Beyond the paper resume and how to develop an online profile as a scientist Antony Williams.
Jay Bhatt Drexel University Libraries
CNI Spring 2010 Membership Meeting
SMART GROUND platform overview
Overview of open resources to support automated structure verification
Open Access to your Research Papers and Data
Mobilizing EPA’s CompTox Chemistry Dashboard Data on Mobile Devices
Introduction to Research Data Management
NIH Public Access Policy
for Engineering & Physical Sciences Postgraduate Researchers
Research Data Management
Journal Usage Statistics Portal (JUSP): a simpler way to measure use and impact
Measuring Your Research Impact
Bird of Feather Session
Data + Research Elements What Publishers Can Do (and Are Doing) to Facilitate Data Integration and Attribution David Parsons – Lawrence, KS, 13th February.
Incorporating Scientific Practices into the BBNJ ILBI
Presentation transcript:

Dealing with the complex challenge of managing diverse chemistry data online Antony Williams, Valery Tkachenko, Alexey Pshenichnov and Ken Karapetyan ACS San Francisco August 2014

CAS Counter http://www.cas.org/content/counter

About Me…as a Chemist I’ve performed a few dozen chemical syntheses I’ve run thousands of analytical spectra I’ve generated thousands of NMR assignments I’ve probably published <5% of all work Most of it has been lost But things can be different today…. But it still needs to be associated with me…

Think about chemistry a mo’ If we imagine that permission exists… (i.e. forget IP, chemical and pharma companies etc…think students…) How many syntheses are performed How many spectra are run How many properties are measured How many compounds are made How many, how much, how big??..... Let’s go manage it all!!

Consider a shift to Openness

Open Access funder mandates… Times have changed… Open Access funder mandates…

Publishers are responding

The world of Open Data is here

Open Data are everywhere Is Openness and Social Sharing changing the world? The cultural experiments in Open Data and exchange are almost daily Mobile platforms enhance participation And then what of Chemistry Data???

An Experiment - ChemSpider ChemSpider allowed the community to participate in linking the internet of chemistry & crowdsourcing of data Successful experiment in terms of building a central hub for integrated web search More people are “users” than “contributors” Yet basic feedback and game-play helps

An Experiment - CSSP

An EPSRC Call “…the identification of the need for a UK national service for the provision of a searchable, electronic chemical database for the UK academic research community.”

National Chemical Database Service

We set a vision… Manage “all” of the chemistry data associated with chemical substances – PUBLISHED and UNPUBLISHED Based on user selected licensing the data to be downloadable, reusable, interactive Build a platform that enables the scientist Data storage, validation, standardization and curation Collaborative data sharing Provide data platform that can enable and enhance publishing of scientific papers

Data Repository Registration of chemical compounds Deposition of chemical syntheses Addition of analytical data Integration to electronic notebooks Rewards and recognition for data sharing Document processing Hosting of data as private, embargoed or public

Development of Data Repository Data repository should not just be a data dump – should not be a “big disk” Searchable, integrated, segregated repository of data types Data access including private, shared embargoed and public Delivery of derived models from data

New Repository Architecture doi: 10.1007/s10822-014-9784-5

New Repository Architecture

Input data pipeline

Compounds

Reactions

Analytical data

Crystallography data

For Deposition of Data Quality of data at source ensuring chemicals are correct - VALIDATION reactions map and balance as appropriate – VALIDATION and STANDARDIZATION file format handling for analytical data types – binary file formats are proprietary - STANDARDIZATION valid interpretation of data – VALIDATION and ANNOTATION

Input data pipeline

Depositions Gateway User Interface

Deposition of Data

Validate and Standardize

CVSP Filtering

CVSP Filtering of DrugBank

ChEMBL (1.3 million records) 11,020 records with 4 bonds and zero charge, e.g. CHEMBL501101 or CHEMBL501973 271 records with hypervalent oxygen (e.g. , CHEMBL2219679), carbon (e.g. 1005895), boron, chlorine, iodine or phosphine 6,177 records where direction of bond makes no sense, e.g. CHEMBL12760 and CHEMBL34704

Depositions User Interface

The challenges of analytical data Vendors produce complex proprietary data formats and standard formats are required (JCAMP, NetCDF, AniML) ChemSpider already hosts thousands of JCAMP spectra Support of “assigned spectra” in place Data validation approaches understood There are a myriad of analytical data types…

ChemSpider ID 24528095 H1 NMR

ChemSpider ID 24528095 C13 NMR

ChemSpider ID 24528095 HHCOSY

ChemSpider ID 24528095 HSQC

ChemSpider ID 24528095 HMBC

Managing Assignments?

Depositions User Interface

Depositions from ELNs Development work integrating chemistry into the Southampton Labtrove notebook Stoichiometry table development Analytical data integration “ChemTrove” rolled out to a small test group in January

Document deposition/processing

Experimental data checker

User Interface Approach

User Interface Approach

Display Widgets

Work in Progress

Work in Progress

User Interface Approach

A Compounds Repository Interface

A Reactions/Document Interface

The PharmaSea Website

The Open PHACTS community ecosystem

Open Source Drug Discovery India

What can drive participation? What can drive scientists to participate and contribute? Ensuring provenance of their data for reuse Mandates from funding agencies Improved systems to ease contribution Additional contributions to science Improved publishing processes Recognition for contributions

Scientists are Increasingly Quantified…

AltMetrics as Scientist Impact

AltMetrics

Detailed Usage Statistics

Rewards and Recognition The First Step badge is awarded when a user submits (& has published) their 1st CSSP article. Congratulations! Your 1st CSSP article has been published. Philosopher Lao Tzu said “A journey of a thousand miles begins with a single step”. In the same way we hope that this will be the first of many submissions that you make to CSSP.

http://orcid.org/0000-0002-2668-4821

AltMetrics Feeds For our data repository ensure contribution of data will feed out to the AltMetrics platforms Every data point, every data download, use and reuse will be associated with the scientist Data will be DOI’ed (presently under review) Services provided will allow for AltMetrics use

What do we have in place? We are testing an early form of the data repository on our data – ChemSpider and our archive of publications Working with collaborators to define needs Testing and enhancing deposition systems Chemical validation & standardization platform Analytical data handling formats And lots in development…

The Challenges Ahead Chemistry is NOT just nicely defined structures! Materials, minerals, attached to beads, polymers, ambiguous materials Domain-specific measurements File format standards are limited in application Encouraging scientists to free up their data AltMetrics, open data mandates, systems The data explosion continues

But it’s not easy of course Not everything we would like around data handling is there for sure Many systems, tools, platforms are already available but we don’t know about them or even if we did contributing us “more work” “What’s in it for me?”, “It’s my data”, “It’s too much work”, “What credit do I get?”

And yes…we know…

Thank you Email: williamsa@rsc.org ORCID: 0000-0002-2668-4821 Twitter: @ChemConnector Personal Blog: www.chemconnector.com SLIDES: www.slideshare.net/AntonyWilliams 79