Royal Society of Chemistry activities to develop a data repository for chemistry-specific data Aileen Day, Alexey Pshenichnov, Ken Karapetyan, Colin Batchelor,

Slides:



Advertisements
Similar presentations
EPrints - Introducing EPrints 3 Software William J Nixon Digital Library Development Manager, University of Glasgow With many thanks to Les Carr and the.
Advertisements

© S.J. Coles 2006 Usability WS, NeSC Jan 06 Enabling the reusability of scientific data: Experiences with designing an open access infrastructure for sharing.
EBankII Workshop 1 Making Scientific Data Openly Available Simon Coles School of Chemistry, University of Southampton.
Supporting Engagement in Open Access: a Publishers Perspective
THE GLOBAL CHEMISTRY NETWORK David James Executive Director, Strategic Innovation Jim Iley Executive Director, Science and Education 3 rd September 2013.
UK National Chemical Database Service: An integration of commercial and public chemistry services to support chemists in the United Kingdom Antony Williams,
The Data Lifecycle and the Curation of Laboratory Experimental Data Tony Hey Corporate VP for Technical Computing Microsoft Corporation.
Royal Society of Chemistry developments to support open drug discovery Antony Williams, Ken Karapetyan, Valery Tkachenko, Colin Batchelor Alexey Pshenichnov.
Because good research needs good data Research Data Management for Researchers University of Aberdeen 7 th October 2014 Jonathan Rans Digital Curation.
Open Notebook Science using Blogs and Wikis Jean-Claude Bradley E-Learning Coordinator College of Arts and Sciences Drexel University March 27, 2007 American.
Click to edit Master subtitle style JISC XYZ Project Principal Investigator: Peter Murray-Rust Project Team: Nick England, Brian Brooks Unilever Centre,
EPrints Workshop, January eBank UK: Dissemination of research data using EPrints Simon Coles, School of Chemistry, University of Southampton.
BLOGGING MEETS COMPUTATIONAL CHEMISTRY Dr Kieron Taylor University of Southampton*
CBioC: Massive Collaborative Curation of Biomedical Literature Future Directions.
Experiences with Repositories and Blogs in Laboratories or ‘R4L: The Repository for the Laboratory’ Leslie Carr, Simon Coles & Jeremy.
Carl Lagoze, Cornell University Prasenjit Mitra, William Brouwer (Penn State University) Mark Borkum (University of Southampton)
How community crowdsourcing and social networking is helping to build a quality online resource for chemists.
Document Management Systems For Human Resource Department Infocrew Solutions Pvt.Ltd.
By : Bridget Kargbo. /watch?v=XyjY8ZLzZrw &feature=player_embedd ed /watch?v=XyjY8ZLzZrw &feature=player_embedd.
Crowdsourcing Chemistry for the Community – 5 Years of Experiences Antony Williams NFAIS, February 28 th 2012.
1 Uploading and Publishing New Tools Michael McLennan Software Architect HUBzero™ Platform for Scientific Collaboration This work licensed under Creative.
The Value of a Unique Researcher Identifier to ChemSpider Projects Antony Williams ORCID Meeting, Boston, May 18 th 2011.
Application Software.
Collaborative Approach to Open Access: Experience from Bioline International Leslie Chan Associate Director Bioline International University of Toronto.
Thomson Scientific October 2006 ISI Web of Knowledge Autumn updates.
ChemSpider – A Combination Platform of Free Chemistry Database, Free Prediction Engines and Crowdsourcing Environment Antony Williams University of Oregon,
Software Sustainability Institute Dealing with software: the research data issues 26 August.
User’s guide. Compare features:EndNote WebEndNote Save references++ Organize & edit references++ Storage capacity (number of references)10,000unlimited.
J-STAGE, NOW NEXT STAGE large scale scholarly e-journal platform of Japan.
Big Data Supporting Drug Discovery Cautionary Tales from the World of Chemistry for Translational Informatics Valery Tkachenko RSC-CSIR/OSDD meeting Pune,
EBank UK: linking scientific data, scholarly communication and learning Michael Day and Rachel Heery UKOLN, University of Bath
Image Workflow Processes Elspeth Haston, Robert Cubey, Martin Pullan & David J Harris.
Chemical health and safety data online – data consistency Antony Williams iRAMP Meeting, Ithaca, Feb 2014.
Software Sustainability Institute Software Attribution can we improve the reusability and sustainability of scientific software?
Marrying ACD/Labs technologies to eScience Projects at the Royal Society of Chemistry Antony Williams ACD/Labs User Meeting June 2013.
The Benefits of Participation in the Social Web of Science Antony Williams Research Square October 30 th 2014.
Presented by Scientific Annotation Middleware Software infrastructure to support rich scientific records and the processes that produce them Jens Schwidder.
Presented by Jens Schwidder Tara D. Gibson James D. Myers Computing & Computational Sciences Directorate Oak Ridge National Laboratory Scientific Annotation.
Vendor Session: ChemSpider, from Royal Society of Chemistry.
Data enhancing the Royal Society of Chemistry publication archive Antony Williams, Colin Batchelor, Peter Corbett, Ken Karapetyan and Valery Tkachenko.
Use flash eBook software AnyFlip to make page flip book works seamlessly on mobile devices.
Taming the Big Data in Computational Chemistry #euroCRIS2015 Barcelona 9-11-XI-2015 Carles Bo ICIQ (BIST) -
CombeDay Making Data Openly Available Simon Coles.
Open Notebook Science using Blogs and Wikis: Implications for Chemical Education Jean-Claude Bradley E-Learning Coordinator College of Arts and Sciences.
Text2PTO: Modernizing Patent Application Filing A Proposal for Submitting Text Applications to the USPTO.
Research Data Management 26 th April 2016 Federica Fina, Data Scientist, University of St Andrews Library.
Clustering the Royal Society of Chemistry chemical repository to enable enhanced navigation across millions of chemicals Valery Tkachenko, Ken Karapetyan,
A Chemistry Data Repository to Serve Them All Antony Williams.
Structure verification and elucidation using the ChemSpider database Antony J Williams, Valery Tkachenko and Alexey Pshenichnov SERMACS, November 16 th.
Vision: Increase regional sharing and collaboration in order to expedite the delivery and adoption of energy efficiency. Conduit is brought to you by NEEA.
Reference Management Module I: Introduction By Rehema Chande-Mallya(PhD)
Power Point Mistakes Contrasting background and text Microsoft Office PowerPoint 2007 enables users to quickly create high-impact, dynamic presentations,
NRF Open Access Statement
Presenters: Charles Romain and Clare Bakewell
Open Exeter Project Team
Applying Royal Society of Chemistry Cheminformatics Skills to Support the PharmaSea Project Antony Williams, Alexey Pshenichnov, Valery Tkachenko, Ken.
Experiences in Hosting Big Chemistry Data Collections for the Community Antony Williams July 30th 2014, NIST.
Dealing with the complex challenge of managing diverse chemistry data online Antony Williams, Valery Tkachenko, Alexey Pshenichnov and Ken Karapetyan.
The importance of being Connected
ORCID ID: Driving needs for analytical data exchange standards and the potential impacts on the chemical sciences Antony Williams.
Five years of helping chemists to create an online presence using freely available resources Antony Williams National.
Who knew I would get here from there: How I became the ChemConnector
Overview of open resources to support automated structure verification
Open Access to your Research Papers and Data
Application Software EIT, © Author Gay Robertson, 2016.
…to the Spotlight From Oblivion… Open Access… Dawn Hibbert
Metadata The metadata contains
TOPIC: (insert here) INSERT STUDENT NAMES HERE.
…to the Spotlight From Oblivion… Open Access… Dawn Hibbert
USING CONFLUENCE AS YOUR CMS
Presentation transcript:

Royal Society of Chemistry activities to develop a data repository for chemistry-specific data Aileen Day, Alexey Pshenichnov, Ken Karapetyan, Colin Batchelor, Peter Corbett, Jon Steele, Valery Tkachenko and Antony Williams, ACS Dallas March 2014

Data in a Scientific Publication This is not new, you known the story… So much data of value contained within a publication and delivered in a PDF form PDF files, and especially unclear licensing, don’t allow me at the data so I can rework, reuse, repurpose, text mine etc. I specialize in XXXX. I want a database of YYYY extracted from publications and made available, for free, with capabilities I need, and the publishers should just do it

Many useful discussions…

Many good visions…

And over the years, progress… There is much progress with open access, data access, licensing, enhanced articles, open data, free online tools, open source codes, publishers waking up, scientists contributing We should be excited at what is available now, what the future holds, what opportunities exist in front of us

But it’s NOT easy..technology

But it’s not easy…US Not everything we would like around data handling is there for sure Many systems, tools, platforms are already available but we don’t know about them or even if we did contributing us “more work” “What’s in it for me?”, “It’s my data”, “It’s too much work”, “What credit do I get?”

An Initial “Vague” Vision Set Manage “all” of the chemistry data associated with chemical substances Data to be downloadable, reusable, interactive Build a platform that enables the scientist Data storage, validation, standardization and curation Collaborative data sharing Provide data platform that can enable and enhance publishing of scientific papers

Data Repository Registration of chemical compounds Deposition of chemical syntheses Addition of analytical data Integration to electronic notebooks Rewards and recognition for data sharing Document processing Hosting of data as private, embargoed or public

Solving for Authors

I hate text mining data DERA: Developing pipelining tools for text- mining so we will be able to process documents for mark-up Compound extraction/markup Reaction extraction/conversion Convert “text spectra” to generate spectral libraries… AGGHHHHH!

“Where is the real data please?” FIGURE DATA

Data Preferences - total bias Views of a spectroscopist Give me the data – interactive, downloadable spectrum is way more valuable to me (processed spectrum and FID available) Spectral header in JCAMP standard is very incomplete (and most spectral standards) I want ASSIGNED/ANNOTATED spectra if possible – don’t “textify” a spectrum!

Solving the problem here.. Binary file formats are problematic – think of the variations in instrumentation and software Standards can be defined – are they correctly implemented? CIF and its Checking, Spectral standards - JCAMP versions, Structure formats, etc… Metadata is crucial

…and what does it solve? “Fixing the data” – data can’t be faked as easily Reprocessing of analytical data can be done…weighting functions, baseline correction, deconvolution etc. I can convert and store it locally

But solve it for many things I want molecules as structure formats not images Please don’t make us hack tables of data Tell us how you generated your files – software version, software libraries, etc.

Input data pipeline

Depositions Gateway User Interface

Document processing

Input data pipeline

Depositions Gateway User Interface

User Interface Approach

Addition of Analytical Data Spectral Container is in development using componentized widgets for display NIST spectra converted into standardized JCAMP format for deposition - 296,103 spectra deposited 10% of remaining NIST spectra need to be curated as there are obvious structure issues

Electronic Notebook Data Development work integrating chemistry into the Southampton Labtrove notebook Stoichiometry table development Analytical data integration “ChemTrove” rolled out to a small test group in January

Present activities – ACS Fall Deposition process development of compounds, reactions and spectral data by Spring FTP, DropBox, Web-upload, ELN integration Compounds, Reactions, Spectral data search, display, download Data sharing – private, public, collaborative Metadata, metadata, metadata standards! Open Sourcing CRD and CVSP

Acknowledgments Jeremy Frey and Simon Coles, University of Southampton Will Dichtel and Leah McEwan, Cornell University Stuart Chalk, University of North Florida Bob Hanson and Bob Lancashire, Jmol and JSpecView

Thank you ORCID: Twitter: Personal Blog: SLIDES: