Big Data Supporting Drug Discovery Cautionary Tales from the World of Chemistry for Translational Informatics Valery Tkachenko RSC-CSIR/OSDD meeting Pune, India February 3 rd 2014
Big Data Chemical Space Drug Discovery pipeline Machine learning Training sets RSC/ChemSpider platforms RSC/Archive Research data management Data quality, crowdsourcing and AltMetrics Building Global Chemistry Network
Science map
Big Data Chemical Space Drug Discovery pipeline Machine learning Training sets RSC/ChemSpider platforms RSC/Archive Research data management Data quality, crowdsourcing and AltMetrics Building Global Chemistry Network
Chemical space
Navigation in chemical space
Big Data Chemical Space Drug Discovery pipeline Machine learning Training sets RSC/ChemSpider platforms RSC/Archive Research data management Data quality, crowdsourcing and AltMetrics Building Global Chemistry Network
Structure-based Drug Design
Ligand-based Drug Design
Big Data Chemical Space Drug Discovery pipeline Machine learning Training sets RSC/ChemSpider platforms RSC/Archive Research data management Data quality, crowdsourcing and AltMetrics Building Global Chemistry Network
Machine learning
Applied machine learning
Big Data Chemical Space Drug Discovery pipeline Machine learning Training sets RSC/ChemSpider platforms RSC/Archive Research data management Data quality, crowdsourcing and AltMetrics Building Global Chemistry Network
~30 million chemicals and growing Data sourced from >500 different sources Crowdsourced curation and annotation Ongoing deposition of data from our journals and our collaborators A structure centric hub for web-searching
ChemSpider
Properties - experimental
Properties - ACDLabs
Properties – EPI Suite
Properties - ChemAxon
Literature references
Patents references
Books
Classification
Chemical vendors and datasources
Multimedia
Big Data Chemical Space Drug Discovery pipeline Machine learning Training sets RSC/ChemSpider platforms RSC/Archive Research data management Data quality, crowdsourcing and AltMetrics Building Global Chemistry Network
ChemSpider Reactions
ChemSpider Spectra
ChemSpider Databases ChemSpider Compounds ChemSpider Reactions ChemSpider Spectra ChemSpider Crystals ChemSpider Materials ChemSpider Assays ChemSpider Algorithms
Research data inflow
Research data outflow
Big Data Chemical Space Drug Discovery pipeline Machine learning Training sets RSC/ChemSpider platforms RSC/Archive Research data management Data quality, crowdsourcing and AltMetrics Building Global Chemistry Network
RSC Archive – since 1841
DERA - Digitally Enabling RSC Archive
Semantic mark-up of articles
It is so difficult to navigate… What’s the structure? Are they in our file? What’s similar? What’s the target? Pharmacology data? Known Pathways? Working On Now? Connections to disease? Expressed in right cell type? Competitors? IP?
Data quality issue and CVSP –Robochemistry –Proliferation of errors in public and private databases –Automated quality control system
DrugBank dataset (6516 records) J. Brechner, IUPAC Graphical Representation of stereochem. configurations Section: ST DB06287
Big Data Chemical Space Drug Discovery pipeline Machine learning Training sets RSC/ChemSpider platforms RSC/Archive Research data management Data quality, crowdsourcing and AltMetrics Building Global Chemistry Network
Research data management
Big Data Chemical Space Drug Discovery pipeline Machine learning Training sets RSC/ChemSpider platforms RSC/Archive Research data management Data quality, crowdsourcing and AltMetrics Building Global Chemistry Network
Crowdsourcing
AltMetrics
RSC/Rewards and Recognition Congratulations! Your 1st CSSP article has been published. Philosopher Lao Tzu said “A journey of a thousand miles begins with a single step”. In the same way we hope that this will be the first of many submissions that you make to CSSP. The First Step badge is awarded when a user submits (& has published) their 1 st CSSP article.
Big Data Chemical Space Drug Discovery pipeline Machine learning Training sets RSC/ChemSpider platforms RSC/Archive Research data management Visualization and navigation Building Global Chemistry Network
Visualization
Visualization and navigation
Big Data Chemical Space Drug Discovery pipeline Machine learning Training sets RSC/ChemSpider platforms RSC/Archive Research data management Data quality, crowdsourcing and AltMetrics Building Global Chemistry Network
We are a part of a larger world
ChemSpider APIs
National Chemistry Database
Open PHACTS is an Innovative Medicines Initiative (IMI) project, aiming to reduce the barriers to drug discovery in industry, academia and for small businesses. Semantic web is one of the corner stones
OSDD
Thank you Slides: