ChemBank Building a Public Web Resource Using Daycart Erik Brauner Head of Chemical and Biological Computing Harvard Institute of Chemistry and Cell Biology.

Slides:



Advertisements
Similar presentations
Instant JChem INFORMATICS MATTERS
Advertisements

1 of of 18 Introduction 3 of 18 Motivation Virtual synthesis for the assessment of practical chemical hypothesis's. What is the set of all molecules.
Mats Dahlberg Research Informatics iNovacia AB, Sweden ChemAxon UGM, Budapest June BeeHive a datamining tool at Biovitrum and iNovacia.
Leveraging ChemAxon Cheminformatics in an Integrated Drug Discovery and Development Platform Zhenbin Li, Paul Starbard, Jim Gregory, Donald Chen, Paul.
19 May 2005Copyright © 2005 – Kelaroo, Inc. Kelaroo Applications & ChemAxon Components: Reagent Management Robert D. Feinstein, Ph.D. Kelaroo, Inc. –
DeltaSofts ChemCart Next Generation Access to Research Data ChemAxon User Group Meeting Budapest, Hungary June 13-14, 2007.
PUBLIC ChemAxon European UGM Building an Electronic Research Habitat at ETC Peter Condron.
1 Real World Chemistry Virtual discovery for the real world Joe Mernagh 19 May 2005.
Distributed Drug Discovery William L. Scott Department of Chemistry and Chemical Biology Indiana University Purdue University Indianapolis, Indianapolis.
Migrating to DayCart Introduction Thor database concepts. Data (Chemical Structure) hierarchy. Thor data model Daylight/Oracle cartridge data model.
EBI is an Outstation of the European Molecular Biology Laboratory. PDBeChem The Ligand Database.
Integrated Compound Management using Daylight TM, Java TM and Oracle ® The GNF Compound Management System Project Elena Rodriguez, GNF Steven Wilkens,
Integration of Protein Family, Function, Structure Rich Links to >90 Databases Value-Added Reports for UniProtKB Proteins iProClass Protein Knowledgebase.
AND TO MAKE A DECISION ON WHICH EXPERIMENT TO DO, YOU WANT TO ORGANIZE YOUR CONTENT, NORMALIZE AND COMPARE, TO UNDER- STAND WHICH COMPOUND INTERACTS WITH.
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
NATIONAL LIBRARY OF MEDICINE The PubMed ID and Entrez, PubMed and PubMed Central Edwin Sequeira National Center for Biotechnology Information June 21,
Quick-and-dirty.  Commands end in a semi-colon ◦ If you forget, another prompt line shows up  Either continue the command or…  End it with a semi-colon.
Jeffery Loo NLM Associate Fellow ’03 – ’05 chemicalinformaticsforlibraries.
1 BrainWave Biosolutions Limited Accelerating Life Science Research through Technology.
A Virtual File System for the PubChem Chemical Structure and Bioassay Database Wolf-D. Ihlenfeldt Wolf-D. Ihlenfeldt Xemistry GmbH Königstein, Germany.
Objective 5.01: Understand database tables used in business Database Fundamentals.
Combinatorial Chemistry and Library Design
1 InstantJChem: a flexible chemical database system G. Marcou, D. Horvath + Laboratoire d’infochimie, Université de Strasbourg, 1, rue Blaise Pascal,
 SQL stands for Structured Query Language.  SQL lets you access and manipulate databases.  SQL is an ANSI (American National Standards Institute) standard.
June 29, 2013 WHY WAIT? CONVERTING TO RICHARD GUAJARDO University of Houston Libraries.
VAMOS Visualization of Accessible Molecular Space A new compound filtering and selection interface Spotfire User Conference - Europe - May , 2003.
Selene Bainum RiteTech LLC.  Doing ColdFusion & SQL development for more than 1/3 of my lifetime  Chief RiteTech  RiteTech is my company.
Simple Database.
Hive : A Petabyte Scale Data Warehouse Using Hadoop
F INDINGS National Institutes of Health National Institute of General Medical Sciences The Humpty Dumpty Dilemma Chemical Biologist Neil Kelleher: Measuring.
Curation Editor Flexible web based editor for non gene model data. FlyBase – Harvard University Frank Smutniak.
Project Overview Bibliographic merging, Endeca, and Web application.
Microsoft Access Lesson 1 Lexington Technology Center February 11, 2003 Bob Herring On the Web at
CZ3253: Computer Aided Drug design Lecture 3: Drug and Cheminformatics Databases Prof. Chen Yu Zong Tel:
Beyond Search Engines: Advanced Web Searching Subject Directories  Librarians’ Index to the Internet  Infomine Finding Databases on a Subject  The Invisible.
Searching the Chemical Literature: Reference Books and Online Resources Dr. Sheppard Chemistry 4401L.
Custom Spotfire Applications for use in Drug Discovery Chris Louer Team Leader, Cheminformatics © 2001, GlaxoSmithKline, Inc. - All Rights Reserved.
ChemModLab: A Web-based Cheminformatics Modeling Laboratory S. Stanley Young + ECCR and ChemSpider Teams.
1 PowerMV Chemical Data Mining Environment S. Stanley Young Jun Feng and Jack Liu NISS MPDM, McMaster University 4 June 2005.
Use of Machine Learning in Chemoinformatics Irene Kouskoumvekaki Associate Professor December 12th, 2012 Biological Sequence Analysis course.
ChEMBL– Open Access Database For Drug Discovery By – Udghosh Singh M.S.(Pharm), 3 rd Sem Pharmacoinformatics.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
Software Architecture to Support The Compound Discovery Process Mike Richards Senomyx, Inc. MUG 2003.
Copyright OpenHelix. No use or reproduction without express written consent1.
A radiologist analyzes an X-ray image, and writes his observations on papers  Image Tagging improves the quality, consistency.  Usefulness of the data.
NCBI Genome Workbench Chuong Huynh NIH/NLM/NCBI Sao Paulo, Brasil July 15, 2004 Slides from Michael Dicuccio’s Genome Workbench.
Brenda Poulter International Applications Specialist Thailand November 2004.
An overview of Bioinformatics. Cell and Central Dogma.
Affymetrix microarray analysis by using Cmap By NFU Biology Algorithm lab.
Mr. Justin “JET” Turner CSCI 3000 – Fall 2015 CRN Section A – TR 9:30-10:45 CRN – Section B – TR 5:30-6:45.
DAY 9: DATABASES Rohit September 21,
ARGOS (A Replicable Genome InfOrmation System) for FlyBase and wFleaBase Don Gilbert, Hardik Sheth, Vasanth Singan { gilbertd, hsheth, vsingan
Microsoft Access Database Creation and Management.
Windows 7 WampServer 2.1 MySQL PHP 5.3 Script Apache Server User Record or Select Media Upload to Internet Return URL Forward URL Create.
GeWorkbench Overview Support Team Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT and Harvard.
Use of Machine Learning in Chemoinformatics
Gene Technologies and Human ApplicationsSection 3 Section 3: Gene Technologies in Detail Preview Bellringer Key Ideas Basic Tools for Genetic Manipulation.
INFORMATION TECHNOLOGY DATABASE MANAGEMENT. A database is a collection of information organized to provide efficient retrieval. The collected information.
RDF based on Integration of Pathway Database and Gene Ontology SNU OOPSLA LAB DongHyuk Im.
Introduction to Business Information Systems by Mark Huber, Craig Piercy, Patrick McKeown, and James Norrie Tech Guide D: The Details of SQL, Data Modelling,
Reaxys – The Highlights. Slide 2 What is Reaxys? A brand new workflow solution for research chemists and scientists from related disciplines An extensive.
OncoTrack Bioinformatics Workshop Max Planck Institute for Molecular Genetics, Berlin Wednesday 6 th November 2013 TimeSubject 13:30-15:00 Introduction.
Cheminformatics and Metabolism Team The EBI Enzyme Portal.
APPLICATIONS OF BIOINFORMATICS IN DRUG DISCOVERY
Database Fundamentals
What is Bioinformatics?
Lesson 3 Bioinformatics Laboratory
Consortium: National networks in 16 European countries.
Consortium: National networks in 16 European countries.
Presentation transcript:

ChemBank Building a Public Web Resource Using Daycart Erik Brauner Head of Chemical and Biological Computing Harvard Institute of Chemistry and Cell Biology Eli and Edith L. Broad Institute June 10, 2004

The Institute of Chemistry and Cell Biology (ICCB) The ICCB is an academic small molecule screening facility located at Harvard Medical School with the goals of: –Enabling academic labs to perform high throughput chemical screens –Creating small molecule libraries for screening –Advancing the field of Chemical Genetics –Creating a public database: ChemBank

The Dual Roles of ChemBank: To handle internal needs –Support tools for high throughput screeners –Support for chemists and library synthesis As a public web resource –Freely available assay data and information on compounds relevant to chemical genetics Had to satisfy the needs of both chemists and biologists.

ChemBank Status Publicly available at: chembank.med.harvard.edu >900,000 structures in the database >5,000 known bioactives with annotation in the database Selected assay data also available Supports similarity queries, and substructure searching.

Structure Sources Commercial libraries Outside databases Curated structures DOS Libraries Virtual library files with undecoded structures (no plate mapping), decoding tag pattern, building blocks, reagents

SD File Handling SD file TDT file XML file mol2tdt.sh (contrib code) custom PERL script This becomes the official load record

A Simple Structure Table create table COMPOUND ( id number not null, smiles varchar2(4000), molecular_weight float, molecular formula varchar2(20), primary key (id) ); create index COMPOUND_idx1 on COMPOUND(smiles) indextype is c$dcischem.ddexact; create index COMPOUND_idx2 on COMPOUND(smiles) indextype is c$dcischem.ddblob; needed for exact smiles matching needed for fingerprints to support similarity

Basic Manipulations Inserting Compounds: insert into COMPOUND (id, smiles, molecular_weight, molecular_formula) values ( 1, smi2cansmi(CCC, 1), smi2amw(CCC), smi2mf(CCC)) Similarity Search: select id, tanimoto(smiles, O=C=O) as similarity from COMPOUND where tanimoto(smiles, O=C=O) >= 0.8 order by similarity desc; Substructure Search: select id from COMPOUND where contains(smiles, O=C=O) = 1

Structure Loading Are These Structures of the Same Compound? =?

Salt Stripping In Daycart Daycart supports salt stripping via the function vcs_desalt(smiles, iso, class) which works in conjunction with a built in salt table. ex: insert into salt values (‘Sodium’, ‘[Na+]’, 0, NULL) VCS_DESALT(‘[Na+].c1ccccc1’, 0, 0)

Normalization in Daycart Nitro and azide normalization can be achieved easily using reaction smirks. Ex: [*:1][N:2](=[O:3])=[O:4]>>[*:1][N+:2](=O:3)[O-:4]

Normalization In Daycart Daycart supports normalization via the function vcs_normalize(smiles, iso, class) which works in conjunction with a built in transform table. ex: insert into transform values (‘Nitros’, ‘ [*:1][N:2](=[O:3])=[O:4]>>[*:1][N+:2](=O:3)[O-:4] ’, ‘FORWARD’, 0, NULL) VCS_NORMALIZE(‘CCN(=O)(=O)’, 0, 0)

Acknowledgements ICCB Informatics Group: –Jeremy Muhlich –Jason McIntosh –Carol Chang –Andrew Lach –Justin Klekota ICCB Chemistry Group: –John Tallarico –Jared Shaw ICCB Screening Group: –Caroline Shamu –Nicky Tolliday National Cancer Institute Tudor Oprea Daylight The University of New Mexico SCHOOL OF MEDICINE