Download presentation
Presentation is loading. Please wait.
1
3. Chemical Data and Data Bases
2
2 Datasets and Databases Many small datasets are available Several commercial databases of compounds and reactions (e.g. CAS) Large but not comprehensive public databases of compounds are just starting to become available As of today, there is no large public database of reactions
3
3 Data: Small Datasets (examples) Mutag (Mutagenicity) –200 compounds (125/63), mutagenicity in Salmonella PTC (Predictive Toxicity Challenge) –A few hundred compounds, carcinogenicity (FM,MM,FR,MR) NCI (Anti-cancer activity) –70,000 compounds screened for ability to inhibit growth in 60 human tumor cell lines Alkanes (Boiling points) –All 150 non-cyclic alkanes (C n H 2n+2 ) with n<11 and their boiling points ([-164,174]) Benzodiazepines (QSAR) –79 1,4-benzodiazepines-2-one, affinity towards GABA A Solubility (Delaney and XLogP) –1440 compounds (Delaney); 1991 compounds (XLogP)
4
4 Large Databases Private/ Commercial Example: ACS Chemical Registry (CAS) [~10sM] Expensive and cannot be “mined” Cambridge Structural DB (CSD) [crytallographic structures, ~350K] More recent trends Example: eMolecules (formerly Chmoogle) Free search engine but cannot be “mined”
5
5 CAS CHEMICAL REGISTRY
6
6 GROWTH of CAS CHEMICAL REGISTRY SYSTEM
7
7 Large “Public” Databases Zinc (UCSF) ChemBank (Harvard) PubChem (NIH) ChemDB (UCI) http://cdb.ics.uci.edu J. Chen, S. J. Swamidass, Y. Dou, J. Bruand, and P. Baldi ChemDB: A Public Database of Small Molecules and Related Chemoinformatics Resources. Bioinformatics, 21, 4133-4139, (2005)
8
8 Example of Large Public DB: ChemDB ~5M unique compounds Commercially available compounds PostgreSQL/Oracle Annotation (Experimental, Computational) Searchable Web interface Similarity, in silico reactions,…
9
9 Example of Statistics
10
10 Molecular Weight/Solubility
11
11
12
12
13
13
14
14
15
15
16
16 ChemDB RChemDB NM Experiments Filters RM
17
17 Chemo/Bio Informatics Two Key Ingredients 1. Data 2. Similarity Measures Bioinformatics analogy and differences: –Data (GenBank, Swissprot, PDB) –Similarity (BLAST)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.