1 InstantJChem: a flexible chemical database system G. Marcou, D. Horvath + Laboratoire d’infochimie, Université de Strasbourg, 1, rue Blaise Pascal, Strasbourg
Introduction The goal is to present InstantJChem for the storage and manipulation of chemical information 1.General presentation 2.Database search 3.Creation of a database from scratch
What is a database? A database stores data in an ordered form on a precise subject. A relational database stores information into tables which possess inter-references A relational database management system (RDBMS) is a software that manages relational databases InstantJChem is not a database and is not an RDBMS.
What is InstantJChem? InstantJChem is a friendly interface between a RDBMS, chemical information and the user. User RDBMS Chemical Information
Key concepts of InstantJChem ProjectsSchemaDatabases and TablesEntitiesData TreesViews
Exercise 1 Create a new project names IJCExercises…
Key concept: Project Project contains resources and connections to one or more databases. icon
Exercise 1 …and import the file SC100.SDF in it….
Key concept: Schema Schema/ Database Contains connection to a database and special tables (JChemProperties) icon
Key concept: Database and Tables Table Database and tables are managed by the RDBMS. Actually store information. icon
What can be stored TypeDescription Standard table IntegerLong integer: 232 = TextUser can specify widths of text fields as large as needed. RealReal double-precision DateAllows to store dates. BooleanValue is True or False List (Standard)To store a list of database items JChem table Chemical termsA list of functions evaluated on chemical structures: logD, pKa, tautomers,... StructureChemical structure, automatically created with a Jchem table
Key concept: Entities Entity An entity is a representation of data. icon It is a unique interface to conceptually different types of tables (Standard, Chemical, SQL, Extractions, etc).
Key concept: Data Trees Data Tree A collection of entities and views. icon Organize information using a hierarchy (parent- child relationship between entities).
Exercise 1 ….Customize a browser for it.
Key concept: Views Views An interface to data. icon For simple data, a spreadsheet view is relevant. For complex relational data, a form is mandatory.
Exercise 2 In the SC100 database, search for fluorobenzene and pyridine containing molecules. Use Substructure or Similarity search.
Exercise 2 In the SC100 database, search for fluorobenzene and pyridine containing molecules. Use Substructure or Similarity search. Substructure search: 20 hits Similarity search: 0 hits Substructure search: 14 hits Similarity search: 0 hits Similarity search uses Chemical Hashed Fingerprints defined at database creation.
Chemical Hashed Fingerprints (CHF) Pattern Length: number of bonds of a pattern Fingerprint Length: total number of bits to store the fingerprint Bits per pattern: number of bits a pattern shall set on Efficient annotation to accelerate structure search
Exercise 3 Combine molecule 25 and 89 into a pseudo-molecule to perform a superstructure query.
Exercise 4 Use compound 46 as a Full and Full fragment query to search the database. Repeat after removing the bromide from the query.
Structure Searches
Exercise 5 Search benzene containing compounds, which name contains “pyrimidin” and annotated as “Good” concerning their aqueous solubility.
Exercise 6 Search for compounds with at least one aromatic ring containing at least on Nitrogen atom
Exercise 7 Search for compounds which MolWeight > 200 and not containing a benzene ring
Exercise 8 Search for compounds with MolWeigh > 200, then for compounds without a benzene ring and search for the union of the hit lists.
Execrise 9 Search for compounds possessing more than 4 microspecies at pH=4.0….
Exercise 9 … Export your hit list.
Exercise 10 Import in your project the file ISICCRsm.RDF…
Exercise 10 … Create a Browser for this database
Exercise 11 Search for reactions including an imidazole ring into their reactants then into their products.
Exercise 12 Add to your Schema a new data tree and structure entity named AlkanBoilingPoint…
Exercise 12 … and add a floating point value field named BoilingPoint.
Exercise 13 Add to the AlkanBoilingPoint entity the following data.
Exercise 14 Add to the AlkanBoilingPoint entity a new date field named Date and fill it.
Exercise 15 Add to the AlkanBoilingPoint entity a calculated value of LogP using a Chemicalterm field.
Summary Create a project and schema Import data Search by substructure, superstructure, similarity, and exact match Search by keyword Combining queries and result lists Export query results Create a new database
Conclusion InstantJChem is a Chemoinformatics layer above a standard SGDB. Provides many more Chemoinformatics services (databases overlap, QSPR modeling, plots, enumeration, scripting) SGDB InstantJChem