Prediction of NMR Chemical Shifts. A Chemometrical Approach К.А. Blinov, Y.D. Smurnyy, Т.S. Churanova, М.Е. Elyashberg Advanced Chemistry Development (ACD)

Structure and its spectral data StructureSpectra

Sometimes solution is not obvious In many cases we obtain several structures corresponding to spectral data. In this case we need a method to rank the structures. Most powerful method - compare experimental and predicted 13 C NMR spectra

13 C NMR spectral data 2,00 9.62 Experimental Predicted

How to find the best structure? In most cases predicted spectrum of “correct structure” has best fit to experimental spectrum In practice “correct structure” has average deviation between predicted and experimental spectra 2-3 ppm

The role of the spectra prediction Real-world task. Unknown structure with MF C 29 H 32 N 2 O 5 and spectral data (1D and 2D NMR). 20 min to generate all structures (> 12 000) 24 hours to predict the NMR 13 С spectra of all the obtained structures Speed of spectra prediction should be increased

Methods of the prediction of NMR spectra Quantum Mechanics Database approach –HOSE Codes –Maximum Common Substructure Rule-based –Additive scheme –Neural Networks – extremely slow – accurate but slow – fast but inaccurate Our choice – improve accuracy of fast method

Additive scheme  a i x i  = 153.71-1.85-4.49-1.39-2.79+1.43+0.52+0.52-1.35= 144.31 153.71 -1.85 -4.49 -1.39 -2.79 1.43 0.52 -1.35 144.31 Main problem – find correct values of atom increments

Available data We have database of 1.5 millions of chemical shifts for 13 С. We can try to obtain correct values!

How to encode atom environment CH 2 Atom’s type Number of atoms … 1 1 CH Input variables … C 1 1 st sphere CH 2 CH 3 O 21 1 2 nd sphere

Data for PLS regression Atom environment encoding Samples Chemical shifts XY

Find best structure encoding Initially best scheme of structure representation does not evident We should find scheme which has best accuracy We should optimize –substitutents coding scheme –number of used “spheres”

Used data 210 K of chemical shifts used as a training set. 170 K of chemical shifts from recent literature used as external validation set.

How to describe atom type Atom type (C, O, etc.). Hybridization (sp 3, sp 2, etc). Valence Number of neighbor H. Charge Distance to “central” atom (bonds) “Central” atom “Substitutent” 7 (N) 1 (sp 3 ) 3 2 0 3

Result for different atom encoding

Result for number of spheres

Is it the best possible accuracy? Best possible average deviation is 3.5 ppm. We need less than 3 ppm (2 is preferable). Should we use additional variables? We should be very careful adding variables.

141,48 125,90 138,30 125,38 Substitutents interference (cross effect) +2,48 122,90 134,16  +1.34  -1.94  -3.94 145.42127.86136.64 +11,26

Enhanced structure encoding CH 2 and CH Atom pair type Number of pairs … 1 Input variables … 1 AtomsPairs of atoms (Crosses) C and O

Result for atom pairs (crosses) Distance between atoms within a cross Number of spheres Mean error, ppm

More enhancements? Now accuracy is good enough (2.3 ppm) But it is still bad in some cases Unfortunately these cases are very important This “special” cases should be taken into account

Stereo effects: double bonds 25.7 17.6 3,9 A 2,9 A We use “topological” distance Sometimes equal topological distance correspond to different “real” distances

Modified structure encoding AtomsPairs of atoms (Crosses) “Stereo” effects Variables

Prediction of spectra by different methods (mean error, ppm) Taken into the account All types of atoms CH 3 =C Atoms only3,521,558,03 + pairs of atoms (crosses) 2,321,503,22 + “stereo” effects2,271,243,22 + solvent2,251,243,20 + to be continued?

Size of training set We have 1.5 millions of chemical shifts We should try to use all available data Only one problem – matrix size In many cases matrix size becomes more than 2 GB

Bigger dataset – smaller mean error!

The final results Method Average deviation The rate of calculation shifts/sec. Old Method - HOSE Codes 1.876 New Additive scheme 1.835800 Faster by 3 order!

Prediction time: the past and present MethodAverage deviationTime HOSE Codes1.72> 24 hours Additive scheme1.632 min. C 29 H 32 N 2 O 5

Conclusions Combination of “new” method with old well-known algorithm can produce very good (and unexpected) result

Prediction of NMR Chemical Shifts. A Chemometrical Approach К.А. Blinov, Y.D. Smurnyy, Т.S. Churanova, М.Е. Elyashberg Advanced Chemistry Development (ACD)

Similar presentations

Presentation on theme: "Prediction of NMR Chemical Shifts. A Chemometrical Approach К.А. Blinov, Y.D. Smurnyy, Т.S. Churanova, М.Е. Elyashberg Advanced Chemistry Development (ACD)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Prediction of NMR Chemical Shifts. A Chemometrical Approach К.А. Blinov, Y.D. Smurnyy, Т.S. Churanova, М.Е. Elyashberg Advanced Chemistry Development (ACD)

Similar presentations

Presentation on theme: "Prediction of NMR Chemical Shifts. A Chemometrical Approach К.А. Blinov, Y.D. Smurnyy, Т.S. Churanova, М.Е. Elyashberg Advanced Chemistry Development (ACD)"— Presentation transcript:

Similar presentations

About project

Feedback