Download presentation
Presentation is loading. Please wait.
1
Two cases of chemometrics application in protein crystallography European Molecular Biology Laboratory (EMBL), Hamburg, Germany Andrey Bogomolov
2
Outline Protein crystallography: a brief introduction Case I: determination of protein secondary structure from the raw diffraction data using PLS-R Case II: modeling of crystal radiation damage Potential applications of chemometric techniques to crystallography (of biological macromolecules)
3
Protein crystallography: introduction Protein (macromolecular) crystallography is a scientific discipline that studies… biological objects: proteins, DNA, RNA etc. … by physical means: X-ray diffraction, synchrotron radiation … on the chemical level: 3D-structure, complexes, interactions … with the extensive use of mathematics: data analysis, modeling The main objectives: solve 3D-structure of a molecule explain its biological function at the atomic level Today’s hot topic: drug design part of the global “-omics” project (genomics/proteomics)
4
Protein crystallography workflow protein (DNA, RNA) solution structure solution data collection crystallization phasing expression& purification
5
Protein crystallography workflow protein crystal structure solution data collection expression& purification phasing crystallization
6
Protein crystallography workflow diffraction pattern structure solution crystallization expression& purification phasing data collection
7
Protein crystallography workflow electron density map structure solution crystallization expression& purification data collection phasing
8
Protein crystallography workflow 3D structure structure solution crystallization expression& purification phasing data collection
9
Protein Data Bank (PDB) Global data collection (>30000 records) www.pdb.org 3D structures experimental data biological and chemical information
10
Crystallographic data collection: Wilson plot X-ray beam experimental theoretical control optimization
11
Case I: Determination of protein secondary structure Problem: determine the contents (fractions of the polypeptide chain) of secondary structure elements in a protein molecule from the raw diffraction data (Wilson plot) well established method for CD and IR spectra of protein solutions PLS regression – one of the best methods Wilson plot: only qualitative data on existing correlation for “theoretical” data α-helix β-sheet
12
Secondary structure determination: data Data Preprocessing: averaging with an optimal bin size* special scaling (correction for anisotropic B-factor)* taking the natural logarithm conversion into the matrix (Wilson plots in rows)* auto-scaling outliers detection and removal* theoretical experimental *) experimental data only
13
Secondary structure determination: data (2) theoretical experimental 1d5t (α+β) 1at0 (β) 1hq3 (α)
14
Secondary structure determination: calibration results 1.S. Navea, R. Tauler, A. de Juan, Elucidation of protein secondary structure, Anal. Biochem. 336 (2005) 231–242 2.K.A. Oberg, J.-M. Ruysschaert, and E. Goormaghtigh, The optimization of protein secondary structure determination with infrared and circular dichroism spectra, Eur. J. Biochem. 271 (2004) 2937-2948 α-helix (theoretical) Element -helix-sheet Theoretical0.062 (0.96)0.060 (0.92) Experimental * 0.112 (0.84)0.081 (0.84) IR/PLS [1]0.078 (0.93)0.075 (0.93) CD/PLS [2]0.077 (0.94)0.092 (0.89) μ: α=0.31, β=0.240.21 (0.00)0.22 (0.00) RMSEP & correlation coefficients for different methods *) Resolution (1/d) = 0.52 Å -1 (~1.9 Å)
15
Case II: Modeling radiation damage Biological crystal exposed to X-rays undergoes radiation damage: Modeling of radiation damage is important understanding of the effect on the protein optimization of data collection Problem present state no comprehensive theory of RD specific effects are well-known, but it the main changes are non- specific Suggestion by Gleb Bourenkov: radiation dose has linear effect on atom’s B-factors Task check for linearity, find reason(s) of deviation
16
Radiation damage modeling: data (trypsin)
17
Radiation damage modeling: results r=0.999 RMSEP=9.4×10 -3
18
Conclusions Multivariate data analysis has a great potential for protein crystallography currently it is application is episodic rarely goes beyond PCA Method-centric approach would be beneficial: “I have a method, I am looking for problems”
19
X-files PCA, Factor Analysis Multivariate Regression MSPC, Design Of Experiment Curve Resolution Multivariate Image Analysis Target Factor Analysis PARAFAC, 3(multi)-way Wavelet Transform SIMCA, PLSD crystallization, HTPC crystal screening crystal auto-mounting data collection data reduction radiation damage phasing structure solution structure refinement
20
Challenge Critical re-assessment of the entire protein crystallographic workflow with multivariate approach in mind – an ambitious project for chemometricians?
21
Acknowledgements Alexander Popov Gleb Bourenkov Victor Lamzin
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.