Presentation is loading. Please wait.

Presentation is loading. Please wait.

Protein Structures from A Statistical Perspective Jinfeng Zhang Department of Statistics Florida State University.

Similar presentations


Presentation on theme: "Protein Structures from A Statistical Perspective Jinfeng Zhang Department of Statistics Florida State University."— Presentation transcript:

1 Protein Structures from A Statistical Perspective Jinfeng Zhang Department of Statistics Florida State University

2 2 Introduction – Machines of Life Proteins play crucial roles in virtually all biological processes. Rhodopsin Myosin Hemoglobin From Protein Data Bank (PDB) Pepsin

3 3 From Sequence to Structure to Function The function of a protein is governed by its three-dimensional structure. –Structure is determined by sequence. DNA Sequences Amino Acid Sequences Protein Structures

4 4 Protein Folding Problem

5 5 Evaluation of Energy Functions Current approach: rank the energy of the single native structure among those of decoy structures. –Neither necessary nor sufficient. EE X-ray structure EE A B Near-Native structures

6 6 X-ray Structures

7 7 Proteins are Dynamic Molecules

8 8 YJ. Huang and GT. Montelione, Nature, 438, (2005), 36-37.

9 9 N Furnham, T Blundell, M DePristo, Nature Structure & Molecular Biology, (2006) 13:184-185. … A more suitable representation of a macro-molecular crystal structure would be an ensemble of models. The range of structures in the ensemble would be considered by any user of the structural information.

10 10 A New Criterion for Energy Function Evaluation Based on Probability of NNS All structures Near-Native Structures (NNS) Native Structure

11 11 Off-lattice Discrete State Model J Zhang, R Chen, J Liang, (2006), Proteins, 63:949-960. CC

12 12 All structures NNS Sampling Near-Native Structures (NNS) by Sequential Monte Carlo (SMC) SMC Native structure Two constraints: – Conformational – Energetic Partition functions: J. Zhang et. al. (2007), Proteins, 66: 61-68

13 13 Comparison with Enumeration At length 15, 5-state model, protein 1ail has 1.04×10 9 conformations. Estimated number is 1.039×10 9 with a sample size of 10,000.

14 14 Energy Functions and Their Performances J Zhang et. al., (2007), Proteins, 66:61-68. J Zhang, R Chen, J Liang, (2006), Proteins, 63:949-960. UP: Uniform Potential CP: Contact Potential CALSP: Contact And Local Sequence-structure Potential

15 15 Characterizing Ensemble of Side Chain Conformations Side chain conformations –Ensemble of structures with the same backbone but different side chain conformations.

16 16 Ensemble of Side Chain Conformations

17 17 Ensemble of Side Chain Conformations Number of side chain conformations, N sc. Side chain conformational entropy. S sc = k B ln(N sc ) Protein stability.  G =  H-T  S http://wishart.biology.ualberta.ca/moviemaker

18 18 Sequential Monte Carlo (SMC) S n = (r 1,…, r j,…, r n ), r j ∈ R j = {1,…, M j }. SMC: sample a side-chain (r) one at a time and fix the residues that are already sampled. For each sample i, there is an associated weight, w (i). At step t, a residue, r t, is picked, and a rotamer, k, is sampled from a given distribution with probability p k. Update weight by

19 19 Sequential Sampling of Side Chains

20 20 Performance The total number of self-avoiding side-chain conformations for the fragment of 3ebx, residue 1-17, is 396,325,923,840 ≈ 3.96×10 11, SMC estimate is 4.01×10 11 with a sample size of 1000.

21 21 Incorporating SCE in Energy Function  G =  H –  H : Residue contact potential.  G =  H - T  S sc –  H : Residue contact potential. –  S sc : Side-chain entropy. –T = 1.

22 22 ΔH vs. ΔH - ΔS sc Protein IDΔHΔHΔH - ΔS sc Protein IDΔHΔHΔH - ΔS sc 1ctf (A, 630)*611beo (D, 2000)672 1r69 (A, 675)2451ctf (D, 2000)101 1sn3 (A, 660)86101dkt-A (D, 2000)5885 2cro (A, 674)6351fca (D, 2000)13610 3icb (A, 653)19251nkl (D, 2000)2173 4pti (A, 687)143831pgb (D, 1572)121 4rxn (A, 677)1471b0n-B (E, 497)114104 1fc2 (B, 500)751ctf (E, 497)134 1hdd-C (B, 500)1051dtk (E, 215)11 2cro (B, 500)47171fc2 (E, 500)323 4icb (B, 500) 11 1igd (E, 500)1596 1bl0 (C, 971)85141shf-A (E, 437)2 2 1eh2 (C, 2413)99532cro (E, 500)11 1jwe (C, 1407)28812ovo (E, 347)192 smd3 (C, 1200)26614pti (E, 343)11 * A: 4state_reduced, B: fisa, C: fisa_casp3, D: lattice_ssfit, E: lmds.

23 23 Summary Proteins can be better modeled as ensemble of conformations. –Estimating entropy and free energy by SMC P NNS as a better criterion for evaluating energy functions. SCE is important for protein folding and structure modeling.

24 24 Acknowledgement Prof. Jun LiuDepartment of Statistics Harvard University Prof. Jie LiangBioengineering Department University of Illinois at Chicago Prof. Rong ChenDepartment of Information and Decision Science University of Illinois at Chicago Dr. Ming LinDepartment of Information and Decision Science University of Illinois at Chicago NIH, NSF for financial support!

25 25 Protein Interactions http://wishart.biology.ualberta.ca/moviemaker

26 26 Native & Decoy Structure of Protein Complexes 1spb 1brc Native

27 27 Native & Decoys Structures Native S sc can differ by more than 20 in k B unit, which corresponds to -11.9 kcal/mol of free energy at 300K. The stability of a protein is around -5 to -20 kcal/mol. 1ctf

28 28 Side-chain Modeling All heavy atoms are explicitly modeled. Side-chain flexibility –Rotamer library by D. Richardson Excluded volume effect –A pair of atoms i and j are considered to be a hard clash if r ij : distance; r 0 (i) and r 0 (j) : van der Waals radii of the two atoms; a : scaling coefficient.

29 29 X-ray & NMR Structures Protein in crystalProtein in solution

30 30 SCE vs. R g of X-ray and NMR Structures 23 proteins with both X-ray and NMR structures


Download ppt "Protein Structures from A Statistical Perspective Jinfeng Zhang Department of Statistics Florida State University."

Similar presentations


Ads by Google