A Simple Model for Protein Structure 施奇廷（東海大學物理系）.

A Simple Model for Protein Structure 施奇廷（東海大學物理系）

The Models HP Model: p i is H or P and  =1 for contacts E HH =-2.3, E HP =-1, and E PP =0 (Li et al., Science 273, 666) For “additive” case: E HH =-2, E HP =-1, and E pipj =0: E pq =- (p i +p j ) where p i =1 (0) for H (P) residues HP Model (2 nd type):

尋找最低能量態對於每一種氨基酸序列，將之放入所有可能的構形中，計算其能量，找出能量最低者為其基態。注意基態能量不可簡併，否則為不穩定之構形，將被演化淘汰。例如在 4x4 晶格中，一序列為： HHPHHPHPPPPHHPHH HHPHHPHPPPPHHPHH

HP Model (1st Type)

第二個模型可以視為 HP 模型之「平均場近似」：將晶格點的位置分為兩類，一種是表面的（ S ），一種是核心的（ C ），若一疏水氨基酸出現在核心（不與水接觸），則能量可降低一個單位。在此近似下，可將一種形狀用一個 N 維向量（）表示，以 0 表 S ，以 1 表 C ，氨基酸序列亦同（）：以 0 表 P ，以 1 表 H 。 Second Model: A Mean-Field Approximation

HP Model (2nd Type)

可設計度（ Designability ）長度為 N 的序列，一共有 2 N 種，每一個序列都找出其對應的基態構形（基態簡併者除外），計算每種構形被選為基態的次數，即為該構形的可設計度。

Designability of a given structure: Number of peptide sequences choosing a particular geometric structure as its non- degenerate ground state.

Geometrical under- standing of the HP model (2 nd type)

LS Model: (C. Micheletti et al., PRL 80, 4987) σ i =L (0, large) or S(1, small); z(σ i )=1 (2) for L (S) residues inside the chain and z(σ i )=2 (3) for L (S) residues at the ends of the chain; z i (  ) is number of contacts at site I; A(x)=1 for x ≧ 0 and –a otherwise (a>0, a= ∞ in the Ref.).

In the N×N square lattices: Notations: n  z is the number of  (L or S) on the z-type sites, z=o (s,c) for corner (side, core) sites, n  =  z n  z for a >> 1 but finite, we get: for a= ∞, L is prohibited to be on the core sites→n L c =0

The most encodable compact structures for the LS model for 6×6 lattice. The shape of the one with highest score is identical to the case of HP model

Geometrical Properties of the 2D Square Lattices n 00 (n 10, n 11 ): number of peptide bonds connecting 00 (10, 11) residues. The 1-0 bonds partition the sequence into n 10 +1 segments of contiguous 1’s or 0’s. Constraints for N>4: 1.An isolated single 1 may only occur at an end of a path 2.An isolated single 0 may only either occur at or be one 1-segment away from an end of a path 3.Each of the four corners on the lattice belongs to a 0- segment with at least 4 sites, except when the corner is an end of a path

4.For a path (1…1), 2n 00 + n 10 = 8N-8 and 2 ≦ n 10 ≦ 4N-12 5.(0010011…1): 2n 00 + n 10 = 8N-9, and 5 ≦ n 10 ≦ 4N-11 6.(0010011…1100100): 2n 00 + n 10 = 8N-10; and 10 ≦ n 10 ≦ 4N-10 for N>6, and 8 ≦ n 10 ≦ 4N-10 for N ≦ 6 7.(0010011…0) but not 6., 2n 00 + n 10 = 8N-10, and 4 ≦ n 10 ≦ 4N-12 8.(0…0) but not 6. and 7., 2n 00 + n 10 = 8N-10, 4 ≦ n 10 ≦ 4N-12 9.(0…1) but not 5., 2n 00 + n 10 = 8N-9, 1 ≦ n 10 ≦ 4N-13 Geometrical Properties of the 2D Square Lattices (conti.)

Example: Constraint 4: (1……1) type Left: maximum n 10 =12 and Right: minimum n 10 =2

Distribution of the Allowed Structures in the Hyperspace More possible binary sequences with larger n 10 are not allowed to be a structure s than those with smaller n 10 from the combinatorial point of view.

Minimal Hamming distance d H (s 1,s 2 ) between two path s 1,s 2 is approximately 4k (2k for triangular lattices) if  n 10 =4k or 4k-2: 1. (…01111110…10000001…)→(01111000…10011001…) 2. (…01111110…10000001…)→(01100110…10011001…)

On the average, the designability of s with larger n10 will be larger. And the results will also be true for other shape of 2D lattices.

Comparison with Protein Data Bank Metric representation of a sequence p with length l=2k: For a set of sequences collected by the models, calculate the frequency distribution of the subsequences with length 2k of the sequences. And plot it in a unit square. And then Calculate the correlation of the distribution function: where F i (l) (m) is the normalized frequency of the mth subsequence with length l in the set i.

Results and Discussion Average designabilities of the paths vs. n 10 for the (c) 4×7 and (d) 6×6 lattices, respectively.

The frequencies of all the subsequences with length 12 observed in (a)all proteins in PDB, (b) the alpha-helix parts of (a), (c)the sequences belong to the highly designable structures, (d)the sequences belong to the low designable structures of HP model.

The frequencies of all the subsequences with length 12 observed in (a)all proteins in PDB, (b) the sequences belong to highly designable structures of LS model. (c) normalized frequencies of (a), (d) normalized frequencies of (b).

Summary HP model 為研究蛋白質結構最簡單之模型，只考慮親梳水作用 HP model 為研究蛋白質結構最簡單之模型，只考慮親梳水作用可設計度之研究，可以解釋許多不同的蛋白質，折疊成類似形狀的現象可設計度之研究，可以解釋許多不同的蛋白質，折疊成類似形狀的現象可設計度高的結構，擁有叫「縐摺」的表面 → 可以自然給出表面的  - 螺旋二級結構，與實驗結果吻合可設計度高的結構，擁有叫「縐摺」的表面 → 可以自然給出表面的  - 螺旋二級結構，與實驗結果吻合 LS model 在數學上與 HP model 是等價的，但是物理意義卻不同 LS model 在數學上與 HP model 是等價的，但是物理意義卻不同藉由與實際蛋白質序列與結構的比較，我們可以判別各個不同的簡化模型之優劣藉由與實際蛋白質序列與結構的比較，我們可以判別各個不同的簡化模型之優劣

A Simple Model for Protein Structure 施奇廷（東海大學物理系）.

Similar presentations

Presentation on theme: "A Simple Model for Protein Structure 施奇廷（東海大學物理系）."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

A Simple Model for Protein Structure 施奇廷（東海大學物理系）.

Similar presentations

Presentation on theme: "A Simple Model for Protein Structure 施奇廷（東海大學物理系）."— Presentation transcript:

Similar presentations

About project

Feedback