Presentation is loading. Please wait.

Presentation is loading. Please wait.

Protein Structure Prediction & Alignment 2003.10.28.

Similar presentations


Presentation on theme: "Protein Structure Prediction & Alignment 2003.10.28."— Presentation transcript:

1 Protein Structure Prediction & Alignment ZHANGroup@bioinfoamss.org 2003.10.28

2 汇报欲达目的 1. 我们研究的科学问题 2. 用到的数学方法范畴 3. 小组目前研究的态势 4. 今后要做的研究方向

3 Bioinformatics Human Genome Project Large molecule data in biology, such as DNA and protein Knowledge of mathematics, computer science, information science, physics, system science, management science as well as biology Genomics ( 基因组学 ) DNA sequencing Gene prediction Sequence alignment

4 DNA Sequencing ACGTGATCGATCGAGTACGAGAGTCTA

5 DNA Sequencing DNA array (DNA chip) AAATGCG

6 Sequencing by Hybridization DNA fragment …… ATACGAAGA ……  Spectrum Error: Positive (misread) / Negative (missing, repetition) ATA TAC ACG CGA GAA AAG AGA Ideal case ATA TAC AGG CGA GAA AAG AGA With errors

7

8 SBH Reconstruction Problem Ideal case (without repetitions and errors) can be solved in polynomial time General case is NP-hard problem Design efficient heuristic algorithms Ji-Hong Zhang, Ling-Yun Wu and Xiang-Sun Zhang. A new approach to the reconstruction of DNA sequencing by hybridization. Bioinformatics, vol 19(1), pages 14-21, 2003. Xiang-Sun Zhang, Ji-Hong Zhang and Ling-Yun Wu. Combinatorial optimization problems in the positional DNA sequencing by hybridization and its algorithms. System Sciences and Mathematics, vol 3, 2002. (in Chinese) Ling-Yun Wu, Ji-Hong Zhang and Xiang-Sun Zhang. Application of neural networks in the reconstruction of DNA sequencing by hybridization. In Proceedings of the 4th ISORA, 2002.

9 Protein Structure Prediction Predict protein 3D structure from (amino acid) sequence Sequence  secondary structure  3D structure  function

10 Proteins Secondary Structure  -helix (30-35%)  - 螺旋  -sheet /  -strand (20-25%)  - 折叠 Coil (40-50%) 无规则卷曲 Loop 环  -turn  - 转角

11 3D Structure of Protein Alpha-helix Beta-sheet Loop and Turn Turn or coil

12 Protein 3D Structure Detection X-ray diffraction X- 射线衍射法 Expensive Slow

13 Protein Structure Protein 3D structure → biological function Lock & key model of enzyme function (docking) Folding problem protein sequence  3D structure Structure prediction and alignment Protein design, drug design, etc … The “holy grail” of bioinformatics

14 Protein Structure Prediction Prediction is possible because Sequence information uniquely determines 3D structure Sequence similarity (>50%) tends to imply structural similarity Prediction is necessary because DNA sequence data » protein sequence data » structure data 199419972002.10 Sequence (Swiss-Port)40,00068,000114,033 Structure (PDB)4,0457,00018,838

15 Predicting Protein Structure Goal Find best fit of sequence to 3D structure Comparative (homology) modeling ( 同源建模 法 ) Construct 3D model from alignment to protein sequences with known structure Threading (fold recognition) ( 折叠识别法 ) Pick best fit to sequences of known 2D / 3D structures (folds) Ab initio / de novo methods ( 从头预测法 ) Attempt to calculate 3D structure “from scratch”  Molecular dynamics  Energy minimization  Lattice models

16

17

18 Modeling protein folding Simple exact model + approximate algorithm Lattice Models

19 Twenty amino acids can be divided into two classes: Hydrophobic/Non-polar (H) ( 疏水 ) Hydrophilic/Polar (P) ( 亲水 ) The contacts between H points are favorable hydrophobic amino acid hydrophilic amino acid Covalent bond H-H contact Goal: maximize the number of H-H contacts HP Model

20 Reduce computation by limiting degrees of freedom Limit  -carbon (C  ) atoms to positions on 2D or 3D lattice Protein sequence  represented as path through lattice points Emphasis on forming hydrophobic core

21 Complexity A combinatorial optimization problem NP-hard problem Long range interaction + global optimization GA MC SA ----- time consumed How Bad are NP-Complete Problems? Length=20

22 SOM Approach Existed SOM solution Motivated by SOM for TSP Incorporation of HP Information Compact lattice

23 New SOM Approach Motivation Consider a big lattice  Multiple map of SOM  Feasibility of solutions Equivalent to PCTSP Properly define the lattice distance TSP force + H-H force

24 New SOM Approach Approachs Initialization Learning sample set partition strategy Learning sample set reduction strategy Local search procedure

25 Numerical Results 1. Constructed HP sequences 2. HP benchmark (up to 36 amino acids)

26 Conclusions Find the global maximum H-H contacts configurations in all the tests Find more optimal conformations Fast -- running time is linear with the sequence length

27 SOM Approach for 2D HP-Model Xiang-Sun Zhang, Yong Wang, Zhong-Wei Zhan, Ling-Yun Wu, Luonan Chen. A New SOM Approach for 2D HP-Model of Proteins' Structure Prediction. Submitted to RECOMB04. Yong Wang, Zhong-Wei Zhan, Ling-Yun Wu, Xiang- Sun Zhang. Improved Self-Organizing Map Algorithm for Protein Folding and its Realization. Submitted to J. of Systems Science and Mathematical Sciences. (in Chinese)

28 Unique Optimal Folding Problem What proteins in the two dimensional HP model have unique optimal (minimum energy) folding? (Brian Hayes, 1998) Oswin Aichholzer proved that in square lattice There are closed chains of monomers with this property for all even lengths. There are open monomer chains with this property for all lengths divisible by four.

29 Square Lattice and Triangular Lattice

30 Our Results For any n = 18k (k is a positive integer), there exists an n-node (open or closed) chain with at least 3^O(n) optimal foldings all with isomorphic contact graphs of size n/2. On 2D triangular lattice, for any integer n> 19, there exist both closed and open chains of n nodes with unique optimal folding.

31 Proteins With Unique Optimal Foldings Zhen-Ping Li, Xiang-Sun Zhang, Luo-Nan Chen, Protein with Unique Optimal Foldings on a Triangular Lattice in the HP Model, Submitted to Journal of Computational Biology.

32 Examples of Optimal Foldings

33 3D Protein Structure Alignment Motivation Group proteins by structural similarity Determine impact of individual residues on protein structure Identify distant homologues of protein families Predict function of proteins with low sequence similarity Identify new folds / targets for x-ray crystallography

34 3D Protein Structure Alignment Correspondence between atoms Pairwise sequence alignment Locations of atoms Protein Data Bank (in PDB file)  Bond angles / lengths  X,Y,Z atom coordinates Evaluation metric 6 degrees of freedom  3 degrees of translation (A)  3 degrees of rotation (R) Root Mean Square Deviation (RMSD)  n = number of atoms  di = distance between corresponding atoms i

35 Structure Alignment Problem

36

37

38 Luo-Nan Chen, Tian-Shou Zhou, Yun Tang, Xiang-Sun Zhang. Structure of Alignment of Protein by Mean Field Annealing. Submitted to ICSB2003.

39 Future Research Protein structure prediction Algorithms for HP model Threading methods Protein structure alignment Novel model for structure alignment SBH reconstruction Algorithms for new pattern SBH methods SNP and Haplotype analysis

40 Summary Protein sequence determine structure Structure prediction (secondary, 3D) is still evolving Ab initio Comparative modeling Threading 3D structure alignment useful for protein comparison, drug design, etc … Proteomics Next stage of bioinformatics Challenge for the new century


Download ppt "Protein Structure Prediction & Alignment 2003.10.28."

Similar presentations


Ads by Google