Download presentation
Presentation is loading. Please wait.
Published byStuart Lawrence Modified over 8 years ago
1
Protein Structure Prediction & Alignment ZHANGroup@bioinfoamss.org 2003.10.28
2
汇报欲达目的 1. 我们研究的科学问题 2. 用到的数学方法范畴 3. 小组目前研究的态势 4. 今后要做的研究方向
3
Bioinformatics Human Genome Project Large molecule data in biology, such as DNA and protein Knowledge of mathematics, computer science, information science, physics, system science, management science as well as biology Genomics ( 基因组学 ) DNA sequencing Gene prediction Sequence alignment
4
DNA Sequencing ACGTGATCGATCGAGTACGAGAGTCTA
5
DNA Sequencing DNA array (DNA chip) AAATGCG
6
Sequencing by Hybridization DNA fragment …… ATACGAAGA …… Spectrum Error: Positive (misread) / Negative (missing, repetition) ATA TAC ACG CGA GAA AAG AGA Ideal case ATA TAC AGG CGA GAA AAG AGA With errors
8
SBH Reconstruction Problem Ideal case (without repetitions and errors) can be solved in polynomial time General case is NP-hard problem Design efficient heuristic algorithms Ji-Hong Zhang, Ling-Yun Wu and Xiang-Sun Zhang. A new approach to the reconstruction of DNA sequencing by hybridization. Bioinformatics, vol 19(1), pages 14-21, 2003. Xiang-Sun Zhang, Ji-Hong Zhang and Ling-Yun Wu. Combinatorial optimization problems in the positional DNA sequencing by hybridization and its algorithms. System Sciences and Mathematics, vol 3, 2002. (in Chinese) Ling-Yun Wu, Ji-Hong Zhang and Xiang-Sun Zhang. Application of neural networks in the reconstruction of DNA sequencing by hybridization. In Proceedings of the 4th ISORA, 2002.
9
Protein Structure Prediction Predict protein 3D structure from (amino acid) sequence Sequence secondary structure 3D structure function
10
Proteins Secondary Structure -helix (30-35%) - 螺旋 -sheet / -strand (20-25%) - 折叠 Coil (40-50%) 无规则卷曲 Loop 环 -turn - 转角
11
3D Structure of Protein Alpha-helix Beta-sheet Loop and Turn Turn or coil
12
Protein 3D Structure Detection X-ray diffraction X- 射线衍射法 Expensive Slow
13
Protein Structure Protein 3D structure → biological function Lock & key model of enzyme function (docking) Folding problem protein sequence 3D structure Structure prediction and alignment Protein design, drug design, etc … The “holy grail” of bioinformatics
14
Protein Structure Prediction Prediction is possible because Sequence information uniquely determines 3D structure Sequence similarity (>50%) tends to imply structural similarity Prediction is necessary because DNA sequence data » protein sequence data » structure data 199419972002.10 Sequence (Swiss-Port)40,00068,000114,033 Structure (PDB)4,0457,00018,838
15
Predicting Protein Structure Goal Find best fit of sequence to 3D structure Comparative (homology) modeling ( 同源建模 法 ) Construct 3D model from alignment to protein sequences with known structure Threading (fold recognition) ( 折叠识别法 ) Pick best fit to sequences of known 2D / 3D structures (folds) Ab initio / de novo methods ( 从头预测法 ) Attempt to calculate 3D structure “from scratch” Molecular dynamics Energy minimization Lattice models
18
Modeling protein folding Simple exact model + approximate algorithm Lattice Models
19
Twenty amino acids can be divided into two classes: Hydrophobic/Non-polar (H) ( 疏水 ) Hydrophilic/Polar (P) ( 亲水 ) The contacts between H points are favorable hydrophobic amino acid hydrophilic amino acid Covalent bond H-H contact Goal: maximize the number of H-H contacts HP Model
20
Reduce computation by limiting degrees of freedom Limit -carbon (C ) atoms to positions on 2D or 3D lattice Protein sequence represented as path through lattice points Emphasis on forming hydrophobic core
21
Complexity A combinatorial optimization problem NP-hard problem Long range interaction + global optimization GA MC SA ----- time consumed How Bad are NP-Complete Problems? Length=20
22
SOM Approach Existed SOM solution Motivated by SOM for TSP Incorporation of HP Information Compact lattice
23
New SOM Approach Motivation Consider a big lattice Multiple map of SOM Feasibility of solutions Equivalent to PCTSP Properly define the lattice distance TSP force + H-H force
24
New SOM Approach Approachs Initialization Learning sample set partition strategy Learning sample set reduction strategy Local search procedure
25
Numerical Results 1. Constructed HP sequences 2. HP benchmark (up to 36 amino acids)
26
Conclusions Find the global maximum H-H contacts configurations in all the tests Find more optimal conformations Fast -- running time is linear with the sequence length
27
SOM Approach for 2D HP-Model Xiang-Sun Zhang, Yong Wang, Zhong-Wei Zhan, Ling-Yun Wu, Luonan Chen. A New SOM Approach for 2D HP-Model of Proteins' Structure Prediction. Submitted to RECOMB04. Yong Wang, Zhong-Wei Zhan, Ling-Yun Wu, Xiang- Sun Zhang. Improved Self-Organizing Map Algorithm for Protein Folding and its Realization. Submitted to J. of Systems Science and Mathematical Sciences. (in Chinese)
28
Unique Optimal Folding Problem What proteins in the two dimensional HP model have unique optimal (minimum energy) folding? (Brian Hayes, 1998) Oswin Aichholzer proved that in square lattice There are closed chains of monomers with this property for all even lengths. There are open monomer chains with this property for all lengths divisible by four.
29
Square Lattice and Triangular Lattice
30
Our Results For any n = 18k (k is a positive integer), there exists an n-node (open or closed) chain with at least 3^O(n) optimal foldings all with isomorphic contact graphs of size n/2. On 2D triangular lattice, for any integer n> 19, there exist both closed and open chains of n nodes with unique optimal folding.
31
Proteins With Unique Optimal Foldings Zhen-Ping Li, Xiang-Sun Zhang, Luo-Nan Chen, Protein with Unique Optimal Foldings on a Triangular Lattice in the HP Model, Submitted to Journal of Computational Biology.
32
Examples of Optimal Foldings
33
3D Protein Structure Alignment Motivation Group proteins by structural similarity Determine impact of individual residues on protein structure Identify distant homologues of protein families Predict function of proteins with low sequence similarity Identify new folds / targets for x-ray crystallography
34
3D Protein Structure Alignment Correspondence between atoms Pairwise sequence alignment Locations of atoms Protein Data Bank (in PDB file) Bond angles / lengths X,Y,Z atom coordinates Evaluation metric 6 degrees of freedom 3 degrees of translation (A) 3 degrees of rotation (R) Root Mean Square Deviation (RMSD) n = number of atoms di = distance between corresponding atoms i
35
Structure Alignment Problem
38
Luo-Nan Chen, Tian-Shou Zhou, Yun Tang, Xiang-Sun Zhang. Structure of Alignment of Protein by Mean Field Annealing. Submitted to ICSB2003.
39
Future Research Protein structure prediction Algorithms for HP model Threading methods Protein structure alignment Novel model for structure alignment SBH reconstruction Algorithms for new pattern SBH methods SNP and Haplotype analysis
40
Summary Protein sequence determine structure Structure prediction (secondary, 3D) is still evolving Ab initio Comparative modeling Threading 3D structure alignment useful for protein comparison, drug design, etc … Proteomics Next stage of bioinformatics Challenge for the new century
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.