Presentation is loading. Please wait.

Presentation is loading. Please wait.

CLePAPS : Fast Pair Alignment of Protein Structures Based on Conformational Letters BioInfoSummer07 ICE-EM Summer Symposium in BioInformatics, Dec 10-14,

Similar presentations


Presentation on theme: "CLePAPS : Fast Pair Alignment of Protein Structures Based on Conformational Letters BioInfoSummer07 ICE-EM Summer Symposium in BioInformatics, Dec 10-14,"— Presentation transcript:

1 CLePAPS : Fast Pair Alignment of Protein Structures Based on Conformational Letters BioInfoSummer07 ICE-EM Summer Symposium in BioInformatics, Dec 10-14, Australian National University. Canberra, Australia Sheng WANG, Wei-Mou ZHENG* Institute of Theoretical Physics, CAS zheng@itp.ac.cn *To whom correspondence should be addressed

2 Outline [1] Introduction [2] The flow chart of CLePAPS Algorithm [2-1] Find SFPs by CLeSUM [2-2] Construct ‘Star-Tree’ [2-3] The ‘Zoon-In’ Strategy [3] Result & Discussion

3 Structure alignment --- a self-consistent problem Correspondence Rigid transformation However, when aligning two protein structures, at the beginning we know neither the transformation nor the correspondence. DALI, CE VAST STRUCTAL, ProSup CLePAPS: Conformational Letters based Pairwise Alignment of Protein Structures Initialization + iteration Similar Fragment Pairs (SFPs); Anchor-based; Alignment = As many consistent SFPs as possible Page 1 (Chp1) Chapter[1] : Introduction

4 Anchor-based superposition SFPs Anchor SFP consistent inconsistent Alignment = Collect as many consistent SFPs as possible Page 2 (Chp1) Chapter[1] : Introduction

5 Initial correspondence (Anchor SFP) Optimal transformation for the correspondence Correspondence update (adding consistent SFPs) Convergence? End Structure Alignment => a self-consistent problem Yes No ProteinA ProteinB Align Chapter[1] : Introduction Page 3 (Chp1)

6 [1] How can we find SFPs as fast as possible? [2] How can we balance Specificity and Sensitivity of the found SFPs? [3] How can we avoid a start? [4] How can we haste the convergence while not to be Local Traped? Four Main Problems LOCAL TRAP Chapter[1] : Introduction Page 4 (Chp1)

7 An example of LOCAL TRAP

8 Find SFPs By CLeSUM SFP List (width 8) SFP List (width 20) Final Alignment Third Update Second Update First Update Optimal Anchor SFP Star-Tree Construct Part_III: ‘Zoom-In’ Top K for anchor Top J for neighbor d1 blank-filling d2 blank-filling d3 blank-filling Part_II: ‘Star-Tree’ Specificity Sensitivity Part_I: SFP Chapter[2] : The flow chart of CLePAPS Algorithm Page 5 (Chp2) Part_II: ‘Star-Tree’ Initial correspondence (Select an Optimal Anchor SFP) Part_III: ‘Zoom-In’ Correspondence update (adding consistent SFPs without Local Trap and to haste the convergence)

9 Find SFPs By CLeSUM SFP ( Similarity Fragment Pair) Chapter[2-1] : Find SFPs by CLeSUM Page 6 (Chp2) CLeSUM ( Conformational Letter SUbstitution Matrix ) Hint: Part_I: SFP

10 The main difference of CLePAPS from other existing algorithms for structure alignment is the use of Conformational Letters. Conformational letters = discretized states of 3D segmental conformations. A letter = a cluster of combinations of three angles formed by C  pseudobonds of four contiguous residues. (obtained by clustering according to the probability distribution.) Fig.1 Centers of 17 conformational letters Page 7 (Chp2) Chapter[2-1] : Find SFPs by CLeSUM

11 Similarity between conformational letters CLeSUM: Conformational Letter SUbstitution Matrix M ij = 20* log 2 (P ij /P i P j ) ~ BLOSUM83, H ~ 1.05 constructed using FSSP representatives. typical helix typical sheet evolutionary + geometric Page 8 (Chp2) Chapter[2-1] : Find SFPs by CLeSUM

12 SFP => highly scored string pair Fast search for SFPs by string comparison CLESUM similarity score importance of SFPs Guided by CLESUM scores, only the top few SFPs need to be examined to determine the superposition for alignment, and hence a reliable greedy strategy becomes possible. Protein A similar seed Page 9 (Chp2) Protein B (smaller) Chapter[2-1] : Find SFPs by CLeSUM Example

13 An example of Find SFP >1molA RRFEDECCGAIHHHHHHHHHHHHHHHOMICQEECBLDFQNBFEEEEFEQNNGCPLDDEEEDEEENOGCEDEEEEEEPKKOGFEDPLDEQBGCCR >1cewI RRCECECAJGBIHHHHHHHHIIHHHIIGPGBLDFFCPLDPLEEFEDPOLCEEEEEEDEFDEAGCAKLAJGKHHIIMNGKLQQQDEEEDEEEEEBPKKOGEEDPLEEER HHHHHHHH AJGKHHII FEDECCGA OLCEEEEE FEDPLDEQ EEDPLEEE PLDDEEED PLEEFEDP CEDEEEEE EEDEEEEE Similar Fragment Pair (SFP) Score rank51234 To find SFP, we take the shorter sequence as template, and record every pair position which score is higher than the threshold, the fragment is at a given length seed 1cewI1molA Align

14 Chapter[2-2] : Construct ‘Star-Tree’ Page 10 (Chp2) SFP List (width 20) => We create a list of SFP with length 20 and sort them by CLeSUM score Top_K & Top_J ( J > K ) => We only select the Top_K of the list as Anchor SFP and check their consistency use Top_J for neighbor Hint: Find SFPs By CLeSUM Part_I: SFP

15 Score rank 5 1 42 Example: Top K, K = 2; Top J,J = 5 Anchor # of consistent SFPs = 4# of consistent SFPs = 1 Selection of Optimal Anchor SFP 1 Top_1 SFP is globally supported by three other SFPs, while Top_2 SFP is supported only by itself. Page 11 (Chp2) 3 Anchor 2 Example Chapter[2-2] : Construct ‘Star-Tree’ SFP

16 1cewI1molA Anchor Consistent # of consistent SFBs = 4 Anchor # of consistent SFBs = 1 Top_1 SFP Top_2 SFP ‘Star-Tree’ view An example of ‘Star-Tree’ construct Align

17 SFP List (width 8) SFP List (width 20) Final Alignment Third Update Second Update First Update Optimal Anchor SFP Star-Tree Construct Part_III: ‘Zoom-In’ Top K for anchor Top J for neighbor d1 blank-filling d2 blank-filling d3 blank-filling Part_II: ‘Star-Tree’ Specificity Sensitivity Page 5 (Chp2) Find SFPs By CLeSUM Part_I: SFP Part_III: ‘Zoom-In’ Correspondence update (adding consistent SFPs without Local Trap and to haste the convergence) Chapter[2] : The flow chart of CLePAPS Algorithm Top 1 ( 4 ) Top 2 ( 1 )

18 Chapter[2-3] : The ‘Zoon-In’ Strategy Page 12 (Chp2) SFP List (width 8) => We create a list of SFP with length 8 and sort them by CLeSUM score (descending order) blank-filling => We add consistent SFPs one by one from SFP List (width 8) to update the correspondence Hint: Find SFPs By CLeSUM Part_I: SFP

19 d1d1 d2d2 d3d3 Page 13 (Chp2) [1] The first transformation is determined by the Optimal Anchor SFP, so we use a large cutoff d1 to avoid LOCAL TRAP Example Chapter[2-3] : The ‘Zoon-In’ Strategy d1 > d2 > d3 8A 6A 5A 。。。 [2] The later transformation is determined by a set of globally consistent SFPs, so we use a lower cutoff to add new consistent SFPs

20 Third Update d1d2 d3 d1 > d2 > d3 An example of ‘Zoom-In’ strategy Elongation Final Alignment Fisrt Update Second Update Shrink 8A 6A 5A 。。。

21 SFP List (width 8) SFP List (width 20) Final Alignment Third Update Second Update First Update Optimal Anchor SFP Star-Tree Construct Part_III: ‘Zoom-In’ Top K for anchor Top J for neighbor d1 blank-filling d2 blank-filling d3 blank-filling Top 1 ( 4 ) Top 2 ( 1 ) Part_II: ‘Star-Tree’ Specificity Sensitivity Page 5 (Chp2) Find SFPs By CLeSUM Part_I: SFP Chapter[2] : The flow chart of CLePAPS Algorithm

22 [1] How can we find SFPs as fast as possible? [2] How can we balance Specificity and Sensitivity of the found SFPs ? [3] How can we avoid a Local Trap start? [4] How can we haste the convergence while not to be Local Traped ? Four Main Problems [1] Fast search for SFPs by merely string comparison [2] Width 20 for Specificity and width 8 for Sensitivity, both sorted by CLeSUM score [3] Optimal Anchor SFP selected through ‘Star-Tree’ [4] Fast ‘Zoom-In’ strategy to convergence only within three times CLePAPS ‘s Solution Page 14 (Chp3) Chapter[3] : Result & Conclusion

23 Page 15 (Chp3) The Fischer benchmark test Database search with CLePAPS Multi-Solution of alignments: symmetry, domain move, repeats Non-topological alignment and domain shuffling [pdb:1ihwA]  [pdb:1ssoA]

24 Multi-Solution[1] : Symmetry [pdb:4fgf]  [pdb:8i1b] Red structure fixed Solution [A]Solution [B]Solution [C] [pdb:4fgf] [OGCCFEFAHO GEED] [OGDCEDFAIOG EED] [KGFCEDDAJO GCCC]

25 Multi-Solution[2] : Domain Move Blue structure fixed [pdb:2gbp]  [pdb:2liv] Solution [A]Solution [B] Domain_1 Domain_2

26 Multi-Solution[3] : Repeats Blue structure fixed [pdb:4cpv]  [pdb:1osa] Solution [A]Solution [B] Repeat_1 Repeat_2

27 Conclusion CLePAPS distinguishes itself from other existing algorithms for pairwise structure alignment in its use of conformational letters. conformational letters : aptly balance precision with simplicity CLeSUM: a proper measure of similarity between states CLeSUM extracted from the database FSSP contains information of structure database statistics, which reduces the chance of accidental matching of two irrelevant helices. evolutionary + geometric = specificity gain For example, two frequent helices are geometrically very similar, but their score is relatively low. CLeSUM similarity score can be used to sort the importance of SFPs for a greedy algorithm. Only the top few SFPs need to be examined. Page 16 (Chp3) Chapter[3] : Result & Conclusion

28 1, Fast search for SFPs by merely string comparison 2, Width 20 for specificity + width 8 for sensitivity 3, Optimal Anchor SFP selected by checking consistency 4, Avoid Local Trap by ’zoom-in’ The running time for the 68 pairs of the Fischer benchmark is less than 2% of that of the downloaded CE local version. Next steps 1, BLOMAPS: fast multiple structure alignment; SFPs → Highly Similar Fragment Blocks (HSFBs) 2, Include biochemical information into CLESUM by amino acid clustering. Entropic clustering: AVCFIWLMY (h) + DEGHKNPQRST (p) Page 17 (Chp3) Chapter[3] : Result & Conclusion

29 Thank you

30 >1molA RRFEDECCGAIHHHHHHHHHHHHHHHOMICQEECBLDFQNBFEEEEFEQNNGCPLDDEE EDEEENOGCEDEEEEEEPKKOGFEDPLDEQBGCCR N-Terminal C-Terminal Step 1 get four continuous C α atom Step 2 get two bending angle θ and θ ’ and one torsion angle τ Step 3 select the most similar one from the 17 states Step 4 assign the code Step 1Step 2 Step 3 Step 4

31 θ θ’θ’ τ


Download ppt "CLePAPS : Fast Pair Alignment of Protein Structures Based on Conformational Letters BioInfoSummer07 ICE-EM Summer Symposium in BioInformatics, Dec 10-14,"

Similar presentations


Ads by Google