A computational study of protein folding pathways Reducing the computational complexity of the folding process using the building block folding model. Nurit Haspel, Chung-Jung Tsai, Haim Wolfson and Ruth Nussinov
The building blocks model (Chung Jung Tsai) Protein folding is a hierarchical process. A protein is constructed from HFU’s. HFU - the result of a combinatorial assembly of building blocks. Building block - a contiguous, highly populated fragment. The building block model allows illustrating the protein folding pathway.
An outline of the building blocks algorithm Scoring function - measures the relative stability of a candidate building block Three ingredients: –Compactness –Degree of isolation –hydrophobicity The result - an “anatomy tree” that illustrates the most probable folding route.
The Scoring Function Z - Compactness H - hydrophobicity I - Isolation
Compactness, Hydrophobicity and Isolation definitions Compactness - Hydrophobicity - Isolation -
The Cutting Procedure Locating a basket of candidate building blocks (relatively stable contiguous fragments): –Assign a stability score to all the candidate fragments –Collect the local minima in the “fragment map” (best score in a given radius). Recursively splitting the protein top-down: –Search the “basket” for a set of fragments that constitute the whole fragment, allowing a short overlap (7 residues) and a gap of up to 15 residues. –Minimum building block size –No node can have only one child (except for the root) –Stop when the node can not be split any further –In this work, building blocks up to level 6.
Example - Annexin III
Example (cont.)
Usefulness of the anatomy tree It is possible to see whether a protein folds through single or multiple route(s). –These routes can be observed by inspecting the fragment map (there can be more than one way to construct a tree). Sequential versus non-sequential folding. –Sequential – contact made only between consecutive building blocks. –Binary anatomy tree sequential folder. Fast versus slow folding –Sequential folding proteins usually fold faster. Climbing up the tree allows us to illustrate the folding process.
Critical building blocks (Sandeep Kumar) Some building blocks may be considered critical for correct folding. A critical building block is in contact with other building blocks in the protein. It likely to be inserted between sequentially connected building blocks. Without it, the other building blocks are likely to mis-associate. The structure and sequence of a critical BB is more likely to be conserved.
Critical building block algorithm For each building block: – Compute its diff. contacting surface area. –Compute its Critical building block index : –Compute its Z-score:
Critical building blocks (cont.) It is found at most levels below the hydrophobic folding unit level It has a consistently high CIndex at different levels Its CIndex is significant by at least 2 standard deviations in at least one level of protein anatomy A building block is critical if :
The goals of my research Clustering the building blocks according to their 3-D structures, using a rigid matching algorithm. Analyzing the building blocks: Sequence, stability distribution, size. Analyzing the clusters: Size, stability score distribution, sequence conservation, criticalness conservation.
The goals of my research (cont.) Analyzing the critical building blocks: position within the protein, relative stability, sequence and structure conservation. Developing an algorithm that assigns a set of building blocks to a protein sequence, using sequence similarity, relative stability and more information.
Clustering the building blocks Each cluster has representative members (one or more) For each building block structure: –Go over the clusters. –Match with cluster representative(s). –If matches (1.5A rmsd, 70% size) - join the building block to the cluster. If no match found - open a new cluster with this building block as a representative. Problem -O(n²) comparisons n - number of clusters
Clustering of the building blocks Cluster 1Cluster 2Cluster n ? ? …
Making clustering more efficient Dividing the building blocks into SCOP families (proteins from the same family usually produce the same building blocks). Clustering each family and then merge all the clusters - reduces the number of clusters at each instance.
Building block and cluster data
Distribution of number of clusters
An example of a cluster
Sequence analysis of the clusters Sequence analysis of the clusters Sequence clustering of each structural cluster (using BLAST). Creating a non-redundant sequence dataset. Goal - finding a connection between (short) sequences and structures.
Statistical analysis of the clusters and of the critical building blocks Statistical analysis of the clusters and of the critical building blocks Stability score distribution among cluster members. Criticalness score distribution among cluster members. Position distribution of the critical building blocks. Stability score as a function of criticalness score.
An example of stability distribution
Criticalness score distribution within a cluster
An N-terminus critical building block example
A C-terminus critical building block example
A mid-sequence critical building block example
Distribution of the position inside the protein - all-alpha, level 3
Stability vs. Criticalness score example
Stability score of critical and non-critical building blocks (histogram) Non-criticalCritical
Final goal Given a sequence and using the information accumulated so far - is there a way of matching a set of building blocks to it?
The building block assignment algorithm Perform sequence alignment of the protein sequence against the building block sequence database. Construct a directed, acyclic graph. –Each matching building block is a graph vertex and is assigned a score depending on the sequence alignment score, building block stability and other parameters. –Directed edges connecting the fragments that match to consecutive areas in the protein sequence, allowing short overlaps and small gaps. –Edge score – average score of connected vertices.
The building block assignment algorithm (cont.) Add fictitious “start” and “target” vertices. Connect start to all starting vertices Connect all ending vertices to target. Find shortest path from start to target using the Single source shortest path algorithm. The path is an “optimal” building block assignment covering the protein sequence.
Illustration of the algorithm
Example – ROP protein from E. coli (1rpo)
Example – Myoglobin from sea hare (1mba)
Suggestions for future work Improving the algorithm and adding new parameters to it (secondary structure alignment, trying other building blocks from the same cluster as the matching building blocks etc.). Combinatorial assembly – Yuval’s work. Further cluster analysis – inquiring into sequence conservation Conformation stability measurements (molecular dynamics…)
Conclusions Using the hierarchical folding model, It may be possible to reduce the folding complexity, assigning local substructures and then assembling them.