A computational study of protein folding pathways Reducing the computational complexity of the folding process using the building block folding model.

Slides:

Advertisements

Similar presentations

Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.

Advertisements

Learning Trajectory Patterns by Clustering: Comparative Evaluation Group D.

Whole genome alignments Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.

Breadth-First Search Seminar – Networking Algorithms CS and EE Dept. Lulea University of Technology 27 Jan Mohammad Reza Akhavan.

Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.

BLAST, PSI-BLAST and position- specific scoring matrices Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and.

Amplicon-Based Quasipecies Assembly Using Next Generation Sequencing Nick Mancuso Bassam Tork Computer Science Department Georgia State University.

Protein Threading Zhanggroup Overview Background protein structure protein folding and designability Protein threading Current limitations.

Hidden Markov models for detecting remote protein homologies Kevin Karplus, Christian Barrett, Richard Hughey Georgia Hadjicharalambous.

Heuristic alignment algorithms and cost matrices

Alignment of Flexible Molecular Structures. Motivation Proteins are flexible. One would like to align proteins modulo the flexibility. Hinge and shear.

FAST: A Novel Protein Structure Alignment Algorithm Jianhua Zhu and Zhiping Weng PROTEINS: Structure, Function, and Bioinformatics 58:618–627 (2005) Created.

Agenda A brief introduction The MASS algorithm The pairwise case Extension to the multiple case Experimental results.

Semantic text features from small world graphs Jure Leskovec, IJS + CMU John Shawe-Taylor, Southampton.

4. Ad-hoc I: Hierarchical clustering

FLEX* - REVIEW.

Lecture 9 Hidden Markov Models BioE 480 Sept 21, 2004.

A New Approach for Alignment of Multiple Proteins Adam Hebdon Zhang, Xu, Kahveci, Tamer, 2006, “A New Approach for Alignment of Multiple Proteins”, Pacific.

1 Alignment of Flexible Protein Structures Based on: FlexProt: Alignment of Flexible Protein Structures Without a Pre-definition of Hinge Regions / M.

Identification of Domains using Structural Data Niranjan Nagarajan Department of Computer Science Cornell University.

BLOSUM Information Resources Algorithms in Computational Biology Spring 2006 Created by Itai Sharon.

Carmine Cerrone, Raffaele Cerulli, Bruce Golden GO IX Sirmione, Italy July

Protein Structure Prediction Samantha Chui Oct. 26, 2004.

Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.

Sequence comparison: Score matrices Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas

Important Problem Types and Fundamental Data Structures

Whole genome alignments Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas

Tree-Structured Indexes. Range Searches ``Find all students with gpa > 3.0’’ –If data is in sorted file, do binary search to find first such student,

Protein Structure Alignment by Incremental Combinatorial Extension (CE) of the Optimal Path Ilya N. Shindyalov, Philip E. Bourne.

BIONFORMATIC ALGORITHMS Ryan Tinsley Brandon Lile May 9th, 2014.

Structural alignments of Proteins using by TOPOFIT method Vitkup D., Melamud E., Moult J., Sander C. Completeness in structural genomics. Nature Struct.

Protein Sequence Alignment and Database Searching.

PDBe-fold (SSM) A web-based service for protein structure comparison and structure searches Gaurav Sahni, Ph.D.

1 Converting Categories to Numbers for Approximate Nearest Neighbor Search 嘉義大學資工系郭煌政 2004/10/20.

CSCE350 Algorithms and Data Structure Lecture 17 Jianjun Hu Department of Computer Science and Engineering University of South Carolina

An Efficient Approach to Clustering in Large Multimedia Databases with Noise Alexander Hinneburg and Daniel A. Keim.

Fundamentals of Algorithms MCS - 2 Lecture # 7

Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.

Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.

Identification of Regulatory Binding Sites Using Minimum Spanning Trees Pacific Symposium on Biocomputing, pp , 2003 Reporter: Chu-Ting Tseng Advisor:

JM - 1 Introduction to Bioinformatics: Lecture III Genome Assembly and String Matching Jarek Meller Jarek Meller Division of Biomedical.

Combinatorial docking approach for structure prediction of large proteins and multi-molecular assemblies Yuval Inbar 1, Hadar Benyamini 2, Ruth Nussinov.

Chapter 3 Computational Molecular Biology Michael Smith

CSE332: Data Abstractions Lecture 24.5: Interlude on Intractability Dan Grossman Spring 2012.

Intel Confidential – Internal Only Co-clustering of biological networks and gene expression data Hanisch et al. This paper appears in: bioinformatics 2002.

HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.

2-3 Tree. Slide 2 Outline  Balanced Search Trees 2-3 Trees Trees.

DALI Method Distance mAtrix aLIgnment

Pairwise Local Alignment and Database Search Csc 487/687 Computing for Bioinformatics.

DDPIn Distance and Density Based Protein Indexing David Hoksza Charles University in Prague Department of Software Engineering Czech Republic.

Data Structures and Algorithms in Parallel Computing Lecture 2.

Pairwise Sequence Alignment Part 2. Outline Summary Local and Global alignments FASTA and BLAST algorithms Evaluating significance of alignments Alignment.

R-Trees: A Dynamic Index Structure For Spatial Searching Antonin Guttman.

MINRMS: an efficient algorithm for determining protein structure similarity using root-mean-squared-distance Andrew I. Jewett, Conrad C. Huang and Thomas.

Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”

V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.

V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.

1 Microarray Clustering. 2 Outline Microarrays Hierarchical Clustering K-Means Clustering Corrupted Cliques Problem CAST Clustering Algorithm.

Clustering [Idea only, Chapter 10.1, 10.2, 10.4].

EBI is an Outstation of the European Molecular Biology Laboratory. PDBe-fold (SSM) A web-based service for protein structure comparison and structure searches.

Database Scanning/Searching FASTA/BLAST/PSIBLAST G P S Raghava.

Chapter 14 Protein Structure Classification

Greedy Technique.

Parallel Density-based Hybrid Clustering

Sequence Alignment 11/24/2018.

And the Final Subject is…

Sandeep Kumar, Yuk Yin Sham, Chung-Jung Tsai, Ruth Nussinov

DALI Method Distance mAtrix aLIgnment

Presentation transcript:

A computational study of protein folding pathways Reducing the computational complexity of the folding process using the building block folding model. Nurit Haspel, Chung-Jung Tsai, Haim Wolfson and Ruth Nussinov

The building blocks model (Chung Jung Tsai)  Protein folding is a hierarchical process.  A protein is constructed from HFU’s.  HFU - the result of a combinatorial assembly of building blocks.  Building block - a contiguous, highly populated fragment.  The building block model allows illustrating the protein folding pathway.

An outline of the building blocks algorithm  Scoring function - measures the relative stability of a candidate building block  Three ingredients: –Compactness –Degree of isolation –hydrophobicity  The result - an “anatomy tree” that illustrates the most probable folding route.

The Scoring Function Z - Compactness H - hydrophobicity I - Isolation

Compactness, Hydrophobicity and Isolation definitions Compactness - Hydrophobicity - Isolation -

The Cutting Procedure  Locating a basket of candidate building blocks (relatively stable contiguous fragments): –Assign a stability score to all the candidate fragments –Collect the local minima in the “fragment map” (best score in a given radius).  Recursively splitting the protein top-down: –Search the “basket” for a set of fragments that constitute the whole fragment, allowing a short overlap (7 residues) and a gap of up to 15 residues. –Minimum building block size –No node can have only one child (except for the root) –Stop when the node can not be split any further –In this work, building blocks up to level 6.

Example - Annexin III

Example (cont.)

Usefulness of the anatomy tree  It is possible to see whether a protein folds through single or multiple route(s). –These routes can be observed by inspecting the fragment map (there can be more than one way to construct a tree).  Sequential versus non-sequential folding. –Sequential – contact made only between consecutive building blocks. –Binary anatomy tree sequential folder.  Fast versus slow folding –Sequential folding proteins usually fold faster.  Climbing up the tree allows us to illustrate the folding process.

Critical building blocks (Sandeep Kumar)  Some building blocks may be considered critical for correct folding.  A critical building block is in contact with other building blocks in the protein.  It likely to be inserted between sequentially connected building blocks.  Without it, the other building blocks are likely to mis-associate.  The structure and sequence of a critical BB is more likely to be conserved.

Critical building block algorithm  For each building block: – Compute its diff. contacting surface area. –Compute its Critical building block index : –Compute its Z-score:

Critical building blocks (cont.)  It is found at most levels below the hydrophobic folding unit level  It has a consistently high CIndex at different levels  Its CIndex is significant by at least 2 standard deviations in at least one level of protein anatomy A building block is critical if :

The goals of my research  Clustering the building blocks according to their 3-D structures, using a rigid matching algorithm.  Analyzing the building blocks: Sequence, stability distribution, size.  Analyzing the clusters: Size, stability score distribution, sequence conservation, criticalness conservation.

The goals of my research (cont.)  Analyzing the critical building blocks: position within the protein, relative stability, sequence and structure conservation.  Developing an algorithm that assigns a set of building blocks to a protein sequence, using sequence similarity, relative stability and more information.

Clustering the building blocks  Each cluster has representative members (one or more)  For each building block structure: –Go over the clusters. –Match with cluster representative(s). –If matches (1.5A rmsd, 70% size) - join the building block to the cluster.  If no match found - open a new cluster with this building block as a representative. Problem -O(n²) comparisons n - number of clusters

Clustering of the building blocks Cluster 1Cluster 2Cluster n ? ? …

Making clustering more efficient  Dividing the building blocks into SCOP families (proteins from the same family usually produce the same building blocks).  Clustering each family and then merge all the clusters - reduces the number of clusters at each instance.

Building block and cluster data

Distribution of number of clusters

An example of a cluster

Sequence analysis of the clusters Sequence analysis of the clusters  Sequence clustering of each structural cluster (using BLAST).  Creating a non-redundant sequence dataset.  Goal - finding a connection between (short) sequences and structures.

Statistical analysis of the clusters and of the critical building blocks Statistical analysis of the clusters and of the critical building blocks  Stability score distribution among cluster members.  Criticalness score distribution among cluster members.  Position distribution of the critical building blocks.  Stability score as a function of criticalness score.

An example of stability distribution

Criticalness score distribution within a cluster

An N-terminus critical building block example

A C-terminus critical building block example

A mid-sequence critical building block example

Distribution of the position inside the protein - all-alpha, level 3

Stability vs. Criticalness score example

Stability score of critical and non-critical building blocks (histogram) Non-criticalCritical

Final goal Given a sequence and using the information accumulated so far - is there a way of matching a set of building blocks to it?

The building block assignment algorithm  Perform sequence alignment of the protein sequence against the building block sequence database.  Construct a directed, acyclic graph. –Each matching building block is a graph vertex and is assigned a score depending on the sequence alignment score, building block stability and other parameters. –Directed edges connecting the fragments that match to consecutive areas in the protein sequence, allowing short overlaps and small gaps. –Edge score – average score of connected vertices.

The building block assignment algorithm (cont.)  Add fictitious “start” and “target” vertices.  Connect start to all starting vertices  Connect all ending vertices to target.  Find shortest path from start to target using the Single source shortest path algorithm.  The path is an “optimal” building block assignment covering the protein sequence.

Illustration of the algorithm

Example – ROP protein from E. coli (1rpo)

Example – Myoglobin from sea hare (1mba)

Suggestions for future work  Improving the algorithm and adding new parameters to it (secondary structure alignment, trying other building blocks from the same cluster as the matching building blocks etc.).  Combinatorial assembly – Yuval’s work.  Further cluster analysis – inquiring into sequence conservation  Conformation stability measurements (molecular dynamics…)

Conclusions Using the hierarchical folding model, It may be possible to reduce the folding complexity, assigning local substructures and then assembling them.