Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.

Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics

Protein structure prediction knowing the structure of a protein is a prerequisite to gain a thorough understanding of the protein's function Experimental methods are highly labor intensive Proteins are capable of folding into their unique functional 3D structures without any additional genetic mechanisms

Where can I learn more? Protein Structure Prediction Center Biology and Biotechnology Research Program Lawrence Livermore National Laboratory, Livermore, CA http://predictioncenter.org/

Why do we want to predict 2 nd structures? prediction of 2 nd structure is a step towards 3D structure prediction can be used in threading methods to identify distantly related proteins may provide insights into function

What is secondary structure? Three major types: Alpha Helical Regions Beta Sheet Regions Coils, Turns, Extended (anything else)

Some Prediction Methods ab initio methods Based on physical properties of aa’s and bonding patterns Statistics of amino acid distributions in known structures Chou-Fasman Position of amino acid and distribution Garnier, Osguthorpe-Robeson (GOR) Neural networks

ab initio methods A mixture of science and engineering The challenges Devise a scoring function that can distinguish between correct (native or native-like) structures from incorrect (non-native) ones. A search method to explore the conformation space Problems A reliable and general scoring function Reliable and general search method that can sample the conformation space adequately

Chou-Fasman First widely used procedure developed by Chou & Fasman in 1974 & 1978 based on frequencies of residues in  -helices,  - sheets and turns Accuracy ~50 - 60% Output-helix, strand or turn

Chou-Fasman Pij-values

How it works Assign all of the residues the appropriate set of parameters. Scan through the peptide and identify regions where 4 out of 6 contiguous residues have P(alpha-helix) > 100. That region is declared an alpha-helix. Extend the helix in both directions until a set of four contiguous residues that have an average P(alpha-helix) P(beta-sheet) for that segment, the segment can be assigned as a helix. Repeat this procedure to locate all of the helical regions in the sequence. Scan through the peptide and identify a region where 3 out of 5 of the residues have a value of P(beta-sheet) > 100. That region is declared as a beta-sheet. Extend the sheet in both directions until a set of four contiguous residues that have an average P(beta-sheet) 105 and the average P(beta-sheet) > P(alpha- helix) for that region. Any region containing overlapping alpha-helical and beta-sheet assignments are taken to be helical if the average P(alpha-helix) > P(beta-sheet) for that region. It is a beta-sheet if the average P(beta-sheet) > P(alpha-helix) for that region. To identify a bend at residue number j, calculate the following value p(t) = f(j)f(j+1)f(j+2)f(j+3) where the f(j+1) value for the j+1 residue is used, the f(j+2) value for the j+2 residue is used and the f(j+3) value for the j+3 residue is used. If: (1) p(t) > 0.000075; (2) the average value for P(turn) > 1.00 in the tetrapeptide; and (3) the averages for the tetrapeptide obey the inequality P(alpha-helix) P(beta-sheet), then a beta-turn is predicted at that location.

GOR (Garnier, Osguthorpe-Robeson) developed by Garnier, Osguthorpe& Robson sliding window of 17 underpredicts  -strand regions GOR III method accuracy ~64%

GOR (Garnier, Osguthorpe-Robeson) Position-dependent propensities for helix, sheet or turn has been calculated for all residue types. For each position j in the sequence, eight residues on both sides of the actual position are considered. A helix propensity table contains info. about propensity for certain residues at 17 positions when the conformation of residue j is helical. The helix propensity tables have 20 x 17 entries. The predicted state of aa j is calculated as the sum of the position-dependent propensities of all residues around aa j.

Psi-BLAST Predict Secondary Structure (PSIPRED) Three stages: 1) Generation of sequence profile 2) Prediction of initial secondary structure 3) Filtering of predicted structure

PSIPRED Uses multiple aligned sequences for prediction. Uses training set of folds with known structure. Uses a two-stage neural network to predict structure based on position specific scoring matrices generated by PSI-BLAST (Jones, 1999) First network converts a window of 15 aa’s into a raw score of h,e (sheet), c (coil) or terminus Second network filters the first output. For example, an output of hhhhehhhh might be converted to hhhhhhhhh. Can obtain a Q 3 value of 70-78% (may be the highest achievable)

Neural networks Computer neural networks are based on simulation of adaptive learning in networks of real neurons. Neurons connect to each other via synaptic junctions which are either stimulatory or inhibitory. Adaptive learning involves the formation or suppression of the right combinations of stimulatory and inhibitory synapses so that a set of inputs produce an appropriate output.

Neural Networks (cont. 1) The computer version of the neural network involves identification of a set of inputs - amino acids in the sequence, which transmit through a network of connections. At each layer, inputs are numerically weighted and the combined result passed to the next layer. Ultimately a final output, a decision, helix, sheet or coil, is produced.

Neural Networks (cont. 2) 90% of training set was used (known structures) 10% was used to evaluate the performance of the neural network during the training session.

Neural Networks (cont. 3) During the training phase, selected sets of proteins of known structure are scanned, and if the decisions are incorrect, the input weightings are adjusted by the software to produce the desired result. Training runs are repeated until the success rate is maximized. Careful selection of the training set is an important aspect of this technique. The set must contain as wide a range of different fold types as possible without duplications of structural types that may bias the decisions.

Neural Networks (cont. 4) An additional component of the PSIPRED procedures involves sequence alignment with similar proteins. The rationale is that some amino acids positions in a sequence contribute more to the final structure than others. (This has been demonstrated by systematic mutation experiments in which each consecutive position in a sequence is substituted by a spectrum of amino acids. Some positions are remarkably tolerant of substitution, while others have unique requirements.) To predict secondary structure accurately, one should place less weight on the tolerant positions, which clearly contribute little to the structure One must also put more weight on the intolerant positions.

15 groups of 21 units (1 unit for each aa plus one specifying the end) Row specifies aa position three outputs are helix, strand or coil Filtering network Provides info on tolerant or intolerant positions

Example of Output from PSIPRED

Workshop http://bioinf.cs.ucl.ac.uk/psipred/psiform.html

Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.

Similar presentations

Presentation on theme: "Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.

Similar presentations

Presentation on theme: "Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics."— Presentation transcript:

Similar presentations

About project

Feedback