Structure Prediction dmitra 11/18/2018.

Structure Prediction dmitra 11/18/2018

Methods Ab initio Heuristics Machine learning Homology modeling
Threading 11/18/2018

RNA Structure Prediction: Ab-initio
Sequence over {A, C, G, U} Complementary pairs attract, form base-pairs or minimizes energy We are not interested in overall energy of the sequence, just the process of minimization Just the linear sequence, zero base pairs, energy=0 Physics is embedded within “free-energy” parameter/function Minimization of energy is objective 11/18/2018

RNA Structure Prediction: Knot-free
Knot-free assumption Knot: base pairs (I, j) and (k, l) where I<j<k<l Knot-free causes planar graph, and makes DP algorithm feasible Base pairs are disjoint or embed in each other 11/18/2018

RNA Structure Prediction: Principle of optimality
Assumption 1: Base-pairing do not affect each other’s energy Now one can add energy minimization by all base pairs in a string and check which configuration produces lowest energy Combinatorics is exponential Need further assumption 11/18/2018

RNA Structure Prediction: DP Algorithm
Assume energy for each component can be calculated independently a(r,k): free energy for base pair (r,k), where r, k from ACGU a is zero for self-pairing (impossible) 11/18/2018

RNA Structure Prediction: DP Algorithm
E(Sij)= min{ E(SI+1,j-1 ) + a(ri,rj), when i,j pairs, Min{E(SI,k-1) + E(Sk+1,j )}, when j pairs with k, I<k=<j} Compute (n x n) matrix for I and j, bottom up, for I-j=0, I-j=1, I-j=2,… Complexity: O(n^3) 11/18/2018

RNA Structure Prediction: relax assumptions
Consider some special energy functions, other than just the base pairing ones a(r,k) This means: different “types” of base pairings Some more practical topology 11/18/2018

RNA Structure Prediction: Loops
Say, base pair at (I,j) and I<u<v<w<j v is accessible from base pair (I,j) if there is no base pair at (u,v) Loop is the bases accessible from base pair (I,j) Note, still no knot Some loops: p249 11/18/2018

RNA Structure Prediction: Energy over loops
Say, (I,j) base pair closes a loop Si+1,j-1 may not have the minimum energy configuration Because energy of Si+1,j-1 plus free energy of a(ri,rj) may be less than min-energy configuration of string (I+1 to j-1) without base pairing at (I,j) This interactive-ness was ignored at the previous assumption level Dynamic Programming can still be done, if we explicitly specify energy parameters 11/18/2018

RNA Structure Prediction: Energy over loops
E(Sij)= min{ E(SI+1,j ), I is not paired E(SI+1,j-1 ), j is not paired min{E(S,i,k-1) + E(Sk+1,j )}, when i or j pairs with k, i<k<j}, E(LI,j ), when (I,j) base pairs and all special structures may appear within [embeds first formula of previous assumption] } 11/18/2018

RNA Structure Prediction: More assumptions
Disregard free energies that do not belong to any loops Added energy of only components is the final energy of the string: no interaction between components Only 4 types of loops’ as in p249 for E(LI,j ), (can add more, if you know their energy parameterization) 11/18/2018

RNA Structure Prediction: free energies for 4 loops
Hairpin loop of size k: Zi(k) Additional stabilizing energy for two adjacent base pairs(in addition to a(r,k)): eta, constant Destabilizing energy for bulge of size k: beta(k) Destabilizing energy for interior loop of size k: gamma(k) 11/18/2018

RNA Structure Prediction: E(LI,j )
Hairpin: a(ri,rj) + zi(j-I+1) Stacked-pair: a(ri,rj)+eta+E(Si+1,j-1) Bulge on i: min{a(ri,rj)+beta(k)+ E(Si+k+1,j-1), k>=1 Bulge on j: min{a(ri,rj)+beta(k)+ E(Si+1,j-k-1), k>=1 Interior loop: min{a(ri,rj)+gamma(k1+k2)+ E(Si+k1+1,j-k2-1), k1,k2>=1 11/18/2018

RNA Structure Prediction: complexity
O(n^2) table entries On each entry: First 2 formulae: O(1) leading to O(n^2) Third formula: O(n) :: O(n^3) 4.1 (E(L) hairpin): O(1) :: O(n^2) 4.2: O(1) :: O(n^2) 4.3: O(n), run on k :: O(n^3) 4.4: O(n), run on k :: O(n^3) 4.5: O(n^2), run on k1, k2 :: O(n^4) Final complexity from 4.4: O(n^4) 11/18/2018

Protein Threading Interactions in proteins are between 20x20 residues, as opposed to 4x4 NA’a at most in RNA’s Residue interactions are quite non-local, causing much more structural complexity Proteins have frequent loops (helices are loops) So, prediction by Ab initio is extremely difficult 11/18/2018

Protein Threading Number of protein folds are few (~1,000 for 20,000+ proteins) Threading: map the target sequence over a template fold Threading is an alignment problem, Torda, Fig1 Find the fold to which target “aligns” optimally (minimum “energy” function) Needs basic scoring functions as in sequence alignment 11/18/2018

Protein Threading: number of folds
More the number of folds in database: more time to find correct template Scoring function for threading is quite imperfect: need more available templates (contradictory requirements) 11/18/2018

Protein Threading: Scoring functions
Full force field is not necessarily ideal: it involves dynamics between molecules, stretch, torsion, etc. Unimportant for a static alignment 11/18/2018

Scoring function could be between residues from the same sequence: for coming close to each other on the alignment Torda, Fig 5 Example scoring function (free energy): For pair of residues A and B to be at distance r (Torda, p7): G(AB) = kT ln(rho-rAB / rho-0-rAB), rho-rAB is probability of AB to be at distance r, rho-0 is probability of random occurrence of that (k,T usual) 11/18/2018

Probabilities are collected from PDB proteins with known structure Different threading scheme uses different scoring functions, but mostly they are derived from PDB 11/18/2018

Example (Setubal-Meidanis, p257): G1(I, ti) for placing i-th residue in sequence to the ti position in the fold G2(I, j, ti, tj) simultaneous placements of i, j, for I<j Constrained to be within a range, say bi<ti<ei 11/18/2018

Protein Threading Optimization is not only on placement, but also on multiple folds in database Accuracy is very sensitive to alignment errors 11/18/2018

Protein Threading: Dynamic programming
Advantage/disadvantage of DP is that it is deterministic Problem: “adjacency” is hard to define in 3D 11/18/2018

Protein Threading: Dynamic programming
DP: try out different combination of “adjacent” residues on different parts of a template (Torda, Fig 5c: adjacent comes from template sequence) Start with smaller number of elements and build up to the full sequence Alternative approach: start with placing each residue to one of its “possible” positions and see where next residue should go: continue residue by residue 11/18/2018

Protein Threading: Probabilistic algorithm
Monte Carlo simulation: randomly throw residues at positions on fold and check aggregate scoring function Simulated annealing: gradually move residues to optimize, stochastically making random shifts to avoid local optimum Time consuming, & the result is non-deterministic 11/18/2018

Protein Threading: Branch and bound
In the worst case try all possible alignments, but prune the search space for non-useful branches using some bounding function 11/18/2018

Protein Threading: Search on folds
Divide and conquer over the space of folds Assumption: folds can be ordered for their “goodness” for the target protein Example: Setubal-Meidanis, p258 11/18/2018

Protein Threading: Future
Slow Subsumed by Ab intio of IBM Blue Gene™ type projects De Novo technique using linear programming (Xu and Li, 2003) Threading techniques are not only useful for structure prediction but for fold recognition problem also: no alignment, just find the template (fold suggests function) 11/18/2018

Structure Prediction dmitra 11/18/2018.

Similar presentations

Presentation on theme: "Structure Prediction dmitra 11/18/2018."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Structure Prediction dmitra 11/18/2018.

Similar presentations

Presentation on theme: "Structure Prediction dmitra 11/18/2018."— Presentation transcript:

Similar presentations

About project

Feedback