Download presentation
Presentation is loading. Please wait.
Published byMelvin Cole Modified over 9 years ago
1
Protein Design CS273: Final Project Charles Kou charlesk@stanford.edu Crystal structure of top7 – A novel protein structure created with RosettaDesign. http://rosettadesign.med.unc.edu/
2
What is Protein Design Opposite of structure prediction: determine low energy sequence that yield given structure. Computationally difficult: Search space of 20^n where n = sequence length (20 amino acids) Major algorithms: Dead-end elimination, genetic algorithms, Monte Carlo, Branch & Bound. http://www.stanford.edu/class/cs273/project/project.html
3
Major Algorithms Trade off between thoroughness and computational speed. Monte Carlo / Genetic Algorithm: Can sample space with infinite number of solutions Sidechain identity, side chain orientation and backbone structure can be varied continuously. No guarantee of reaching global energy minimum. Dead-End Elimination Allows only discrete conformations. Rejection criteria is used to prune the search space. Desjarlais JR, Clarke ND. Computer search algorithms in protein modification and design. Curr Opin Struct Biol. 1998 Aug;8(4):471-5. PubMedPubMed
4
Review: Energy Landscape q1q1 qiqi q2q2 qjqj q N-1 qNqN defined over large dimensional conformation space JC Lantombe, Energy2.ppt
5
Review: Example Energy Function E = bonded terms + non-bonded terms + solvation terms E = (ES + EQ + ES-B + ETor) + (EvdW + Edipole) Bonded terms - Relatively few Non-bonded terms - Depend on distances between pairs of atoms - O(n 2 ) Expensive to compute Solvation terms - May require computing molecular surface JC Lantombe, Energy2.ppt
6
Random walk through conformation space At each cycle: –Perturb current conformation at random –Accept step with probability: (Metropolis acceptance criterion) The conformations generated by an arbitrarily long MCS are Boltzman distributed, i.e., #conformations in V ~ Review: Monte Carlo Simulation (MCS) JC Lantombe, Energy2.ppt
7
Monte Carlo Simulation Tend to waste time in local min. May consist of millions of steps. Energy must be evaluated frequently (computationally expensive). Use ChainTree to improve performance. Koehl, P and Levitt, M. De novo protein design. I. In search of stability and specificity. Journal of Molecular Biology, 293, 1161-1181 (1999). Lotan, I., Schwarzer, F., Halperin, D., Latombe, J.C.: Efficient maintenance and self-collision testing for kinematic chains. In: Symposium on Computational Geometry (2002) 43–52
8
Genetic Algorithm Starts with First generation pool. Iteratively apply genetic operators (selection, recombination, mutation). Evloves toward better solution (low energy function). S. M. Larson, J. England, J. DesJarlais, and V. S. Pande. Thoroughly sampling sequence space: large-scale protein design of structural ensembles. Protein Science 11 2804-281 (2002). Protein ScienceProtein Science
9
Selection Selection function takes into account the value of fitness function. This gives priority to the “fit” organism but also gives chance for “less fit” organisms. http://en.wikipedia.org/wiki/Genetic_algorithm
10
Selection Method Roulette Method: probability of selection is proportional to the value of fitness function Tournament: picks k individuals (tournament size), and choose the individual with probability p. Iterate with probability p*(1-p), then p*(p*(1-p)) … Higher k = less chance for weaker individual. http://en.wikipedia.org/wiki/Tournament_selection http://en.wikipedia.org/wiki/Roulette_wheel_selection
11
Recombination, Mutation Recombination: different segment of the structure which is optimized in parallel can be recombined into the same model. Recombination occurs with a set probability. Otherwise, the population is propogated to the next generation. Mutation: avoids local minima by mutating the child with a set probability. Similar to MC: there is no guarantee to converge into global minimum. http://en.wikipedia.org/wiki/Genetic_operator
12
Genome@home Genome@Home uses distributed computing and genetic algorithm. It also incorporates backbone flexibility using Monte Carlo (random perturbation with RMSD<1.0a) which improves the result. http://www.stanford.edu/group/pandegroup/genome/
13
Dead-end Elimination Discrete conformational search. Functionally equivalent to exhustive search. It uses rejection criteria to prune the search space. The robustness depends on the discreteness and the rejection criteria used. Guaranteed convergence to global min. Initially used for sidechain placement. More difficult for protein design because of high degrees of freedom. Looger LL, Hellinga HW. Generalized dead-end elimination algorithms make large-scale protein side-chain structure prediction tractable: implications for protein design and structural genomics. J Mol Biol. 2001 Mar 16;307(1):429-45. PubMedPubMed
14
Energy of conformation Reformulation of sidechain placement problem: Amino acid identity is used instead of rotamer. The general DEE allows residue up to 300. Energy of conformation is defined as sum of interaction among side chains and sum of interaction of sidechain and the backbone. Rejection criteria is used and iterated until no more rotamers can be eliminated. Convergence occurs, or reduces the problem sufficiently for exhaustive serach.
15
DEE filter: Rejection Criteria Simple Criterion: If lowest energy struct that can be found using a given sidechain rotamer (low energy side chain conformation) is higer than the highest energy struct w/ different rotamer, the first rotamer is eliminated.
16
DEE filter: Rejection Criteria Goldstein Criteria: if energy struct containing one rotamer is always lowered by changing to a second one, the first one is eliminated.
17
DEE filter: Rejection Criteria Generalized Criterion: residues are added in group, eliminated clusters of rotamers in the groups maybe excluded from the minimum operator, in addition to those which form dead-end clusters with c.
18
Mean Field Theory Reduce search space. Self-consistency is sought by placing amino acids at pre-selected positions in a given structure. Energy function is minimized by mean field. Voigt CA, Mayo SL, Arnold FH, Wang ZG. Computational method to reduce the search space for directed protein evolution. Proc Natl Acad Sci U S A. 2001 Mar 27;98(7):3778-83. PNASPNAS
19
Review: Branch & Bound Set of solutions can be partitioned into subsets (branch) Upper limit on a subset’s solution can be computed fast (bound) Branch & Bound 1.Select subset with best possible bound 2.Subdivide it, and compute a bound for each subset S.Batzoglou, Threading2.ppt
20
Rosetta Design Initial backbone designed without regard to side-chain packing. Iterates between sequence design and backbone optimization using Monte Carlo. Perturbation in random change in the torsional angles of 1-5 random residue, or substitution of backbone torsonal angles of 1- 3 consecutive residues with torsional angles from a structure in the PDB. Sidechain optimization. Accept/reject using Metropolis criterion. 1.17-a backbone atom RMSD between model and structure. Kuhlman B, Dantas G, Ireton GC, Varani G, Stoddard BL, Baker D. Design of a novel globular protein fold with atomic-level accuracy. Science. 2003 Nov 21;302(5649):1364-8. PubMedPubMed Crystal structure of top7 – A novel protein structure created with RosettaDesign. http://rosettadesign.med.unc.edu/
21
Using Rosetta Design Red: PDB 1A1M: Mhc Class I Molecule B*5301 Complexed With Peptide Typdinqml From Gag Protein Of Hiv2 Blue: Rosetta Stone Designed Visualized with Deep View / Swiss- PdbViewer. http://www.rcsb.org/pdb/cgi/explore.cgi?pid=195321117535569&pdbId=1A1M http://us.expasy.org/spdbv/
22
b.e.a.n.s. A simple openGL based program was developed to test monte carol and genetic algorithms on designing “chain of jelly beans.” User is able to vary the initial structure of the “beans” and compare the efficiency of the algorithms via built-in timer.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.