Presentation is loading. Please wait.

Presentation is loading. Please wait.

A too simple model for protein folding Ethan Bolker Mathematics and Computer Science UMass Boston Clark University April 14, 2004.

Similar presentations


Presentation on theme: "A too simple model for protein folding Ethan Bolker Mathematics and Computer Science UMass Boston Clark University April 14, 2004."— Presentation transcript:

1 A too simple model for protein folding Ethan Bolker Mathematics and Computer Science UMass Boston Clark University April 14, 2004

2 Preliminaries Problem source: biology teaching need, Analysis mixes biology, cs, mathematics (= applied mathematics) Ongoing help from Bogdan Calota See www.cs.umb.edu/~eb/foldingwww.cs.umb.edu/~eb/folding

3 How life works DNA (gene) makes RNA RNA makes polypeptide Polypeptide folds into protein Proteins interact (biochemistry) Cells … organisms … communities … Natural selection makes gene mix evolve

4 Virtual teaching laboratories For Brian White (Biology, UMass Boston) Virtual Genetics Laboratory (VGL) –Mendelian genetics –http://intro.bio.umb.edu/VGL/index.htmhttp://intro.bio.umb.edu/VGL/index.htm –Science, April 16, 2004 GenExplorer –the central dogma –www.cs.umb.edu/genex/www.cs.umb.edu/genex/ Watch this space …

5 Polypeptide  protein Polypeptide: sequence of amino acids chemical (biological) activity depends on three dimensional configuration (folding) Protein: polypeptide folded into active shape Given the sequence, what’s the shape? –Wet lab lots of chemistry x-ray crystallography (newer tools) –Virtual lab compute shape from chemical principles need supercomputer or grid

6 folding@home www.stanford.edu/group/pandegroup/folding/

7 For beginning biologists Problem: give students hands on experience showing how sequence determines shape Solution: very simple model –amino acid = disk in the plane, hydrophobic index hi expresses wish to avoid wet environment –fold polypeptide on hex grid to minimize energy energy = Σ (# exposed edges)  hi acids

8 folding@umb 51882 possible configurations (5279 modulo dihedral group symmetry) minimum energy -131.17 minimum occurs once topology 0: [2, 7] 1: [ ] 2: [0, 7] 3: [ ] 4: [ ] 5: [ ] 6: [ ] 7: [0, 2]

9 folding@umb 51882 possible configurations (5279 modulo dihedral group symmetry) minimum energy -13.161 minimum occurs twice (second - obvious - answer has same topology)

10 Brute force search Try all nonintersecting walks of length n on plane grid of hexagons: 1, 6, 30, 138, 618, 2730, 11946, 51882, 224130, 964134, 4133166, … Sequence # A001334 in the Online Encyclopedia of Integer Sequences www.research.att.com/~njas/sequences/ www.research.att.com/~njas/sequences/ No closed form expression Growth rate obviously O(5 n ), actual  4.25 n To count foldings, divide by 12 (symmetry)

11 A (random) chain of length 17 Five of the 11 minimum energy foldings All 11 show same 8 acid cool ring, hot core Essentially the same topology 12 hour computation

12 Open questions (statistical) How many minima? What is the energy distribution –for one polypeptide, over all foldings? –of minima, over all polypeptides of fixed length? Do all minima for a pp have same topology? (several possible definitions for topology) Do approximate minima have same topology? (several possible definitions for approximate)

13 Which amino acid universe? Random polypeptides – acids chosen hi uniformly distributed in [-1,1] hi = (1,-1) with probability (p, 1-p) from (Ala, Arg, …, Tyr, Val) with –measured hydrophobic indices –measured probabilities of occurrence the natural universe

14 Digression How do you interpolate visually between red and green? in RGB space, white is halfway in HSB space, yellow is halfway Application uses cubic interpolation to adjust contrast near the midpoint

15 Cubic interpolation // Map a range of hydrophobic indices h to a continuum of // colors between RED and GREEN in HSB space. // // First map h linearly to x between 0.0 and 1.1 so that we // can form convex combinations. To get better visual effect // replace x by // f(x) = ax^3 + bx^2 + cx // color(x) = f(x)*RED + (1-f(x))*GREEN // f(0) = 0 means color(0) = GREEN. Then find a, b and c so that // f(1) = 1, f(1/2) = 1/2 and f '(1/2) = k (to be determined). Then // color(1) = RED and color(1/2) = 1/2 (RED+GREEN) = YELLOW, // // When k = 1, f(x) = x is linear, not cubic (check the algebra). // That works well for the natural table. But for the virtual table it // provides too little contrast near the center. k= ½ flattens out the // cubic at its inflection point there and seems to be just about right.

16 Open questions (biological) Nature isn’t random: naturally occurring polypeptides are not a random selection from the natural universe Which shapes can occur as the minimum energy configurations of polypeptides? –which are beautiful? (polypeptide tangrams) –which are interesting? (designer drugs) (I like cool rings, Brian White likes hot cores)

17 Folding algorithms Conjecture: brute force is NP-complete Look for an approximate algorithm –polynomial time –close to true minimum with high probability –not stochastic Conjecture: no local algorithm will do

18 Incremental Folding int lookahead int step ≤ lookahead while there are acids to place explore all positions for the next lookahead acids that minimize the energy of configuration so far place the first step of those lookahead acids

19 Incremental Folding lookahead = step = 1 is greedy lookahead = step = n is brute force time = O(  4.x lookahead ) linear in n, but exponential in lookahead n step

20 50 acids, randomly chosen from natural universe seed 2255 minimum energy -352.38 lookahead 8, step 1 time 139 seconds

21 50 acids, randomly chosen from natural universe seed 2255 minimum energy -338.42 lookahead 8, step 4 time 29 seconds

22 50 acids, randomly chosen from natural universe seed 2255 minimum energy -351.54 lookahead 8, step 5 time 27 seconds

23 50 acids, randomly chosen from natural universe seed 2255 minimum energy -343.98 lookahead 8, step 7 time 15 seconds

24 brute force folding for one random chain of length 17

25 incremental: step sensitivity brute force

26 incremental: lookahead sensitivity 56 1010 9 87 1 1313 1414 1212 brute force

27 Topology highly sensitive to step Energy not monotone with step or lookahead Can always be fooled May be realistic biologically Suffices for teaching goal Incremental Folding ● ● ●

28 More geometry Square grid folding is faster: O(2.x lookahead ) instead of O(4.x lookahead ) But not nearly as pretty

29 Folding in space Cubic grid has same folding complexity as hex grid in plane since each cell has six neighbors 3D analogue of hex grid is spherical close packing –oranges at the market –layers of hexagonally close packed planes –cell is a rhombic dodecahedron –each sphere has 12 neighbors –folding complexity O(10.x n )

30 Packing spheres

31 H. Steinhaus Mathematical Snapshots

32 Foldings in space energy 37.8 time 18 seconds explored 752057 chains energy 15.6 time 0 seconds explored 8185 chains

33 Summary The customer is satisfied You can play with the applet The software needs work All the interesting questions are still open


Download ppt "A too simple model for protein folding Ethan Bolker Mathematics and Computer Science UMass Boston Clark University April 14, 2004."

Similar presentations


Ads by Google