Presentation is loading. Please wait.

Presentation is loading. Please wait.

2. Introduction to Rosetta and structural modeling Approaches for structural modeling of proteins The Rosetta framework and its prediction modes Cartesian.

Similar presentations


Presentation on theme: "2. Introduction to Rosetta and structural modeling Approaches for structural modeling of proteins The Rosetta framework and its prediction modes Cartesian."— Presentation transcript:

1 2. Introduction to Rosetta and structural modeling Approaches for structural modeling of proteins The Rosetta framework and its prediction modes Cartesian and polar coordinates Sampling (finding the structure) and scoring (selecting the structure)

2 Structural Modeling of Proteins - Approaches

3 Prediction of Structure from Sequence Flowchart Comparison of query sequence to nr database Similar to a sequence of known structure? Homology Modeling (Comparative Modeling) No Fold Recognition (Threading) Fits a known fold? Yes Ab initio prediction No Protocols: ab initio, loops, side chains, active sites….

4 The Rosetta framework and its prediction modes

5 The Rosetta Strategy Observation: local sequence preferences bias, but do not uniquely define the local structure of a protein Goal: mimic interplay of local and global interactions that determine protein structure

6 The Rosetta Strategy Local interactions: fragments Derived from known structures Sampled for similar sequences/secondary structure propensity Fragment library represents accessible local structures for short sequence

7 The Rosetta Strategy Global (non-local) interactions: scoring function Buried hydrophobic residues, paired  strands, specific side chain interactions, etc. Derived from known structures (statistics on preferred conformations) Boltzmann’s principle relates frequency to energy

8 A short history of Rosetta In the beginning: ab initio modeling of protein structure starting from sequence  Short fragments of known proteins are assembled by a Monte Carlo strategy to yield native-like protein conformations Reliable fold identification for short proteins. Recently improved to high-resolution models (within 2A RMSD) ATCSFFGRKLL…..

9 A short history of Rosetta Success of ab initio protocol lead to extension to  Protein design  Design of new fold: TOP7  Protein loop modeling; homology modeling  Protein-protein docking; protein interface design  Protein-ligand docking  Protein-DNA interactions; RNA modeling  Many more, e.g. solving the phase problem in Xray crystallography ATCSFFGRKLL…..

10 More recent additions Boinc (Rosetta@home) FoldIt Rosettascripts; RosettaDiagrams PyRosetta

11 Scoring and Sampling

12 The basic assumption in structure prediction Native structure located in global minimum (free) energy conformation (GMEC) ➜ A good Energy function can select the correct model among decoys ➜ A good sampling technique can find the GMEC in the rugged landscape E E GMEC Conformation space

13 Two-Step Procedure 1.Low-resolution step locates potential minima (fast) 2.Cluster analysis identifies broadest basins in landscape 3.High-resolution step can identify lowest energy minimum in the basins (slow) GMEC E E Conformation space

14 Nature uses one scoring function…  Aim: one generic function for different applications Optimization of parameters:  Originally from small molecules (experiments & quantum mechanical calculations)  Today: use of protein structures solved at high- accuracy How are scoring terms optimized? Benchmarks:  Discriminate ground state from alternative conformations  Identify correct side chain conformation  Predict effect of stability of point mutations (  G) Leaver-Fay, …, & Baker (2013) Methods in Enzymology 523:109

15 Structure Representation: Equilibrium bonds and angles (Engh & Huber 1991) Centroid: average location of center of mass of side- chain (Centroid | aa, ,  ) No modeling of side chains Fast Low-Resolution Step (e.g. score4)

16 Bayes Theorem: Independent components prevent over-counting P(str | seq) = P(str)*P(seq|str) / P(seq) Low-Resolution Scoring Function constant sequence- dependent features sequence- dependent features structure dependent features structure dependent features

17 Bayes Theorem: P(seq | str) P(str | seq) = P(str) * P(seq | str) / P(seq) Score = S env + S pair + … neighbors: C  -C  <10Ǻ Sequence-Dependent Components Rohl et al. (2004) Methods in Enzymology 383:66 Origin: Simons et al., JMB 1997; Simons et al., Proteins 1999

18 P(str) P(str | seq) = P(str) * P(seq | str) / P(seq) Score = … + Sr g + Sc  + S vdw + … Structure-Dependent Components

19 P(str) P(str | seq) = P(str) * P(seq | str) / P(seq) Score = … + S rama ….+…..+ 10 Structure-Dependent Components

20 Slow, exact step Locates global energy minimum Structure Representation: All-atom (including polar and non-polar hydrogens, but no water) Side chains as rotamers from backbone-dependent library Side chain conformation adjusted frequently e.g. score12; Talaris; … High-Resolution Step Dunbrack 1997

21 Side chains have preferred conformations They are summarized in rotamer libraries Select one rotamer for each position Best conformation: lowest-energy combination of rotamers High-Resolution Step: Rotamer Libraries Serine  1 preferences t=180 o g - =-60 o g + =+60 o

22 High-Resolution Scoring Function Major contributions: – Burial of hydrophobic groups away from water – Void-free packing of buried groups and atoms – Buried polar atoms form intra-molecular hydrogen bonds

23 Packing interactions Score = S LJ(atr + rep) + …. r ij Linearized repulsive part e: well depth from CHARMm19 High-Resolution Scoring Function (new in score12’: starts from minimum)

24 Implicit solvation Score = … + S solvation + …. Lazaridis & Karplus, Proteins 1999 solvation free energy density of i polar High-Resolution Scoring Function x ij =(r ij - R i )/ i x ij 2 x ji 2

25 polar Solvation energy x ij 2 x ji 2 solvation free energy density of i Excluded volume implicit solvation model: Penalizes buried polars Solvation free energy density is assumed to be approximated by a Gaussian distribution f i (r)4  r 2 =  i exp (-x i 2 ) x i = (r – R i )/ i i = 3.5A (6.0A for de-ionized groups) correlation length (width of first, or 2 first solvation shells)  i = 2 *  G i free /(sqrt  i ) proportionality coefficient

26 Hydrogen Bonding Energy Based on statistics from high-resolution structures in the PDB (Kortemme, Morozov & Baker 2003 JMB) Slide from Jeff Gray ] Score = …. + S hb(srbb+lrbb+sc) + …. sr bb : short range, backbone HB lr bb : long range, backbone HB sc: HB with side chain atom

27 Rotamer preference Score = … + S dunbrack + …. Dunbrack, 1997 High-Resolution Scoring Function

28 One long, generic function …. Score = S env + S pair + Sr g + Sc  + S vdw + S ss + S sheet + S hs + S rama + S hb (srbb + lrbb) + docking_score + S disulf_cent + S r  + S co + S contact_prediction + S dipolar + S projection + S pc + S tether + S  + S  + S symmetry + S splicemsd + ….. docking_score = S d env + S d pair + S d contact + S d vdw + S d site constr + S d + S fab score Score = S LJ(atr + rep) + S solvation + S hb(srbb+lrbb+sc) + S dunbrack + S pair – S ref + S prob1b + S intrares + S gb_elec + S gsolt + S h2o (solv + hb) + S _plane Scoring Function: Summary

29 One long, generic function …. A weighted sum of different terms Score12 = w1*S LJatr + w2*S LJrep + w3*S solvation + w4*S hb(srbb+lrbb+sc) + w5*S dunbrack + w6*S pair – S ref Scoring Function: Summary Leaver-Fay, …, & Baker (2013) Methods in Enzymology 523:109 How can it be improved ? Feature Analysis Tool : improve parameters OptE : optimize weights How can it be improved ? Feature Analysis Tool : improve parameters OptE : optimize weights

30 Feature Analysis : improve scoring term Leaver-Fay, …, & Baker (2013) Methods in Enzymology 523:109 Aim: similar distributions in crystal structures and models e.g. HB distance H- O   in Ser & Thr

31 Feature Analysis : improve scoring term Leaver-Fay, …, & Baker (2013) Methods in Enzymology 523:109 Aim: similar distributions in crystal structures and models e.g. HB distance H- O   in Ser & Thr After correction: distribution in native & model structures overlap

32 Score12 = w1*S LJatr + w2*S LJrep + w3*S solvation + w4*S hb(srbb+lrbb+sc) + w5*S dunbrack + w6*S pair – S ref OptE : optimize weights Leaver-Fay, …, & Baker (2013) Methods in Enzymology 523:109 Maximum Likelihood Parameter Estimation Benchmarks:  Discriminate ground state from alternative conformations  Identify correct side chain conformation  Sequence recovery in design: choose correct amino acid residue  Predict effect of stability of point mutations (  G) & more … Aim: Best score for correct prediction

33 Representations of protein structure: Cartesian and polar coordinates Position PHI PSI OMEGA CHI1 CHI2 CHI3 CHI4 1 0.00 -60.00 -180.00 -60.00 0.00 0.00 0.00 2 3 …. … PDB x y z ATOM 490 N GLN A 31 52.013 -87.359 -8.797 1.00 7.06 N ATOM 491 CA GLN A 31 52.134 -87.762 -10.201 1.00 8.67 C ATOM 492 C GLN A 31 51.726 -89.222 -10.343 1.00 10.90 C ATOM 493 O GLN A 31 51.015 -89.601 -11.275 1.00 9.63 O ….. ….

34 2 ways to represent the protein structure Cartesian coordinates (x,y,z; pdb format)  Intuitive – look at molecules in space  Easy calculation of energy score (based on atom- atom distances) – Difficult to change conformation of structure (while keeping bond length and bond angle unchanged) Polar coordinates (  equilibrium angles and bond lengths)  Compact (3 values/residue)  Easy changes of protein structure (turn around one or more dihedral angles) – Non-intuitive – Difficult to evaluate energy score (calculation of neighboring matrix complicated)

35 A snake in the 2D world Cartesian representation: points: (0,0),(1,1),(1,2),(2,2),(3,3) connections (predefined): 1-2,2-3,3-4,4-5 x y (0,0) (1,1) (1,2) (2,2) (3,3) 1-2 2-3 3-4 4-5 1 1 2 2 3 3 4 4 5 5

36 A snake in the 2D world Internal coordinates: bond lengths (predefined): √2,1,1,√2 angles: 45 0,90 o,0 o,45 o x y √2 1 1 1 1 x y 45 o 90 o From wikipedia

37 A snake wiggling in the 2D world Constraint: keep bond length fixed Move in Cartesian representation (0,0),(1,1),(1,2),(2,2),(3,3)  (0,0),(1,1),(1,2),(2,2),(3,0) Bond length changed! x y √2 √3

38 A snake wiggling in the 2D world Constraint: keep bond length fixed Move in polar coordinates 45 0,90 o,0 o,45 o  45 0,90 o,45 o,45 o Bond length unchanged! Large impact on structure x y

39 Polar  Cartesian coordinates Convert r and  to x and y (0,0),(1,1),(1,2),(2,2),(3,3) 45 0,90 o,0 o,45 o √2,1,1,√2 x y From wikipedia

40 Cartesian  polar coordinates Convert x and y to r and  (0,0),(1,1),(1,2),(2,2),(3,3) 45 0,90 o,0 o,45 o √2,1,1,√2 x y

41 Moving the snake to the 3D world x y Cartesian representation: points: additional z-axis (0,0,0),(1,1,0),(1,2,0),(2,2,0),(3,3,0) connections (predefined): 1-2,2-3,3-4,4-5 Internal coordinates: bond lengths (predefined): √2,1,1,√2 angles: 45 0,90 o,0 o,45 o dihedral angles: 180 0,180 o z Proteins: bond lengths and angles fixed. Only dihedral angles are varied

42 Dihedral angles Dihedral angles  1 -  4 define side chain From wikipedia Dihedral angle: defines geometry of 4 consecutive atoms (given bond lengths and angles)

43 What we learned from our snake x y Cartesian representation: Easy to look at, difficult to move – Moves do not preserve bond length (and angles in 3D) Internal coordinates: Easy to move, difficult to see – calculation of distances between points not trivial z Proteins: bond lengths and angles fixed. Only dihedral angles are varied

44 Solution: toggle CALCULATE ENERGY - Cartesian coordinates: Derive distance matrix (neighbor list) for energy score calculation CALCULATE ENERGY - Cartesian coordinates: Derive distance matrix (neighbor list) for energy score calculation Transform: build positions in space according to dihedral angles PDB x y z ATOM 490 N GLN A 31 52.013 -87.359 -8.797 1.00 7.06 N ATOM 491 CA GLN A 31 52.134 -87.762 -10.201 1.00 8.67 C ATOM 492 C GLN A 31 51.726 -89.222 -10.343 1.00 10.90 C ATOM 493 O GLN A 31 51.015 -89.601 -11.275 1.00 9.63 O ….. …. MOVE STRUCTURE - Polar coordinates: introduce changes in structure by rotating around dihedral angle(s) (change  values) MOVE STRUCTURE - Polar coordinates: introduce changes in structure by rotating around dihedral angle(s) (change  values) Position PHI PSI OMEGA CHI1 CHI2 CHI3 CHI4 1 0.00 -60.00 -180.00 -60.00 0.00 0.00 0.00 2 3 …. … Transform: calculate dihedral angles from coordinates (0,0),(1,1),(1,2),(2,2),(3,3)45 0,90 o,0 o,45 o

45 Cartesian  polar coordinates Position PHI PSI OMEGA CHI1 CHI2 CHI3 CHI4 ….. 32 -59.00 -60.00 -180.00 0.00 0.00 0.00 0.00 33 34 …. … PDB x y z … ATOM 490 C GLN A 31 52.013 -87.359 -8.797 1.00 7.06 N ATOM 491 N GLY A 32 52.134 -87.762 -10.201 1.00 8.67 C ATOM 492 CA GLY A 32 51.726 -89.222 -10.343 1.00 10.90 C ATOM 493 O GLY A 32 51.015 -89.601 -11.275 1.00 9.63 O ….. …. How to calculate polar from Cartesian coordinates: example  : C’-N-Ca-C – define plane perpendicular to N-Ca (b 2 ) vector – calculate projection of Ca-C (b 3 ) and C’-N (b 1 ) onto plane – calculate angle between projections (0,0),(1,1),(1,2),(2,2),(3,3)45 0,90 o,0 o,45 o

46 Polar  Cartesian coordinates Position PHI PSI OMEGA CHI1 CHI2 CHI3 CHI4 ….. 32 -59.00 -60.00 -180.00 0.00 0.00 0.00 0.00 33 34 …. … PDB x y z … ATOM 490 C GLN A 31 52.013 -87.359 -8.797 1.00 7.06 N ATOM 491 N GLY A 32 52.134 -87.762 -10.201 1.00 8.67 C ATOM 492 CA GLY A 32 51.726 -89.222 -10.343 1.00 10.90 C ATOM 493 O GLY A 32 51.015 -89.601 -11.275 1.00 9.63 O ….. …. Find x,y,z coordinates of C, based on atom positions of C’, N and Ca, and a given  value (  : C’-N-Ca-C) create Ca-C vector: – size Ca-C=1.51A (equilibrium bond length) – angle N-Ca-C= 111 o (equilibrium value for N-Ca-C angle) rotate vector around N-Ca axis to obtain projections of Ca-C and N-C’ with wanted  (0,0),(1,1),(1,2),(2,2),(3,3) 45 0,90 o,0 o,45 o

47 Representation of protein structure 43128756 Rosetta folding 3 backbone dihedral angles per residue Sampling and minimization in TORSIONAL space: change angle and rebuild, starting from changed angle Build coordinates of structure starting from first atom, according to dihedral angles (and equilibrium bond length and angle) 43128756 87 Based on slides by Chu Wang

48 Representation of protein structure 43128756 431287564’4’3’3’1’1’2’2’8’8’7’7’5’5’6’6’ Backbone dihedral angles fixed (rigid-body) Rosetta folding 3 backbone dihedral angles per residue Rosetta docking 6 rigid-body DOFs -- 3 translational vectors 3 rotational angles Sampling and minimization in TORSIONAL space Sampling and minimization in RIGID-BODY space How can those two types of degrees of freedom be combined?

49 Fold tree representation “long-range” edge – 6 rigid-body DOFs 4’4’3’3’1’1’2’2’8’8’7’7’5’5’6’6’ “peptide” edge – 3 backbone dihedral angles 43128756 Example: fold-tree based docking  Originally developed to improve sampling of strand registers in  -sheet proteins.  Allows simultaneous optimization of rigid-body and backbone/sidechain torsional degrees of freedom. Fold tree: Bradley and Baker, Proteins (2006) 4’4’3’3’1’1’2’2’8’8’7’7’5’5’6’6’  Construct fold-trees to treat a variety of protein folding and docking problems.

50 Fold-trees for different modeling tasks protein folding NC N: N-terminal; C: C-terminal; X: chain break; O: root of the tree; Flexible “peptide” edgerigid “peptide” edge 11’1’ rigid “jump” 11’1’ flexible “jump” Color – flexible bb Gray – fixed bb

51 Fold-trees for different modeling tasks N11’1’C22’2’xx loop modeling N: N-terminal; C: C-terminal; X: chain break; O: root of the tree; Flexible “peptide” edgerigid “peptide” edge 11’1’ rigid “jump” 11’1’ flexible “jump” Color – flexible bb Gray – fixed bb

52 Fold-trees for different modeling tasks N1C N1’1’C fully flexible docking N: N-terminal; C: C-terminal; X: chain break; O: root of the tree; Flexible “peptide” edgerigid “peptide” edge 11’1’ rigid “jump” 11’1’ flexible “jump” N1C N1’1’C docking w/ hinge motion N1 N1’1’C 22’2’xC 3’3’3x docking w/ loop modeling Color – flexible bb Gray – fixed bb

53 Fold-trees for different modeling tasks Color – flexible bb Gray – fixed bb Pale – symmetry operation

54 Fold-trees for different modeling tasks Color – flexible bb Gray – fixed bb Filled colored circles - flexible sc

55 Fold-trees for different modeling tasks Color – flexible bb Gray – fixed bb Filled colored circles - flexible sc o empty colored circles – flexible amino acid: design

56 Fold-trees for different modeling tasks Color – flexible bb Gray – fixed bb Filled colored circles - flexible sc o empty colored circles – flexible amino acid: design

57 Rosetta3: Object-oriented architecture Color – flexible bb Gray – fixed bb Description of object-oriented organization in Rosetta3: Leaver-Fay et al. Methods in Enzymology (2013)

58 The Rosetta sampling strategy: A general overview 9 residue fragments 3 residue fragments Gradual addition of parameters to scoring function Quick quenching Fragment Sampling Strategies to keep fragment insertion/perturbation local Monte Carlo (MC) Sampling MC sampling with minimization Local optimization Repacking and refinement Side chain rearrangement


Download ppt "2. Introduction to Rosetta and structural modeling Approaches for structural modeling of proteins The Rosetta framework and its prediction modes Cartesian."

Similar presentations


Ads by Google