Modelling genome structure and function Ram Samudrala University of Washington.

Modelling genome structure and function Ram Samudrala University of Washington

Rationale for understanding protein structure and function Protein sequence -large numbers of sequences, including whole genomes Protein function - rational drug design and treatment of disease - protein and genetic engineering - build networks to model cellular pathways - study organismal function and evolution ? structure determination structure prediction homology rational mutagenesis biochemical analysis model studies Protein structure - three dimensional - complicated - mediates function

Protein folding …-L-K-E-G-V-S-K-D-… …-CUA-AAA-GAA-GGU-GUU-AGC-AAG-GUU-… one amino acid DNA protein sequence unfolded protein native state spontaneous self-organisation (~1 second) not unique mobile inactive expanded irregular

Protein folding …-L-K-E-G-V-S-K-D-… …-CUA-AAA-GAA-GGU-GUU-AGC-AAG-GUU-… one amino acid DNA protein sequence unfolded protein native state spontaneous self-organisation (~1 second) unique shape precisely ordered stable/functional globular/compact helices and sheets not unique mobile inactive expanded irregular

Ab initio prediction of protein structure sample conformational space such that native-like conformations are found astronomically large number of conformations 5 states/100 residues = 5 100 = 10 70 select hard to design functions that are not fooled by non-native conformations (“decoys”)

Semi-exhaustive segment-based folding EFDVILKAAGANKVAVIKAVRGATGLGLKEAKDLVESAPAALKEGVSKDDAEALKKALEEAGAEVEVK generate fragments from database 14-state ,  model …… minimise monte carlo with simulated annealing conformational space annealing, GA …… filter all-atom pairwise interactions, bad contacts compactness, secondary structure

Historical perspective on ab initio prediction Before CASP (BC): “solved” (biased results) CASP1: worse than random CASP2: worse than random with one exception CASP4: ? CASP3: consistently predicted correct topology - ~ 6.0 Å for 60+ residues *T56/dnab – 6.8 Å (60 residues; 67-126) **T59/smd3 – 6.8 Å (46 residues; 30-75) **T61/hdea – 7.4 Å (66 residues; 9-74)**T64/sinr – 4.8 Å (68 residues; 1-68) *T74/eps15 – 7.0 Å (60 residues; 154-213) **T75/ets1 – 7.7 Å (77 residues; 55-131)

Prediction for CASP4 target T110/rbfa C  RMSD of 4.0 Å for 80 residues (1-80)

Prediction for CASP4 target T97/er29 C  RMSD of 6.2 Å for 80 residues (18-97)

Prediction for CASP4 target T106/sfrp3 C  RMSD of 6.2 Å for 70 residues (6-75)

Prediction for CASP4 target T98/sp0a C  RMSD of 6.0 Å for 60 residues (37-105)

Prediction for CASP4 target T126/omp C  RMSD of 6.5 Å for 60 residues (87-146)

Prediction for CASP4 target T114/afp1 C  RMSD of 6.5 Å for 45 residues (36-80)

Postdiction for CASP4 target T102/as48 C  RMSD of 5.3 Å for 70 residues (1-70)

Historical perspective on ab initio prediction CASP1: worse than random CASP2: worse than random with one exception CASP4: consistently predicted correct topology - ~4-6.0 A for 60-80+ residues CASP3: consistently predicted correct topology - ~ 6.0 Å for 60+ residues **T110/rbfa – 4.0 Å (80 residues; 1-80)*T114/afp1 – 6.5 Å (45 residues; 36-80) **T97/er29 – 6.0 Å (80 residues; 18-97) **T106/sfrp3 – 6.2 Å (70 residues; 6-75) *T98/sp0a – 6.0 Å (60 residues; 37-105)**T102/as48 – 5.3 Å (70 residues; 1-70) Before CASP (BC): “solved” (biased results)

Comparative modelling of protein structure KDHPFGFAVPTKNPDGTMNLMNWECAIP KDPPAGIGAPQDN----QNIMLWNAVIP ** * * * * * * * ** …… align build initial model construct non-conserved side chains and main chains refine

Historical perspective on comparative modelling BC excellent ~ 80% 1.0 Å 2.0 Å alignment side chain short loops longer loops

Historical perspective on comparative modelling CASP1 poor ~ 50% ~ 3.0 Å > 5.0 Å BC excellent ~ 80% 1.0 Å 2.0 Å alignment side chain short loops longer loops

A graph theoretic representation of protein structure -0.6 (V 1 ) -1.0 (F) -0.7 (K) -0.5 (I) -0.9 (V 2 ) weigh nodes -0.5 (I)-0.9 (V 2 ) -1.0 (F) -0.7 (K) -0.3 -0.4 -0.2 -0.1 find cliques W = -4.5 represent residues as nodes -0.5 (I) -0.6 (V 1 ) -0.9 (V 2 ) -1.0 (F) -0.7 (K) -0.3 -0.4 -0.2 -0.1 -0.2 construct graph -0.1

Prediction for CASP4 target T128/sodm C  RMSD of 1.0 Å for 198 residues (PID 50%)

Prediction for CASP4 target T111/eno C  RMSD of 1.7 Å for 430 residues (PID 51%)

Prediction for CASP4 target T122/trpa C  RMSD of 2.9 Å for 241 residues (PID 33%)

Prediction for CASP4 target T125/sp18 C  RMSD of 4.4 Å for 137 residues (PID 24%)

Prediction for CASP4 target T112/dhso C  RMSD of 4.9 Å for 348 residues (PID 24%)

Prediction for CASP4 target T92/yeco C  RMSD of 5.6 Å for 104 residues (PID 12%)

CASP4: overall model accuracy ranging from 1 Å to 6 Å for 50-10% sequence identity **T112/dhso – 4.9 Å (348 residues; 24%)**T92/yeco – 5.6 Å (104 residues; 12%) **T128/sodm – 1.0 Å (198 residues; 50%) **T125/sp18 – 4.4 Å (137 residues; 24%) **T111/eno – 1.7 Å (430 residues; 51%)**T122/trpa – 2.9 Å (241 residues; 33%) Historical perspective on comparative modelling CASP2 fair ~ 75% ~ 1.0 Å ~ 3.0 Å CASP3 fair ~75% ~ 1.0 Å ~ 2.5 Å CASP4 fair ~75% ~ 1.0 Å ~ 2.0 Å CASP1 poor ~ 50% ~ 3.0 Å > 5.0 Å BC excellent ~ 80% 1.0 Å 2.0 Å alignment side chain short loops longer loops

Computational aspects of structural genomics D. ab initio prediction C. fold recognition * * * * * * * * * * B. comparative modelling A. sequence space * * * * * * * * * * * * E. target selection targets F. analysis * * (Figure idea by Steve Brenner.)

Computational aspects of functional genomics structure based methods microenvironment analysis zinc binding site? structure comparison homology function? sequence based methods sequence comparison motif searches phylogenetic profiles domain fusion analyses + experimental data + * * * * G. assign function * * assign function to entire protein space

Conclusions: structure Ab initio prediction can produce low resolution models that may aid gross functional studies Comparative modelling can produce high resolution models that can be used to study detailed function Large scale structure prediction will complement experimental structural genomics efforts

Conclusions:function Detailed analysis of structures can be used to predict protein function, complementing experimental and sequence based techniques Structure comparisons and microenvironment analyses can be used to prediction function on a genome-wide scale Large scale function prediction will complement experimental functional genomics efforts

Take home message Prediction of protein structure and function can be used to model whole genomes to understand organismal function and evolution Michael Levitt, Stanford University John Moult, CARB Patrice Koehl, Stanford University Yu Xia, Stanford Univeristy Levitt and Moult groups Acknowledgements

Modelling genome structure and function Ram Samudrala University of Washington.

Similar presentations

Presentation on theme: "Modelling genome structure and function Ram Samudrala University of Washington."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Modelling genome structure and function Ram Samudrala University of Washington.

Similar presentations

Presentation on theme: "Modelling genome structure and function Ram Samudrala University of Washington."— Presentation transcript:

Similar presentations

About project

Feedback