Modelling Genome Structure and Function Ram Samudrala University of Washington.

Modelling Genome Structure and Function Ram Samudrala University of Washington

Rationale for understanding protein structure and function Protein sequence -large numbers of sequences, including whole genomes Protein function - rational drug design and treatment of disease - protein and genetic engineering - build networks to model cellular pathways - study organismal function and evolution ? structure determination structure prediction homology rational mutagenesis biochemical analysis model studies Protein structure - three dimensional - complicated - mediates function

Protein folding …-L-K-E-G-V-S-K-D-… …-CUA-AAA-GAA-GGU-GUU-AGC-AAG-GUU-… one amino acid DNA protein sequence unfolded protein native state spontaneous self-organisation (~1 second) not unique mobile inactive expanded irregular

Protein folding …-L-K-E-G-V-S-K-D-… …-CUA-AAA-GAA-GGU-GUU-AGC-AAG-GUU-… one amino acid DNA protein sequence unfolded protein native state spontaneous self-organisation (~1 second) unique shape precisely ordered stable/functional globular/compact helices and sheets not unique mobile inactive expanded irregular

Ab initio prediction of protein structure sample conformational space such that native-like conformations are found astronomically large number of conformations 5 states/100 residues = 5 100 = 10 70 select hard to design functions that are not fooled by non-native conformations (“decoys”)

Semi-exhaustive segment-based folding EFDVILKAAGANKVAVIKAVRGATGLGLKEAKDLVESAPAALKEGVSKDDAEALKKALEEAGAEVEVK generate fragments from database 14-state ,  model …… minimise monte carlo with simulated annealing conformational space annealing, GA …… filter all-atom pairwise interactions, bad contacts compactness, secondary structure

Ab initio prediction at CASP CASP1: worse than random CASP2: worse than random with one exception CASP4: consistently predicted correct topology - ~4-6.0 A for 60-80+ residues CASP3: consistently predicted correct topology - ~ 6.0 Å for 60+ residues **T110/rbfa – 4.0 Å (80 residues; 1-80)*T114/afp1 – 6.5 Å (45 residues; 36-80) **T97/er29 – 6.0 Å (80 residues; 18-97) **T106/sfrp3 – 6.2 Å (70 residues; 6-75) *T98/sp0a – 6.0 Å (60 residues; 37-105)**T102/as48 – 5.3 Å (70 residues; 1-70) Before CASP (BC): “solved” (biased results)

Prediction for CASP4 target T110/rbfa C  RMSD of 4.0 Å for 80 residues (1-80)

Prediction for CASP4 target T97/er29 C  RMSD of 6.2 Å for 80 residues (18-97)

Prediction for CASP4 target T106/sfrp3 C  RMSD of 6.2 Å for 70 residues (6-75)

Prediction for CASP4 target T98/sp0a C  RMSD of 6.0 Å for 60 residues (37-105)

Prediction for CASP4 target T126/omp C  RMSD of 6.5 Å for 60 residues (87-146)

Prediction for CASP4 target T114/afp1 C  RMSD of 6.5 Å for 45 residues (36-80)

Postdiction for CASP4 target T102/as48 C  RMSD of 5.3 Å for 70 residues (1-70)

Comparative modelling of protein structure KDHPFGFAVPTKNPDGTMNLMNWECAIP KDPPAGIGAPQDN----QNIMLWNAVIP ** * * * * * * * ** …… scan align refine physical functions build initial model minimum perturbation construct non-conserved side chains and main chains graph theory, semfold de novo simulation

A graph theoretic representation of protein structure -0.6 (V 1 ) -1.0 (F) -0.7 (K) -0.5 (I) -0.9 (V 2 ) weigh nodes -0.5 (I)-0.9 (V 2 ) -1.0 (F) -0.7 (K) -0.3 -0.4 -0.2 -0.1 find cliques W = -4.5 represent residues as nodes -0.5 (I) -0.6 (V 1 ) -0.9 (V 2 ) -1.0 (F) -0.7 (K) -0.3 -0.4 -0.2 -0.1 -0.2 construct graph -0.1

CASP4: overall model accuracy ranging from 1 Å to 6 Å for 50-10% sequence identity **T112/dhso – 4.9 Å (348 residues; 24%)**T92/yeco – 5.6 Å (104 residues; 12%) **T128/sodm – 1.0 Å (198 residues; 50%) **T125/sp18 – 4.4 Å (137 residues; 24%) **T111/eno – 1.7 Å (430 residues; 51%)**T122/trpa – 2.9 Å (241 residues; 33%) Comparative modelling at CASP CASP2 fair ~ 75% ~ 1.0 Å ~ 3.0 Å CASP3 fair ~75% ~ 1.0 Å ~ 2.5 Å CASP4 fair ~75% ~ 1.0 Å ~ 2.0 Å CASP1 poor ~ 50% ~ 3.0 Å > 5.0 Å BC excellent ~ 80% 1.0 Å 2.0 Å alignment side chain short loops longer loops

Prediction for CASP4 target T128/sodm C  RMSD of 1.0 Å for 198 residues (PID 50%)

Prediction for CASP4 target T111/eno C  RMSD of 1.7 Å for 430 residues (PID 51%)

Prediction for CASP4 target T122/trpa C  RMSD of 2.9 Å for 241 residues (PID 33%)

Prediction for CASP4 target T125/sp18 C  RMSD of 4.4 Å for 137 residues (PID 24%)

Prediction for CASP4 target T112/dhso C  RMSD of 4.9 Å for 348 residues (PID 24%)

Prediction for CASP4 target T92/yeco C  RMSD of 5.6 Å for 104 residues (PID 12%)

Prediction for Invb

Computational aspects of structural genomics D. ab initio prediction C. fold recognition * * * * * * * * * * B. comparative modelling A. sequence space * * * * * * * * * * * * E. target selection targets F. analysis * * (Figure idea by Steve Brenner.)

Computational aspects of functional genomics structure based methods microenvironment analysis zinc binding site? structure comparison homology function? sequence based methods sequence comparison motif searches phylogenetic profiles domain fusion analyses + experimental data + * * * * G. assign function * * assign function to entire protein space

Modelling structure and function of entire genomes Homo sapiens (human) ~ 52,413 sequences Oryza sativa (rice) ~ 55,000 sequences Pseudomonas aeruginosa ~ 6000 sequences Salmonella typhimurium ~ 2000 sequences HIV 9 genes + ~ 52,000 variants

Modelling structure and function of the Oryza sativa (rice) genome Most common functions (from PROSITE) ATP/GTP-binding site motif A (P loop) Serine/Threonine protein kinase active site EF-hand (Calcium binding) Cytochrome C Heme binding site Most common functions (from annotations) Reverse transcriptase Nucleotide Binding Site (NBS) Serine/Threonine protein kinase Chitinase ~30 % with known homologs in PDB 6813 coding sequences 3149 without a product annotation 816 classified as hypothetical protein 1187 with a hypothetical function

Bioverse database and webserver http://bioverse.compbio.washington.edu

Bioverse webserver snapshot

Bioverse dataflow

Take home message Prediction of protein structure and function can be used to model whole genomes to understand organismal function and evolution Jason McDermott Group members Levitt and Moult groups Acknowledgements

Modelling Genome Structure and Function Ram Samudrala University of Washington.

Similar presentations

Presentation on theme: "Modelling Genome Structure and Function Ram Samudrala University of Washington."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Modelling Genome Structure and Function Ram Samudrala University of Washington.

Similar presentations

Presentation on theme: "Modelling Genome Structure and Function Ram Samudrala University of Washington."— Presentation transcript:

Similar presentations

About project

Feedback