Modelling the rice proteome Ram Samudrala University of Washington
{ } What is a “proteome”? What does it mean to “model a proteome”? All proteins of a particular system (organelle, cell, organism) What does it mean to “model a proteome”? For any protein, we wish to: ANNOTATION { figure out what it looks like (structure or form) understand what it does (function) Repeat for all proteins in a system EXPRESSION + INTERACTION } Understand the relationships between all of them
? Why should we model proteomes? Intellectual challenge: Because it’s there! Pragmatic reasons: - rational drug design and treatment of disease - protein and genetic engineering - build networks to model cellular pathways - study organismal function and evolution ?
De novo prediction of protein structure sample conformational space such that native-like conformations are found select hard to design functions that are not fooled by non-native conformations (“decoys”) astronomically large number of conformations 5 states/100 residues = 5100 = 1070
Semi-exhaustive segment-based folding EFDVILKAAGANKVAVIKAVRGATGLGLKEAKDLVESAPAALKEGVSKDDAEALKKALEEAGAEVEVK minimise monte carlo with simulated annealing conformational space annealing, GA … generate fragments from database 14-state f,y model … filter all-atom pairwise interactions, bad contacts compactness, secondary structure
CASP5 prediction for T129 5.8 Å Cα RMSD for 68 residues
CASP5 prediction for T138 4.6 Å Cα RMSD for 84 residues
CASP5 prediction for T146 5.6 Å Cα RMSD for 67 residues
4.8 Å Cα RMSD for all 69 residues CASP5 prediction for T170 4.8 Å Cα RMSD for all 69 residues
CASP5 prediction for T172 5.9 Å Cα RMSD for 74 residues
CASP5 prediction for T187 5.1 Å Cα RMSD for 66 residues
Comparative modelling of protein structure KDHPFGFAVPTKNPDGTMNLMNWECAIP KDPPAGIGAPQDN----QNIMLWNAVIP ** * * * * * * * ** … scan align build initial model minimum perturbation construct non-conserved side chains and main chains graph theory, semfold refine physical functions
1.0 Å Cα RMSD for 133 residues (57% id) CASP5 prediction for T129 1.0 Å Cα RMSD for 133 residues (57% id)
1.0 Å Cα RMSD for 249 residues (41% id) CASP5 prediction for T182 1.0 Å Cα RMSD for 249 residues (41% id)
2.7 Å Cα RMSD for 99 residues (32% id) CASP5 prediction for T150 2.7 Å Cα RMSD for 99 residues (32% id)
6.0 Å Cα RMSD for 428 residues (24% id) CASP5 prediction for T185 6.0 Å Cα RMSD for 428 residues (24% id)
2.5 Å Cα RMSD for 125 residues (22% id) CASP5 prediction for T160 2.5 Å Cα RMSD for 125 residues (22% id)
6.0 Å Cα RMSD for 260 residues (14% id) CASP5 prediction for T133 6.0 Å Cα RMSD for 260 residues (14% id)
B. comparative modelling Computational aspects of structural genomics A. sequence space * C. fold recognition * B. comparative modelling D. ab initio prediction E. target selection targets F. analysis * (Figure idea by Steve Brenner.)
+ + Computational aspects of functional genomics structure based methods microenvironment analysis zinc binding site? structure comparison homology function? * G. assign function assign function to entire protein space sequence based methods sequence comparison motif searches phylogenetic profiles domain fusion analyses + experimental data single molecule + genomic/proteomic +
Bioverse – explore relationships among molecules and systems http://bioverse.compbio.washington.edu Jason McDermott
Bioverse – explore relationships among molecules and systems Jason Mcdermott
Bioverse – prediction of protein interaction networks Target proteome Interacting protein database protein A 85% protein α protein β experimentally determined interaction predicted interaction protein B 90% Assign confidence based on similarity and strength of interaction Jason Mcdermott
Bioverse – mapping pathways on the rice predicted network Defense-related proteins Jason McDermott
Bioverse – mapping pathways on the rice predicted network Tryptophan biosynthesis Jason McDermott
Bioverse – network-based annotation Jason McDermott
Bioverse – interactive network viewer Jason McDermott
Take home message Acknowledgements Prediction of protein structure and function can be used to model whole genomes to understand organismal function and evolution Acknowledgements Aaron Chang Ashley Lam Ekachai Jenwitheesuk Gong Cheng Jason McDermott Kai Wang Ling-Hong Hung Lynne Townsend Marissa LaMadrid Mike Inouye Stewart Moughon Shing-Chung Ngan Yi-Ling Cheng Zach Frazier