University of Washington Modelling Proteomes Ram Samudrala University of Washington
? Rationale for understanding protein structure and function structure determination structure prediction Protein sequence -large numbers of sequences, including whole genomes Protein structure - three dimensional - complicated - mediates function ? homology rational mutagenesis biochemical analysis model studies Protein function - rational drug design and treatment of disease - protein and genetic engineering - build networks to model cellular pathways - study organismal function and evolution
Protein folding DNA protein sequence unfolded protein native state …-CUA-AAA-GAA-GGU-GUU-AGC-AAG-GUU-… protein sequence …-L-K-E-G-V-S-K-D-… one amino acid not unique mobile inactive expanded irregular unfolded protein native state spontaneous self-organisation (~1 second)
Protein folding DNA protein sequence unfolded protein native state …-CUA-AAA-GAA-GGU-GUU-AGC-AAG-GUU-… protein sequence …-L-K-E-G-V-S-K-D-… one amino acid not unique mobile inactive expanded irregular unfolded protein native state spontaneous self-organisation (~1 second) unique shape precisely ordered stable/functional globular/compact helices and sheets
De novo prediction of protein structure sample conformational space such that native-like conformations are found select hard to design functions that are not fooled by non-native conformations (“decoys”) astronomically large number of conformations 5 states/100 residues = 5100 = 1070
Semi-exhaustive segment-based folding EFDVILKAAGANKVAVIKAVRGATGLGLKEAKDLVESAPAALKEGVSKDDAEALKKALEEAGAEVEVK minimise monte carlo with simulated annealing conformational space annealing, GA … generate fragments from database 14-state f,y model … filter all-atom pairwise interactions, bad contacts compactness, secondary structure
Decomposition of all-atom function using ICA (blind separation of sources by maximising the statistical independence across various channels) atom type 2 atom type 1 energy distance (A) Disulphide bridges atom type 1 atom type 2 distance (A) energy Main chain hydrogen bonding atom type 2 atom type 1 energy distance (A) Salt bridges atom type 2 atom type 1 energy distance (A) Side -> main chain hydrogen bonding Shing-Chung Ngan
Ab initio prediction at CASP Before CASP (BC): “solved” (biased results) CASP1: worse than random CASP2: worse than random with one exception CASP3: consistently predicted correct topology - ~ 6.0 Å for 60+ residues CASP4: consistently predicted correct topology - ~4-6.0 A for 60-80+ residues **T97/er29 – 6.0 Å (80 residues; 18-97) *T98/sp0a – 6.0 Å (60 residues; 37-105) **T102/as48 – 5.3 Å (70 residues; 1-70) **T106/sfrp3 – 6.2 Å (70 residues; 6-75) **T110/rbfa – 4.0 Å (80 residues; 1-80) *T114/afp1 – 6.5 Å (45 residues; 36-80)
Prediction for CASP4 target T110/rbfa Ca RMSD of 4.0 Å for 80 residues (1-80)
Prediction for CASP4 target T97/er29 Ca RMSD of 6.2 Å for 80 residues (18-97)
Prediction for CASP4 target T106/sfrp3 Ca RMSD of 6.2 Å for 70 residues (6-75)
Prediction for CASP4 target T98/sp0a Ca RMSD of 6.0 Å for 60 residues (37-105)
Prediction for CASP4 target T126/omp Ca RMSD of 6.5 Å for 60 residues (87-146)
Prediction for CASP4 target T114/afp1 Ca RMSD of 6.5 Å for 45 residues (36-80)
Postdiction for CASP4 target T102/as48 Ca RMSD of 5.3 Å for 70 residues (1-70)
CASP5?
Comparative modelling of protein structure KDHPFGFAVPTKNPDGTMNLMNWECAIP KDPPAGIGAPQDN----QNIMLWNAVIP ** * * * * * * * ** … scan align de novo simulation build initial model minimum perturbation construct non-conserved side chains and main chains graph theory, semfold refine physical functions
A graph theoretic representation of protein structure residues as nodes -0.6 (V1) -1.0 (F) -0.7 (K) -0.5 (I) -0.9 (V2) weigh nodes -0.5 (I) -0.6 (V1) -0.9 (V2) -1.0 (F) -0.7 (K) -0.3 -0.4 -0.2 -0.1 construct graph -0.5 (I) -0.9 (V2) -1.0 (F) -0.7 (K) -0.3 -0.4 -0.2 -0.1 find cliques W = -4.5
Comparative modelling at CASP fair ~ 75% ~ 1.0 Å ~ 3.0 Å CASP3 ~75% ~ 2.5 Å CASP4 ~ 2.0 Å CASP1 poor ~ 50% > 5.0 Å BC excellent ~ 80% 1.0 Å 2.0 Å alignment side chain short loops longer loops CASP4: overall model accuracy ranging from 1 Å to 6 Å for 50-10% sequence identity **T128/sodm – 1.0 Å (198 residues; 50%) **T111/eno – 1.7 Å (430 residues; 51%) **T122/trpa – 2.9 Å (241 residues; 33%) **T125/sp18 – 4.4 Å (137 residues; 24%) **T112/dhso – 4.9 Å (348 residues; 24%) **T92/yeco – 5.6 Å (104 residues; 12%)
Prediction for CASP4 target T128/sodm Ca RMSD of 1.0 Å for 198 residues (PID 50%)
Prediction for CASP4 target T111/eno Ca RMSD of 1.7 Å for 430 residues (PID 51%)
Prediction for CASP4 target T122/trpa Ca RMSD of 2.9 Å for 241 residues (PID 33%)
Prediction for CASP4 target T125/sp18 Ca RMSD of 4.4 Å for 137 residues (PID 24%)
Prediction for CASP4 target T112/dhso Ca RMSD of 4.9 Å for 348 residues (PID 24%)
Prediction for CASP4 target T92/yeco Ca RMSD of 5.6 Å for 104 residues (PID 12%)
CASP5?
Protein structure from combining theory and experiment Ling-Hong Hung
Prediction for Invb from Salmonella typhimurium
Prediction of HIV-1 protease-inhibitor binding energies with MD 1.0 0.5 with MD without MD Correlation coefficient ps 0 0.2 0.4 0.6 0.8 1.0 MD simulation time Ekachai Jenwitheesuk
Bioverse – explore relationships among molecules and systems http://bioverse.compbio.washington.edu Jason Mcdermott
Bioverse – explore relationships among molecules and systems Jason Mcdermott
Bioverse – human protein-protein interaction network Jason Mcdermott/Zach Frazier
Bioverse – salmonella protein-protein interaction network Jason Mcdermott/Zach Frazier
Bioverse – human protein-protein similarity network Jason Mcdermott/Zach Frazier
Take home message Acknowledgements Prediction of protein structure and function can be used to model whole proteomes to understand organismal function and evolution Ekachai Jenwitheesuk Jason McDermott Ling-Hong Hung Shing-Chung Ngan Yi-Ling Chen Zach Frazier Group members Levitt and Moult groups Acknowledgements