Presentation is loading. Please wait.

Presentation is loading. Please wait.

How does the genome of an organism

Similar presentations


Presentation on theme: "How does the genome of an organism"— Presentation transcript:

1 How does the genome of an organism
Modelling proteomes Conjoint 548A Ram Samudrala How does the genome of an organism specify its behaviour and characteristics?

2 Proteome – all proteins of a particular system
~60,000 in human ~60,000 in rice ~4500 in bacteria like Salmonella and E. coli Several thousand distinct sequence families

3 Modelling proteomes – understand the structure of individual proteins
A few thousand distinct structural folds

4 Modelling proteomes – understand their individual functions
Thousands of possible functions

5 Modelling proteomes – understand their expression
Different expression patterns based on time and location

6 Modelling proteomes – understand their interactions
Interactions and expression patterns are interdependent with structure and function

7 Protein folding Gene Protein sequence Unfolded protein
…-CTA-AAA-GAA-GGT-GTT-AGC-AAG-GTT-… Protein sequence …-L-K-E-G-V-S-K-D-… one amino acid not unique mobile inactive expanded irregular Unfolded protein Native biologically relevant state spontaneous self-organisation (~1 second)

8 Protein folding Gene Protein sequence Unfolded protein
…-CTA-AAA-GAA-GGT-GTT-AGC-AAG-GTT-… Protein sequence …-L-K-E-G-V-S-K-D-… one amino acid not unique mobile inactive expanded irregular Unfolded protein Native biologically relevant state spontaneous self-organisation (~1 second) unique shape precisely ordered stable/functional globular/compact helices and sheets

9 Protein primary structure
twenty types of amino acids R H C OH O N two amino acids join by forming a peptide bond R H C O N OH R H C O N c f y each residue in the amino acid main chain has two degrees of freedom (f and y) the amino acid side chains can have up to four degrees of freedom (c1-4)

10 Protein secondary structure
b a L f 0 0 y +180 -180 many f,y combinations are not possible b sheet (anti-parallel) N C N C b sheet (parallel) a helix

11 Protein tertiary and quaternary structures
Ribonuclease inhibitor (2bnh) Haemoglobin (1hbh) Hemagglutinin (1hgd)

12 Methods for obtaining structure
Experimental X-ray crystallography NMR spectroscopy Theoretical De novo prediction Homology modelling

13 Computer representation of protein structure
Structures are stored in the protein data bank (PDB), a repository of mostly experimental models based on X-ray crystallographic and NMR studies < Atoms are defined by their Cartesian coordinates: ATOM N GLU ATOM CA GLU ATOM C GLU ATOM O GLU ATOM CB GLU ATOM CG GLU ATOM CD GLU ATOM OE1 GLU ATOM OE2 GLU ATOM N PHE ATOM CA PHE These structures provide the basis for most of theoretical work in protein folding and protein structure prediction

14 Comparison of protein structures
Need ways to determine if two protein structures are related and to compare predicted models to experimental structures Commonly used measure is the root mean square deviation (RMSD) of the Cartesian atoms between two structures after optimal superposition (McLachlan, 1979): Usually use Ca atoms 3.6 Å 2.9 Å NK-lysin (1nkl) Bacteriocin T102/as48 (1e68) T102 best model Other measures include contact maps and torsion angle RMSDs

15 De novo prediction of protein structure
sample conformational space such that native-like conformations are found select hard to design functions that are not fooled by non-native conformations (“decoys”) astronomically large number of conformations 5 states/100 residues = 5100 = 1070

16 Requirements for sampling methods and scoring functions
Sampling methods must produce good decoy sets that are comprehensive and include several native-like structures Scoring function scores must correlate well with RMSD of conformations (the better the score/energy, the lower the RMSD)

17 Semi-exhaustive segment-based folding
EFDVILKAAGANKVAVIKAVRGATGLGLKEAKDLVESAPAALKEGVSKDDAEALKKALEEAGAEVEVK minimise Find the most protein-like structures generate Make random moves to optimise what is observed in known structures filter all-atom pairwise interactions, bad contacts compactness, secondary structure, consensus of generated conformations

18 Critical Assessment of protein Structure Prediction methods (CASP)
Pre-CASP CASP Bias towards known structures Blind prediction

19 5.0 Å Cα RMSD for all 53 residues
CASP6 prediction (model1) for T0215 5.0 Å Cα RMSD for all 53 residues Ling-Hong Hung/Shing-Chung Ngan

20 4.3 Å Cα RMSD for all 70 residues
CASP6 prediction (model1) for T0281 4.3 Å Cα RMSD for all 70 residues Sheets –no copying- pairing and packing Ling-Hong Hung/Shing-Chung Ngan

21 Homologous proteins share similar structures
Gan et al, Biophysical Journal 83: , 2002

22 Comparative modelling of protein structure
KDHPFGFAVPTKNPDGTMNLMNWECAIP KDPPAGIGAPQDN----QNIMLWNAVIP ** * * * * * * * ** scan align de novo simulation build initial model minimum perturbation construct non-conserved side chains and main chains graph theory, semfold refine physical functions

23 1.3 Å Cα RMSD for all 137 residues (80% ID)
CASP6 prediction (model1) for T0231 1.3 Å Cα RMSD for all 137 residues (80% ID) Tianyun Liu

24 2.4 Å Cα RMSD for all 142 residues (46% ID)
CASP6 prediction (model1) for T0271 2.4 Å Cα RMSD for all 142 residues (46% ID) Tianyun Liu

25 Protein structure from combining theory and experiment
Ling-Hong Hung

26 Similar global sequence or structure does not imply similar function
TIM barrel proteins 2246 with known structure hydrolase ligase lyase oxidoreductase transferase

27 Function prediction from structure
Kai Wang

28 Prediction of HIV-1 protease-inhibitor binding energies with MD
Can predict resistance/susceptibility to six FDA approved inhibitors with 95% accuracy in conjunction with knowledge-based methods Ekachai Jenwitheesuk

29 Prediction of protein inhibitors
Ekachai Jenwitheesuk

30 Prediction of protein interaction networks
Target proteome protein A 85% predicted interaction protein B 90% Interacting protein database protein a protein b experimentally determined interaction Assign confidence based on similarity and strength of interaction Key paradigm is the use of homology to transfer information across organisms; not limited to yeast, fly, and worm Consensus of interactions helps with confidence assignments Jason McDermott

31 E. coli predicted protein interaction network
Jason McDermott

32 M. tuberculosis predicted protein interaction network
Jason McDermott

33 C. elegans predicted protein interaction network
Jason McDermott

34 H. sapiens predicted protein interaction network
Jason McDermott

35 Bioverse – v2.0 Michal Guerquin/Zach Frazier

36 Network-based annotation for C. elegans
Jason McDermott

37 Identifying key proteins on the anthrax predicted network
Articulation point proteins Jason McDermott

38 Identification of virulence factors
Jason McDermott

39 Bioverse - Integrator Aaron Chang/Imran Rashid

40 + + Computational Functional Structural biology genomics genomics
Where is all this going? Computational biology + Structural genomics Functional genomics + Take home message Prediction of protein structure, function, and networks may be used to model whole genomes to understand organismal function and evolution

41 Current group members: Past group members:
Acknowledgements Andrew Nichols Baishali Chanda Chuck Mader David Nickle Duangdao Wichadakul Duncan Milburn Ersin Emre Oren Ekachai Jenwitheesuk Gong Cheng Jason McDermott Jeremy Horst Kai Wang Ling-Hong Hung Michal Guerquin Shing-Chung Ngan Somsak Phattarasukol Stewart Moughon Tianyun Liu Weerayuth Kittichotirat Zach Frazier Kristina Montgomery, Program Manager Current group members: Aaron Chang Marissa LaMadrid Mike Inouye Sarunya Suebtragoon Past group members: Funding agencies: National Institutes of Health National Science Foundation Searle Scholars Program Puget Sound Partners in Global Health UW Advanced Technology Initiative


Download ppt "How does the genome of an organism"

Similar presentations


Ads by Google