Download presentation
Presentation is loading. Please wait.
Published byScarlett Carson Modified over 9 years ago
1
Protein Structure Prediction Graham Wood Charlotte Deane
2
The problem - in brief MVLSEGEWQL VLHVWAKVEA DVAGHGQDIL … AKYKELCYOG Databases Algorithms Software +=
3
Why is protein structure prediction needed? Essential functioning of cells is mediated by proteins It is protein structure that leads to protein function 3D structure determination is expensive, slow and difficult (by X-ray crystallography or NMR) Assists in the engineering of new proteins
4
Terminology Target - the unknown structure you are trying to model Parent - a known structure which provides a basis for modelling
5
The problem- more detail Configuration space Energy EKGPDLYLIPLT Protein databases EKGPDLYLIPLT Biologist Physicist
6
CASP Critical Assessment of Structure Prediction Jan-Apr May Jun Jul Aug Sept Oct Nov Dec Biologists Caspers Organisers Call for structures Publish seqs on web Give sequences to organisers Structure determination Give structures to organisers Predict structure from sequence Expert assessment 4 day mtg
7
Degree of evolutionary conservation Less conserved Information poor More conserved Information rich DNA seqProtein SeqStructureFunction ACAGTTACAC CGGCTATGTA CTATACTTTG HDSFKLPVMS KFDWEMFKPC GKFLDSGKLG
8
Three main approaches (in order of current success) 1.Comparative modelling 2.Fold recognition 3.De novo
9
Comparative modelling Conserved backbone Energy EKGPDLYLIPLT Target Close homologues Variable backbone Side chains
10
Comparative modelling (protein building) 1.Prepare the raw materials 2.Build the model (two methods) 3.Check the model 4.Accept or reject the model
11
C1: Preparing the raw materials Structurally align parents Align target to parents EKGPDLYLIPLT Given target AA sequence Identify parents (homologues)
12
loop region secondary structure region Structurally conserved regions and structurally variable regions SCR SVR
13
C2: Building (choice of two methods) Attach and orient side-chains Refine model Determine SCRs and build associated backbone Determine SVRs and build rest of backbone Assemble fragmentsUse spatial restraints
15
C2: Building (choice of two methods) Orient side-chains Refine model Determine SCRs and build associated backbone Determine SVRs and build rest of backbone Assemble fragmentsUse spatial restraints Optimally satisfy spatial restraints
16
D T N V A Y C N K D
17
C3: Test model (C4: then accept or reject) Examine the model in the light of all experimental data PROCHECK, VERIFY3D, PROSA II, Visual inspection using 3D software, JOY
18
Problems in comparative modelling Aligning the target to the parents The packing of secondary structure elements in the core The long insertions and deletions in the structurally variable regions
19
Fold Recognition ? Target
20
Fold recognition Energy EKGPDLYLIPLT Target Structurally similar proteins
21
Fold recognition (protein finding) 1.Obtain library of non-duplicate folds 2.Perform sequence-structure alignment 3.Assess success of alignment Biologist – use substitution matrix Physicist – use potentials 4.Accept or reject the model
22
Sequence-structure alignment 1. Construct sequence profile 2. Use profile to score the sequence TargetParent BLASTP OWLMULTAL Dynamic programming algorithm Score
23
Amino acid substitutions are constrained by local environments Different substitution patterns Environment-specific substitution tables
24
Main-chain conformation and secondary structure (α-helix, β-strand, coil and positive φ) Solvent accessibility (accessible and inaccessible) Hydrogen bonds (side-chain to main-chain NH, side-chain to main-chain CO and side-chain to side-chain) Definition of local environments
25
Substitution scores Background probability of observing amino acid b, match occurring by chance Log odds score scaled to the nearest integer Probability that amino acid a in environment E is replaced by amino acid b Frequency of observing amino acid a in environment E replaced by b
26
Scoring with potentials Energy potential Solvation potential
27
The Novel Fold Problem ? asdghklprtwecvmnasetyasdghklprtwecvmnasety
28
De novo – new fold methods Energy EKGPDLYLIPLT Segment configurations Sets of local configurations
29
Defining a “New Fold” CATH –Somewhat objective SCOP –No objective definition –Tends towards evolutionary relationships Ask A. Murzin
30
New fold approach All structure information is in the AA sequence (Anfinson, Science, 1973) Seek “lowest free energy conformation” Tactic is to simplify the problem, for example Simplified model of protein (one atom per residue) Simple or knowledge based potential function Assist in detecting distant homologues
31
New fold recognition (structure discovery) 1.Set up domain and objective function 2.Perform optimisation 3.Check the model 4.Accept or reject the model
32
De Novo (biologist) ROSETTA (Baker et al.) Domain of objective function sequence 9 residues... Set of local structures consistent with local sequence
33
De Novo (biologist) ROSETTA Objective function to be maximised constant Function of energy
34
De Novo (biologist) ROSETTA Maximising the probability of the sequence 1.Choose each local conformation and start with a fully extended chain 2.Generate a neighbouring conformation 3.Accept in simulated annealing style, using P(structure|sequence) 4.Do this many times and cluster results – use centre of largest cluster as prediction
36
De Novo (physicist) ASTROFOLD (Floudas et al.) 1.Predict α-helices and β-strands 2.Predict β-sheets and disulphide bridges using ILP 3.Use deterministic global optimisation, with energy function and constraints to predict tertiary structure
37
Testing of prediction servers - LiveBench SensitivitySpecificityAdded Value ServerTypeEasyHardAllHardEasyHard Pcons2Consensus642233 ShotGun on 5Consensus124475 ShotGun on 3Consensus211122 Shotgun-INBGUThreading333341 INBGUThreading756956 Fugue3Threading14898159 Fugue2Threading12787108 Fugue1Threading1714 111615 mGenTHREADERThreading8111613611 GenTHREADERThreading13121715813 3D-PSSMThreading51012 10 ORFeusSequence467614 FFASSequence995597 Sam-T99Sequence101513161116 SuperfamilySequence151311101712 ORF-BLASTBLAST11161014 PDB-BLASTBLAST161715171317 BLAST 18
38
Review - comparative modelling Conserved backbone Energy EKGPDLYLIPLT Target Close homologues Variable backbone Side chains
39
Review - fold recognition Energy EKGPDLYLIPLT Target Structurally similar proteins
40
Review - new fold methods Energy EKGPDLYLIPLT Segment configurations Sets of local configurations
41
Summary: Prediction Methods Comparative modelling –There exists a protein with clear homology –PSI-BLAST Fold recognition –There exists a protein of similar fold (analogy) –DALI (CATH & SCOP) Novel Fold methods –The sequence has a new fold Better methods needed yet for it all to be useful!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.