Download presentation
Presentation is loading. Please wait.
1
From Sequences to Structure
Illustrations from: C Branden and J Tooze, Introduction to Protein Structure, 2nd ed. Garland Pub. ISBN
2
Protein Functions Mechanoenzymes: myosin, actin
Rhodopsin: allows vision Globins: transport oxygen Antibodies: immune system Enzymes: pepsin, renin, carboxypeptidase A Receptors: transmembrane signaling Vitelogenin: molecular velcro And hundreds of thousands more…
3
Proteins are Chains of Amino Acids
Polymer – a molecule composed of repeating units
4
The Peptide Bond Dehydration synthesis
Repeating backbone: N–C –C –N–C –C Convention – start at amino terminus and proceed to carboxy terminus O O
5
Peptidyl polymers A few amino acids in a chain are called a polypeptide. A protein is usually composed of 50 to 400+ amino acids. Since part of the amino acid is lost during dehydration synthesis, we call the units of a protein amino acid residues. amide nitrogen carbonyl carbon
6
Side Chain Properties Recall that the electronegativity of carbon is at about the middle of the scale for light elements Carbon does not make hydrogen bonds with water easily – hydrophobic O and N are generally more likely than C to h-bond to water – hydrophilic We group the amino acids into three general groups: Hydrophobic Charged (positive/basic & negative/acidic) Polar
7
The Hydrophobic Amino Acids
Proline severely limits allowable conformations!
8
The Charged Amino Acids
9
The Polar Amino Acids
10
More Polar Amino Acids And then there’s…
11
Planarity of the Peptide Bond
12
OCCBIO 2006 – Fundamental Bioinformatics
Phi and psi = = 180° is extended conformation : C to N–H : C=O to C OCCBIO 2006 – Fundamental Bioinformatics
13
The Ramachandran Plot Observed (non-glycine) Observed (glycine) Calculated G. N. Ramachandran – first calculations of sterically allowed regions of phi and psi Note the structural importance of glycine
14
Primary and Secondary Structure
Primary structure = the linear sequence of amino acids comprising a protein: AGVGTVPMTAYGNDIQYYGQVT… Secondary structure Regular patterns of hydrogen bonding in proteins result in two patterns that emerge in nearly every protein structure known: the -helix and the -sheet The location of direction of these periodic, repeating structures is known as the secondary structure of the protein
15
The alpha Helix 60°
16
Properties of the alpha helix
60° Hydrogen bonds between C=O of residue n, and NH of residue n+4 3.6 residues/turn 1.5 Å/residue rise 100°/residue turn
17
Properties of -helices
4 – 40+ residues in length Often amphipathic or “dual-natured” Half hydrophobic and half hydrophilic Mostly when surface-exposed If we examine many -helices, we find trends… Helix formers: Ala, Glu, Leu, Met Helix breakers: Pro, Gly, Tyr, Ser
18
The beta Strand (and Sheet)
135° +135°
19
Properties of beta sheets
Formed of stretches of 5-10 residues in extended conformation Pleated – each C a bit above or below the previous Parallel/aniparallel, contiguous/non-contiguous OCCBIO 2006 – Fundamental Bioinformatics
20
Parallel and anti-parallel -sheets
Anti-parallel is slightly energetically favored Anti-parallel Parallel
21
Turns and Loops Secondary structure elements are connected by regions of turns and loops Turns – short regions of non-, non- conformation Loops – larger stretches with no secondary structure. Often disordered. “Random coil” Sequences vary much more than secondary structure regions
22
Levels of Protein Structure
Secondary structure elements combine to form tertiary structure Quaternary structure occurs in multienzyme complexes Many proteins are active only as homodimers, homotetramers, etc.
23
Disulfide Bonds Two cyteines in close proximity will form a covalent bond Disulfide bond, disulfide bridge, or dicysteine bond. Significantly stabilizes tertiary structure.
24
Protein Structure Examples
25
Determining Protein Structure
There are ~ 100,000 distinct proteins in the human proteome. 3D structures have been determined for 14,000 proteins, from all organisms Includes duplicates with different ligands bound, etc. Coordinates are determined by X-ray crystallography
26
X-Ray diffraction Image is averaged over: Space (many copies)
Time (of the diffraction experiment)
27
Electron Density Maps Resolution is dependent on the quality/regularity of the crystal R-factor is a measure of “leftover” electron density Solvent fitting Refinement
28
The Protein Data Bank http://www.rcsb.org/pdb/
ATOM N ALA E APR 213 ATOM CA ALA E APR 214 ATOM C ALA E APR 215 ATOM O ALA E APR 216 ATOM CB ALA E APR 217 ATOM N GLY E APR 218 ATOM CA GLY E APR 219 ATOM C GLY E APR 220 ATOM O GLY E APR 221 ATOM N VAL E APR 222 ATOM CA VAL E APR 223 ATOM C VAL E APR 224 ATOM O VAL E APR 225 ATOM CB VAL E APR 226 ATOM CG1 VAL E APR 227 ATOM CG2 VAL E APR 228
29
Views of a Protein Wireframe Ball and stick
30
Views of a Protein Spacefill Cartoon CPK colors Carbon = green, black
Nitrogen = blue Oxygen = red Sulfur = yellow Hydrogen = white
31
The Protein Folding Problem
Central question of molecular biology: “Given a particular sequence of amino acid residues (primary structure), what will the tertiary/quaternary structure of the resulting protein be?” Input: AAVIKYGCAL… Output: 11, 22… = backbone conformation: (no side chains yet)
32
Forces Driving Protein Folding
It is believed that hydrophobic collapse is a key driving force for protein folding Hydrophobic core Polar surface interacting with solvent Minimum volume (no cavities) Disulfide bond formation stabilizes Hydrogen bonds Polar and electrostatic interactions
33
Folding Help Proteins are, in fact, only marginally stable
Native state is typically only 5 to 10 kcal/mole more stable than the unfolded form Many proteins help in folding Protein disulfide isomerase – catalyzes shuffling of disulfide bonds Chaperones – break up aggregates and (in theory) unfold misfolded proteins
34
The Hydrophobic Core Hemoglobin A is the protein in red blood cells (erythrocytes) responsible for binding oxygen. The mutation E6V in the chain places a hydrophobic Val on the surface of hemoglobin The resulting “sticky patch” causes hemoglobin S to agglutinate (stick together) and form fibers which deform the red blood cell and do not carry oxygen efficiently Sickle cell anemia was the first identified molecular disease
35
Sickle Cell Anemia Sequestering hydrophobic residues in the protein core protects proteins from hydrophobic agglutination.
36
Computational Problems in Protein Folding
Two key questions: Evaluation – how can we tell a correctly-folded protein from an incorrectly folded protein? H-bonds, electrostatics, hydrophobic effect, etc. Derive a function, see how well it does on “real” proteins Optimization – once we get an evaluation function, can we optimize it? Simulated annealing/monte carlo EC Heuristics
37
Fold Optimization Simple lattice models (HP-models)
Two types of residues: hydrophobic and polar 2-D or 3-D lattice The only force is hydrophobic collapse Score = number of HH contacts
38
Scoring Lattice Models
H/P model scoring: count noncovalent hydrophobic interactions. Sometimes: Penalize for buried polar or surface hydrophobic residues
39
What can we do with lattice models?
For smaller polypeptides, exhaustive search can be used Looking at the “best” fold, even in such a simple model, can teach us interesting things about the protein folding process For larger chains, other optimization and search methods must be used Greedy, branch and bound Evolutionary computing, simulated annealing Graph theoretical methods
40
Learning from Lattice Models
The “hydrophobic zipper” effect: Ken Dill ~ 1997
41
Representing a lattice model
Absolute directions UURRDLDRRU Relative directions LFRFRRLLFFL Advantage, we can’t have UD or RL in absolute Only three directions: LRF What about bumps? LFRRR Bad score Use a better representation
42
Preference-order representation
Each position has two “preferences” If it can’t have either of the two, it will take the “least favorite” path if possible Example: {LR},{FL},{RL}, {FR},{RL},{RL},{FR},{RF} Can still cause bumps: {LF},{FR},{RL},{FL}, {RL},{FL},{RF},{RL}, {FL}
43
More Realistic Models Higher resolution lattices (45° lattice, etc.)
Off-lattice models Local moves Optimization/search methods and / representations Greedy search Branch and bound EC, Monte Carlo, simulated annealing, etc.
44
The Other Half of the Picture
Now that we have a more realistic off-lattice model, we need a better energy function to evaluate a conformation (fold). Theoretical force field: G = Gvan der Waals + Gh-bonds + Gsolvent + Gcoulomb Empirical force fields Start with a database Look at neighboring residues – similar to known protein folds?
45
Threading: Fold recognition
Given: Sequence: IVACIVSTEYDVMKAAR… A database of molecular coordinates Map the sequence onto each fold Evaluate Objective 1: improve scoring function Objective 2: folding
46
Secondary Structure Prediction
AGVGTVPMTAYGNDIQYYGQVT… A-VGIVPM-AYGQDIQY-GQVT… AG-GIIP--AYGNELQ--GQVT… AGVCTVPMTA---ELQYYG--T… AGVGTVPMTAYGNDIQYYGQVT… ----hhhHHHHHHhhh--eeEE…
47
Secondary Structure Prediction
Easier than folding Current algorithms can prediction secondary structure with 70-80% accuracy Chou, P.Y. & Fasman, G.D. (1974). Biochemistry, 13, Based on frequencies of occurrence of residues in helices and sheets PhD – Neural network based Uses a multiple sequence alignment Rost & Sander, Proteins, 1994 , 19, 55-72
48
Chou-Fasman Parameters
49
Chou-Fasman Algorithm
Identify -helices 4 out of 6 contiguous amino acids that have P(a) > 100 Extend the region until 4 amino acids with P(a) < 100 found Compute P(a) and P(b); If the region is >5 residues and P(a) > P(b) identify as a helix Repeat for -sheets [use P(b)] If an and a region overlap, the overlapping region is predicted according to P(a) and P(b)
50
Chou-Fasman, cont’d Identify hairpin turns: Accuracy 60-65%
P(t) = f(i) of the residue f(i+1) of the next residue f(i+2) of the following residue f(i+3) of the residue at position (i+3) Predict a hairpin turn starting at positions where: P(t) > The average P(turn) for the four residues > 100 P(a) < P(turn) > P(b) for the four residues Accuracy 60-65%
51
Chou-Fasman Example CAENKLDHVRGPTCILFMTWYNDGP
CAENKL – Potential helix (!C and !N) Residues with P(a) < 100: RNCGPSTY Extend: When we reach RGPT, we must stop CAENKLDHV: P(a) = 972, P(b) = 843 Declare alpha helix Identifying a hairpin turn VRGP: P(t) = Average P(turn) = Avg P(a) = 79.5, Avg P(b) = 98.25
52
Lots More to Come Microarray analysis Mass Spectrometry
Interactions/ Knockouts Synthetic Lethality RPPA .....
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.