CAP5510 – Bioinformatics Protein Structures

Slides:



Advertisements
Similar presentations
The Structure and Function of Proteins Bioinformatics Ch 7
Advertisements

Protein Structure C483 Spring 2013.
1 Chapter 7 Protein and RNA Structure Prediction 暨南大學資訊工程學系 黃光璿 2004/05/24.
Protein Structure – Part-2 Pauling Rules The bond lengths and bond angles should be distorted as little as possible. No two atoms should approach one another.
The amino acids in their natural habitat. Topics: Hydrogen bonds Secondary Structure Alpha helix Beta strands & beta sheets Turns Loop Tertiary & Quarternary.
Protein 3-Dimensional Structure and Function
Proteins Function and Structure.
Pages 42 to 46.  Chemical composition  Carbon  Hydrogen  Oxygen  Nitrogen  Sulfur (sometimes)  Monomer/Building Block  Amino Acids (20 different.
From Sequences to Structure
Chemical Biology 03 BLOOD
Disulfide Bonds Two cyteines in close proximity will form a covalent bond Disulfide bond, disulfide bridge, or dicysteine bond. Significantly stabilizes.
The Structure and Functions of Proteins BIO271/CS399 – Bioinformatics.
Protein-a chemical view A chain of amino acids folded in 3D Picture from on-line biology bookon-line biology book Peptide Protein backbone N / C terminal.
1 Levels of Protein Structure Primary to Quaternary Structure.
Protein Primer. Outline n Protein representations n Structure of Proteins Structure of Proteins –Primary: amino acid sequence –Secondary:  -helices &
Roadmap The topics: basic concepts of molecular biology more on Perl
Protein Basics Protein function Protein structure –Primary Amino acids Linkage Protein conformation framework –Dihedral angles –Ramachandran plots Sequence.
Genetic Threading By J.Yadgari and A.Amir Published: special issue on Bioinformatics in Journal of Constraints, June 2001 Alexandre Tchourbanov University.
A PEPTIDE BOND PEPTIDE BOND Polypeptides are polymers of amino acid residues linked by peptide group Peptide group is planar in nature which limits.
Protein Structural Prediction. Protein Structure is Hierarchical.
Proteins account for more than 50% of the dry mass of most cells
Computational Structure Prediction Kevin Drew BCH364C/391L Systems Biology/Bioinformatics 2/12/15.
Protein Structure Prediction Dr. G.P.S. Raghava Protein Sequence + Structure.
Chapter 12 Protein Structure Basics. 20 naturally occurring amino acids Free amino group (-NH2) Free carboxyl group (-COOH) Both groups linked to a central.
Housekeeping Your performance on the exam has caused me to re-evaluate how homework will be handled I will now be picking up every problem assigned on.
Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.
What are proteins? Proteins are important; e.g. for catalyzing and regulating biochemical reactions, transporting molecules, … Linear polymer chain composed.
Protein Secondary Structure Lecture 2/19/2003. Three Dimensional Protein Structures Confirmation: Spatial arrangement of atoms that depend on bonds and.
Lecture 10: Protein structure
Introduction to Protein Structure
Proteins: Secondary Structure Alpha Helix
Proteins. Proteins? What is its How does it How is its How does it How is it Where is it What are its.
Protein Folding & Biospectroscopy F14PFB Dr David Robinson Lecture 2.
02/03/10 CSCE 769 Dihedral Angles Homayoun Valafar Department of Computer Science and Engineering, USC.
STRUCTURAL ORGANIZATION
RNA Secondary Structure Prediction Spring Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?
PROTEINS PROTEINS Levels of Protein Structure.
Bioinformatics: Practical Application of Simulation and Data Mining Protein Folding I Prof. Corey O’Hern Department of Mechanical Engineering & Materials.
Protein Folding & Biospectroscopy F14PFB David Robinson Mark Searle Jon McMaster
Amino acids and proteins … for AS Biology. Amino acids Proteins are macromolecules consisting of long unbranched chains of amino acids. All amino acids.
BIOL 200 (Section 921) Lecture # 2, June 20, 2006 Reading for lecture 2: Essential Cell Biology (ECB) 2nd edition. Chap 2 pp 55-56, 58-64, 74-75; Chap.
Department of Mechanical Engineering
Secondary structure prediction
CS790 – BioinformaticsProtein Structure and Function1 Review of fundamental concepts  Know how electron orbitals and subshells are filled Know why atoms.
Operone lac Principles of protein structure and function Function is derived from structure Structure is derived from amino acid sequence Different.
Part I : Introduction to Protein Structure A/P Shoba Ranganathan Kong Lesheng National University of Singapore.
Protein 3-Dimensional Structure and Function. Terminology Conformation – spatial arrangement of atoms in a protein Native conformation – conformation.
CS790 – BioinformaticsProtein Structure and Function1 Disulfide Bonds  Two cyteines in close proximity will form a covalent bond  Disulfide bond, disulfide.
Protein structure and function Part - I
Protein Modeling Protein Structure Prediction. 3D Protein Structure ALA CαCα LEU CαCαCαCαCαCαCαCα PRO VALVAL ARG …… ??? backbone sidechain.
Introduction to Protein Structure Prediction BMI/CS 576 Colin Dewey Fall 2008.
3-D Structure of Proteins
Protein Structure and Bioinformatics. Chapter 2 What is protein structure? What are proteins made of? What forces determines protein structure? What is.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
Protein backbone Biochemical view:
Levels of Protein Structure. Why is the structure of proteins (and the other organic nutrients) important to learn?
Levels of Protein Structure. Why is the structure of proteins (and the other organic nutrients) important to learn?
Structural organization of proteins
Protein Structure BL
Protein Proteins are biochemical compounds consisting of one or more polypeptides typically folded into a globular or fibrous form in a biologically functional.
Proteins Primary structure: Amino acids link together to form a linear polypeptide. The primary structure of a protein is a linear chain of amino acids.
The heroic times of crystallography
Amino Acids and Proteins
Protein Structure September 7,
The Peptide Bond Amino acids are joined together in a condensation reaction that forms an amide known as a peptide bond.
The Peptide Bond Amino acids are joined together in a condensation reaction that forms an amide known as a peptide bond.
Amino acids are linked by peptide bonds
Levels of Protein Structure
Fig 3.13 Reproduced from: Biochemistry by T.A. Brown, ISBN: © Scion Publishing Ltd, 2017.
The Three-Dimensional Structure of Proteins
Presentation transcript:

CAP5510 – Bioinformatics Protein Structures Tamer Kahveci CISE Department University of Florida

What and Why? Proteins fold into a three dimensional shape Structure can reveal functional information that we can not find from sequence Misfolding proteins can cause diseases Sickle cell anemia, mad cow disease Used in drug design Hemoglobin Normal v.s. sickled blood cells E → V HIV protease inhibitor

Goals Understand protein structures Learn how protein shapes are Primary, secondary, tertiary Learn how protein shapes are determined Predicted Structure comparison (?)

A Protein Sequence >gi|22330039|ref|NP_683383.1| unknown protein; protein id: At1g45196.1 [Arabidopsis thaliana] MPSESSYKVHRPAKSGGSRRDSSPDSIIFTPESNLSLFSSASVSVDRCSSTSDAHDRDDSLISAWKEEFEVKKDDESQNL DSARSSFSVALRECQERRSRSEALAKKLDYQRTVSLDLSNVTSTSPRVVNVKRASVSTNKSSVFPSPGTPTYLHSMQKGW SSERVPLRSNGGRSPPNAGFLPLYSGRTVPSKWEDAERWIVSPLAKEGAARTSFGASHERRPKAKSGPLGPPGFAYYSLY SPAVPMVHGGNMGGLTASSPFSAGVLPETVSSRGSTTAAFPQRIDPSMARSVSIHGCSETLASSSQDDIHESMKDAATDA QAVSRRDMATQMSPEGSIRFSPERQCSFSPSSPSPLPISELLNAHSNRAEVKDLQVDEKVTVTRWSKKHRGLYHGNGSKM

Amino Acid Composition Basic Amino Acid Structure: The side chain, R, varies for each of the 20 amino acids Side chain C R C H N O OH Amino group Carboxyl group

The Peptide Bond O O Dehydration synthesis Repeating backbone: N–C –C –N–C –C Convention – start at amino terminus and proceed to carboxy terminus O O

Peptidyl polymers A few amino acids in a chain are called a polypeptide. A protein is usually composed of 50 to 400+ amino acids. We call the units of a protein amino acid residues. amide nitrogen carbonyl carbon

Side chain properties Carbon does not make hydrogen bonds with water easily – hydrophobic O and N are generally more likely than C to h-bond to water – hydrophilic We group the amino acids into three general groups: Hydrophobic Charged (positive/basic & negative/acidic) Polar

The Hydrophobic Amino Acids

The Charged Amino Acids

The Polar Amino Acids

More Polar Amino Acids And then there’s…

Planarity of the Peptide Bond Phi () – the angle of rotation about the N-C bond. Psi () – the angle of rotation about the C-C bond. The planar bond angles and bond lengths are fixed.

Primary & Secondary Structure Primary structure = the linear sequence of amino acids comprising a protein: AGVGTVPMTAYGNDIQYYGQVT… Secondary structure Regular patterns of hydrogen bonding in proteins result in two patterns that emerge in nearly every protein structure known: the -helix and the -sheet The location of direction of these periodic, repeating structures is known as the secondary structure of the protein

The Alpha Helix     60°

Properties of the Alpha Helix     60° Hydrogen bonds between C=O of residue n, and NH of residue n+4 3.6 residues/turn 1.5 Å/residue rise 100°/residue turn

Properties of -helices 4 – 40+ residues in length Often amphipathic or “dual-natured” Half hydrophobic and half hydrophilic If we examine many -helices, we find trends… Helix formers: Ala, Glu, Leu, Met Helix breakers: Pro, Gly, Tyr, Ser

The beta strand (& sheet)    135°   +135°

Properties of beta sheets Formed of stretches of 5-10 residues in extended conformation Parallel/aniparallel, contiguous/non-contiguous

Anti-Parallel Beta Sheets

Parallel Beta Sheets

Mixed Beta Sheets

Turns and Loops Secondary structure elements are connected by regions of turns and loops Turns – short regions of non-, non- conformation Loops – larger stretches with no secondary structure. Sequences vary much more than secondary structure regions

Ramachandran Plot

Levels of Protein Structure Secondary structure elements combine to form tertiary structure Quaternary structure occurs in multienzyme complexes

Protein Structure Example Beta Sheet Helix Loop ID: 12as 2 chains

Views of a Protein Wireframe Ball and stick

Views of a protein Spacefill Cartoon CPK colors Carbon = green, black, or grey Nitrogen = blue Oxygen = red Sulfur = yellow Hydrogen = white

Common Protein Motifs

Mostly Helical Folding Motifs Four helical bundle: Globin domain:

/ Motifs / barrel:

Open Twisted Beta Sheets

Beta Barrels

Determining the Structure of a Protein Experimental Methods X-ray NMR As of August 2013, structure of > 85,000 proteins are determined

X-Ray Crystallography Discovery of X-rays (Wilhelm Conrad Röntgen, 1895) Crystals diffract X-rays in regular patterns (Max Von Laue, 1912) The first X-ray diffraction pattern from a protein crystal (Dorothy Hodgkin, 1934)

X-Ray Crystallography Grow millions of protein crystals Takes months Expose to radiation beam Analyze the image with computer Average over many copies of images PDB Not all proteins can be crystallized!

NMR Nuclear Magnetic Resonance Nuclei of atoms vibrate when exposed to oscillating magnetic field Detect vibrations by external sensors Computes inter-atomic distances. Requires complex analysis. NMR can be used for short sequences (<200 residues) More than one model can be derived from NMR.

Determining the Structure of a Protein Computational Methods

The Protein Folding Problem Central question of molecular biology: “Given a particular sequence of amino acid residues (primary structure), what will the secondary/tertiary/quaternary structure of the resulting protein be?” Input: AAVIKYGCAL… Output: 11, 22…

Structure v.s. Sequence Observation: A protein with the same sequence (under the same circumstances) yields the same shape. Protein folds into a shape that minimizes the energy needed to stay in that shape. Protein folds in ~10-15 seconds.

Secondary Structure Prediction

Chou-Fasman methods Uses statistically obtained Chou-Fasman parameters. For each amino acid has P(a): alpha P(b): beta P(t): turn f(): additional turn parameter.

Chou-Fasman Parameters

C.-F. Alpha Helix Prediction (1) M Q S Y V 142 151 83 121 70 145 111 77 69 106 37 119 130 105 110 75 147 170 P(a) P(b) Find P(a) for all letters Find 6 contiguous letters, at least 4 of them have P(a) > 100 Declare these regions as alpha helix

C.-F. Alpha Helix Prediction (2) M Q S Y V 142 151 83 121 70 145 111 77 69 106 37 119 130 105 110 75 147 170 P(a) P(b) Extend in both directions until 4 consecutive letters with P(a) < 100 found

C.-F. Alpha Helix Prediction (3) M Q S Y V 142 151 83 121 70 145 111 77 69 106 37 119 130 105 110 75 147 170 P(a) P(b) Find sum of P(a) (Sa) and sum of P(b) (Sb) in the extended region If region is long enough ( >= 5 letters) and P(a) > P(b) then declare the extended region as alpha helix

C.-F. Beta Sheet Prediction Same as alpha helix replace P(a) with P(b) Resolving overlapping alpha helix & beta sheet Compute sum of P(a) (Sa) and sum of P(b) (Sb) in the overlap. If Sa > Sb => alpha helix If Sb > Sa => beta sheet

C.-F. Turn Prediction A E T L C M Q S Y V 142 151 83 121 70 145 111 77 69 106 37 119 130 105 110 75 147 170 66 74 96 59 60 98 143 114 50 i i+1 i+2 i+3 P(a) P(b) P(t) f() An amino acid is predicted as turn if all of the following holds: f(i)*f(i+1)*f(i+2)*f(i+3) > 0.000075 Avg(P(i+k)) > 100, for k=0, 1, 2, 3 Sum(P(t)) > Sum(P(a)) and Sum(P(b)) for i+k, (k=0, 1, 2, 3)

Other Methods for SSE Prediction Similarity searching Predator Markov chain Neural networks PHD ~65% to 80% accuracy

Tertiary Structure Prediction

Forces driving protein folding It is believed that hydrophobic collapse is a key driving force for protein folding Hydrophobic core Polar surface interacting with solvent Minimum volume (no cavities) Disulfide bond formation stabilizes Hydrogen bonds Polar and electrostatic interactions

Fold Optimization Simple lattice models (HP-models or Hydrophobic-Polar models) Two types of residues: hydrophobic and polar 2-D or 3-D lattice The only force is hydrophobic collapse Score = number of HH contacts

Scoring Lattice Models H/P model scoring: count noncovalent hydrophobic interactions. Sometimes: Penalize for buried polar or surface hydrophobic residues

Can we use lattice models? For smaller polypeptides, exhaustive search can be used Looking at the “best” fold, even in such a simple model, can teach us interesting things about the protein folding process For larger chains, other optimization and search methods must be used Greedy, branch and bound Evolutionary computing, simulated annealing

The “hydrophobic zipper” effect Ken Dill ~ 1997

Representing a lattice model Absolute directions UURRDLDRRU Relative directions LFRFRRLLFFL Advantage, we can’t have UD or RL in absolute Only three directions: LRF What about bumps? LFRRR Bad score Use a better representation

Preference-order representation Each position has two “preferences” If it can’t have either of the two, it will take the “least favorite” path if possible Example: {LR},{FL},{RL}, {FR},{RL},{RL},{FL},{RF} Can still cause bumps: {LF},{FR},{RL},{FL}, {RL},{FL},{RF},{RL}, {FL}

More realistic models Higher resolution lattices (45° lattice, etc.) Off-lattice models Local moves Optimization/search methods and / representations Greedy search Branch and bound EC, Monte Carlo, simulated annealing, etc.

How to Evaluate the Result? Now that we have a more realistic off-lattice model, we need a better energy function to evaluate a conformation (fold). Theoretical force field: G = Gvan der Waals + Gh-bonds + Gsolvent + Gcoulomb Empirical force fields Start with a database Look at neighboring residues – similar to known protein folds?

Comparative Modeling Identify similar protein sequences from a database of known proteins (BLAST) Find conserved regions by aligning these proteins (CLUSTAL-W) Predict alpha helices and beta sheets from conserved regions, backbone Predict loops Predict side chain positions Evaluate

Threading: Fold recognition Given: Sequence: IVACIVSTEYDVMKAAR… A database of molecular coordinates Map the sequence onto each fold Evaluate Objective 1: improve scoring function Objective 2: folding

Folding : still a hard problem Levinthal’s paradox – Consider a 100 residue protein. If each residue can take only 3 positions, there are 3100 = 5  1047 possible conformations. If it takes 10-13s to convert from 1 structure to another, exhaustive search would take 1.6  1027 years.

Protein Classification Class: Similar secondary structure properties All alpha, all beta, alpha/beta, alpha+beta Fold: major secondary structure similarity. Globin like (6 helices, folded leaf, partly opened) Super family: distant homologs. 25-30% sequence identity. Family: close homologs. Evolved from the same ancestor. High identity.