1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala.

Slides:



Advertisements
Similar presentations
Secondary structure prediction from amino acid sequence.
Advertisements

PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification.
Protein Structure Prediction
The amino acids in their natural habitat. Topics: Hydrogen bonds Secondary Structure Alpha helix Beta strands & beta sheets Turns Loop Tertiary & Quarternary.
Secondary structure elements  helices  strands/sheets/barrels  turns The type of 2° structure is determined by the amino acid sequence –Chemical & physical.
Protein Threading Zhanggroup Overview Background protein structure protein folding and designability Protein threading Current limitations.
PDB-Protein Data Bank SCOP –Protein structure classification CATH –Protein structure classification genTHREADER–3D structure prediction Swiss-Model–3D.
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
Structural bioinformatics
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Chapter 9 Structure Prediction. Motivation Given a protein, can you predict molecular structure Want to avoid repeated x-ray crystallography, but want.
Protein structure (Part 2 of 2).
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Copyright  1996, 1999, All rights reserved.
The Protein Data Bank (PDB)
Protein Tertiary Structure Comparison Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia.
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
CISC667, F05, Lec20, Liao1 CISC 467/667 Intro to Bioinformatics (Fall 2005) Protein Structure Prediction Protein Secondary Structure.
1 Protein Structure Prediction Reporter: Chia-Chang Wang Date: April 1, 2005.
Structure Prediction in 1D
Protein structure determination & prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray.
Protein Structure Analysis - I
BMI 731 Protein Structures and Related Database Searches.
Bioinformatics (3 lectures) Why bother about proteins/prediction What is bioinformatics Protein databases Making use of database information –Predictions.
Protein Structures.
Computational Structure Prediction Kevin Drew BCH364C/391L Systems Biology/Bioinformatics 2/12/15.
Bioinformatics for biomedicine Protein domains and 3D structure Lecture 4, Per Kraulis
Chapter 12 Protein Structure Basics. 20 naturally occurring amino acids Free amino group (-NH2) Free carboxyl group (-COOH) Both groups linked to a central.
Structural alignment Protein structure Every protein is defined by a unique sequence (primary structure) that folds into a unique.
Tertiary Structure Prediction Methods Any given protein sequence Structure selection Compare sequence with proteins have solved structure Homology Modeling.
Macromolecular structure
Practical session 2b Introduction to 3D Modelling and threading 9:30am-10:00am 3D modeling and threading 10:00am-10:30am Analysis of mutations in MYH6.
Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.
Introduction to Protein Structure
Proteins. Proteins? What is its How does it How is its How does it How is it Where is it What are its.
COT 6930 HPC and Bioinformatics Protein Structure Prediction Xingquan Zhu Dept. of Computer Science and Engineering.
 Four levels of protein structure  Linear  Sub-Structure  3D Structure  Complex Structure.
PROTEINS PROTEINS Levels of Protein Structure.
CS 177 Proteins, part 2 (Computational modeling) Review of protein structures Computational Modeling Three-dimensional structural analysis in laboratory.
© Wiley Publishing All Rights Reserved. Protein 3D Structures.
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.
Computational prediction of protein-protein interactions Rong Liu
Secondary structure prediction
Protein Classification II CISC889: Bioinformatics Gang Situ 04/11/2002 Parts of this lecture borrowed from lecture given by Dr. Altman.
Web Servers for Predicting Protein Secondary Structure (Regular and Irregular) Dr. G.P.S. Raghava, F.N.A. Sc. Bioinformatics Centre Institute of Microbial.
Part I : Introduction to Protein Structure A/P Shoba Ranganathan Kong Lesheng National University of Singapore.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Protein secondary structure Prediction Why 2 nd Structure prediction? The problem Seq: RPLQGLVLDTQLYGFPGAFDDWERFMRE Pred:CCCCCHHHHHCCCCEEEECCHHHHHHCC.
Protein Secondary Structure Prediction G P S Raghava.
1 Protein Structure Prediction (Lecture for CS397-CXZ Algorithms in Bioinformatics) April 23, 2004 ChengXiang Zhai Department of Computer Science University.
Protein Tertiary Structure. Protein Data Bank (PDB) Contains all known 3D structural data of large biological molecules, mostly proteins and nucleic acids:
Protein Modeling Protein Structure Prediction. 3D Protein Structure ALA CαCα LEU CαCαCαCαCαCαCαCα PRO VALVAL ARG …… ??? backbone sidechain.
Manually Adjusting Multiple Alignments Chris Wilton.
Protein Structure Prediction ● Why ? ● Type of protein structure predictions – Sec Str. Pred – Homology Modelling – Fold Recognition – Ab Initio ● Secondary.
Introduction to Protein Structure Prediction BMI/CS 576 Colin Dewey Fall 2008.
Protein Structure and Bioinformatics. Chapter 2 What is protein structure? What are proteins made of? What forces determines protein structure? What is.
Comparative methods Basic logics: The 3D structure of the protein is deduced from: 1.Similarities between the protein and other proteins 2.Statistical.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
Protein backbone Biochemical view:
PROTEIN STRUCTURE (Donaldson, March 10,2003) What are we trying to learn about genes and their proteins: Predict function for unknown protein by comparison.
Lecture 10 CS566 Fall Structural Bioinformatics Motivation Concepts Structure Solving Structure Comparison Structure Prediction Modeling Structural.
Proteins Structure Predictions Structural Bioinformatics.
Protein Structure Prediction. Protein Sequence Analysis Molecular properties (pH, mol. wt. isoelectric point, hydrophobicity) Secondary Structure Super-secondary.
Computational Structure Prediction
Protein Structure September 7,
Introduction to Bioinformatics II
Haixu Tang School of Inforamtics
Protein Structures.
Protein structure prediction
Presentation transcript:

1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala University

2 Overview Introduction to proteins, structure & classification Protein Folding Experimental techniques for structure determination Structure prediction

3

4 Proteins Proteins play a crucial role in virtually all biological processes with a broad range of functions. The activity of an enzyme or the function of a protein is governed by the three-dimensional structure

5 20 amino acids - the building blocks

6 The Amino Acids

7 Hydrophilic or hydrophobic..? Virtually all soluble proteins feature a hydrophobic core surrounded by a hydrophilic surface But, peptide backbone is inherently polar ? Solution ; neutralize potential H-donors & acceptors using ordered secondary structure

8 Secondary Structure Secondary Structure:  -helix

9 3.6 residues / turn Axial dipole moment Not Proline & Glycine Protein surfaces Secondary Structure Secondary Structure:  -helix

10 Secondary Structure Secondary Structure:  -sheets

11 Secondary Structure Secondary Structure:  -sheets Parallel or antiparallel Alternating side-chains No mixing Loops often have polar amino acids

12 Structural classification Databases –SCOP, ’Structural Classification of Proteins’, manual classification –CATH, ’Class Architecture Topology Homology’, based on the SSAP algorithm –FSSP, ’Family of Structurally Similar Proteins’, based on the DALI algorithm –PClass, ’Protein Classification’ based on the LOCK and 3Dsearch algorithms

13 Structural classification, CATH Class, four types : –Mainly  –   structures –Mainly  –No secondary structure Arhitecture (fold) Topology (superfamily) Homology (family)

14 Structural classification..

15 Structural classification.. Two types of algorithms – Inter-Molecular, 3D, Rigid Body ; structural alignment in a common coordinate system (hard) e.g. VAST, LOCK.. alg. – Intra-Molecular, 2D, Internal Geometry ; structural alignment using internal distances and angles e.g. DALI, STRUCTURAL, SSAP.. alg.

16 Structural classification, SSAP SSAP, ‘Sequential Structure Alignment Program’ Basic idea ; The similarity between residue i in molecule A and residue k in molecule B is characterised in terms of their structural surroundings This similarity can be quantified into a score, S ik Based on this similarity score and some specified gap penalty, dynamic programming is used to find the optimal structural alignment

17 Structural classification, SSAP The structural neighborhood of residue i in A compared to residue k in B i k

18 Structural classification, SSAP.. Distance between residue i & j in molecule A ; d A i,j Similarity for two pairs of residues, i j in A & k l in B ; a,b constants Similarity between residue i in A and residue k in B ; Idea ; S i,k is big if the distances from residue i in A to the 2n nearest neighbours are similar to the corresponding distances around k in B

19 Structural classification, SSAP.. This works well for small structures and local structural alignments - however, insertions and deletions cause problems  unrelated distances HSERAHVFIM.. GQ-VMAC-NW.. i=5 k=4 A : B : - The real algorithm uses Dynamic programming on two levels, first to find which distances to compare  S ik, then to align the structures using these scores

20 Experimental techniques for structure determination X-ray Crystallography Nuclear Magnetic Resonance spectroscopy (NMR) Electron Microscopy/Diffraction Free electron lasers ?

21 X-ray Crystallography

22 X-ray Crystallography.. From small molecules to viruses Information about the positions of individual atoms Limited information about dynamics Requires crystals

23

24 NMR Limited to molecules up to ~50kDa (good quality up to 30 kDa) Distances between pairs of hydrogen atoms Lots of information about dynamics Requires soluble, non-aggregating material Assignment problem

25 Electron Microscopy/ Diffraction Low to medium resolution Limited information about dynamics Can use very small crystals (nm range) Can be used for very large molecules and complexes

26

27 Structure Prediction GPSRYIV… ?

28 Protein Folding Different sequence  Different structure Free energy difference small due to large entropy decrease,   G =  H - T  S

29 Structure Prediction Why is structure prediction and especially ab initio calculations hard..? Many degrees of freedom / residue Remote noncovalent interactions Nature does not go through all conformations Folding assisted by enzymes & chaperones

30 Ab initio calculations used for smaller problems ; Calculation of affinity Enzymatic pathways Molecular dynamics

31 Sequence Classification rev. Class : Secondary structure content Fold : Major structural similarity. Superfamily : Probable common evolutionary origin. Family : Clear evolutionary relationship.

32 Search sequence data banks for homologs Search methods e.g. BLAST, PSIBLAST, FASTA … Homologue in PDB..? Structure Prediction IVTY…PGGG HYW…QHG

33 Multiple sequence / structure alignment Contains more information than a single sequence for applications like homology modeling and secondary structure prediction Gives location of conserved parts and residues likely to be buried in the protein core or exposed to solvent Structure Prediction

34 HFD fingerprint Multiple alignment example

35 Statistical Analysis (old fashioned): –For each amino acid type assign it’s ‘propensity’ to be in a helix, sheet, or coil. Limited accuracy ~55-60%. Random prediction ~38%. MTLLALGINHKTAP... CCEEEEEECCCCCC... Secondary Structure Prediction

36 Each residue is classified as: –H  /H , strong helix / strand former. –h  /h , weak helix / strand former. –I, indifferent. –b  /b , weak helix/strand breaker. –B  /B , strong helix / strand breaker. The Chou & Fasman Method

37 The Chou & Fasman Method.. Score each residue: –H  /h  =1, I  =0 or ½, B  /b  =-1. –H  /h  =1, I  =0 or ½, B  /b  =-1. Helix nucleation: –Score > 4 in a “window” of 6 residues. Strand nucleation: –Score > 3 in a “window” of 5 residues. Propagate until score < 1 in a 4 residue “window”.

38 GPSRYIVTLANGK Helix: Strand No nucl … Nucleation Propagate GPSRYIVTLANGK Result The Chou & Fasman Method..

39 Neural networks (e.g. the PHD server): –Input: a number of protein sequences + secondary structure. –Output: a trained network that predicts secondary structure elements with ~70% accuracy. Use many different methods and compare (e.g. the JPred server)! Modern methods

40 Summary The function of a protein is governed by its structure Different sequence  Different structure PDB, protein data bank Secondary structure prediction is hard, tertiary structure prediction is even harder Use homologs whenever possible or different methods to assess quality

41