Presentation is loading. Please wait.

Presentation is loading. Please wait.

Proteins Structural Bioinformatics. 2 3 Specific databases of protein sequences and structures  Swissprot  PIR  TREMBL (translated from DNA)  PDB.

Similar presentations


Presentation on theme: "Proteins Structural Bioinformatics. 2 3 Specific databases of protein sequences and structures  Swissprot  PIR  TREMBL (translated from DNA)  PDB."— Presentation transcript:

1 Proteins Structural Bioinformatics

2 2

3 3 Specific databases of protein sequences and structures  Swissprot  PIR  TREMBL (translated from DNA)  PDB (Three Dimensional Structures)

4 4 “ Perhaps the most remarkable features of the molecule are its complexity and its lack of symmetry. The arrangement seems to be almost totally lacking in the kind of regularities which one instinctively anticipates.” Solved in 1958 by Max Perutz John Kendrew of Cambridge University. Won the 1962 and Nobel Prize in Chemistry. Myoglobin – the first high resolution protein structure

5 5 Why Proteins Structure ?  Proteins are fundamental components of all living cells, performing a variety of biological tasks.  Each protein has a particular 3D structure that determines its function.  Protein structure is more conserved than protein sequence, and more closely related to function.

6 6 There Are Four Levels of Protein Structure Primary: amino acid linear sequence. Secondary:  -helices, β-sheets and loops. Tertiary: the 3D shape of the fully folded polypeptide chain Quaternary: arrangement of several polypeptide chains.

7 7 Symbols for the 20 amino acids A ala alanineM met methionine C cys cysteineN asn aspargine D asp aspartic acidP pro proline E glu glutamic acidQ gln glutamine F phe phenylalanineR arg arginine G gly glycineS ser serine H his histidineT thr threonine I ile isoleucineV val valine K lys lysineW trp tryptophane L leu leucineY tyr tyrosine

8 8 Secondary Structure Secondary structure is usually divided into three categories: Alpha helix Beta strand (sheet) Anything else – turn/loop

9 9 3.6 residues 5.6 Å Alpha Helix : Pauling (1951) A consecutive stretch of 5-40 amino acids (average 10). A right-handed spiral conformation. 3.6 amino acids per turn. Stabilized by H-bonds in the backbone between C=O of residue n, and NH of residue n+4. Side-chains point out.

10 10 Beta Strand : Pauling and Corey (1951) Different polypeptide chains run alongside each other and are linked together by hydrogen bonds. Each section is called β -strand, and consists of 5-10 amino acids. β -strand

11 11 The strands become adjacent to each other, forming beta-sheet. Beta Sheet 3.47Å 4.6Å 3.25Å 4.6Å (a)Antiparallel (b)Parallel

12 12 Loops Connect the secondary structure elements. Have various length and shapes. Located at the surface of the folded protein and therefore may have important role in biological recognition processes. Proteins that are evolutionary related have the same helices & sheets but may vary in loop structures.

13 13 How is the 3D Structure Determined ? 1. Experimental methods (Best approach): X-rays crystallography. NMR. Others. 2. In-silico methods (partial solutions - based on similarity): based on similarity):. Threading - needs a 3D structure, combinatorial complexity. Ab-initio structure prediction - not always successful.

14 14 X-ray crystallography 1.Obtain an ordered protein crystal. 2.Check x-ray diffraction. The crystal is bombarded with X-ray beams. The collision of the beams with the electrons creates a diffraction pattern.

15 15 X-ray crystallography 3.Analyze diffraction pattern and produce an electron density map. 4.Thread the known protein sequence into the density map.

16 16 X-ray crystallography The molecules must be very pure in order to produce perfect and stable crystals. The method is time-consuming and difficult.

17 17 NMR - Nuclear Magnetic Resonance (since 1945) A sample is immersed in a magnetic field and bombarded with radio waves. The molecule’s nucleus resonate (spin). This motion is determined and is specific for each molecule type.

18 18 Principles of NMR

19 19 NMR - Nuclear Magnetic Resonance The NMR technique is very time consuming and expensive, and the sample has to be in a concentrated solution, and is limited to small and soluble molecules.

20 20 PDB: Protein Data Bank Holds 3D models of biological macromolecules (protein, RNA, DNA). All data are available to the public. Obtained by X-Ray crystallography (84%) or NMR spectroscopy (16%). Submitted by biologists and biochemists from around the world.

21 21 PDB – Protein Data Bank http://www.rcsb.org/pdb/

22 22 How Many Structures ? PDB Content Growth http://www.rcsb.org/pdb/holdings.html

23 23 Structure Prediction: Motivation Hundreds of thousands of gene sequences translated to proteins (genbanbk, SW, PIR) Only about 28000 solved structures (PDB) Experimental methods are time consuming and not always posible Goal: Predict protein structure based on sequence information

24 24 Structure Prediction: Motivation Understand protein function –Locate binding sites Broaden homology –Detect similar function where sequence differs Explain disease –See effect of amino acid changes –Design suitable compensatory drugs

25 25 Prediction Approaches Primary (sequence) to secondary structure –Sequence characteristics Secondary to tertiary structure –Fold recognition –Threading against known structures Primary to tertiary structure –Ab initio modelling

26 26 Secondary structures have an amphiphilic nature : one face polar and the other non polar Non-polar polar  -helix  -sheet non- polar Can we predict the secondary structure from sequence ?

27 27 Secondary Structure Prediction Methods Chou-Fasman / GOR Method –Based on amino acid frequencies Artificial Neural Network (ANN) methods –PHDsec and PSIpred HMM (Hidden Markov Model) Best accuracy now ~80%

28 28 Chou and Fasman (1974) Name P(a) P(b) P(turn) Alanine 142 83 66 Arginine 98 93 95 Aspartic Acid 101 54 146 Asparagine 67 89 156 Cysteine 70 119 119 Glutamic Acid 151 037 74 Glutamine 111 110 98 Glycine 57 75 156 Histidine 100 87 95 Isoleucine 108 160 47 Leucine 121 130 59 Lysine 114 74 101 Methionine 145 105 60 Phenylalanine 113 138 60 Proline 57 55 152 Serine 77 75 143 Threonine 83 119 96 Tryptophan 108 137 96 Tyrosine 69 147 114 Valine 106 170 50 The propensity of an amino acid to be part of a certain secondary structure (e.g. – Proline has a low propensity of being in an alpha helix or beta sheet  breaker) Success rate of 50%

29 29 Secondary Structure Method Improvements ‘Sliding window’ approach Most alpha helices are ~12 residues long Most beta strands are ~6 residues long  Look at all windows of size 6/12  Calculate a score for each window. If >threshold  predict this is an alpha helix/beta sheet TGTAGPOLKCHIQWMLPLKK

30 30 Improvements in the 1980’s Adding information from conservation in MSA Smarter algorithms (e.g. HMM, neural networks). Success -> ~80%

31 31 PHDsec and PSIpred PHDsec –Rost & Sander, 1993 –Based on sequence family alignments PSIpred –Jones, 1999 –Based on Position Specific Scoring Matrix Generated by PSI-BLAST Both consider long-range interactions

32 32 HMM HMM enables us to calculate the probability of assigning a sequence of hidden states to the observation TGTAGPOLKCHIQWMLHHHHHHHLLLLBBBBB p = ? observation Hidden state

33 33 The probability of observing a residue which belongs to an α- helix followed by a residue belonging to a turn = 0.15 The probability of observing Alanine as part of a β-sheet Table built according to large database of known secondary structures α-helix followed by α-helix Beginning with an α- helix

34 34 HMM The above table enables us to calculate the probability of assigning secondary structure to a protein Example TGQHHH p = 0.45 x 0.041 x 0.8 x 0.028 x 0.8x 0.0635 = 0.0020995

35 35 SS prediction using ANN Inputs for one position Amino acid at position

36 36 PHDsec Neural Net Inputs for one position Amino acid at position Hidden layer Outputs H= helix E= strand C= Coil Confidence 0=low,9=high

37 37 Secondary structure prediction AGADIR - An algorithm to predict the helical content of peptidesAGADIR APSSP - Advanced Protein Secondary Structure Prediction ServerAPSSP GOR - Garnier et al, 1996GOR HNN - Hierarchical Neural Network method (Guermeur, 1997)HNN Jpred - A consensus method for protein secondary structure prediction at University of DundeeJpred JUFO - Protein secondary structure prediction from sequence (neural network)JUFO nnPredict - University of California at San Francisco (UCSF)nnPredict PredictProtein - PHDsec, PHDacc, PHDhtm, PHDtopology, PHDthreader, MaxHom, EvalSec from Columbia UniversityPredictProtein Prof - Cascaded Multiple Classifiers for Secondary Structure PredictionProf PSA - BioMolecular Engineering Research Center (BMERC) / BostonPSA PSIpred - Various protein structure prediction methods at Brunel UniversityPSIpred SOPMA - Geourjon and Del י age, 1995SOPMA SSpro - Secondary structure prediction using bidirectional recurrent neural networks at University of CaliforniaSSpro DLP - Domain linker prediction at RIKENDLP


Download ppt "Proteins Structural Bioinformatics. 2 3 Specific databases of protein sequences and structures  Swissprot  PIR  TREMBL (translated from DNA)  PDB."

Similar presentations


Ads by Google