Presentation is loading. Please wait.

Presentation is loading. Please wait.

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU April 8, 2003Claus Lundegaard Protein Secondary Structures Assignment and prediction.

Similar presentations


Presentation on theme: "CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU April 8, 2003Claus Lundegaard Protein Secondary Structures Assignment and prediction."— Presentation transcript:

1 CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU April 8, 2003Claus Lundegaard Protein Secondary Structures Assignment and prediction

2 CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU April 8, 2003Claus Lundegaard Use of secondary structure Definition of domain boundaries Classification of protein structures Definition of loops/core Use in fold recognition methods Improvements of alignments

3 CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU April 8, 2003Claus Lundegaard Secondary Structure Elements

4 CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU April 8, 2003Claus Lundegaard Classification of secondary structure Defining features –Dihedral angles –Hydrogen bonds –Geometry Assigned manually by crystallographers or Automatic –DSSP (Kabsch & Sander,1983) –STRIDE (Frishman & Argos, 1995) –Continuum (Andersen et al.)

5 CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU April 8, 2003Claus Lundegaard Dihedral Angles phi - dihedral angle about the N-Calpha bond psi - dihedral angle about the Calpha-C bond omega - dihedral angle about the C-N (peptide) bond From http://www.imb-jena.de

6 CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU April 8, 2003Claus Lundegaard Secondary Structure Types *H =  -helix *B = residue in isolated  -bridge *E = extended strand, participates in  -ladder *G = 3-helix (3/10 helix) *I = 5 helix (  -helix) *T = hydrogen bonded turn *S = bend

7 CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU April 8, 2003Claus Lundegaard Alpha helices phi(deg) psi(deg) H-bond pattern ------------------------------------------------------------------ right-handed alpha-helix -57.8 -47.0 i+4 pi-helix -57.1 -69.7 i+5 3-10 helix -74.0 -4.0 i+3 (omega is 180 deg in all cases) ----------------------------------------------------------------- From http://www.imb-jena.de

8 CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU April 8, 2003Claus Lundegaard Beta Strands phi(deg) psi(deg) omega (deg) ------------------------------------------------------------------ beta strand -120 120 180 ----------------------------------------------------------------- Hydrogen bond patterns in beta sheets. Here a four-stranded beta sheet is drawn schematically which contains three antiparallel and one parallel strand. Hydrogen bonds are indicated with red lines (antiparallel strands) and green lines (parallel strands) connecting the hydrogen and receptor oxygen. From http://broccoli.mfn.ki.se/pps_course_96/

9 CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU April 8, 2003Claus Lundegaard Automatic assignment programs DSSP ( http://www.cmbi.kun.nl/gv/dssp/ )http://www.cmbi.kun.nl/gv/dssp/ STRIDE ( http://www.hgmp.mrc.ac.uk/Registered/Option/stride.html )http://www.hgmp.mrc.ac.uk/Registered/Option/stride.html # RESIDUE AA STRUCTURE BP1 BP2 ACC N-H-->O O-->H-N N-H-->O O-->H-N TCO KAPPA ALPHA PHI PSI X-CA Y-CA Z-CA 1 4 A E 0 0 205 0, 0.0 2,-0.3 0, 0.0 0, 0.0 0.000 360.0 360.0 360.0 113.5 5.7 42.2 25.1 2 5 A H - 0 0 127 2, 0.0 2,-0.4 21, 0.0 21, 0.0 -0.987 360.0-152.8-149.1 154.0 9.4 41.3 24.7 3 6 A V - 0 0 66 -2,-0.3 21,-2.6 2, 0.0 2,-0.5 -0.995 4.6-170.2-134.3 126.3 11.5 38.4 23.5 4 7 A I E -A 23 0A 106 -2,-0.4 2,-0.4 19,-0.2 19,-0.2 -0.976 13.9-170.8-114.8 126.6 15.0 37.6 24.5 5 8 A I E -A 22 0A 74 17,-2.8 17,-2.8 -2,-0.5 2,-0.9 -0.972 20.8-158.4-125.4 129.1 16.6 34.9 22.4 6 9 A Q E -A 21 0A 86 -2,-0.4 2,-0.4 15,-0.2 15,-0.2 -0.910 29.5-170.4 -98.9 106.4 19.9 33.0 23.0 7 10 A A E +A 20 0A 18 13,-2.5 13,-2.5 -2,-0.9 2,-0.3 -0.852 11.5 172.8-108.1 141.7 20.7 31.8 19.5 8 11 A E E +A 19 0A 63 -2,-0.4 2,-0.3 11,-0.2 11,-0.2 -0.933 4.4 175.4-139.1 156.9 23.4 29.4 18.4 9 12 A F E -A 18 0A 31 9,-1.5 9,-1.8 -2,-0.3 2,-0.4 -0.967 13.3-160.9-160.6 151.3 24.4 27.6 15.3 10 13 A Y E -A 17 0A 36 -2,-0.3 2,-0.4 7,-0.2 7,-0.2 -0.994 16.5-156.0-136.8 132.1 27.2 25.3 14.1 11 14 A L E >> -A 16 0A 24 5,-3.2 4,-1.7 -2,-0.4 5,-1.3 -0.929 11.7-122.6-120.0 133.5 28.0 24.8 10.4 12 15 A N T 45S+ 0 0 54 -2,-0.4 -2, 0.0 2,-0.2 0, 0.0 -0.884 84.3 9.0-113.8 150.9 29.7 22.0 8.6 13 16 A P T 45S+ 0 0 114 0, 0.0 -1,-0.2 0, 0.0 -2, 0.0 -0.963 125.4 60.5 -86.5 8.5 32.0 21.6 6.8 14 17 A D T 45S- 0 0 66 2,-0.1 -2,-0.2 1,-0.1 3,-0.1 0.752 89.3-146.2 -64.6 -23.0 33.0 25.2 7.6 15 18 A Q T <5 + 0 0 132 -4,-1.7 2,-0.3 1,-0.2 -3,-0.2 0.936 51.1 134.1 52.9 50.0 33.3 24.2 11.2 16 19 A S E < +A 11 0A 44 -5,-1.3 -5,-3.2 2, 0.0 2,-0.3 -0.877 28.9 174.9-124.8 156.8 32.1 27.7 12.3 17 20 A G E -A 10 0A 28 -2,-0.3 2,-0.3 -7,-0.2 -7,-0.2 -0.893 15.9-146.5-151.0-178.9 29.6 28.7 14.8 18 21 A E E -A 9 0A 14 -9,-1.8 -9,-1.5 -2,-0.3 2,-0.4 -0.979 5.0-169.6-158.6 146.0 28.0 31.5 16.7 19 22 A F E +A 8 0A 3 12,-0.4 12,-2.3 -2,-0.3 2,-0.3 -0.982 27.8 149.2-139.1 120.3 26.5 32.2 20.1 20 23 A M E -AB 7 30A 0 -13,-2.5 -13,-2.5 -2,-0.4 2,-0.4 -0.983 39.7-127.8-152.1 161.6 24.5 35.4 20.6 21 24 A F E -AB 6 29A 45 8,-2.4 7,-2.9 -2,-0.3 8,-1.0 -0.934 23.9-164.1-112.5 137.7 21.7 37.0 22.6 22 25 A D E -AB 5 27A 6 -17,-2.8 -17,-2.8 -2,-0.4 2,-0.5 -0.948 6.9-165.0-123.7 138.3 18.9 38.9 20.8 23 26 A F E > S-AB 4 26A 76 3,-3.5 3,-2.1 -2,-0.4 -19,-0.2 -0.947 78.4 -27.2-127.3 111.5 16.4 41.3 22.3 24 27 A D T 3 S- 0 0 74 -21,-2.6 -20,-0.1 -2,-0.5 -1,-0.1 0.904 128.9 -46.6 50.4 45.0 13.4 42.1 20.2 25 28 A G T 3 S+ 0 0 20 -22,-0.3 2,-0.4 1,-0.2 -1,-0.3 0.291 118.8 109.3 84.7 -11.1 15.4 41.4 17.0 26 29 A D E < S-B 23 0A 114 -3,-2.1 -3,-3.5 109, 0.0 2,-0.3 -0.822 71.8-114.7-103.1 140.3 18.4 43.4 18.1 27 30 A E E -B 22 0A 8 -2,-0.4 -5,-0.3 -5,-0.2 3,-0.1 -0.525 24.9-177.7 -74.1 127.5 21.8 41.8 19.1

10 CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU April 8, 2003Claus Lundegaard Straight HEC Secondary Structure Prediction What to predict? –All 8 types or pool types into groups H E C *H =  helix *B = residue in isolated  -bridge *E = extended strand, participates in  ladder *G = 3-helix (3/10 helix) * I = 5 helix (  helix) *T = hydrogen bonded turn *S = bend *C/.= random coil CASP Q3

11 CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU April 8, 2003Claus Lundegaard Secondary Structure Prediction Simple alignments. Heuristic Methods (e.g., Chou-Fasman, 1974) Neural Networks (different inputs) –Raw Sequence (late 80’s) –Scoring matrix (e.g., PhD, early 90’s) –Position specific alignment profiles (e.g., PsiPred, late 90’s) –Multiple networks balloting, probability conversion, output expansion (Petersen et al., 2000).

12 CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU April 8, 2003Claus Lundegaard Simple Alignments Solved structures homologous to query needed Homologous proteins have ~88% identical (3 state) secondary structure If no homologue can be identified alignment will give almost random results

13 CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU April 8, 2003Claus Lundegaard Amino acid preferences in  - Helix

14 CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU April 8, 2003Claus Lundegaard Amino acid preferences in  - Strand

15 CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU April 8, 2003Claus Lundegaard Amino acid preferences in coil

16 CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU April 8, 2003Claus Lundegaard Chou-Fasman NameP(a)P(b)P(turn)f(i)f(i+1)f(i+2)f(i+3) Ala 14283660.060.0760.0350.058 Arg 9893950.0700.1060.0990.085 Asp 101541460.1470.1100.1790.081 Asn 67891560.1610.0830.1910.091 Cys 701191190.1490.0500.1170.128 Glu 15137740.0560.0600.0770.064 Gln 111110980.0740.0980.0370.098 Gly 57751560.1020.0850.1900.152 His 10087950.1400.0470.0930.054 Ile 108160470.0430.0340.0130.056 Leu 121130590.0610.0250.0360.070 Lys 114741010.0550.1150.0720.095 Met 145105600.0680.0820.0140.055 Phe 113138600.0590.0410.0650.065 Pro 57551520.1020.3010.0340.068 Ser 77751430.1200.1390.1250.106 Thr 83119960.0860.1080.0650.079 Trp 108137960.0770.0130.0640.167 Tyr 691471140.0820.0650.1140.125 Val 106170500.0620.0480.0280.053

17 CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU April 8, 2003Claus Lundegaard Chou-Fasman 1.Assign all of the residues in the peptide the appropriate set of parameters. 2.Scan through the peptide and identify regions where 4 out of 6 contiguous residues have P(a-helix) > 100. That region is declared an alpha-helix. Extend the helix in both directions until a set of four contiguous residues that have an average P(a-helix) P(b-sheet) for that segment, the segment can be assigned as a helix. 3.Repeat this procedure to locate all of the helical regions in the sequence. 4.Scan through the peptide and identify a region where 3 out of 5 of the residues have a value of P(b-sheet) > 100. That region is declared as a beta-sheet. Extend the sheet in both directions until a set of four contiguous residues that have an average P(b-sheet) 105 and the average P(b-sheet) > P(a-helix) for that region. 5.Any region containing overlapping alpha-helical and beta-sheet assignments are taken to be helical if the average P(a-helix) > P(b-sheet) for that region. It is a beta sheet if the average P(b-sheet) > P(a-helix) for that region. 6.To identify a bend at residue number j, calculate the following value: p(t) = f(j)f(j+1)f(j+2)f(j+3) where the f(j+1) value for the j+1 residue is used, the f(j+2) value for the j+2 residue is used and the f(j+3) value for the j+3 residue is used. If: (1) p(t) > 0.000075; (2) the average value for P(turn) > 1.00 in the tetra-peptide; and (3) the averages for the tetra-peptide obey the inequality P(a-helix) P(b-sheet), then a beta-turn is predicted at that location.

18 CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU April 8, 2003Claus Lundegaard Chou-Fasman General applicable Works for sequences with no solved homologs Low Accuracy

19 CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU April 8, 2003Claus Lundegaard Neural Networks Benefits –General applicable –Can capture higher order correlations –Inputs other than sequence information Drawbacks –Needs many data (different solved structures) –Risk of overtraining

20 CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU April 8, 2003Claus Lundegaard Architecture I K E E H V I I Q A E H E C IKEEHVIIQAEFYLNPDQSGEF….. Window Input Layer Hidden Layer Output Layer Weights

21 CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU April 8, 2003Claus Lundegaard Sparse encoding Inp Neuron 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 AAcid A 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 R 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 N 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 D 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Q 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 E 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0

22 CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU April 8, 2003Claus Lundegaard Input Layer I K E E H V I I Q A E 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0

23 CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU April 8, 2003Claus Lundegaard BLOSUM 62 A R N D C Q E G H I L K M F P S T W Y V B Z X * A 4 -1 -2 -2 0 -1 -1 0 -2 -1 -1 -1 -1 -2 -1 1 0 -3 -2 0 -2 -1 0 -4 R -1 5 0 -2 -3 1 0 -2 0 -3 -2 2 -1 -3 -2 -1 -1 -3 -2 -3 -1 0 -1 -4 N -2 0 6 1 -3 0 0 0 1 -3 -3 0 -2 -3 -2 1 0 -4 -2 -3 3 0 -1 -4 D -2 -2 1 6 -3 0 2 -1 -1 -3 -4 -1 -3 -3 -1 0 -1 -4 -3 -3 4 1 -1 -4 C 0 -3 -3 -3 9 -3 -4 -3 -3 -1 -1 -3 -1 -2 -3 -1 -1 -2 -2 -1 -3 -3 -2 -4 Q -1 1 0 0 -3 5 2 -2 0 -3 -2 1 0 -3 -1 0 -1 -2 -1 -2 0 3 -1 -4 E -1 0 0 2 -4 2 5 -2 0 -3 -3 1 -2 -3 -1 0 -1 -3 -2 -2 1 4 -1 -4 G 0 -2 0 -1 -3 -2 -2 6 -2 -4 -4 -2 -3 -3 -2 0 -2 -2 -3 -3 -1 -2 -1 -4 H -2 0 1 -1 -3 0 0 -2 8 -3 -3 -1 -2 -1 -2 -1 -2 -2 2 -3 0 0 -1 -4 I -1 -3 -3 -3 -1 -3 -3 -4 -3 4 2 -3 1 0 -3 -2 -1 -3 -1 3 -3 -3 -1 -4 L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4 -2 2 0 -3 -2 -1 -2 -1 1 -4 -3 -1 -4 K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5 -1 -3 -1 0 -1 -3 -2 -2 0 1 -1 -4 M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5 0 -2 -1 -1 -1 -1 1 -3 -1 -1 -4 F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6 -4 -2 -2 1 3 -1 -3 -3 -1 -4 P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7 -1 -1 -4 -3 -2 -2 -1 -2 -4 S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4 1 -3 -2 -2 0 0 0 -4 T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5 -2 -2 0 -1 -1 0 -4 W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11 2 -3 -4 -3 -2 -4 Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7 -1 -3 -2 -1 -4 V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4 -3 -2 -1 -4

24 CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU April 8, 2003Claus Lundegaard Input Layer I K E E H V I I Q A E 0 0 2 -4 2 5 -2 0 -3 1 -2 -3 0 -3 -2

25 CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU April 8, 2003Claus Lundegaard Structure to Structure H E C H E C H E C H E C IKEEHVIIQAEFYLNPDQSGEF….. Window Input Layer Hidden Layer Output Layer Weights

26 CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU April 8, 2003Claus Lundegaard PHD method (Rost and Sander) Combine neural networks with sequence profiles –6-8 Percentage points increase in prediction accuracy over standard neural networks Use second layer “Structure to structure” network to filter predictions Jury of predictors Set up as mail server

27 CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU April 8, 2003Claus Lundegaard Position specific scoring matrix QUERY 1 IKEEHVIIQAEFYLNPDQSGEF 22 Q30631 26 IKEEHVIIQAEFYLNPDQSGEF 47 AAA59783 26 IKEEHVIIQAEFYLNPDQSGEF 47 AAA59787 26 IKEEHVIIQAEFYLNPDQSGEF 47 NP_061984 26 IKEEHVIIQAEFYLNPDQSGEF 47 CAA25076 1 IKEEHVIIQAEFYLNPDQSGEF 22 1102205B 14 IKEEHVIIQAEFYLNPDQSGEF 35 1HXY_A 1 IKEEHVIIQAEFYLNPDQSGEF 22 1SEB_A 1 IKEEHVIIQAEFYLNPDQSGEF 22 1AQD_A 1 IKEEHVIIQAEFYLNPDQSGEF 22 1BX2_A 1 KEEHVIIQAEFYLNPDQSGEF 21 AAB65589 26 IKEEHVIIQAEFYLKPDSSGEF 47 AAB65587 26 IKEEHVIIQAEFYLKPDSSGEF 47 BAA23385 25 IKEDHVIIQAEFYLNPEQSAEF 46 AAA36283 26 IKEEHVIIQAEFYLNPDQSGEF 47 AAA42357 25 IKEEHSIIQAEFYLSPDQSGEF 46 AAA42366 26 IKEEHSIIQAEFYLSPDQSGEF 47 AAA42362 26 IKEEHSIIQAEFYLSPDQSGEF 47 AAA42355 26 IKEEHSIIQAEFYLSPDQSGEF 47 AAA42358 24 IKEEHSIIQAEFYLSPDQSGEF 45 1DLH_A 1 EEHVIIQAEFYLNPDQSGEF 20 S06316 26 IKEEHTIIQAEFYLSPDQNGEF 47 AAA99463 26 VKEEHVIIQAEFYLTPDPSGEF 47 S15684 25 IKEDHVIIQAEFYLNPEESAEF 46 XP_215331 26 IREEHTIIQAEFYLSPDQNGEF 47 CAD86939 26 IREEHTIIQAEFYLSPDQNGEF 47 CAA77679 25 IKEDHVIIQAEFYLNPEESAEF 46 A48381 1 IKEDHVIIQAEFYLNPEESAEF 22 1KLU_A 1 EHVIIQAEFYLNPDQSGEF 19 1KLG_A 1 EHVIIQAEFYLNPDQSGEF 19 AAP37560 24 IVENHVIIQAEFYLSPDKSGEF 45 AAP03010 24 IVENHVIIQAEFYLSPDKSGEF 45 CAC28142 19 IREEHTIIQAEFYLSPDQNGEF 40 A46505 24 IVENHVIIQAEFYLSPDKSGEF 45 AAO63763 24 IVENHVIIQAEFYLSPDKSGEF 45 AAP37552 24 IVENHVIIQAEFYLSPDKSGEF 45 AAP37553 24 IVENHVIIQAEFYLSPDKSGEF 45 AAA31074 36 IVENHVIIQAEFYLSPDKSGEF 57 1A6A_A 1 HVIIQAEFYLNPDQSGEF 18 P01904 26 IKEEHTIIQAEFYLLPDKRGEF 47

28 CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU April 8, 2003Claus Lundegaard Position specific scoring matrix A R N D C Q E G H I L K M F P S T W Y V 1 I -1 -3 -3 -3 -1 -3 -3 -4 -4 5 1 -3 1 0 -3 -3 -1 -3 -2 3 2 K -1 2 0 -1 -3 1 1 -2 -1 -3 -3 5 -2 -3 -1 0 -1 -3 -2 -3 3 E -1 0 0 1 -4 2 5 -2 0 -4 -3 1 -2 -4 -1 0 -1 -3 -2 -3 4 E -1 -1 0 3 -4 1 5 -2 0 -4 -3 0 -2 -4 -1 0 -1 -3 -2 -3 5 H -2 0 0 -1 -3 0 0 -2 8 -4 -3 -1 -2 -1 -2 -1 -2 -3 2 -3 6 V 0 -2 -2 -2 -1 -2 -2 -3 -3 2 0 -2 0 -1 -2 1 2 -3 -2 4 7 I -2 -3 -4 -3 -1 -3 -4 -4 -4 5 1 -3 1 0 -3 -3 -1 -3 -2 3 8 I -2 -3 -4 -3 -1 -3 -4 -4 -4 5 1 -3 1 0 -3 -3 -1 -3 -2 3 9 Q -1 1 0 -1 -3 6 2 -2 0 -3 -2 1 -1 -4 -2 0 -1 -2 -2 -2 10 A 5 -2 -2 -2 -1 -1 -1 0 -2 -2 -2 -1 -1 -3 -1 1 0 -3 -2 0 11 E -1 0 0 1 -4 2 5 -2 0 -4 -3 1 -2 -4 -1 0 -1 -3 -2 -3 12 F -3 -3 -3 -4 -3 -4 -4 -3 -1 0 0 -3 0 7 -4 -3 -2 1 3 -1 13 Y -2 -2 -2 -3 -3 -2 -2 -3 2 -2 -1 -2 -1 3 -3 -2 -2 2 7 -1 14 L -2 -2 -4 -4 -2 -2 -3 -4 -3 1 4 -3 2 0 -3 -3 -1 -2 -1 1 15 N 0 0 4 0 -2 0 0 -1 -1 -3 -3 1 -2 -3 -1 2 2 -3 -2 -2 16 P -1 -2 -2 -2 -3 -2 -1 -2 -2 -3 -3 -1 -3 -4 8 -1 -1 -4 -3 -3 17 D -2 -1 1 6 -4 0 3 -2 -1 -3 -4 -1 -3 -4 -2 0 -1 -4 -3 -3 18 Q -1 0 0 -1 -3 5 1 -2 0 -3 -3 1 -1 -3 3 1 0 -3 -2 -2 19 S 1 -1 3 0 -1 0 0 -1 -1 -3 -3 0 -2 -3 -1 4 1 -3 -2 -2 20 G 2 -2 -1 -2 -2 -2 -2 5 -2 -3 -3 -2 -3 -3 -2 0 -1 -3 -3 -3 21 E -1 0 0 1 -4 2 5 -2 0 -4 -3 1 -2 -4 -1 0 -1 -3 -2 -3 22 F -3 -3 -3 -4 -3 -4 -4 -3 -1 0 0 -3 0 7 -4 -3 -2 1 3 -1

29 CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU April 8, 2003Claus Lundegaard Alignment depicted as path in matrix T C G C A T C A T C G C A T C A TCGCA TC-CA TCGCA T-CCA

30 CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU April 8, 2003Claus Lundegaard Alignment depicted as path in matrix T C G C A T C A x Meaning of point in matrix: all residues up to this point have been aligned (but there are many different possible paths). Position labeled “x”: TC aligned with TC --TC-TCTC TC--T-CTC

31 CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU April 8, 2003Claus Lundegaard Local alignment: example

32 CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU April 8, 2003Claus Lundegaard  -BLAST SwissProt IKEEHVIIQAEFYLNPDQSGEF QUERY 1 IKEEHVIIQAEFYLNPDQSGEF 22 Q30631 26 IKEEHVIIQAEFYLNPDQSGEF 47 AAA59783 26 IKEEHVIIQAEFYLNPDQSGEF 47 AAA59787 26 IKEEHVIIQAEFYLNPDQSGEF 47 NP_061984 26 IKEEHVIIQAEFYLNPDQSGEF 47 CAA25076 1 IKEEHVIIQAEFYLNPDQSGEF 22 1102205B 14 IKEEHVIIQAEFYLNPDQSGEF 35 1HXY_A 1 IKEEHVIIQAEFYLNPDQSGEF 22 1SEB_A 1 IKEEHVIIQAEFYLNPDQSGEF 22 1AQD_A 1 IKEEHVIIQAEFYLNPDQSGEF 22 1BX2_A 1 KEEHVIIQAEFYLNPDQSGEF 21 AAB65589 26 IKEEHVIIQAEFYLKPDSSGEF 47 AAB65587 26 IKEEHVIIQAEFYLKPDSSGEF 47 BAA23385 25 IKEDHVIIQAEFYLNPEQSAEF 46 AAA36283 26 IKEEHVIIQAEFYLNPDQSGEF 47 AAA42357 25 IKEEHSIIQAEFYLSPDQSGEF 46 AAA42366 26 IKEEHSIIQAEFYLSPDQSGEF 47 AAA42362 26 IKEEHSIIQAEFYLSPDQSGEF 47 AAA42355 26 IKEEHSIIQAEFYLSPDQSGEF 47 AAA42358 24 IKEEHSIIQAEFYLSPDQSGEF 45 1DLH_A 1 EEHVIIQAEFYLNPDQSGEF 20 S06316 26 IKEEHTIIQAEFYLSPDQNGEF 47 AAA99463 26 VKEEHVIIQAEFYLTPDPSGEF 47 S15684 25 IKEDHVIIQAEFYLNPEESAEF 46 XP_215331 26 IREEHTIIQAEFYLSPDQNGEF 47 CAD86939 26 IREEHTIIQAEFYLSPDQNGEF 47 CAA77679 25 IKEDHVIIQAEFYLNPEESAEF 46 A48381 1 IKEDHVIIQAEFYLNPEESAEF 22 1KLU_A 1 EHVIIQAEFYLNPDQSGEF 19 1KLG_A 1 EHVIIQAEFYLNPDQSGEF 19 AAP37560 24 IVENHVIIQAEFYLSPDKSGEF 45 AAP03010 24 IVENHVIIQAEFYLSPDKSGEF 45 CAC28142 19 IREEHTIIQAEFYLSPDQNGEF 40 A46505 24 IVENHVIIQAEFYLSPDKSGEF 45 AAO63763 24 IVENHVIIQAEFYLSPDKSGEF 45 AAP37552 24 IVENHVIIQAEFYLSPDKSGEF 45 AAP37553 24 IVENHVIIQAEFYLSPDKSGEF 45 AAA31074 36 IVENHVIIQAEFYLSPDKSGEF 57 1A6A_A 1 HVIIQAEFYLNPDQSGEF 18 P01904 26 IKEEHTIIQAEFYLLPDKRGEF 47 Query Database A R N D C Q E G H I L K M F P S T W Y V B Z X * A 4 -1 -2 -2 0 -1 -1 0 -2 -1 -1 -1 -1 -2 -1 1 0 -3 -2 0 -2 -1 0 -4 R -1 5 0 -2 -3 1 0 -2 0 -3 -2 2 -1 -3 -2 -1 -1 -3 -2 -3 -1 0 -1 -4 N -2 0 6 1 -3 0 0 0 1 -3 -3 0 -2 -3 -2 1 0 -4 -2 -3 3 0 -1 -4 D -2 -2 1 6 -3 0 2 -1 -1 -3 -4 -1 -3 -3 -1 0 -1 -4 -3 -3 4 1 -1 -4 C 0 -3 -3 -3 9 -3 -4 -3 -3 -1 -1 -3 -1 -2 -3 -1 -1 -2 -2 -1 -3 -3 -2 -4 Q -1 1 0 0 -3 5 2 -2 0 -3 -2 1 0 -3 -1 0 -1 -2 -1 -2 0 3 -1 -4 E -1 0 0 2 -4 2 5 -2 0 -3 -3 1 -2 -3 -1 0 -1 -3 -2 -2 1 4 -1 -4 G 0 -2 0 -1 -3 -2 -2 6 -2 -4 -4 -2 -3 -3 -2 0 -2 -2 -3 -3 -1 -2 -1 -4 H -2 0 1 -1 -3 0 0 -2 8 -3 -3 -1 -2 -1 -2 -1 -2 -2 2 -3 0 0 -1 -4 I -1 -3 -3 -3 -1 -3 -3 -4 -3 4 2 -3 1 0 -3 -2 -1 -3 -1 3 -3 -3 -1 -4 L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4 -2 2 0 -3 -2 -1 -2 -1 1 -4 -3 -1 -4 K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5 -1 -3 -1 0 -1 -3 -2 -2 0 1 -1 -4 M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5 0 -2 -1 -1 -1 -1 1 -3 -1 -1 -4 F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6 -4 -2 -2 1 3 -1 -3 -3 -1 -4 P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7 -1 -1 -4 -3 -2 -2 -1 -2 -4 S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4 1 -3 -2 -2 0 0 0 -4 T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5 -2 -2 0 -1 -1 0 -4 W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11 2 -3 -4 -3 -2 -4 Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7 -1 -3 -2 -1 -4 V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4 -3 -2 -1 -4 Blosum62 matrix Alignment A R N D C Q E G H I L K M F P S T W Y V 1 I -1 -3 -3 -3 -1 -3 -3 -4 -4 5 1 -3 1 0 -3 -3 -1 -3 -2 3 2 K -1 2 0 -1 -3 1 1 -2 -1 -3 -3 5 -2 -3 -1 0 -1 -3 -2 -3 3 E -1 0 0 1 -4 2 5 -2 0 -4 -3 1 -2 -4 -1 0 -1 -3 -2 -3 4 E -1 -1 0 3 -4 1 5 -2 0 -4 -3 0 -2 -4 -1 0 -1 -3 -2 -3 5 H -2 0 0 -1 -3 0 0 -2 8 -4 -3 -1 -2 -1 -2 -1 -2 -3 2 -3 6 V 0 -2 -2 -2 -1 -2 -2 -3 -3 2 0 -2 0 -1 -2 1 2 -3 -2 4 7 I -2 -3 -4 -3 -1 -3 -4 -4 -4 5 1 -3 1 0 -3 -3 -1 -3 -2 3 8 I -2 -3 -4 -3 -1 -3 -4 -4 -4 5 1 -3 1 0 -3 -3 -1 -3 -2 3 9 Q -1 1 0 -1 -3 6 2 -2 0 -3 -2 1 -1 -4 -2 0 -1 -2 -2 -2 10 A 5 -2 -2 -2 -1 -1 -1 0 -2 -2 -2 -1 -1 -3 -1 1 0 -3 -2 0 11 E -1 0 0 1 -4 2 5 -2 0 -4 -3 1 -2 -4 -1 0 -1 -3 -2 -3 12 F -3 -3 -3 -4 -3 -4 -4 -3 -1 0 0 -3 0 7 -4 -3 -2 1 3 -1 13 Y -2 -2 -2 -3 -3 -2 -2 -3 2 -2 -1 -2 -1 3 -3 -2 -2 2 7 -1 14 L -2 -2 -4 -4 -2 -2 -3 -4 -3 1 4 -3 2 0 -3 -3 -1 -2 -1 1 15 N 0 0 4 0 -2 0 0 -1 -1 -3 -3 1 -2 -3 -1 2 2 -3 -2 -2 16 P -1 -2 -2 -2 -3 -2 -1 -2 -2 -3 -3 -1 -3 -4 8 -1 -1 -4 -3 -3 17 D -2 -1 1 6 -4 0 3 -2 -1 -3 -4 -1 -3 -4 -2 0 -1 -4 -3 -3 18 Q -1 0 0 -1 -3 5 1 -2 0 -3 -3 1 -1 -3 3 1 0 -3 -2 -2 19 S 1 -1 3 0 -1 0 0 -1 -1 -3 -3 0 -2 -3 -1 4 1 -3 -2 -2 20 G 2 -2 -1 -2 -2 -2 -2 5 -2 -3 -3 -2 -3 -3 -2 0 -1 -3 -3 -3 21 E -1 0 0 1 -4 2 5 -2 0 -4 -3 1 -2 -4 -1 0 -1 -3 -2 -3 22 F -3 -3 -3 -4 -3 -4 -4 -3 -1 0 0 -3 0 7 -4 -3 -2 1 3 -1 Position specific Scoring matrix

33 CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU April 8, 2003Claus Lundegaard Position specific scoring matrices (  -BLAST profiles) A R N D C Q E G H I L K M F P S T W Y V 1 I -2 -4 -5 -5 -2 -4 -4 -5 -5 6 0 -4 0 -2 -4 -4 -2 -4 -3 4 2 K -1 -1 -2 -2 -3 -1 3 -3 -2 -2 -3 4 -2 -4 -3 1 1 -4 -3 2 3 E 5 -3 -3 -3 -3 3 1 -2 -3 -3 -3 -2 -2 -4 -3 -1 -2 -4 -3 1 4 E -4 -3 2 5 -6 1 5 -4 -3 -6 -6 -2 -5 -6 -4 -2 -3 -6 -5 -5 5 H -4 2 1 1 -5 1 -2 -4 9 -5 -2 -3 -4 -4 -5 -3 -4 -5 1 -5 6 V -3 0 -4 -5 -4 -4 -2 -3 -5 1 -2 1 0 1 -4 -3 3 -5 -3 5 7 I 0 -2 -4 1 -4 -2 -4 -4 -5 1 0 -2 0 2 -5 1 -1 -5 -3 4 8 I -3 0 -5 -5 -4 -2 -5 -6 1 2 4 -4 -1 0 -5 -2 0 -3 5 -1 9 Q -2 -3 -2 -3 -5 4 -1 3 5 -5 -3 -3 -4 -2 -4 2 -1 -4 2 -2 10 A 2 -4 -4 -3 2 -3 -1 -4 -2 1 -1 -4 -3 -4 1 2 3 -5 -1 1 11 E -1 3 1 1 -1 0 1 -4 -3 -1 -3 0 3 -5 4 -1 -3 -6 -3 -1 12 F -3 -5 -5 -5 -4 -4 -4 -1 -1 1 1 -5 2 5 -1 -4 -4 -3 5 2 13 Y 3 -5 -5 -6 3 -4 -5 -2 -1 0 -4 -5 -3 3 -5 -2 -2 -2 7 1 14 L -1 -3 -4 -2 1 5 1 -1 -1 -1 1 -3 -3 1 -5 -1 -1 -2 3 -2 15 N -1 -4 4 1 5 -3 -4 2 -4 -4 -4 -3 -2 -4 -5 2 0 -5 0 0 16 P -2 4 -4 -4 -5 0 -3 3 2 -5 -4 0 -4 -3 0 1 -2 -1 5 -3 17 D -3 -2 1 5 -6 -2 2 2 -1 -2 -2 -3 -5 -4 -5 -1 2 -6 -3 -4

34 CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU April 8, 2003Claus Lundegaard Profile development

35 CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU April 8, 2003Claus Lundegaard PSI-Pred (Jones, DT) Use alignments from iterative sequence searches (PSI-Blast) as input to a neural network Better predictions due to better sequence profiles Available as stand alone program and via the web

36 CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU April 8, 2003Claus Lundegaard Improvement of accuracy 1974 Chou & Fasman~50-53% 1978 Garnier63% 1987 Zvelebil66% 1988 Quian & Sejnowski64.3% 1993 Rost & Sander70.8-72.0% 1997 Frishman & Argos<75% 1999 Cuff & Barton72.9% 1999 Jones76.5% 2000 Petersen et al.77.9%

37 CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU April 8, 2003Claus Lundegaard Benchmarking secondary structure predictions CASP –Critical Assessment of Structure Predictions –Sequences from about-to-be-solved-structures are given to groups who submit their predictions before the structure is published EVA –Newly solved structures are send to prediction servers.

38 CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU April 8, 2003Claus Lundegaard EVA results (Rost et al., 2001) PROFphd77.0% PSIPRED76.8% SAM-T99sec76.1% SSpro76.0% Jpred275.5% PHD71.7% –Cubic.columbia.edu/eva

39 CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU April 8, 2003Claus Lundegaard H E C Output expansion I K E E H V I I Q A E H E C IKEEHVIIQAEFYLNPDQSGEF….. Window Input Layer Hidden Layer Output Layer Weights H E C

40 CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU April 8, 2003Claus Lundegaard Sequence-to-structure –Window sizes15,17,19 and 21 –Hidden units50 and 75 –10-fold cross validation => 80 predictions Structure-to-structure –Window size17 –Hidden units40 –10-fold cross validation => 800 predictions Several different architectures

41 CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU April 8, 2003Claus Lundegaard Confidence of a per residue prediction –P(Highest) – P(second highest) –H: 0.80 E: 0.05 C:0.15 => conf.=0.65 Mean per chain confidence for all 800 predictions –Calculate Mean and Standard deviation –Averaging of per chain predictions with   Balloting procedure

42 CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU April 8, 2003Claus Lundegaard Activities to probabilities 0.050.10.15…1.0 0.050.99 0.10 0.150.90.830.75. 1.0 Helix activities Strand activities Coil probabilities Coil conversion

43 CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU April 8, 2003Claus Lundegaard EVA (400 low homology proteins) RankingGroup name Q3 Performance 1SBI-AT77.2 % 2PROFsec B.Rost 76.3 % 3Psi-pred D.Jones 76.2 % Sequence profiles as input Sequence profiles as input Neural network technology Neural network technology Balloting of large number of Neural Network predictions (0.2%) Balloting of large number of Neural Network predictions (0.2%) Output expansion (0.5%) Output expansion (0.5%) Probability transformation (1.2%) Probability transformation (1.2%) Petersen et al., Proteins, 41: 17-20, 2000

44 CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU April 8, 2003Claus Lundegaard Links to servers Database of links –http://mmtsb.scripps.edu/cgi- bin/renderrelres?protmodelhttp://mmtsb.scripps.edu/cgi- bin/renderrelres?protmodel ProfPHD –http://cubic.bioc.columbia.edu/http://cubic.bioc.columbia.edu/ PSIPRED –http://bioinf.cs.ucl.ac.uk/psipred/http://bioinf.cs.ucl.ac.uk/psipred/ JPred –www.compbio.dundee.ac.uk/Software/JPred/jpred. htmlwww.compbio.dundee.ac.uk/Software/JPred/jpred. html

45 CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU April 8, 2003Claus Lundegaard Practical Conclusion If you need a secondary structure prediction use one of the newer ones such as –ProfPHD, –PSIPRED, and –JPred And not one of the older ones such as –Chou-Fasman, and –Garnier

46 CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU April 8, 2003Claus Lundegaard Topology Surface exposure

47 CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU April 8, 2003Claus Lundegaard Automatic assignment programs DSSP ( http://www.cmbi.kun.nl/gv/dssp/ )http://www.cmbi.kun.nl/gv/dssp/ STRIDE ( http://www.hgmp.mrc.ac.uk/Registered/Option/stride.html )http://www.hgmp.mrc.ac.uk/Registered/Option/stride.html # RESIDUE AA STRUCTURE BP1 BP2 ACC N-H-->O O-->H-N N-H-->O O-->H-N TCO KAPPA ALPHA PHI PSI X-CA Y-CA Z-CA 1 4 A E 0 0 205 0, 0.0 2,-0.3 0, 0.0 0, 0.0 0.000 360.0 360.0 360.0 113.5 5.7 42.2 25.1 2 5 A H - 0 0 127 2, 0.0 2,-0.4 21, 0.0 21, 0.0 -0.987 360.0-152.8-149.1 154.0 9.4 41.3 24.7 3 6 A V - 0 0 66 -2,-0.3 21,-2.6 2, 0.0 2,-0.5 -0.995 4.6-170.2-134.3 126.3 11.5 38.4 23.5 4 7 A I E -A 23 0A 106 -2,-0.4 2,-0.4 19,-0.2 19,-0.2 -0.976 13.9-170.8-114.8 126.6 15.0 37.6 24.5 5 8 A I E -A 22 0A 74 17,-2.8 17,-2.8 -2,-0.5 2,-0.9 -0.972 20.8-158.4-125.4 129.1 16.6 34.9 22.4 6 9 A Q E -A 21 0A 86 -2,-0.4 2,-0.4 15,-0.2 15,-0.2 -0.910 29.5-170.4 -98.9 106.4 19.9 33.0 23.0 7 10 A A E +A 20 0A 18 13,-2.5 13,-2.5 -2,-0.9 2,-0.3 -0.852 11.5 172.8-108.1 141.7 20.7 31.8 19.5 8 11 A E E +A 19 0A 63 -2,-0.4 2,-0.3 11,-0.2 11,-0.2 -0.933 4.4 175.4-139.1 156.9 23.4 29.4 18.4 9 12 A F E -A 18 0A 31 9,-1.5 9,-1.8 -2,-0.3 2,-0.4 -0.967 13.3-160.9-160.6 151.3 24.4 27.6 15.3 10 13 A Y E -A 17 0A 36 -2,-0.3 2,-0.4 7,-0.2 7,-0.2 -0.994 16.5-156.0-136.8 132.1 27.2 25.3 14.1 11 14 A L E >> -A 16 0A 24 5,-3.2 4,-1.7 -2,-0.4 5,-1.3 -0.929 11.7-122.6-120.0 133.5 28.0 24.8 10.4 12 15 A N T 45S+ 0 0 54 -2,-0.4 -2, 0.0 2,-0.2 0, 0.0 -0.884 84.3 9.0-113.8 150.9 29.7 22.0 8.6 13 16 A P T 45S+ 0 0 114 0, 0.0 -1,-0.2 0, 0.0 -2, 0.0 -0.963 125.4 60.5 -86.5 8.5 32.0 21.6 6.8 14 17 A D T 45S- 0 0 66 2,-0.1 -2,-0.2 1,-0.1 3,-0.1 0.752 89.3-146.2 -64.6 -23.0 33.0 25.2 7.6 15 18 A Q T <5 + 0 0 132 -4,-1.7 2,-0.3 1,-0.2 -3,-0.2 0.936 51.1 134.1 52.9 50.0 33.3 24.2 11.2 16 19 A S E < +A 11 0A 44 -5,-1.3 -5,-3.2 2, 0.0 2,-0.3 -0.877 28.9 174.9-124.8 156.8 32.1 27.7 12.3 17 20 A G E -A 10 0A 28 -2,-0.3 2,-0.3 -7,-0.2 -7,-0.2 -0.893 15.9-146.5-151.0-178.9 29.6 28.7 14.8 18 21 A E E -A 9 0A 14 -9,-1.8 -9,-1.5 -2,-0.3 2,-0.4 -0.979 5.0-169.6-158.6 146.0 28.0 31.5 16.7 19 22 A F E +A 8 0A 3 12,-0.4 12,-2.3 -2,-0.3 2,-0.3 -0.982 27.8 149.2-139.1 120.3 26.5 32.2 20.1 20 23 A M E -AB 7 30A 0 -13,-2.5 -13,-2.5 -2,-0.4 2,-0.4 -0.983 39.7-127.8-152.1 161.6 24.5 35.4 20.6 21 24 A F E -AB 6 29A 45 8,-2.4 7,-2.9 -2,-0.3 8,-1.0 -0.934 23.9-164.1-112.5 137.7 21.7 37.0 22.6 22 25 A D E -AB 5 27A 6 -17,-2.8 -17,-2.8 -2,-0.4 2,-0.5 -0.948 6.9-165.0-123.7 138.3 18.9 38.9 20.8 23 26 A F E > S-AB 4 26A 76 3,-3.5 3,-2.1 -2,-0.4 -19,-0.2 -0.947 78.4 -27.2-127.3 111.5 16.4 41.3 22.3 24 27 A D T 3 S- 0 0 74 -21,-2.6 -20,-0.1 -2,-0.5 -1,-0.1 0.904 128.9 -46.6 50.4 45.0 13.4 42.1 20.2 25 28 A G T 3 S+ 0 0 20 -22,-0.3 2,-0.4 1,-0.2 -1,-0.3 0.291 118.8 109.3 84.7 -11.1 15.4 41.4 17.0 26 29 A D E < S-B 23 0A 114 -3,-2.1 -3,-3.5 109, 0.0 2,-0.3 -0.822 71.8-114.7-103.1 140.3 18.4 43.4 18.1 27 30 A E E -B 22 0A 8 -2,-0.4 -5,-0.3 -5,-0.2 3,-0.1 -0.525 24.9-177.7 -74.1 127.5 21.8 41.8 19.1

48 CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU April 8, 2003Claus Lundegaard Prediction Server PhdAcc

49 CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU April 8, 2003Claus Lundegaard Topology Trans membrane regions prediction

50 CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU April 8, 2003Claus Lundegaard Transmembrane prediction servers List (http://us.expasy.org/tools/)http://us.expasy.org/tools –HMMTOP - Prediction of transmembrane helices and topology of proteins (Hungarian Academy of Sciences)HMMTOP –TMHMM - Prediction of transmembrane helices in proteins (CBS; Denmark)TMHMM –TopPred 2 - Topology prediction of membrane proteins (Stockholm University)TopPred 2


Download ppt "CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU April 8, 2003Claus Lundegaard Protein Secondary Structures Assignment and prediction."

Similar presentations


Ads by Google