Presentation is loading. Please wait.

Presentation is loading. Please wait.

Protein Sequence Motifs

Similar presentations


Presentation on theme: "Protein Sequence Motifs"— Presentation transcript:

1 Protein Sequence Motifs
Protein Analysis Protein Sequence Motifs Aalt-Jan van Dijk Plant Research International, Wageningen UR Biometris, Wageningen UR © 2008, Jack A.M. Leunissen

2 Plant Bioinformatics Integrated analysis of omics datasets Genomics
27/01/2018 Genomics Next Generation Sequencing Genome assembly & annotation (Comparative) genome analysis SNP analysis, marker development Technology Computational infrastructure Database development Webbased analysis tools Software- development Workflow management systems machine learning Integrated analysis of omics datasets Transcriptomics Alternative splicing EST analysis Proteomics Data (pre-)processing pipelining Protein interactions networks Metabolomics Database- development Metabolite and pathway-identification Systems biology network modelling (bottom-up)

3 My research Protein complex structures Protein-protein docking
27/01/2018 My research Protein complex structures Protein-protein docking Correlated mutations Interaction site prediction/analysis Protein-protein interactions Protein-DNA interactions Motif search Enzyme active sites

4 Overview Protein Motif Searching
Protein Analysis Overview Protein Motif Searching Hydrophobicity & Transmembrane Domains Protein Interactions Sequence-motifs to predict interaction sites Secondary Structure Prediction © 2008, Jack A.M. Leunissen

5 Protein Motif Searching
Protein Analysis Protein Motif Searching © 2008, Jack A.M. Leunissen

6 Protein Analysis What is a motif? A motif is a description of a particular element of a protein that contains a specific sequence pattern Motifs are identified by 3D structural alignment Multiple sequence alignment Pattern searching programs © 2008, Jack A.M. Leunissen

7 Protein Analysis What is a motif? A motif is a description of a particular element of a protein that contains a specific sequence pattern Motifs are identified by 3D structural alignment Multiple sequence alignment Pattern searching programs © 2008, Jack A.M. Leunissen

8 Protein Motif Searching
Protein Analysis Protein Motif Searching Strict consensus pattern use only strictly conserved residues C--QASCDGIPLKMNDC C---VTCEGLPMRMDQC CERTLGCQPMPVH---C CxxxxxCxxxPxxxxxC C C P C © 2008, Jack A.M. Leunissen

9 Protein Motif Searching
Protein Analysis Protein Motif Searching Strict consensus pattern use only strictly conserved residues C--QASCDGIPLKMNDC C---VTCEGLPMRMDQC CERTLGCQPMPVH---C CxxxxxCxxxPxxxxxC C C P C © 2008, Jack A.M. Leunissen

10 Protein Motif Searching
Protein Analysis Protein Motif Searching Strict consensus pattern use only strictly conserved residues But what about: variable residues? gaps? C--QASCDGIPLKMNDC C---VTCEGLPMRMDQC CERTLGCQPMPVH---C CxxxxxCxxxPxxxxxC C C P C © 2008, Jack A.M. Leunissen

11 Protein Motif Searching
Protein Analysis Protein Motif Searching Strict consensus patterns contain no alternative residues no flexible regions no mismatches no gaps C--QASCDGIPLKMNDC C---VTCEGLPMRMDQC CERTLGCQPMPVH---C C C P C CxxxxxCxxxPxxxxxC © 2008, Jack A.M. Leunissen

12 Protein Motif Searching
Protein Analysis Protein Motif Searching Most motifs defined as regular expressions Motifs can contain alternative residues flexible regions C-x(2,5)-C-x-[GP]-x-P-x(2,5)-C CXXXCXGXPXXXXXC | | | | | FGCAKLCAGFPLRRLPCFYG © 2008, Jack A.M. Leunissen

13 The PROSITE Syntax A-[BC]-X-D(2,5)-{EFG}-H A B or C anything 2-5 D’s
Protein Analysis The PROSITE Syntax A-[BC]-X-D(2,5)-{EFG}-H A B or C anything 2-5 D’s not E, F, or G H © 2008, Jack A.M. Leunissen

14 Protein Analysis PROSITE entries Mandatory motifs characterise a protein (super-) family ID SUBTILASE_ASP; PATTERN. DE Serine proteases, subtilase family, aspartic acid active site. PA [STAIV]-x-[LIVMF]-[LIVM]-D-[DSTA]-G-[LIVMFC]-x(2,3)-[DNH]. ID SUBTILASE_HIS; PATTERN. DE Serine proteases, subtilase family, histidine active site. PA H-G-[STM]-x-[VIC]-[STAGC]-[GS]-x-[LIVMA]-[STAGCLV]-[SAGM]. ID SUBTILASE_SER; PATTERN. DE Serine proteases, subtilase family, serine active site. PA G-T-S-x-[SA]-x-P-x(2)-[STAVC]-[AG]. © 2008, Jack A.M. Leunissen

15 Protein Analysis Exercise Find the three subtilase motifs in prosite (prosite.expasy.org) Compare the lists of proteins in which the motifs occur – what does this tell you? Similarly, compare protein structures in which the motifs occur Have a look at the “sequence logo” © 2008, Jack A.M. Leunissen

16 Protein Motif Searching
Protein Analysis Protein Motif Searching Some motifs occur frequently in proteins; they may not actually be present, such as Post-translational modification sites ID ASN_GLYCOSYLATION; PATTERN. DE N-glycosylation site. PA N-{P}-[ST]-{P}. © 2008, Jack A.M. Leunissen

17 Protein Analysis Exercise Use a glycosylation site predictor such as Input: your favorite set of sequences Do you observe that some N-{P}-[ST] sites are likely to be glycosylated and others not? © 2008, Jack A.M. Leunissen

18 Profiles Many motifs cannot be easily defined using simple patterns
Protein Analysis Profiles Many motifs cannot be easily defined using simple patterns Such motifs can be defined using profiles A profile is constructed from a multiple sequence alignment. For each position, each amino acid is given a score depending on how likely it is to occur © 2008, Jack A.M. Leunissen

19 Protein Analysis Calculating a Profile For each alignment position: take the (weighted) average of the appropriate rows from the scoring matrix An (extremely simple) example: seq_01 A A A A A A A A A A W seq_02 A A A A A A A A A W W seq_03 A A A A A A A A W W W seq_04 A A A A A A A W W W W seq_05 A A A A A A W W W W W seq_06 A A A A A W W W W W W seq_07 A A A A W W W W W W W seq_08 A A A W W W W W W W W seq_09 A A W W W W W W W W W seq_10 A W W W W W W W W W W © 2008, Jack A.M. Leunissen

20 prophecy (EMBOSS), using Henikoff profile type, and BLOSUM62 matrix;
Protein Analysis Excerpt from the EBLOSUM62 matrix: A R N D C Q E G H I L K M F P S T W Y V A W A C D E F G H I K L M 10A: N P Q R S T V W Y 5A+5W: 10W: prophecy (EMBOSS), using Henikoff profile type, and BLOSUM62 matrix; © 2008, Jack A.M. Leunissen

21 Protein Analysis Pattern Searching Short linear motifs: e.g. Profiles: meme © 2008, Jack A.M. Leunissen

22 Protein Analysis Exercise Use a number of sequences wich contain the prosite subtilase motif and find motifs in those sequences with MEME © 2008, Jack A.M. Leunissen

23 Protein Analysis Hydropathy Plot Prediction hydrophobic and hydrophilic regions in a protein © 2008, Jack A.M. Leunissen

24 Partition Coefficients
Protein Analysis Partition Coefficients Hydrophilic Hydrophobic Oil Water © 2008, Jack A.M. Leunissen

25 Hydrophobicity/Hydrophilicity Values
Protein Analysis Hydrophobicity/Hydrophilicity Values Fauchere & Pliska Kyte & Doolittle Hopp & Woods Eisenberg R K D Q N E H S T P Y C G A M W L V F I hydrophilic hydrophobic © 2008, Jack A.M. Leunissen

26 Protein Analysis Hydrophobicity Plot Sum amino acid hydrophobicity values in a given window Plot the value in the middle of the window Shift the window one position © 2008, Jack A.M. Leunissen

27 Sliding Window Approach
Protein Analysis Sliding Window Approach Calculate property for first sub-sequence Use the result (plot/print/store) Move to next residue position, and repeat © 2008, Jack A.M. Leunissen

28 MEZCALTASTESVERYNICE
Protein Analysis Hydrophobicity Plot MEZCALTASTESVERYNICE © 2008, Jack A.M. Leunissen

29 MEZCALTASTESVERYNICE
Protein Analysis Hydrophobicity Plot MEZCALTASTESVERYNICE © 2008, Jack A.M. Leunissen

30 MEZCALTASTESVERYNICE
Protein Analysis Hydrophobicity Plot MEZCALTASTESVERYNICE © 2008, Jack A.M. Leunissen

31 MEZCALTASTESVERYNICE
Protein Analysis Hydrophobicity Plot MEZCALTASTESVERYNICE © 2008, Jack A.M. Leunissen

32 MEZCALTASTESVERYNICE
Protein Analysis Hydrophobicity Plot MEZCALTASTESVERYNICE © 2008, Jack A.M. Leunissen

33 MEZCALTASTESVERYNICE
Protein Analysis Hydrophobicity Plot MEZCALTASTESVERYNICE © 2008, Jack A.M. Leunissen

34 MEZCALTASTESVERYNICE
Protein Analysis Hydrophobicity Plot MEZCALTASTESVERYNICE © 2008, Jack A.M. Leunissen

35 Transmembrane Regions
Protein Analysis Transmembrane Regions Rotation is 100 degrees per amino acid Climb is 1.5 Angstrom per amino acid residue © 2008, Jack A.M. Leunissen

36 Transmembrane Regions
Protein Analysis Transmembrane Regions So we need approx. 30 / 1.5 = 20 amino acids to span the membrane 30 angstrom © 2008, Jack A.M. Leunissen

37 Protein Analysis © 2008, Jack A.M. Leunissen

38 Protein Analysis Adapting the window size to the size of the membrane spanning segment makes the picture easier to interpret © 2008, Jack A.M. Leunissen

39 window = 1 window = 9 window = 19 window = 121 Protein Analysis
© 2008, Jack A.M. Leunissen

40 Protein Analysis Protein Interactions © 2008, Jack A.M. Leunissen

41 Protein Interactions hemoglobin Obligatory
Explain homodimer, heterodimer Obligatory

42 Protein Interactions hemoglobin Mitochondrial Cu transporters
Obligatory Transient

43 Experimental approaches (1)
Yeast two-hybrid (Y2H) Old / small-scale: co-immunoprecipitation Disadvantages: problem with homodimers, problem with large complexes

44 Experimental approaches (2)
Affinity Purification + mass spectrometry (AP-MS) Also indirect interactions

45 Interaction Databases
STRING

46 Interaction Databases
PPI + genetic / function association

47 Interaction Databases
STRING HPRD

48 Interaction Databases
Only human and only small-scale based

49 Interaction Databases
STRING HPRD InteroPorc Many others…. E.g. see

50 Yeast protein interaction network

51 Sequence-based Protein Binding Site Prediction
Protein Analysis Sequence-based Protein Binding Site Prediction © 2008, Jack A.M. Leunissen

52 Binding site

53 Protein Analysis Binding site © 2008, Jack A.M. Leunissen

54 Predefined motifs

55 Predefined motifs

56 Predefined motifs

57 Predefined motifs

58 Motif search in groups of proteins
Group proteins which have same interaction partner Use motif search, e.g. find PWMs Neduva Plos Biol 2005

59 Motif search in groups of proteins
Group proteins which have same interaction partner Use motif search

60 Correlated Motif Search

61 Correlated Motif Search
Interactors Non-interactors AARLL PLTEQ AARLL MARLT MARLT DLTEP VVRLM MARLT VVRLM MMTER PLTEQ DLTEP Correlated Motif Pair: (RL,TE)

62 Experimental validation
Van Dijk et al, Plos Comp Biol 2010

63 New approach: slider Faster approach  genome wide searching for interaction motifs Improve mining algorithm with a priori biological knowledge (conservation score, surface accessibility) Boyen et al, IEEE/ACM Trans Comput Biol Bioinform. 2011

64 Protein Analysis THE END….. Questions? © 2008, Jack A.M. Leunissen

65 Protein Analysis © 2008, Jack A.M. Leunissen

66 Secondary Structure Prediction
Protein Analysis Secondary Structure Prediction © 2008, Jack A.M. Leunissen

67 Secondary Structure Prediction
Protein Analysis Secondary Structure Prediction Traditional methods (statistical and/or rule-based) E.g. Garnier, Osguthorpe & Robson Statistical method Accuracy ~ 60% © 2008, Jack A.M. Leunissen

68 GOR Helix Parameters i-8 i-6 i-4 i-2 i i+2 i+4 i+6 i+8
Protein Analysis GOR Helix Parameters i-8 i-6 i-4 i-2 i i+2 i+4 i+6 i+8 Gly ala val leu ile ser thr asp glu asn gln lys his arg phe tyr trp cys met pro © 2008, Jack A.M. Leunissen

69 I S G A R N I E R H E L I X P R E D I C T
Protein Analysis I S G A R N I E R H E L I X P R E D I C T i-8 i-6 i-4 i-2 i i+2 i+4 i+6 i+8 Gly ala val leu ile ser thr asp glu asn gln lys his arg phe tyr trp cys met pro © 2008, Jack A.M. Leunissen

70 GOR Prediction beta sheet helix Protein Analysis
© 2008, Jack A.M. Leunissen

71 Secondary Structure Prediction
Protein Analysis Secondary Structure Prediction Recent methods Neural networks = flexible statistics Multiple alignments = variability Heuristics = common sense Or a combination of the above Accuracy ~ 70% © 2008, Jack A.M. Leunissen

72 Protein Analysis Heuristics Conserved parts are structurally and/or functionally important Segments with many gaps must be in loop regions © 2008, Jack A.M. Leunissen

73 Secondary Structure Prediction
Protein Analysis Secondary Structure Prediction Strategy Use as many methods as possible Use homologous sequences Combine predictions into consensus prediction © 2008, Jack A.M. Leunissen

74 Why can’t it be 100% correct?
Protein Analysis Why can’t it be 100% correct? All current 2D prediction schemes are based upon observation of occurrence of 2D elements in 3D structures Deduction of 2D elements from structures is ambiguous! DSSP, Stride, and the PDB (human) annotation do not always agree upon the assigned elements © 2008, Jack A.M. Leunissen

75 Do these residues still belong to the helix?
Protein Analysis Do these residues still belong to the helix? © 2008, Jack A.M. Leunissen


Download ppt "Protein Sequence Motifs"

Similar presentations


Ads by Google