Download presentation
Presentation is loading. Please wait.
1
Protein Sequence Motifs
Protein Analysis Protein Sequence Motifs Aalt-Jan van Dijk Plant Research International, Wageningen UR Biometris, Wageningen UR © 2008, Jack A.M. Leunissen
2
Plant Bioinformatics Integrated analysis of omics datasets Genomics
27/01/2018 Genomics Next Generation Sequencing Genome assembly & annotation (Comparative) genome analysis SNP analysis, marker development Technology Computational infrastructure Database development Webbased analysis tools Software- development Workflow management systems machine learning Integrated analysis of omics datasets Transcriptomics Alternative splicing EST analysis Proteomics Data (pre-)processing pipelining Protein interactions networks Metabolomics Database- development Metabolite and pathway-identification Systems biology network modelling (bottom-up)
3
My research Protein complex structures Protein-protein docking
27/01/2018 My research Protein complex structures Protein-protein docking Correlated mutations Interaction site prediction/analysis Protein-protein interactions Protein-DNA interactions Motif search Enzyme active sites
4
Overview Protein Motif Searching
Protein Analysis Overview Protein Motif Searching Hydrophobicity & Transmembrane Domains Protein Interactions Sequence-motifs to predict interaction sites Secondary Structure Prediction © 2008, Jack A.M. Leunissen
5
Protein Motif Searching
Protein Analysis Protein Motif Searching © 2008, Jack A.M. Leunissen
6
Protein Analysis What is a motif? A motif is a description of a particular element of a protein that contains a specific sequence pattern Motifs are identified by 3D structural alignment Multiple sequence alignment Pattern searching programs © 2008, Jack A.M. Leunissen
7
Protein Analysis What is a motif? A motif is a description of a particular element of a protein that contains a specific sequence pattern Motifs are identified by 3D structural alignment Multiple sequence alignment Pattern searching programs © 2008, Jack A.M. Leunissen
8
Protein Motif Searching
Protein Analysis Protein Motif Searching Strict consensus pattern use only strictly conserved residues C--QASCDGIPLKMNDC C---VTCEGLPMRMDQC CERTLGCQPMPVH---C CxxxxxCxxxPxxxxxC C C P C © 2008, Jack A.M. Leunissen
9
Protein Motif Searching
Protein Analysis Protein Motif Searching Strict consensus pattern use only strictly conserved residues C--QASCDGIPLKMNDC C---VTCEGLPMRMDQC CERTLGCQPMPVH---C CxxxxxCxxxPxxxxxC C C P C © 2008, Jack A.M. Leunissen
10
Protein Motif Searching
Protein Analysis Protein Motif Searching Strict consensus pattern use only strictly conserved residues But what about: variable residues? gaps? C--QASCDGIPLKMNDC C---VTCEGLPMRMDQC CERTLGCQPMPVH---C CxxxxxCxxxPxxxxxC C C P C © 2008, Jack A.M. Leunissen
11
Protein Motif Searching
Protein Analysis Protein Motif Searching Strict consensus patterns contain no alternative residues no flexible regions no mismatches no gaps C--QASCDGIPLKMNDC C---VTCEGLPMRMDQC CERTLGCQPMPVH---C C C P C CxxxxxCxxxPxxxxxC © 2008, Jack A.M. Leunissen
12
Protein Motif Searching
Protein Analysis Protein Motif Searching Most motifs defined as regular expressions Motifs can contain alternative residues flexible regions C-x(2,5)-C-x-[GP]-x-P-x(2,5)-C CXXXCXGXPXXXXXC | | | | | FGCAKLCAGFPLRRLPCFYG © 2008, Jack A.M. Leunissen
13
The PROSITE Syntax A-[BC]-X-D(2,5)-{EFG}-H A B or C anything 2-5 D’s
Protein Analysis The PROSITE Syntax A-[BC]-X-D(2,5)-{EFG}-H A B or C anything 2-5 D’s not E, F, or G H © 2008, Jack A.M. Leunissen
14
Protein Analysis PROSITE entries Mandatory motifs characterise a protein (super-) family ID SUBTILASE_ASP; PATTERN. DE Serine proteases, subtilase family, aspartic acid active site. PA [STAIV]-x-[LIVMF]-[LIVM]-D-[DSTA]-G-[LIVMFC]-x(2,3)-[DNH]. ID SUBTILASE_HIS; PATTERN. DE Serine proteases, subtilase family, histidine active site. PA H-G-[STM]-x-[VIC]-[STAGC]-[GS]-x-[LIVMA]-[STAGCLV]-[SAGM]. ID SUBTILASE_SER; PATTERN. DE Serine proteases, subtilase family, serine active site. PA G-T-S-x-[SA]-x-P-x(2)-[STAVC]-[AG]. © 2008, Jack A.M. Leunissen
15
Protein Analysis Exercise Find the three subtilase motifs in prosite (prosite.expasy.org) Compare the lists of proteins in which the motifs occur – what does this tell you? Similarly, compare protein structures in which the motifs occur Have a look at the “sequence logo” © 2008, Jack A.M. Leunissen
16
Protein Motif Searching
Protein Analysis Protein Motif Searching Some motifs occur frequently in proteins; they may not actually be present, such as Post-translational modification sites ID ASN_GLYCOSYLATION; PATTERN. DE N-glycosylation site. PA N-{P}-[ST]-{P}. © 2008, Jack A.M. Leunissen
17
Protein Analysis Exercise Use a glycosylation site predictor such as Input: your favorite set of sequences Do you observe that some N-{P}-[ST] sites are likely to be glycosylated and others not? © 2008, Jack A.M. Leunissen
18
Profiles Many motifs cannot be easily defined using simple patterns
Protein Analysis Profiles Many motifs cannot be easily defined using simple patterns Such motifs can be defined using profiles A profile is constructed from a multiple sequence alignment. For each position, each amino acid is given a score depending on how likely it is to occur © 2008, Jack A.M. Leunissen
19
Protein Analysis Calculating a Profile For each alignment position: take the (weighted) average of the appropriate rows from the scoring matrix An (extremely simple) example: seq_01 A A A A A A A A A A W seq_02 A A A A A A A A A W W seq_03 A A A A A A A A W W W seq_04 A A A A A A A W W W W seq_05 A A A A A A W W W W W seq_06 A A A A A W W W W W W seq_07 A A A A W W W W W W W seq_08 A A A W W W W W W W W seq_09 A A W W W W W W W W W seq_10 A W W W W W W W W W W © 2008, Jack A.M. Leunissen
20
prophecy (EMBOSS), using Henikoff profile type, and BLOSUM62 matrix;
Protein Analysis Excerpt from the EBLOSUM62 matrix: A R N D C Q E G H I L K M F P S T W Y V A W A C D E F G H I K L M 10A: N P Q R S T V W Y 5A+5W: 10W: prophecy (EMBOSS), using Henikoff profile type, and BLOSUM62 matrix; © 2008, Jack A.M. Leunissen
21
Protein Analysis Pattern Searching Short linear motifs: e.g. Profiles: meme © 2008, Jack A.M. Leunissen
22
Protein Analysis Exercise Use a number of sequences wich contain the prosite subtilase motif and find motifs in those sequences with MEME © 2008, Jack A.M. Leunissen
23
Protein Analysis Hydropathy Plot Prediction hydrophobic and hydrophilic regions in a protein © 2008, Jack A.M. Leunissen
24
Partition Coefficients
Protein Analysis Partition Coefficients Hydrophilic Hydrophobic Oil Water © 2008, Jack A.M. Leunissen
25
Hydrophobicity/Hydrophilicity Values
Protein Analysis Hydrophobicity/Hydrophilicity Values Fauchere & Pliska Kyte & Doolittle Hopp & Woods Eisenberg R K D Q N E H S T P Y C G A M W L V F I hydrophilic hydrophobic © 2008, Jack A.M. Leunissen
26
Protein Analysis Hydrophobicity Plot Sum amino acid hydrophobicity values in a given window Plot the value in the middle of the window Shift the window one position © 2008, Jack A.M. Leunissen
27
Sliding Window Approach
Protein Analysis Sliding Window Approach Calculate property for first sub-sequence Use the result (plot/print/store) Move to next residue position, and repeat © 2008, Jack A.M. Leunissen
28
MEZCALTASTESVERYNICE
Protein Analysis Hydrophobicity Plot MEZCALTASTESVERYNICE © 2008, Jack A.M. Leunissen
29
MEZCALTASTESVERYNICE
Protein Analysis Hydrophobicity Plot MEZCALTASTESVERYNICE © 2008, Jack A.M. Leunissen
30
MEZCALTASTESVERYNICE
Protein Analysis Hydrophobicity Plot MEZCALTASTESVERYNICE © 2008, Jack A.M. Leunissen
31
MEZCALTASTESVERYNICE
Protein Analysis Hydrophobicity Plot MEZCALTASTESVERYNICE © 2008, Jack A.M. Leunissen
32
MEZCALTASTESVERYNICE
Protein Analysis Hydrophobicity Plot MEZCALTASTESVERYNICE © 2008, Jack A.M. Leunissen
33
MEZCALTASTESVERYNICE
Protein Analysis Hydrophobicity Plot MEZCALTASTESVERYNICE © 2008, Jack A.M. Leunissen
34
MEZCALTASTESVERYNICE
Protein Analysis Hydrophobicity Plot MEZCALTASTESVERYNICE © 2008, Jack A.M. Leunissen
35
Transmembrane Regions
Protein Analysis Transmembrane Regions Rotation is 100 degrees per amino acid Climb is 1.5 Angstrom per amino acid residue © 2008, Jack A.M. Leunissen
36
Transmembrane Regions
Protein Analysis Transmembrane Regions So we need approx. 30 / 1.5 = 20 amino acids to span the membrane 30 angstrom © 2008, Jack A.M. Leunissen
37
Protein Analysis © 2008, Jack A.M. Leunissen
38
Protein Analysis Adapting the window size to the size of the membrane spanning segment makes the picture easier to interpret © 2008, Jack A.M. Leunissen
39
window = 1 window = 9 window = 19 window = 121 Protein Analysis
© 2008, Jack A.M. Leunissen
40
Protein Analysis Protein Interactions © 2008, Jack A.M. Leunissen
41
Protein Interactions hemoglobin Obligatory
Explain homodimer, heterodimer Obligatory
42
Protein Interactions hemoglobin Mitochondrial Cu transporters
Obligatory Transient
43
Experimental approaches (1)
Yeast two-hybrid (Y2H) Old / small-scale: co-immunoprecipitation Disadvantages: problem with homodimers, problem with large complexes
44
Experimental approaches (2)
Affinity Purification + mass spectrometry (AP-MS) Also indirect interactions
45
Interaction Databases
STRING
46
Interaction Databases
PPI + genetic / function association
47
Interaction Databases
STRING HPRD
48
Interaction Databases
Only human and only small-scale based
49
Interaction Databases
STRING HPRD InteroPorc Many others…. E.g. see
50
Yeast protein interaction network
51
Sequence-based Protein Binding Site Prediction
Protein Analysis Sequence-based Protein Binding Site Prediction © 2008, Jack A.M. Leunissen
52
Binding site
53
Protein Analysis Binding site © 2008, Jack A.M. Leunissen
54
Predefined motifs
55
Predefined motifs
56
Predefined motifs
57
Predefined motifs
58
Motif search in groups of proteins
Group proteins which have same interaction partner Use motif search, e.g. find PWMs Neduva Plos Biol 2005
59
Motif search in groups of proteins
Group proteins which have same interaction partner Use motif search
60
Correlated Motif Search
61
Correlated Motif Search
Interactors Non-interactors AARLL PLTEQ AARLL MARLT MARLT DLTEP VVRLM MARLT VVRLM MMTER PLTEQ DLTEP Correlated Motif Pair: (RL,TE)
62
Experimental validation
Van Dijk et al, Plos Comp Biol 2010
63
New approach: slider Faster approach genome wide searching for interaction motifs Improve mining algorithm with a priori biological knowledge (conservation score, surface accessibility) Boyen et al, IEEE/ACM Trans Comput Biol Bioinform. 2011
64
Protein Analysis THE END….. Questions? © 2008, Jack A.M. Leunissen
65
Protein Analysis © 2008, Jack A.M. Leunissen
66
Secondary Structure Prediction
Protein Analysis Secondary Structure Prediction © 2008, Jack A.M. Leunissen
67
Secondary Structure Prediction
Protein Analysis Secondary Structure Prediction Traditional methods (statistical and/or rule-based) E.g. Garnier, Osguthorpe & Robson Statistical method Accuracy ~ 60% © 2008, Jack A.M. Leunissen
68
GOR Helix Parameters i-8 i-6 i-4 i-2 i i+2 i+4 i+6 i+8
Protein Analysis GOR Helix Parameters i-8 i-6 i-4 i-2 i i+2 i+4 i+6 i+8 Gly ala val leu ile ser thr asp glu asn gln lys his arg phe tyr trp cys met pro © 2008, Jack A.M. Leunissen
69
I S G A R N I E R H E L I X P R E D I C T
Protein Analysis I S G A R N I E R H E L I X P R E D I C T i-8 i-6 i-4 i-2 i i+2 i+4 i+6 i+8 Gly ala val leu ile ser thr asp glu asn gln lys his arg phe tyr trp cys met pro © 2008, Jack A.M. Leunissen
70
GOR Prediction beta sheet helix Protein Analysis
© 2008, Jack A.M. Leunissen
71
Secondary Structure Prediction
Protein Analysis Secondary Structure Prediction Recent methods Neural networks = flexible statistics Multiple alignments = variability Heuristics = common sense Or a combination of the above Accuracy ~ 70% © 2008, Jack A.M. Leunissen
72
Protein Analysis Heuristics Conserved parts are structurally and/or functionally important Segments with many gaps must be in loop regions © 2008, Jack A.M. Leunissen
73
Secondary Structure Prediction
Protein Analysis Secondary Structure Prediction Strategy Use as many methods as possible Use homologous sequences Combine predictions into consensus prediction © 2008, Jack A.M. Leunissen
74
Why can’t it be 100% correct?
Protein Analysis Why can’t it be 100% correct? All current 2D prediction schemes are based upon observation of occurrence of 2D elements in 3D structures Deduction of 2D elements from structures is ambiguous! DSSP, Stride, and the PDB (human) annotation do not always agree upon the assigned elements © 2008, Jack A.M. Leunissen
75
Do these residues still belong to the helix?
Protein Analysis Do these residues still belong to the helix? © 2008, Jack A.M. Leunissen
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.