Protein Sequence Motifs

Protein Sequence Motifs
Protein Analysis Protein Sequence Motifs Aalt-Jan van Dijk Plant Research International, Wageningen UR Biometris, Wageningen UR © 2008, Jack A.M. Leunissen

Plant Bioinformatics Integrated analysis of omics datasets Genomics
27/01/2018 Genomics Next Generation Sequencing Genome assembly & annotation (Comparative) genome analysis SNP analysis, marker development Technology Computational infrastructure Database development Webbased analysis tools Software- development Workflow management systems machine learning Integrated analysis of omics datasets Transcriptomics Alternative splicing EST analysis Proteomics Data (pre-)processing pipelining Protein interactions networks Metabolomics Database- development Metabolite and pathway-identification Systems biology network modelling (bottom-up)

My research Protein complex structures Protein-protein docking
27/01/2018 My research Protein complex structures Protein-protein docking Correlated mutations Interaction site prediction/analysis Protein-protein interactions Protein-DNA interactions Motif search Enzyme active sites

Overview Protein Motif Searching
Protein Analysis Overview Protein Motif Searching Hydrophobicity & Transmembrane Domains Protein Interactions Sequence-motifs to predict interaction sites Secondary Structure Prediction © 2008, Jack A.M. Leunissen

Protein Motif Searching
Protein Analysis Protein Motif Searching © 2008, Jack A.M. Leunissen

Protein Analysis What is a motif? A motif is a description of a particular element of a protein that contains a specific sequence pattern Motifs are identified by 3D structural alignment Multiple sequence alignment Pattern searching programs © 2008, Jack A.M. Leunissen

Protein Analysis Protein Motif Searching Strict consensus pattern use only strictly conserved residues C--QASCDGIPLKMNDC C---VTCEGLPMRMDQC CERTLGCQPMPVH---C CxxxxxCxxxPxxxxxC C C P C © 2008, Jack A.M. Leunissen

Protein Analysis Protein Motif Searching Strict consensus pattern use only strictly conserved residues But what about: variable residues? gaps? C--QASCDGIPLKMNDC C---VTCEGLPMRMDQC CERTLGCQPMPVH---C CxxxxxCxxxPxxxxxC C C P C © 2008, Jack A.M. Leunissen

Protein Analysis Protein Motif Searching Strict consensus patterns contain no alternative residues no flexible regions no mismatches no gaps C--QASCDGIPLKMNDC C---VTCEGLPMRMDQC CERTLGCQPMPVH---C C C P C CxxxxxCxxxPxxxxxC © 2008, Jack A.M. Leunissen

Protein Analysis Protein Motif Searching Most motifs defined as regular expressions Motifs can contain alternative residues flexible regions C-x(2,5)-C-x-[GP]-x-P-x(2,5)-C CXXXCXGXPXXXXXC | | | | | FGCAKLCAGFPLRRLPCFYG © 2008, Jack A.M. Leunissen

The PROSITE Syntax A-[BC]-X-D(2,5)-{EFG}-H A B or C anything 2-5 D’s
Protein Analysis The PROSITE Syntax A-[BC]-X-D(2,5)-{EFG}-H A B or C anything 2-5 D’s not E, F, or G H © 2008, Jack A.M. Leunissen

Protein Analysis PROSITE entries Mandatory motifs characterise a protein (super-) family ID SUBTILASE_ASP; PATTERN. DE Serine proteases, subtilase family, aspartic acid active site. PA [STAIV]-x-[LIVMF]-[LIVM]-D-[DSTA]-G-[LIVMFC]-x(2,3)-[DNH]. ID SUBTILASE_HIS; PATTERN. DE Serine proteases, subtilase family, histidine active site. PA H-G-[STM]-x-[VIC]-[STAGC]-[GS]-x-[LIVMA]-[STAGCLV]-[SAGM]. ID SUBTILASE_SER; PATTERN. DE Serine proteases, subtilase family, serine active site. PA G-T-S-x-[SA]-x-P-x(2)-[STAVC]-[AG]. © 2008, Jack A.M. Leunissen

Protein Analysis Exercise Find the three subtilase motifs in prosite (prosite.expasy.org) Compare the lists of proteins in which the motifs occur – what does this tell you? Similarly, compare protein structures in which the motifs occur Have a look at the “sequence logo” © 2008, Jack A.M. Leunissen

Protein Analysis Protein Motif Searching Some motifs occur frequently in proteins; they may not actually be present, such as Post-translational modification sites ID ASN_GLYCOSYLATION; PATTERN. DE N-glycosylation site. PA N-{P}-[ST]-{P}. © 2008, Jack A.M. Leunissen

Protein Analysis Exercise Use a glycosylation site predictor such as Input: your favorite set of sequences Do you observe that some N-{P}-[ST] sites are likely to be glycosylated and others not? © 2008, Jack A.M. Leunissen

Profiles Many motifs cannot be easily defined using simple patterns
Protein Analysis Profiles Many motifs cannot be easily defined using simple patterns Such motifs can be defined using profiles A profile is constructed from a multiple sequence alignment. For each position, each amino acid is given a score depending on how likely it is to occur © 2008, Jack A.M. Leunissen

Protein Analysis Calculating a Profile For each alignment position: take the (weighted) average of the appropriate rows from the scoring matrix An (extremely simple) example: seq_01 A A A A A A A A A A W seq_02 A A A A A A A A A W W seq_03 A A A A A A A A W W W seq_04 A A A A A A A W W W W seq_05 A A A A A A W W W W W seq_06 A A A A A W W W W W W seq_07 A A A A W W W W W W W seq_08 A A A W W W W W W W W seq_09 A A W W W W W W W W W seq_10 A W W W W W W W W W W © 2008, Jack A.M. Leunissen

prophecy (EMBOSS), using Henikoff profile type, and BLOSUM62 matrix;
Protein Analysis Excerpt from the EBLOSUM62 matrix: A R N D C Q E G H I L K M F P S T W Y V A W A C D E F G H I K L M 10A: N P Q R S T V W Y 5A+5W: 10W: prophecy (EMBOSS), using Henikoff profile type, and BLOSUM62 matrix; © 2008, Jack A.M. Leunissen

Partition Coefficients
Protein Analysis Partition Coefficients Hydrophilic Hydrophobic Oil Water © 2008, Jack A.M. Leunissen

Hydrophobicity/Hydrophilicity Values
Protein Analysis Hydrophobicity/Hydrophilicity Values Fauchere & Pliska Kyte & Doolittle Hopp & Woods Eisenberg R K D Q N E H S T P Y C G A M W L V F I hydrophilic hydrophobic © 2008, Jack A.M. Leunissen

Sliding Window Approach
Protein Analysis Sliding Window Approach Calculate property for first sub-sequence Use the result (plot/print/store) Move to next residue position, and repeat © 2008, Jack A.M. Leunissen

MEZCALTASTESVERYNICE
Protein Analysis Hydrophobicity Plot MEZCALTASTESVERYNICE © 2008, Jack A.M. Leunissen

Transmembrane Regions
Protein Analysis Transmembrane Regions Rotation is 100 degrees per amino acid Climb is 1.5 Angstrom per amino acid residue © 2008, Jack A.M. Leunissen

Transmembrane Regions
Protein Analysis Transmembrane Regions So we need approx. 30 / 1.5 = 20 amino acids to span the membrane 30 angstrom © 2008, Jack A.M. Leunissen

window = 1 window = 9 window = 19 window = 121 Protein Analysis
© 2008, Jack A.M. Leunissen

Protein Interactions hemoglobin Obligatory
Explain homodimer, heterodimer Obligatory

Protein Interactions hemoglobin Mitochondrial Cu transporters
Obligatory Transient

Experimental approaches (1)
Yeast two-hybrid (Y2H) Old / small-scale: co-immunoprecipitation Disadvantages: problem with homodimers, problem with large complexes

Experimental approaches (2)
Affinity Purification + mass spectrometry (AP-MS) Also indirect interactions

Interaction Databases
STRING

PPI + genetic / function association

STRING HPRD

Only human and only small-scale based

STRING HPRD InteroPorc Many others…. E.g. see

Yeast protein interaction network

Sequence-based Protein Binding Site Prediction
Protein Analysis Sequence-based Protein Binding Site Prediction © 2008, Jack A.M. Leunissen

Binding site

Predefined motifs

Motif search in groups of proteins
Group proteins which have same interaction partner Use motif search, e.g. find PWMs Neduva Plos Biol 2005

Motif search in groups of proteins
Group proteins which have same interaction partner Use motif search

Correlated Motif Search

Correlated Motif Search
Interactors Non-interactors AARLL PLTEQ AARLL MARLT MARLT DLTEP VVRLM MARLT VVRLM MMTER PLTEQ DLTEP Correlated Motif Pair: (RL,TE)

Experimental validation
Van Dijk et al, Plos Comp Biol 2010

New approach: slider Faster approach  genome wide searching for interaction motifs Improve mining algorithm with a priori biological knowledge (conservation score, surface accessibility) Boyen et al, IEEE/ACM Trans Comput Biol Bioinform. 2011

GOR Helix Parameters i-8 i-6 i-4 i-2 i i+2 i+4 i+6 i+8
Protein Analysis GOR Helix Parameters i-8 i-6 i-4 i-2 i i+2 i+4 i+6 i+8 Gly ala val leu ile ser thr asp glu asn gln lys his arg phe tyr trp cys met pro © 2008, Jack A.M. Leunissen

I S G A R N I E R H E L I X P R E D I C T
Protein Analysis I S G A R N I E R H E L I X P R E D I C T i-8 i-6 i-4 i-2 i i+2 i+4 i+6 i+8 Gly ala val leu ile ser thr asp glu asn gln lys his arg phe tyr trp cys met pro © 2008, Jack A.M. Leunissen

Protein Analysis Secondary Structure Prediction Recent methods Neural networks = flexible statistics Multiple alignments = variability Heuristics = common sense Or a combination of the above Accuracy ~ 70% © 2008, Jack A.M. Leunissen

Why can’t it be 100% correct?
Protein Analysis Why can’t it be 100% correct? All current 2D prediction schemes are based upon observation of occurrence of 2D elements in 3D structures Deduction of 2D elements from structures is ambiguous! DSSP, Stride, and the PDB (human) annotation do not always agree upon the assigned elements © 2008, Jack A.M. Leunissen

Do these residues still belong to the helix?
Protein Analysis Do these residues still belong to the helix? © 2008, Jack A.M. Leunissen

Protein Sequence Motifs

Similar presentations

Presentation on theme: "Protein Sequence Motifs"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Protein Sequence Motifs

Similar presentations

Presentation on theme: "Protein Sequence Motifs"— Presentation transcript:

Similar presentations

About project

Feedback