Download presentation
Presentation is loading. Please wait.
1
Repeats and composition bias
Miguel Andrade Faculty of Biology, Johannes Gutenberg University Institute of Molecular Biology Mainz, Germany
2
Repeats
3
Frequency 14% proteins contains repeats (Marcotte et al, 1999)
1: Single amino acid repeats. 2: Longer imperfect tandem repeats. Assemble in structure.
4
Definition repeats Sequence, long, imperfect, tandem
MRAVVKSPIMCHEKSPSVCSPLNMTSSVCSPAGINSVSSTTASF GSFPVHSPITQGTPLTCSPNVENRGSRSHSPAHASNVGSPLSSPLSSMKSSISSPPSHCSVKSPVSSPNNVTLRSSVSSPANINN
5
Definition repeats Sequence, long, imperfect, tandem
MRAVVKSPIMCHEKSPSVCSPLNMTSSVCSPAGINSVSSTTASFGSFPVHSPITQGTPLTCSPNVENRGSRSHSPAHASNVGSPLSSPLSSMKSSISSPPSHCSVKSPVSSPNNVTLRSSVSSPANINN
6
Definition repeats Sequence, long, imperfect, tandem MRAVVKSPIM CHE
KSPSVCSPLN MTSSVCSPAG INSVSSTTASF GSFPVHSPIT Q GTPLTCSPNV EN RGSRSHSPAH ASN VGSPLSSPLS S MKSSISSPPS HCS VKSPVSSPNN VT LRSSVSSPAN INN
7
Definition repeats Sequence, long, imperfect, tandem MRAVVKSPIM CHE
KSPSVCSPLN MTSSVCSPAG INSVSSTTASF GSFPVHSPIT Q GTPLTCSPNV EN RGSRSHSPAH ASN VGSPLSSPLS S MKSSISSPPS HCS VKSPVSSPNN VT LRSSVSSPAN INN
8
Tandem repeats fold together
9
Tandem repeats fold together
10
Tandem repeats fold together
11
Tandem repeats fold together
12
Tandem repeats fold together
13
Tandem repeats fold together
14
Definition repeats Sequence, long, imperfect, tandem MRAVVKSPIM CHE
KSPSVCSPLN MTSSVCSPAG INSVSSTTASF GSFPVHSPIT Q GTPLTCSPNV EN RGSRSHSPAH ASN VGSPLSSPLS S MKSSISSPPS HCS VKSPVSSPNN VT LRSSVSSPAN INN
15
(Vlassi et al, 2013) 15
16
A subunit PP2A structure
PDB:1b3u Groves et al. (1999) Cell
17
Ap1 Clathrin Adaptor Core
PDB:1w63 Heldwein et al. (2004) PNAS
18
Ap1 Clathrin Adaptor Core
PDB:1w63 Heldwein et al. (2004) PNAS
19
i-TASSER model of D. melanogaster thr protein
Based on PDB 4BUJ chain B
20
PDB 4BUJ Ski complex (yeast)
21
Andrade et al. (2001) J Struct Biol
22
Definition CBRs Perfect repeat: QQQQQQQQQQQ Imperfect: QQQQPQQQQQQ
Amino acid type: DDDDDEEEDEDEED Compositionally biased regions (CBRs) High frequency of one or two amino acids in a region. Particular case of low complexity region
23
Detection CBRs Sometimes straightforward. N-terminal human Huntingtin.
How many CBRs can you find? >sp|P42858|HD_HUMAN Huntingtin OS=Homo sapiens MATLEKLMKAFESLKSFQQQQQQQQQQQQQQQQQQQQQPPPPPPPPPPPQLPQPPPQAQP LLPQPQPPPPPPPPPPGPAVAEEPLHRPKKELSATKKDRVNHCLTICENIVAQSVRNSPE FQKLLGIAMELFLLCSDDAESDVRMVADECLNKVIKALMDSNLPRLQLELYKEIKKNGAP RSLRAALWRFAELAHLVRPQKCRPYLVNLLPCLTRTSKRPEESVQETLAAAVPKIMASFG NFANDNEIKVLLKAFIANLKSSSPTIRRTAAGSAVSICQHSRRTQYFYSWLLNVLLGLLV PVEDEHSTLLILGVLLTLRYLVPLLQQQVKDTSLKGSFGVTRKEMEVSPSAEQLVQVYEL TLHHTQHQDHNVVTGALELLQQLFRTPPPELLQTLTAVGGIGQLTAAKEESGGRSRSGSI VELIAGGGSSCSPVLSRKQKGKVLLGEEEALEDDSESRSDVSSSALTASVKDEISGELAA SSGVSTPGSAGHDIITEQPRSQHTLQADSVDLASCDLTSSATDGDEEDILSHSSSQVSAV PSDPAMDLNDGTQASSPISDSSQTTTEGPDSAVTPSDSSEIVLDGTDNQYLGLQIGQPQD EDEEATGILPDEASEAFRNSSMALQQAHLLKNMSHCRQPSDSSVDKFVLRDEATEPGDQE NKPCRIKGDIGQSTDDDSAPLVHCVRLLSASFLLTGGKNVLVPDRDVRVSVKALALSCVG AAVALHPESFFSKLYKVPLDTTEYPEEQYVSDILNYIDHGDPQVRGATAILCGTLICSIL
24
Detection CBRs Sometimes straightforward. N-terminal human Huntingtin.
How many CBRs can you find? >sp|P42858|HD_HUMAN Huntingtin OS=Homo sapiens MATLEKLMKAFESLKSFQQQQQQQQQQQQQQQQQQQQQPPPPPPPPPPPQLPQPPPQAQP LLPQPQPPPPPPPPPPGPAVAEEPLHRPKKELSATKKDRVNHCLTICENIVAQSVRNSPE FQKLLGIAMELFLLCSDDAESDVRMVADECLNKVIKALMDSNLPRLQLELYKEIKKNGAP RSLRAALWRFAELAHLVRPQKCRPYLVNLLPCLTRTSKRPEESVQETLAAAVPKIMASFG NFANDNEIKVLLKAFIANLKSSSPTIRRTAAGSAVSICQHSRRTQYFYSWLLNVLLGLLV PVEDEHSTLLILGVLLTLRYLVPLLQQQVKDTSLKGSFGVTRKEMEVSPSAEQLVQVYEL TLHHTQHQDHNVVTGALELLQQLFRTPPPELLQTLTAVGGIGQLTAAKEESGGRSRSGSI VELIAGGGSSCSPVLSRKQKGKVLLGEEEALEDDSESRSDVSSSALTASVKDEISGELAA SSGVSTPGSAGHDIITEQPRSQHTLQADSVDLASCDLTSSATDGDEEDILSHSSSQVSAV PSDPAMDLNDGTQASSPISDSSQTTTEGPDSAVTPSDSSEIVLDGTDNQYLGLQIGQPQD EDEEATGILPDEASEAFRNSSMALQQAHLLKNMSHCRQPSDSSVDKFVLRDEATEPGDQE NKPCRIKGDIGQSTDDDSAPLVHCVRLLSASFLLTGGKNVLVPDRDVRVSVKALALSCVG AAVALHPESFFSKLYKVPLDTTEYPEEQYVSDILNYIDHGDPQVRGATAILCGTLICSIL
25
Detection CBRs Sometimes straightforward. N-terminal human Huntingtin.
How many CBRs can you find? >sp|P42858|HD_HUMAN Huntingtin OS=Homo sapiens MATLEKLMKAFESLKSFQQQQQQQQQQQQQQQQQQQQQPPPPPPPPPPPQLPQPPPQAQP LLPQPQPPPPPPPPPPGPAVAEEPLHRPKKELSATKKDRVNHCLTICENIVAQSVRNSPE FQKLLGIAMELFLLCSDDAESDVRMVADECLNKVIKALMDSNLPRLQLELYKEIKKNGAP RSLRAALWRFAELAHLVRPQKCRPYLVNLLPCLTRTSKRPEESVQETLAAAVPKIMASFG NFANDNEIKVLLKAFIANLKSSSPTIRRTAAGSAVSICQHSRRTQYFYSWLLNVLLGLLV PVEDEHSTLLILGVLLTLRYLVPLLQQQVKDTSLKGSFGVTRKEMEVSPSAEQLVQVYEL TLHHTQHQDHNVVTGALELLQQLFRTPPPELLQTLTAVGGIGQLTAAKEESGGRSRSGSI VELIAGGGSSCSPVLSRKQKGKVLLGEEEALEDDSESRSDVSSSALTASVKDEISGELAA SSGVSTPGSAGHDIITEQPRSQHTLQADSVDLASCDLTSSATDGDEEDILSHSSSQVSAV PSDPAMDLNDGTQASSPISDSSQTTTEGPDSAVTPSDSSEIVLDGTDNQYLGLQIGQPQD EDEEATGILPDEASEAFRNSSMALQQAHLLKNMSHCRQPSDSSVDKFVLRDEATEPGDQE NKPCRIKGDIGQSTDDDSAPLVHCVRLLSASFLLTGGKNVLVPDRDVRVSVKALALSCVG AAVALHPESFFSKLYKVPLDTTEYPEEQYVSDILNYIDHGDPQVRGATAILCGTLICSIL
26
Detection CBRs Sometimes straightforward. N-terminal human Huntingtin.
How many CBRs can you find? >sp|P42858|HD_HUMAN Huntingtin OS=Homo sapiens MATLEKLMKAFESLKSFQQQQQQQQQQQQQQQQQQQQQPPPPPPPPPPPQLPQPPPQAQP LLPQPQPPPPPPPPPPGPAVAEEPLHRPKKELSATKKDRVNHCLTICENIVAQSVRNSPE FQKLLGIAMELFLLCSDDAESDVRMVADECLNKVIKALMDSNLPRLQLELYKEIKKNGAP RSLRAALWRFAELAHLVRPQKCRPYLVNLLPCLTRTSKRPEESVQETLAAAVPKIMASFG NFANDNEIKVLLKAFIANLKSSSPTIRRTAAGSAVSICQHSRRTQYFYSWLLNVLLGLLV PVEDEHSTLLILGVLLTLRYLVPLLQQQVKDTSLKGSFGVTRKEMEVSPSAEQLVQVYEL TLHHTQHQDHNVVTGALELLQQLFRTPPPELLQTLTAVGGIGQLTAAKEESGGRSRSGSI VELIAGGGSSCSPVLSRKQKGKVLLGEEEALEDDSESRSDVSSSALTASVKDEISGELAA SSGVSTPGSAGHDIITEQPRSQHTLQADSVDLASCDLTSSATDGDEEDILSHSSSQVSAV PSDPAMDLNDGTQASSPISDSSQTTTEGPDSAVTPSDSSEIVLDGTDNQYLGLQIGQPQD EDEEATGILPDEASEAFRNSSMALQQAHLLKNMSHCRQPSDSSVDKFVLRDEATEPGDQE NKPCRIKGDIGQSTDDDSAPLVHCVRLLSASFLLTGGKNVLVPDRDVRVSVKALALSCVG AAVALHPESFFSKLYKVPLDTTEYPEEQYVSDILNYIDHGDPQVRGATAILCGTLICSIL
27
Detection repeats Sometimes straightforward.
N-terminal human Huntingtin. How many repeats can you find? >sp|P42858|HD_HUMAN Huntingtin OS=Homo sapiens MATLEKLMKAFESLKSFQQQQQQQQQQQQQQQQQQQQQPPPPPPPPPPPQLPQPPPQAQP LLPQPQPPPPPPPPPPGPAVAEEPLHRPKKELSATKKDRVNHCLTICENIVAQSVRNSPE FQKLLGIAMELFLLCSDDAESDVRMVADECLNKVIKALMDSNLPRLQLELYKEIKKNGAP RSLRAALWRFAELAHLVRPQKCRPYLVNLLPCLTRTSKRPEESVQETLAAAVPKIMASFG NFANDNEIKVLLKAFIANLKSSSPTIRRTAAGSAVSICQHSRRTQYFYSWLLNVLLGLLV PVEDEHSTLLILGVLLTLRYLVPLLQQQVKDTSLKGSFGVTRKEMEVSPSAEQLVQVYEL TLHHTQHQDHNVVTGALELLQQLFRTPPPELLQTLTAVGGIGQLTAAKEESGGRSRSGSI VELIAGGGSSCSPVLSRKQKGKVLLGEEEALEDDSESRSDVSSSALTASVKDEISGELAA SSGVSTPGSAGHDIITEQPRSQHTLQADSVDLASCDLTSSATDGDEEDILSHSSSQVSAV PSDPAMDLNDGTQASSPISDSSQTTTEGPDSAVTPSDSSEIVLDGTDNQYLGLQIGQPQD EDEEATGILPDEASEAFRNSSMALQQAHLLKNMSHCRQPSDSSVDKFVLRDEATEPGDQE NKPCRIKGDIGQSTDDDSAPLVHCVRLLSASFLLTGGKNVLVPDRDVRVSVKALALSCVG AAVALHPESFFSKLYKVPLDTTEYPEEQYVSDILNYIDHGDPQVRGATAILCGTLICSIL
28
Detection repeats Often NOT straightforward.
N-terminal human Huntingtin. How many repeats can you find? >sp|P42858|HD_HUMAN Huntingtin OS=Homo sapiens MATLEKLMKAFESLKSFQQQQQQQQQQQQQQQQQQQQQPPPPPPPPPPPQLPQPPPQAQP LLPQPQPPPPPPPPPPGPAVAEEPLHRPKKELSATKKDRVNHCLTICENIVAQSVRNSPE FQKLLGIAMELFLLCSDDAESDVRMVADECLNKVIKALMDSNLPRLQLELYKEIKKNGAP RSLRAALWRFAELAHLVRPQKCRPYLVNLLPCLTRTSKRPEESVQETLAAAVPKIMASFG NFANDNEIKVLLKAFIANLKSSSPTIRRTAAGSAVSICQHSRRTQYFYSWLLNVLLGLLV PVEDEHSTLLILGVLLTLRYLVPLLQQQVKDTSLKGSFGVTRKEMEVSPSAEQLVQVYEL TLHHTQHQDHNVVTGALELLQQLFRTPPPELLQTLTAVGGIGQLTAAKEESGGRSRSGSI VELIAGGGSSCSPVLSRKQKGKVLLGEEEALEDDSESRSDVSSSALTASVKDEISGELAA SSGVSTPGSAGHDIITEQPRSQHTLQADSVDLASCDLTSSATDGDEEDILSHSSSQVSAV PSDPAMDLNDGTQASSPISDSSQTTTEGPDSAVTPSDSSEIVLDGTDNQYLGLQIGQPQD EDEEATGILPDEASEAFRNSSMALQQAHLLKNMSHCRQPSDSSVDKFVLRDEATEPGDQE NKPCRIKGDIGQSTDDDSAPLVHCVRLLSASFLLTGGKNVLVPDRDVRVSVKALALSCVG AAVALHPESFFSKLYKVPLDTTEYPEEQYVSDILNYIDHGDPQVRGATAILCGTLICSIL
29
Detection repeats Often NOT straightforward.
N-terminal human Huntingtin. How many repeats can you find? EFQKLLGIAMELFLLCSDDAESDVRMVADECLNKVIKA CRPYLVNLLPCLTRTSKRP-EESVQETLAAAVPKIMAS NDNEIKVLLKAFIANLKSSSPTIRRTAAGSAVSICQHS TQYFYSWLLNVLLGLLVPVEDEHSTLLILGVLLTLRYL PSAEQLVQVYELTLHHTQHQDHNVVTGALELLQQLFRT
30
Detection repeats Often NOT straightforward.
N-terminal human Huntingtin. How many repeats can you find? EFQKLLGIAMELFLLCSDDAESDVRMVADECLNKVIKA CRPYLVNLLPCLTRTSKRP-EESVQETLAAAVPKIMAS NDNEIKVLLKAFIANLKSSSPTIRRTAAGSAVSICQHS TQYFYSWLLNVLLGLLVPVEDEHSTLLILGVLLTLRYL PSAEQLVQVYELTLHHTQHQDHNVVTGALELLQQLFRT
31
Repeats
32
Frequency repeats Fraction of proteins annotated with the keyword REPEAT in SwissProt % Archaea 27/ Viruses 81/ Bacteria 299/ Fungi / Viridiplantae 153/ Metazoa / Rest of Eukaryota 92/ (Andrade et al 2001)
33
Detection of repeats Dotplots Comparing a sequence against itself
34
Detection of repeats Dotplots TLRSSVSSPANINNS NMTSSVCSPANISV
35
Detection of repeats Dotplots TLRSSVSSPANINNS | 1 match NMTSSVCSPANISV
36
Detection of repeats Dotplots TLRSSVSSPANINNS ||| ||||| NMTSSVCSPANISV
8 matches NMTSSVCSPANISV
37
Detection of repeats Dotplots TLRSSVSSPANINNS | | NMTSSVCSPANISV
| | 2 matches NMTSSVCSPANISV
38
Detection of repeats Dotplots TLRSSVSSPANINNS | 1 match NMTSSVCSPANISV
39
Detection of repeats Dotplots TLRSSVSSPANINNS NMTSSVCSPANISV 8
40
Detection of repeats Dotplots TLRSSVSSPANINNS NMTSSVCSPANISV 1821
41
Exercise 1
42
Exercise 1/4. Using Dotlet with the human mineralocorticoid receptor (MR)
Go to the Dotlet web page: Click on the input button and paste the sequence of the human mineralocorticoid receptor (UniProt id P08235) Click on the “compute” button Try to find combinations of parameters that show patterns in the dot plot (Hint: You can adjust this finely using the arrows) Find repetitions clicking in the diagonal patterns
43
Exercise 1/4. Using Dotlet with the human mineralocorticoid receptor (MR)
44
RepeatsDB
45
RepeatsDB
46
Exercise 3/4. Examining repeat structures in repeatsDB
Go to RepeatsDB: Go to Search, then select database RepeatsDB, and Search field, No.units Choose one example (you might look for an unreviewed one) and write the name in your card. Check the assignment of the units. Is it correct? What about insertions? Are there mistakes? Write the result in the card.
47
Composition bias
48
Definition 14% proteins contains repeats (Marcotte et al, 1999)
1: Single amino acid repeats. 2: Longer imperfect tandem repeats. Assemble in structure.
49
Definition CBRs Perfect repeat: QQQQQQQQQQQ Imperfect: QQQQPQQQQQQ
Amino acid type: DDDDDEEEDEDEED Compositionally biased regions (CBRs) High frequency of one or two amino acids in a region. Particular case of low complexity region
50
Function CBRs Conservation => Function
Length, amino acid type not necessarily conserved Frequency: 1 in 3 proteins contains a compositionally biased region (Wootton, 1994), ~11% conserved (Sim and Creamer, 2004)
51
Function CBRs Conservation => Function
Length, amino acid type not necessarily conserved Functions: Passive: linkers Active: binding, mediate protein interaction, structural integrity (Sim and Creamer, 2004)
52
Structure of CBRs Often variable or flexible: do not easily crystalize
53
1CJF: profilin bound to polyP
54
2IF8: Inositol Phosphate Multikinase Ipk2
55
2IF8: Inositol Phosphate Multikinase Ipk2
RVSETTTSGSL
56
2CX5: mitochondrial cytochrome c
B subunit N-terminal
57
2CX5: mitochondrial cytochrome c
B subunit N-terminal FFFFIFVFNF
58
Amino acid repeats Distribution is not random: Eukaryota:
Most common: poly-Q, poly-N, poly-A, poly-S, poly-G Prokaryota: Most common: poly-S, poly-G, poly-A, poly-P Relatively rare: poly-Q, poly-N Very rare or absent in both eukaryota and prokaryota: Poly-I, Poly-M, Poly-W, Poly-C, Poly-Y Toxicity of long stretches of hydrophobic residues. (Faux et al 2005)
59
AIR9 Microtubule localization of Δx-GFP (1708 aa) Ser rich + basic
conserved region LRR A9 repeats Δ1 Δ3 Δ2 Δ15 Δ9 Δ10 Δ6 Δ11 Δ12 Δ14 Δ16 Buschmann, et al (2006). Current Biology. Buschmann, et al (2007). Plant Signaling & Behavior Microtubule localization of Δx-GFP
60
Homorepeats are frequent but difficult to characterize
Pablo Mier e.g. polyQ: MATLEKLMKAFESLKSFQQQQQQQQQQQQQQQQQQQQQQQPPPPPPPPPPPQLPQP 10% of human proteins have homorepeats lack sequence conservation not possible to predict function by homology 30% residues are disordered 50% human proteins have one long disordered segment Homorepeats need to be studied in context
61
Context and evolution of homorepeats
dAtabase of PolyX Evolution Mier et al. (2016) Bioinformatics
62
Context and evolution of homorepeats
Left: ataxin1; right: PRR11 dAtabase of PolyX Evolution Mier et al. (2016) Bioinformatics
63
Exercise 4/4. Viewing homorepeats in an alignment with dAPE
Find a human protein starting with the letter in your card. Use UniProt advanced search: Organism “Human”, Entry Name [ID]: “a*” (for a). Write the Entry Name in your card. Copy the Entry ID (e.g. P10275). Go to the dAPE web page: Use the Entry ID in option A Hit the “Report the evolution of its polyX" button Which polyX has the human protein? Which other polyX not in the human protein were found in the orthologs? Write these in your card.
64
Exercise 4/4. Viewing homorepeats in an alignment with dAPE
Name here
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.