Presentation is loading. Please wait.

Presentation is loading. Please wait.

The FREAKS Session 3.1: Repeats Session 3.2: Biased regions Miguel Andrade Johannes-Gutenberg University of Mainz of PROTEIN SEQUENCE.

Similar presentations


Presentation on theme: "The FREAKS Session 3.1: Repeats Session 3.2: Biased regions Miguel Andrade Johannes-Gutenberg University of Mainz of PROTEIN SEQUENCE."— Presentation transcript:

1 The FREAKS Session 3.1: Repeats Session 3.2: Biased regions Miguel Andrade Johannes-Gutenberg University of Mainz Andrade@uni-mainz.de of PROTEIN SEQUENCE

2 Frequency 14% proteins contains repeats (Marcotte et al, 1999) 1: Single amino acid repeats. 2: Longer imperfect tandem repeats. Assemble in structure.

3 Definition repeats Sequence, long, imperfect, tandem MRAVVKSPIMCHEKSPSVCSPLNMTSSVCSPAGINSVSSTTASF GSFPVHSPITQGTPLTCSPNVENRGSRSHSPAHASNVGSPLSSP LSSMKSSISSPPSHCSVKSPVSSPNNVTLRSSVSSPANINN

4 Definition repeats Sequence, long, imperfect, tandem MRAVVKSPIMCHEKSPSVCSPLNMTSSVCSPAGINSVSSTTASF GSFPVHSPITQGTPLTCSPNVENRGSRSHSPAHASNVGSPLSSP LSSMKSSISSPPSHCSVKSPVSSPNNVTLRSSVSSPANINN

5 Definition repeats Sequence, long, imperfect, tandem MRAVVKSPIM CHE KSPSVCSPLN MTSSVCSPAG INSVSSTTASF GSFPVHSPIT Q GTPLTCSPNV EN RGSRSHSPAH ASN VGSPLSSPLS S MKSSISSPPS HCS VKSPVSSPNN VT LRSSVSSPAN INN

6 Definition repeats Sequence, long, imperfect, tandem MRAVVKSPIM CHE KSPSVCSPLN MTSSVCSPAG INSVSSTTASF GSFPVHSPIT Q GTPLTCSPNV EN RGSRSHSPAH ASN VGSPLSSPLS S MKSSISSPPS HCS VKSPVSSPNN VT LRSSVSSPAN INN

7 Tandem repeats fold together

8

9

10

11

12

13 Definition repeats Sequence, long, imperfect, tandem MRAVVKSPIM CHE KSPSVCSPLN MTSSVCSPAG INSVSSTTASF GSFPVHSPIT Q GTPLTCSPNV EN RGSRSHSPAH ASN VGSPLSSPLS S MKSSISSPPS HCS VKSPVSSPNN VT LRSSVSSPAN INN

14 (Vlassi et al, 2013) http://weblogo.berkeley.edu

15 Andrade et al. (2001) J Struct Biol

16 Definition CBRs Perfect repeat: QQQQQQQQQQQ Imperfect: QQQQPQQQQQQ Amino acid type: DDDDDEEEDEDEED Compositionally biased regions (CBRs) High frequency of one or two amino acids in a region. Particular case of low complexity region

17 Detection CBRs Sometimes straightforward. N-terminal human Huntingtin. How many CBRs can you find? >sp|P42858|HD_HUMAN Huntingtin OS=Homo sapiens MATLEKLMKAFESLKSFQQQQQQQQQQQQQQQQQQQQQPPPPPPPPPPPQLPQPPPQAQP LLPQPQPPPPPPPPPPGPAVAEEPLHRPKKELSATKKDRVNHCLTICENIVAQSVRNSPE FQKLLGIAMELFLLCSDDAESDVRMVADECLNKVIKALMDSNLPRLQLELYKEIKKNGAP RSLRAALWRFAELAHLVRPQKCRPYLVNLLPCLTRTSKRPEESVQETLAAAVPKIMASFG NFANDNEIKVLLKAFIANLKSSSPTIRRTAAGSAVSICQHSRRTQYFYSWLLNVLLGLLV PVEDEHSTLLILGVLLTLRYLVPLLQQQVKDTSLKGSFGVTRKEMEVSPSAEQLVQVYEL TLHHTQHQDHNVVTGALELLQQLFRTPPPELLQTLTAVGGIGQLTAAKEESGGRSRSGSI VELIAGGGSSCSPVLSRKQKGKVLLGEEEALEDDSESRSDVSSSALTASVKDEISGELAA SSGVSTPGSAGHDIITEQPRSQHTLQADSVDLASCDLTSSATDGDEEDILSHSSSQVSAV PSDPAMDLNDGTQASSPISDSSQTTTEGPDSAVTPSDSSEIVLDGTDNQYLGLQIGQPQD EDEEATGILPDEASEAFRNSSMALQQAHLLKNMSHCRQPSDSSVDKFVLRDEATEPGDQE NKPCRIKGDIGQSTDDDSAPLVHCVRLLSASFLLTGGKNVLVPDRDVRVSVKALALSCVG AAVALHPESFFSKLYKVPLDTTEYPEEQYVSDILNYIDHGDPQVRGATAILCGTLICSIL

18 Detection CBRs Sometimes straightforward. N-terminal human Huntingtin. How many CBRs can you find? >sp|P42858|HD_HUMAN Huntingtin OS=Homo sapiens MATLEKLMKAFESLKSFQQQQQQQQQQQQQQQQQQQQQPPPPPPPPPPPQLPQPPPQAQP LLPQPQPPPPPPPPPPGPAVAEEPLHRPKKELSATKKDRVNHCLTICENIVAQSVRNSPE FQKLLGIAMELFLLCSDDAESDVRMVADECLNKVIKALMDSNLPRLQLELYKEIKKNGAP RSLRAALWRFAELAHLVRPQKCRPYLVNLLPCLTRTSKRPEESVQETLAAAVPKIMASFG NFANDNEIKVLLKAFIANLKSSSPTIRRTAAGSAVSICQHSRRTQYFYSWLLNVLLGLLV PVEDEHSTLLILGVLLTLRYLVPLLQQQVKDTSLKGSFGVTRKEMEVSPSAEQLVQVYEL TLHHTQHQDHNVVTGALELLQQLFRTPPPELLQTLTAVGGIGQLTAAKEESGGRSRSGSI VELIAGGGSSCSPVLSRKQKGKVLLGEEEALEDDSESRSDVSSSALTASVKDEISGELAA SSGVSTPGSAGHDIITEQPRSQHTLQADSVDLASCDLTSSATDGDEEDILSHSSSQVSAV PSDPAMDLNDGTQASSPISDSSQTTTEGPDSAVTPSDSSEIVLDGTDNQYLGLQIGQPQD EDEEATGILPDEASEAFRNSSMALQQAHLLKNMSHCRQPSDSSVDKFVLRDEATEPGDQE NKPCRIKGDIGQSTDDDSAPLVHCVRLLSASFLLTGGKNVLVPDRDVRVSVKALALSCVG AAVALHPESFFSKLYKVPLDTTEYPEEQYVSDILNYIDHGDPQVRGATAILCGTLICSIL

19 Detection CBRs Sometimes straightforward. N-terminal human Huntingtin. How many CBRs can you find? >sp|P42858|HD_HUMAN Huntingtin OS=Homo sapiens MATLEKLMKAFESLKSFQQQQQQQQQQQQQQQQQQQQQPPPPPPPPPPPQLPQPPPQAQP LLPQPQPPPPPPPPPPGPAVAEEPLHRPKKELSATKKDRVNHCLTICENIVAQSVRNSPE FQKLLGIAMELFLLCSDDAESDVRMVADECLNKVIKALMDSNLPRLQLELYKEIKKNGAP RSLRAALWRFAELAHLVRPQKCRPYLVNLLPCLTRTSKRPEESVQETLAAAVPKIMASFG NFANDNEIKVLLKAFIANLKSSSPTIRRTAAGSAVSICQHSRRTQYFYSWLLNVLLGLLV PVEDEHSTLLILGVLLTLRYLVPLLQQQVKDTSLKGSFGVTRKEMEVSPSAEQLVQVYEL TLHHTQHQDHNVVTGALELLQQLFRTPPPELLQTLTAVGGIGQLTAAKEESGGRSRSGSI VELIAGGGSSCSPVLSRKQKGKVLLGEEEALEDDSESRSDVSSSALTASVKDEISGELAA SSGVSTPGSAGHDIITEQPRSQHTLQADSVDLASCDLTSSATDGDEEDILSHSSSQVSAV PSDPAMDLNDGTQASSPISDSSQTTTEGPDSAVTPSDSSEIVLDGTDNQYLGLQIGQPQD EDEEATGILPDEASEAFRNSSMALQQAHLLKNMSHCRQPSDSSVDKFVLRDEATEPGDQE NKPCRIKGDIGQSTDDDSAPLVHCVRLLSASFLLTGGKNVLVPDRDVRVSVKALALSCVG AAVALHPESFFSKLYKVPLDTTEYPEEQYVSDILNYIDHGDPQVRGATAILCGTLICSIL

20 Detection CBRs Sometimes straightforward. N-terminal human Huntingtin. How many CBRs can you find? >sp|P42858|HD_HUMAN Huntingtin OS=Homo sapiens MATLEKLMKAFESLKSFQQQQQQQQQQQQQQQQQQQQQPPPPPPPPPPPQLPQPPPQAQP LLPQPQPPPPPPPPPPGPAVAEEPLHRPKKELSATKKDRVNHCLTICENIVAQSVRNSPE FQKLLGIAMELFLLCSDDAESDVRMVADECLNKVIKALMDSNLPRLQLELYKEIKKNGAP RSLRAALWRFAELAHLVRPQKCRPYLVNLLPCLTRTSKRPEESVQETLAAAVPKIMASFG NFANDNEIKVLLKAFIANLKSSSPTIRRTAAGSAVSICQHSRRTQYFYSWLLNVLLGLLV PVEDEHSTLLILGVLLTLRYLVPLLQQQVKDTSLKGSFGVTRKEMEVSPSAEQLVQVYEL TLHHTQHQDHNVVTGALELLQQLFRTPPPELLQTLTAVGGIGQLTAAKEESGGRSRSGSI VELIAGGGSSCSPVLSRKQKGKVLLGEEEALEDDSESRSDVSSSALTASVKDEISGELAA SSGVSTPGSAGHDIITEQPRSQHTLQADSVDLASCDLTSSATDGDEEDILSHSSSQVSAV PSDPAMDLNDGTQASSPISDSSQTTTEGPDSAVTPSDSSEIVLDGTDNQYLGLQIGQPQD EDEEATGILPDEASEAFRNSSMALQQAHLLKNMSHCRQPSDSSVDKFVLRDEATEPGDQE NKPCRIKGDIGQSTDDDSAPLVHCVRLLSASFLLTGGKNVLVPDRDVRVSVKALALSCVG AAVALHPESFFSKLYKVPLDTTEYPEEQYVSDILNYIDHGDPQVRGATAILCGTLICSIL

21 Detection repeats Sometimes straightforward. N-terminal human Huntingtin. How many repeats can you find? >sp|P42858|HD_HUMAN Huntingtin OS=Homo sapiens MATLEKLMKAFESLKSFQQQQQQQQQQQQQQQQQQQQQPPPPPPPPPPPQLPQPPPQAQP LLPQPQPPPPPPPPPPGPAVAEEPLHRPKKELSATKKDRVNHCLTICENIVAQSVRNSPE FQKLLGIAMELFLLCSDDAESDVRMVADECLNKVIKALMDSNLPRLQLELYKEIKKNGAP RSLRAALWRFAELAHLVRPQKCRPYLVNLLPCLTRTSKRPEESVQETLAAAVPKIMASFG NFANDNEIKVLLKAFIANLKSSSPTIRRTAAGSAVSICQHSRRTQYFYSWLLNVLLGLLV PVEDEHSTLLILGVLLTLRYLVPLLQQQVKDTSLKGSFGVTRKEMEVSPSAEQLVQVYEL TLHHTQHQDHNVVTGALELLQQLFRTPPPELLQTLTAVGGIGQLTAAKEESGGRSRSGSI VELIAGGGSSCSPVLSRKQKGKVLLGEEEALEDDSESRSDVSSSALTASVKDEISGELAA SSGVSTPGSAGHDIITEQPRSQHTLQADSVDLASCDLTSSATDGDEEDILSHSSSQVSAV PSDPAMDLNDGTQASSPISDSSQTTTEGPDSAVTPSDSSEIVLDGTDNQYLGLQIGQPQD EDEEATGILPDEASEAFRNSSMALQQAHLLKNMSHCRQPSDSSVDKFVLRDEATEPGDQE NKPCRIKGDIGQSTDDDSAPLVHCVRLLSASFLLTGGKNVLVPDRDVRVSVKALALSCVG AAVALHPESFFSKLYKVPLDTTEYPEEQYVSDILNYIDHGDPQVRGATAILCGTLICSIL

22 Detection repeats Often NOT straightforward. N-terminal human Huntingtin. How many repeats can you find? >sp|P42858|HD_HUMAN Huntingtin OS=Homo sapiens MATLEKLMKAFESLKSFQQQQQQQQQQQQQQQQQQQQQPPPPPPPPPPPQLPQPPPQAQP LLPQPQPPPPPPPPPPGPAVAEEPLHRPKKELSATKKDRVNHCLTICENIVAQSVRNSPE FQKLLGIAMELFLLCSDDAESDVRMVADECLNKVIKALMDSNLPRLQLELYKEIKKNGAP RSLRAALWRFAELAHLVRPQKCRPYLVNLLPCLTRTSKRPEESVQETLAAAVPKIMASFG NFANDNEIKVLLKAFIANLKSSSPTIRRTAAGSAVSICQHSRRTQYFYSWLLNVLLGLLV PVEDEHSTLLILGVLLTLRYLVPLLQQQVKDTSLKGSFGVTRKEMEVSPSAEQLVQVYEL TLHHTQHQDHNVVTGALELLQQLFRTPPPELLQTLTAVGGIGQLTAAKEESGGRSRSGSI VELIAGGGSSCSPVLSRKQKGKVLLGEEEALEDDSESRSDVSSSALTASVKDEISGELAA SSGVSTPGSAGHDIITEQPRSQHTLQADSVDLASCDLTSSATDGDEEDILSHSSSQVSAV PSDPAMDLNDGTQASSPISDSSQTTTEGPDSAVTPSDSSEIVLDGTDNQYLGLQIGQPQD EDEEATGILPDEASEAFRNSSMALQQAHLLKNMSHCRQPSDSSVDKFVLRDEATEPGDQE NKPCRIKGDIGQSTDDDSAPLVHCVRLLSASFLLTGGKNVLVPDRDVRVSVKALALSCVG AAVALHPESFFSKLYKVPLDTTEYPEEQYVSDILNYIDHGDPQVRGATAILCGTLICSIL

23 Detection repeats Often NOT straightforward. N-terminal human Huntingtin. How many repeats can you find? EFQKLLGIAMELFLLCSDDAESDVRMVADECLNKVIKA CRPYLVNLLPCLTRTSKRP-EESVQETLAAAVPKIMAS NDNEIKVLLKAFIANLKSSSPTIRRTAAGSAVSICQHS TQYFYSWLLNVLLGLLVPVEDEHSTLLILGVLLTLRYL PSAEQLVQVYELTLHHTQHQDHNVVTGALELLQQLFRT

24 Detection repeats Often NOT straightforward. N-terminal human Huntingtin. How many repeats can you find? EFQKLLGIAMELFLLCSDDAESDVRMVADECLNKVIKA CRPYLVNLLPCLTRTSKRP-EESVQETLAAAVPKIMAS NDNEIKVLLKAFIANLKSSSPTIRRTAAGSAVSICQHS TQYFYSWLLNVLLGLLVPVEDEHSTLLILGVLLTLRYL PSAEQLVQVYELTLHHTQHQDHNVVTGALELLQQLFRT

25 The FREAKS Session 3.1: Repeats Session 3.2: Biased regions Miguel Andrade Johannes-Gutenberg University of Mainz Andrade@uni-mainz.de of PROTEIN SEQUENCE

26 Frequency repeats Fraction of proteins annotated with the keyword REPEAT in SwissProt % Archaea 27/3428 0.79 Viruses 81/8048 1.00 Bacteria 299/28438 1.05 Fungi 232/8334 2.78 Viridiplantae 153/6963 2.20 Metazoa 1538/28948 5.31 Rest of Eukaryota 92/2434 3.78 (Andrade et al 2001)

27 Detection of repeats Dotplots Comparing a sequence against itself

28 Detection of repeats Dotplots TLRSSVSSPANINNS NMTSSVCSPANISV

29 Detection of repeats Dotplots TLRSSVSSPANINNS NMTSSVCSPANISV | 1 match

30 Detection of repeats Dotplots TLRSSVSSPANINNS NMTSSVCSPANISV ||| ||||| 8 matches

31 Detection of repeats Dotplots TLRSSVSSPANINNS NMTSSVCSPANISV | | 2 matches

32 Detection of repeats Dotplots TLRSSVSSPANINNS NMTSSVCSPANISV | 1 match

33 Detection of repeats Dotplots TLRSSVSSPANINNS NMTSSVCSPANISV 8

34 Detection of repeats Dotplots TLRSSVSSPANINNS NMTSSVCSPANISV 1821

35 Exercise 1

36 Obtain the sequence from UniProtsequence from UniProt Go to the Dotlet web pageDotlet Click on the input button and paste the sequence there Try to find combinations of parameters that show patterns in the dot plot Find repetitions clicking in the diagonal patterns Exercise 1. Using Dotlet with the human mineralocorticoid receptor (MR)

37 Detection of repeats Using a multiple sequence alignment helps Conserved repeated patterns JalView with Regular Expression searches

38 Detection of repeats Using a multiple sequence alignment helps Conserved repeated patterns JalView with Regular Expression searches

39 Detection of repeats Using a multiple sequence alignment helps Conserved repeated patterns JalView with Regular Expression searches

40 Detection of repeats Using a multiple sequence alignment helps Conserved repeated patterns JalView with Regular Expression searches Regular Expressions: [LS]P.A matches L or S, followed by P, followed by anything, followed by A

41 Detection of repeats Using a multiple sequence alignment helps Conserved repeated patterns JalView with Regular Expression searches Regular Expressions: [LS]P.A matches L or S, followed by P, followed by anything, followed by A Which one is not matched? LPTA, SPAA, LPPA, LPAP, SPLA

42 Detection of repeats Using a multiple sequence alignment helps Conserved repeated patterns JalView with Regular Expression searches Regular Expressions: [LS]P.A matches L or S, followed by P, followed by anything, followed by A Which one is not matched? LPTA, SPAA, LPPA, LPAP, SPLA

43 Run JalView using the JNLP file in desktop (from http://www.jalview.org/Download) http://www.jalview.org/Download Load this MSA in JalViewthis MSA Use the "find" option with a regular expression and mark all matches Try to find the expression that matches more repeats. How many repeats do you see? How long are they? Would you correct the alignment based on these findings? Exercise 2. Using JalView with a MSA of the MR with orthologs

44 #T1 #T13#T12#T11#T10#T9#T8 #T7 #T2#T3#T4#T5#T6 #F1 #F2#F3 #F4 #F5#F10#F9#F8#F7#F6 #T14 #T15 #F11 *** **** (Vlassi et al, 2013)

45 Andrade and Bork (1995) Nature Genetics

46 A subunit PP2A structure PDB:1b3u Groves et al. (1999) Cell

47

48 Ap1 Clathrin Adaptor Core PDB:1w63 Heldwein et al. (2004) PNAS

49 Ap1 Clathrin Adaptor Core PDB:1w63 Heldwein et al. (2004) PNAS

50 Neural Network! Secondary structure Transmembrane helices Residue exposure Andrade, Petosa, O'Donoghue, Müller and Bork (2001) J Mol Biol

51 Neural Network Backpropagation neural network …GTAARTWCGASLFVPRLLAGHVDITSLVRALAKSGDLFVARSTKT… Central position Output neuron = 0.1 - 0.9 Hidden layer n=3 Input layer n=39x20 Architecture Palidwor et al. (2009) PLoS Comp Biol

52 Neural Network Architecture H1H2 L H1H2L Fournier et al. (2013) PLoS One

53 Virus/phages 1379599 16 1.16E-05 Archaea 362208 296 8.17E-04 Euryarchaeota 225118 247 1.10E-03 Crenarchaeota 100611 32 3.18E-04 Bacteria 14505441 3939 2.72E-04 Acidobacteria 40456 27 6.67E-04 Actinobacteria 1634898 462 2.83E-04 Bacteroidetes 724491 135 1.86E-04 Chlamydiae 103375 155 1.50E-03 Chloroflexi 57584 74 1.29E-03 Cyanobacteria 279184 549 1.97E-03 Firmicutes 3837822 627 1.63E-04 Planctomycetes 56777 129 2.27E-03 Proteobacteria 7220418 1599 2.21E-04 Spirochetes 135404 100 7.39E-04 Eukaryota 5710673 14659 2.57E-03 Apicomplexa 129180 237 1.83E-03 Ciliophora 66444 128 1.93E-03 Bacillariophyta 24672 74 3.00E-03 Viridiplantae 1577040 4086 2.59E-03 Chlorophyta 82919 321 3.87E-03 Streptophyta 930229 1463 1.57E-03 Kinetoplastida 119358 385 3.23E-03 Mycetozoa 34276 103 3.01E-03 Phaeophyceae 21376 63 2.95E-03 Fungi 1256144 4228 3.37E-03 Metazoa 2796864 6873 2.46E-03 Nematodes 223421 365 1.63E-03 Trematodes 39558 97 2.45E-03 Insecta 694809 1173 1.69E-03 Sarcopterygii 1145548 3909 3.41E-03 0 4E-03 Fournier et al. (2013) PLoS One

54 http://cbdm.mdc-berlin.de/~ard2/

55 Go to the ARD2 web pageARD2 Paste this sequence (1-780 fragment of human huntingtin) in the input windowthis sequence Run ARD2 and interpret the output Exercise 3. Detecting repeats in human huntingtin

56 Go to the PDBPaint web pagePDBPaint In the "Query PDB" window type "2IE3", in the "web service" menu choose the "ARD2" option and select a large window size (e.g. 800*800) Hit the "Go!" button Turn around the structure and examine the correspondence between the hits and the structure Exercise 4. Viewing detected repeats in a protein structure

57


Download ppt "The FREAKS Session 3.1: Repeats Session 3.2: Biased regions Miguel Andrade Johannes-Gutenberg University of Mainz of PROTEIN SEQUENCE."

Similar presentations


Ads by Google