The FREAKS Session 3.1: Repeats Session 3.2: Biased regions Miguel Andrade Johannes-Gutenberg University of Mainz of PROTEIN SEQUENCE.

Slides:



Advertisements
Similar presentations
©CMBI 2005 Exploring Protein Sequences - Part 2 Part 1: Patterns and Motifs Profiles Hydropathy Plots Transmembrane helices Antigenic Prediction Signal.
Advertisements

Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
PROTEIN SECONDARY STRUCTURE PREDICTION WITH NEURAL NETWORKS.
Homology 3D modeling and effect of mutations. X-ray crystallography (70,714 in PDB) need crystals Nuclear Magnetic Resonance (NMR) (9,312) proteins in.
Secondary structure prediction. Amino acid sequence -> Secondary structure Alpha helix Beta strand Disordered/coil 70% accuracy 1991, 81% accuracy in.
Chapter 9 Structure Prediction. Motivation Given a protein, can you predict molecular structure Want to avoid repeated x-ray crystallography, but want.
Expect value Expect value (E-value) Expected number of hits, of equivalent or better score, found by random chance in a database of the size.
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
Protein Modules An Introduction to Bioinformatics.
Pattern databases in protein analysis Arthur Gruber Instituto de Ciências Biomédicas Universidade de São Paulo AG-ICB-USP.
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
PREDICTION OF PROTEIN FEATURES Beyond protein structure (TM, signal/target peptides, coiled coils, conservation…)
Protein and Function Databases
Introduction to Bioinformatics - Tutorial no. 8 Protein Prediction: - PROSITE - Pfam - SCOP - TOPITS - genThreader.
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology.
Assessment of sequence alignment Lecture Introduction The Dot plot Matrix visualisation matching tool: – Basics of Dot plot – Examples of Dot plot.
Pattern databasesPattern databasesPattern databasesPattern databases Gopalan Vivek.
© Wiley Publishing All Rights Reserved.
Wellcome Trust Workshop Working with Pathogen Genomes Module 3 Sequence and Protein Analysis (Using web-based tools)
Assessment of sequence alignment Lecture Introduction The Dot plot Matrix visualisation matching tool: – Basics of Dot plot – Examples of Dot plot.
Protein domains. Protein domains are structural units (average 160 aa) that share: Function Folding Evolution Proteins normally are multidomain (average.
Practical session 2b Introduction to 3D Modelling and threading 9:30am-10:00am 3D modeling and threading 10:00am-10:30am Analysis of mutations in MYH6.
The FREAKS Session 3.1: Repeats Session 3.2: Biased regions Miguel Andrade Johannes-Gutenberg University of Mainz of PROTEIN SEQUENCE.
Tweaking BLAST Although you normally see BLAST as a web page with boxes to place data in and tick boxes, etc., it is actually a command line program that.
Analysis of a Neural Language Model Eric Doi CS 152: Neural Networks Harvey Mudd College.
1 PyMOL Evolutionary Trace Viewer 1.1 Lichtarge Lab Sept. 13, 2010.
Repeats and composition bias. Repeats Frequency 14% proteins contains repeats (Marcotte et al, 1999) 1: Single amino acid repeats. 2: Longer imperfect.
Protein Secondary Structure Prediction Based on Position-specific Scoring Matrices Yan Liu Sep 29, 2003.
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.
Sequence-based Similarity Module (BLAST & CDD only ) & Horizontal Gene Transfer Module (Ortholog Neighborhood & GC content only)
Copyright OpenHelix. No use or reproduction without express written consent1.
Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department.
Motif discovery and Protein Databases Tutorial 5.
Copyright OpenHelix. No use or reproduction without express written consent1.
PROTEIN PATTERN DATABASES. PROTEIN SEQUENCES SUPERFAMILY FAMILY DOMAIN MOTIF SITE RESIDUE.
Repeats and composition bias. Repeats Frequency 14% proteins contains repeats (Marcotte et al, 1999) 1: Single amino acid repeats. 2: Longer imperfect.
Tweaking BLAST Although you normally see BLAST as a web page with boxes to place data in and tick boxes, etc., it is actually a command line program that.
Bioinformatics in Vaccine Design
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
Homology 3D modeling and effect of mutations. X-ray crystallography (70,714 in PDB) need crystals Nuclear Magnetic Resonance (NMR) (9,312) proteins in.
Protein motif /domain Structural unit Functional unit Signature of protein family How are they defined?
Graphical representation of an amino acid or nucleic acid multiple sequence alignment developed by Tom Schneider and Mike.
Improved Protein Secondary Structure Prediction. Secondary Structure Prediction Given a protein sequence a 1 a 2 …a N, secondary structure prediction.
Homology 3D modeling and effect of mutations Miguel Andrade Faculty of Biology, Johannes Gutenberg University Institute of Molecular Biology Mainz, Germany.
Protein domains Miguel Andrade Mainz, Germany Faculty of Biology,
Repeats and composition bias
Homology 3D modeling Miguel Andrade Mainz, Germany Faculty of Biology,
Prediction of protein features. Beyond protein structure
Secondary structure prediction
polyQ and other homorepeats
Protein domains Miguel Andrade Mainz, Germany Faculty of Biology,
a) SW PO TFS AHEC HAFS PSMS BSM RSM POCR Relative abundance
Repeats and composition bias
Secondary structure prediction
Protein 3D representation
Dot Plots, Path Matrices, Score Matrices
Protein domains Miguel Andrade Mainz, Germany Faculty of Biology,
Homology 3D modeling and effect of mutations
There are four levels of structure in proteins
Adva Yeheskel Bioinformatics Unit, Tel Aviv University 8/5/2018
Varodom Charoensawan, Derek Wilson, Sarah A. Teichmann 
Comparative Genomics.
Protein domains Miguel Andrade Mainz, Germany Faculty of Biology,
Repeats and composition bias
Examining repeats with databases
S.N.U. EECS Jeong-Jin Lee Eui-Taik Na
Neural Networks for Protein Structure Prediction Dr. B Bhunia.
Presentation transcript:

The FREAKS Session 3.1: Repeats Session 3.2: Biased regions Miguel Andrade Johannes-Gutenberg University of Mainz of PROTEIN SEQUENCE

Frequency 14% proteins contains repeats (Marcotte et al, 1999) 1: Single amino acid repeats. 2: Longer imperfect tandem repeats. Assemble in structure.

Definition repeats Sequence, long, imperfect, tandem MRAVVKSPIMCHEKSPSVCSPLNMTSSVCSPAGINSVSSTTASF GSFPVHSPITQGTPLTCSPNVENRGSRSHSPAHASNVGSPLSSP LSSMKSSISSPPSHCSVKSPVSSPNNVTLRSSVSSPANINN

Definition repeats Sequence, long, imperfect, tandem MRAVVKSPIMCHEKSPSVCSPLNMTSSVCSPAGINSVSSTTASF GSFPVHSPITQGTPLTCSPNVENRGSRSHSPAHASNVGSPLSSP LSSMKSSISSPPSHCSVKSPVSSPNNVTLRSSVSSPANINN

Definition repeats Sequence, long, imperfect, tandem MRAVVKSPIM CHE KSPSVCSPLN MTSSVCSPAG INSVSSTTASF GSFPVHSPIT Q GTPLTCSPNV EN RGSRSHSPAH ASN VGSPLSSPLS S MKSSISSPPS HCS VKSPVSSPNN VT LRSSVSSPAN INN

Definition repeats Sequence, long, imperfect, tandem MRAVVKSPIM CHE KSPSVCSPLN MTSSVCSPAG INSVSSTTASF GSFPVHSPIT Q GTPLTCSPNV EN RGSRSHSPAH ASN VGSPLSSPLS S MKSSISSPPS HCS VKSPVSSPNN VT LRSSVSSPAN INN

Tandem repeats fold together

Definition repeats Sequence, long, imperfect, tandem MRAVVKSPIM CHE KSPSVCSPLN MTSSVCSPAG INSVSSTTASF GSFPVHSPIT Q GTPLTCSPNV EN RGSRSHSPAH ASN VGSPLSSPLS S MKSSISSPPS HCS VKSPVSSPNN VT LRSSVSSPAN INN

(Vlassi et al, 2013)

Andrade et al. (2001) J Struct Biol

Definition CBRs Perfect repeat: QQQQQQQQQQQ Imperfect: QQQQPQQQQQQ Amino acid type: DDDDDEEEDEDEED Compositionally biased regions (CBRs) High frequency of one or two amino acids in a region. Particular case of low complexity region

Detection CBRs Sometimes straightforward. N-terminal human Huntingtin. How many CBRs can you find? >sp|P42858|HD_HUMAN Huntingtin OS=Homo sapiens MATLEKLMKAFESLKSFQQQQQQQQQQQQQQQQQQQQQPPPPPPPPPPPQLPQPPPQAQP LLPQPQPPPPPPPPPPGPAVAEEPLHRPKKELSATKKDRVNHCLTICENIVAQSVRNSPE FQKLLGIAMELFLLCSDDAESDVRMVADECLNKVIKALMDSNLPRLQLELYKEIKKNGAP RSLRAALWRFAELAHLVRPQKCRPYLVNLLPCLTRTSKRPEESVQETLAAAVPKIMASFG NFANDNEIKVLLKAFIANLKSSSPTIRRTAAGSAVSICQHSRRTQYFYSWLLNVLLGLLV PVEDEHSTLLILGVLLTLRYLVPLLQQQVKDTSLKGSFGVTRKEMEVSPSAEQLVQVYEL TLHHTQHQDHNVVTGALELLQQLFRTPPPELLQTLTAVGGIGQLTAAKEESGGRSRSGSI VELIAGGGSSCSPVLSRKQKGKVLLGEEEALEDDSESRSDVSSSALTASVKDEISGELAA SSGVSTPGSAGHDIITEQPRSQHTLQADSVDLASCDLTSSATDGDEEDILSHSSSQVSAV PSDPAMDLNDGTQASSPISDSSQTTTEGPDSAVTPSDSSEIVLDGTDNQYLGLQIGQPQD EDEEATGILPDEASEAFRNSSMALQQAHLLKNMSHCRQPSDSSVDKFVLRDEATEPGDQE NKPCRIKGDIGQSTDDDSAPLVHCVRLLSASFLLTGGKNVLVPDRDVRVSVKALALSCVG AAVALHPESFFSKLYKVPLDTTEYPEEQYVSDILNYIDHGDPQVRGATAILCGTLICSIL

Detection CBRs Sometimes straightforward. N-terminal human Huntingtin. How many CBRs can you find? >sp|P42858|HD_HUMAN Huntingtin OS=Homo sapiens MATLEKLMKAFESLKSFQQQQQQQQQQQQQQQQQQQQQPPPPPPPPPPPQLPQPPPQAQP LLPQPQPPPPPPPPPPGPAVAEEPLHRPKKELSATKKDRVNHCLTICENIVAQSVRNSPE FQKLLGIAMELFLLCSDDAESDVRMVADECLNKVIKALMDSNLPRLQLELYKEIKKNGAP RSLRAALWRFAELAHLVRPQKCRPYLVNLLPCLTRTSKRPEESVQETLAAAVPKIMASFG NFANDNEIKVLLKAFIANLKSSSPTIRRTAAGSAVSICQHSRRTQYFYSWLLNVLLGLLV PVEDEHSTLLILGVLLTLRYLVPLLQQQVKDTSLKGSFGVTRKEMEVSPSAEQLVQVYEL TLHHTQHQDHNVVTGALELLQQLFRTPPPELLQTLTAVGGIGQLTAAKEESGGRSRSGSI VELIAGGGSSCSPVLSRKQKGKVLLGEEEALEDDSESRSDVSSSALTASVKDEISGELAA SSGVSTPGSAGHDIITEQPRSQHTLQADSVDLASCDLTSSATDGDEEDILSHSSSQVSAV PSDPAMDLNDGTQASSPISDSSQTTTEGPDSAVTPSDSSEIVLDGTDNQYLGLQIGQPQD EDEEATGILPDEASEAFRNSSMALQQAHLLKNMSHCRQPSDSSVDKFVLRDEATEPGDQE NKPCRIKGDIGQSTDDDSAPLVHCVRLLSASFLLTGGKNVLVPDRDVRVSVKALALSCVG AAVALHPESFFSKLYKVPLDTTEYPEEQYVSDILNYIDHGDPQVRGATAILCGTLICSIL

Detection CBRs Sometimes straightforward. N-terminal human Huntingtin. How many CBRs can you find? >sp|P42858|HD_HUMAN Huntingtin OS=Homo sapiens MATLEKLMKAFESLKSFQQQQQQQQQQQQQQQQQQQQQPPPPPPPPPPPQLPQPPPQAQP LLPQPQPPPPPPPPPPGPAVAEEPLHRPKKELSATKKDRVNHCLTICENIVAQSVRNSPE FQKLLGIAMELFLLCSDDAESDVRMVADECLNKVIKALMDSNLPRLQLELYKEIKKNGAP RSLRAALWRFAELAHLVRPQKCRPYLVNLLPCLTRTSKRPEESVQETLAAAVPKIMASFG NFANDNEIKVLLKAFIANLKSSSPTIRRTAAGSAVSICQHSRRTQYFYSWLLNVLLGLLV PVEDEHSTLLILGVLLTLRYLVPLLQQQVKDTSLKGSFGVTRKEMEVSPSAEQLVQVYEL TLHHTQHQDHNVVTGALELLQQLFRTPPPELLQTLTAVGGIGQLTAAKEESGGRSRSGSI VELIAGGGSSCSPVLSRKQKGKVLLGEEEALEDDSESRSDVSSSALTASVKDEISGELAA SSGVSTPGSAGHDIITEQPRSQHTLQADSVDLASCDLTSSATDGDEEDILSHSSSQVSAV PSDPAMDLNDGTQASSPISDSSQTTTEGPDSAVTPSDSSEIVLDGTDNQYLGLQIGQPQD EDEEATGILPDEASEAFRNSSMALQQAHLLKNMSHCRQPSDSSVDKFVLRDEATEPGDQE NKPCRIKGDIGQSTDDDSAPLVHCVRLLSASFLLTGGKNVLVPDRDVRVSVKALALSCVG AAVALHPESFFSKLYKVPLDTTEYPEEQYVSDILNYIDHGDPQVRGATAILCGTLICSIL

Detection CBRs Sometimes straightforward. N-terminal human Huntingtin. How many CBRs can you find? >sp|P42858|HD_HUMAN Huntingtin OS=Homo sapiens MATLEKLMKAFESLKSFQQQQQQQQQQQQQQQQQQQQQPPPPPPPPPPPQLPQPPPQAQP LLPQPQPPPPPPPPPPGPAVAEEPLHRPKKELSATKKDRVNHCLTICENIVAQSVRNSPE FQKLLGIAMELFLLCSDDAESDVRMVADECLNKVIKALMDSNLPRLQLELYKEIKKNGAP RSLRAALWRFAELAHLVRPQKCRPYLVNLLPCLTRTSKRPEESVQETLAAAVPKIMASFG NFANDNEIKVLLKAFIANLKSSSPTIRRTAAGSAVSICQHSRRTQYFYSWLLNVLLGLLV PVEDEHSTLLILGVLLTLRYLVPLLQQQVKDTSLKGSFGVTRKEMEVSPSAEQLVQVYEL TLHHTQHQDHNVVTGALELLQQLFRTPPPELLQTLTAVGGIGQLTAAKEESGGRSRSGSI VELIAGGGSSCSPVLSRKQKGKVLLGEEEALEDDSESRSDVSSSALTASVKDEISGELAA SSGVSTPGSAGHDIITEQPRSQHTLQADSVDLASCDLTSSATDGDEEDILSHSSSQVSAV PSDPAMDLNDGTQASSPISDSSQTTTEGPDSAVTPSDSSEIVLDGTDNQYLGLQIGQPQD EDEEATGILPDEASEAFRNSSMALQQAHLLKNMSHCRQPSDSSVDKFVLRDEATEPGDQE NKPCRIKGDIGQSTDDDSAPLVHCVRLLSASFLLTGGKNVLVPDRDVRVSVKALALSCVG AAVALHPESFFSKLYKVPLDTTEYPEEQYVSDILNYIDHGDPQVRGATAILCGTLICSIL

Detection repeats Sometimes straightforward. N-terminal human Huntingtin. How many repeats can you find? >sp|P42858|HD_HUMAN Huntingtin OS=Homo sapiens MATLEKLMKAFESLKSFQQQQQQQQQQQQQQQQQQQQQPPPPPPPPPPPQLPQPPPQAQP LLPQPQPPPPPPPPPPGPAVAEEPLHRPKKELSATKKDRVNHCLTICENIVAQSVRNSPE FQKLLGIAMELFLLCSDDAESDVRMVADECLNKVIKALMDSNLPRLQLELYKEIKKNGAP RSLRAALWRFAELAHLVRPQKCRPYLVNLLPCLTRTSKRPEESVQETLAAAVPKIMASFG NFANDNEIKVLLKAFIANLKSSSPTIRRTAAGSAVSICQHSRRTQYFYSWLLNVLLGLLV PVEDEHSTLLILGVLLTLRYLVPLLQQQVKDTSLKGSFGVTRKEMEVSPSAEQLVQVYEL TLHHTQHQDHNVVTGALELLQQLFRTPPPELLQTLTAVGGIGQLTAAKEESGGRSRSGSI VELIAGGGSSCSPVLSRKQKGKVLLGEEEALEDDSESRSDVSSSALTASVKDEISGELAA SSGVSTPGSAGHDIITEQPRSQHTLQADSVDLASCDLTSSATDGDEEDILSHSSSQVSAV PSDPAMDLNDGTQASSPISDSSQTTTEGPDSAVTPSDSSEIVLDGTDNQYLGLQIGQPQD EDEEATGILPDEASEAFRNSSMALQQAHLLKNMSHCRQPSDSSVDKFVLRDEATEPGDQE NKPCRIKGDIGQSTDDDSAPLVHCVRLLSASFLLTGGKNVLVPDRDVRVSVKALALSCVG AAVALHPESFFSKLYKVPLDTTEYPEEQYVSDILNYIDHGDPQVRGATAILCGTLICSIL

Detection repeats Often NOT straightforward. N-terminal human Huntingtin. How many repeats can you find? >sp|P42858|HD_HUMAN Huntingtin OS=Homo sapiens MATLEKLMKAFESLKSFQQQQQQQQQQQQQQQQQQQQQPPPPPPPPPPPQLPQPPPQAQP LLPQPQPPPPPPPPPPGPAVAEEPLHRPKKELSATKKDRVNHCLTICENIVAQSVRNSPE FQKLLGIAMELFLLCSDDAESDVRMVADECLNKVIKALMDSNLPRLQLELYKEIKKNGAP RSLRAALWRFAELAHLVRPQKCRPYLVNLLPCLTRTSKRPEESVQETLAAAVPKIMASFG NFANDNEIKVLLKAFIANLKSSSPTIRRTAAGSAVSICQHSRRTQYFYSWLLNVLLGLLV PVEDEHSTLLILGVLLTLRYLVPLLQQQVKDTSLKGSFGVTRKEMEVSPSAEQLVQVYEL TLHHTQHQDHNVVTGALELLQQLFRTPPPELLQTLTAVGGIGQLTAAKEESGGRSRSGSI VELIAGGGSSCSPVLSRKQKGKVLLGEEEALEDDSESRSDVSSSALTASVKDEISGELAA SSGVSTPGSAGHDIITEQPRSQHTLQADSVDLASCDLTSSATDGDEEDILSHSSSQVSAV PSDPAMDLNDGTQASSPISDSSQTTTEGPDSAVTPSDSSEIVLDGTDNQYLGLQIGQPQD EDEEATGILPDEASEAFRNSSMALQQAHLLKNMSHCRQPSDSSVDKFVLRDEATEPGDQE NKPCRIKGDIGQSTDDDSAPLVHCVRLLSASFLLTGGKNVLVPDRDVRVSVKALALSCVG AAVALHPESFFSKLYKVPLDTTEYPEEQYVSDILNYIDHGDPQVRGATAILCGTLICSIL

Detection repeats Often NOT straightforward. N-terminal human Huntingtin. How many repeats can you find? EFQKLLGIAMELFLLCSDDAESDVRMVADECLNKVIKA CRPYLVNLLPCLTRTSKRP-EESVQETLAAAVPKIMAS NDNEIKVLLKAFIANLKSSSPTIRRTAAGSAVSICQHS TQYFYSWLLNVLLGLLVPVEDEHSTLLILGVLLTLRYL PSAEQLVQVYELTLHHTQHQDHNVVTGALELLQQLFRT

Detection repeats Often NOT straightforward. N-terminal human Huntingtin. How many repeats can you find? EFQKLLGIAMELFLLCSDDAESDVRMVADECLNKVIKA CRPYLVNLLPCLTRTSKRP-EESVQETLAAAVPKIMAS NDNEIKVLLKAFIANLKSSSPTIRRTAAGSAVSICQHS TQYFYSWLLNVLLGLLVPVEDEHSTLLILGVLLTLRYL PSAEQLVQVYELTLHHTQHQDHNVVTGALELLQQLFRT

The FREAKS Session 3.1: Repeats Session 3.2: Biased regions Miguel Andrade Johannes-Gutenberg University of Mainz of PROTEIN SEQUENCE

Frequency repeats Fraction of proteins annotated with the keyword REPEAT in SwissProt % Archaea 27/ Viruses 81/ Bacteria 299/ Fungi 232/ Viridiplantae 153/ Metazoa 1538/ Rest of Eukaryota 92/ (Andrade et al 2001)

Detection of repeats Dotplots Comparing a sequence against itself

Detection of repeats Dotplots TLRSSVSSPANINNS NMTSSVCSPANISV

Detection of repeats Dotplots TLRSSVSSPANINNS NMTSSVCSPANISV | 1 match

Detection of repeats Dotplots TLRSSVSSPANINNS NMTSSVCSPANISV ||| ||||| 8 matches

Detection of repeats Dotplots TLRSSVSSPANINNS NMTSSVCSPANISV | | 2 matches

Detection of repeats Dotplots TLRSSVSSPANINNS NMTSSVCSPANISV | 1 match

Detection of repeats Dotplots TLRSSVSSPANINNS NMTSSVCSPANISV 8

Detection of repeats Dotplots TLRSSVSSPANINNS NMTSSVCSPANISV 1821

Exercise 1

Obtain the sequence from UniProtsequence from UniProt Go to the Dotlet web pageDotlet Click on the input button and paste the sequence there Try to find combinations of parameters that show patterns in the dot plot Find repetitions clicking in the diagonal patterns Exercise 1. Using Dotlet with the human mineralocorticoid receptor (MR)

Detection of repeats Using a multiple sequence alignment helps Conserved repeated patterns JalView with Regular Expression searches

Detection of repeats Using a multiple sequence alignment helps Conserved repeated patterns JalView with Regular Expression searches

Detection of repeats Using a multiple sequence alignment helps Conserved repeated patterns JalView with Regular Expression searches

Detection of repeats Using a multiple sequence alignment helps Conserved repeated patterns JalView with Regular Expression searches Regular Expressions: [LS]P.A matches L or S, followed by P, followed by anything, followed by A

Detection of repeats Using a multiple sequence alignment helps Conserved repeated patterns JalView with Regular Expression searches Regular Expressions: [LS]P.A matches L or S, followed by P, followed by anything, followed by A Which one is not matched? LPTA, SPAA, LPPA, LPAP, SPLA

Detection of repeats Using a multiple sequence alignment helps Conserved repeated patterns JalView with Regular Expression searches Regular Expressions: [LS]P.A matches L or S, followed by P, followed by anything, followed by A Which one is not matched? LPTA, SPAA, LPPA, LPAP, SPLA

Run JalView using the JNLP file in desktop (from Load this MSA in JalViewthis MSA Use the "find" option with a regular expression and mark all matches Try to find the expression that matches more repeats. How many repeats do you see? How long are they? Would you correct the alignment based on these findings? Exercise 2. Using JalView with a MSA of the MR with orthologs

#T1 #T13#T12#T11#T10#T9#T8 #T7 #T2#T3#T4#T5#T6 #F1 #F2#F3 #F4 #F5#F10#F9#F8#F7#F6 #T14 #T15 #F11 *** **** (Vlassi et al, 2013)

Andrade and Bork (1995) Nature Genetics

A subunit PP2A structure PDB:1b3u Groves et al. (1999) Cell

Ap1 Clathrin Adaptor Core PDB:1w63 Heldwein et al. (2004) PNAS

Ap1 Clathrin Adaptor Core PDB:1w63 Heldwein et al. (2004) PNAS

Neural Network! Secondary structure Transmembrane helices Residue exposure Andrade, Petosa, O'Donoghue, Müller and Bork (2001) J Mol Biol

Neural Network Backpropagation neural network …GTAARTWCGASLFVPRLLAGHVDITSLVRALAKSGDLFVARSTKT… Central position Output neuron = Hidden layer n=3 Input layer n=39x20 Architecture Palidwor et al. (2009) PLoS Comp Biol

Neural Network Architecture H1H2 L H1H2L Fournier et al. (2013) PLoS One

Virus/phages E-05 Archaea E-04 Euryarchaeota E-03 Crenarchaeota E-04 Bacteria E-04 Acidobacteria E-04 Actinobacteria E-04 Bacteroidetes E-04 Chlamydiae E-03 Chloroflexi E-03 Cyanobacteria E-03 Firmicutes E-04 Planctomycetes E-03 Proteobacteria E-04 Spirochetes E-04 Eukaryota E-03 Apicomplexa E-03 Ciliophora E-03 Bacillariophyta E-03 Viridiplantae E-03 Chlorophyta E-03 Streptophyta E-03 Kinetoplastida E-03 Mycetozoa E-03 Phaeophyceae E-03 Fungi E-03 Metazoa E-03 Nematodes E-03 Trematodes E-03 Insecta E-03 Sarcopterygii E E-03 Fournier et al. (2013) PLoS One

Go to the ARD2 web pageARD2 Paste this sequence (1-780 fragment of human huntingtin) in the input windowthis sequence Run ARD2 and interpret the output Exercise 3. Detecting repeats in human huntingtin

Go to the PDBPaint web pagePDBPaint In the "Query PDB" window type "2IE3", in the "web service" menu choose the "ARD2" option and select a large window size (e.g. 800*800) Hit the "Go!" button Turn around the structure and examine the correspondence between the hits and the structure Exercise 4. Viewing detected repeats in a protein structure