Download presentation
Presentation is loading. Please wait.
1
Introduction to Bioinformatics - Tutorial no. 8 Protein Prediction: - PROSITE - Pfam - SCOP - TOPITS - genThreader
2
Prosite http://www.expasy.org/prosite/ ProSite is a database of protein domains that can be searched by either regular expression patterns or sequence profiles.
3
Protein Sequence Motifs Examples Ck2_Phospho_Site [ST]-x(2)-[DE] Actins_1 [FY]-[LIV]-G-[DE]-E-A-Q-x-[RKQ](2)-G Zinc_Finger_C2H2 C-x(2,4)-C-x(3)-[LIVMFYWC]-x(8)-H-x(3,5)-H x = any AA Acceptable amino acids for a given position x(2,4) = xx or xxx or xxxx Repetition of an element
4
Prosite Search Quick search by AC, ID or documentation text Scan PROSITE with your sequence
5
Search Results Your sequence Found domains with color code
6
Pfam http://www.sanger.ac.uk/Software/Pfam/index.shtml Pfam is a large collection of multiple sequence alignments and hidden Markov models covering many common protein domains and families. Pfam families can be broken down into four basic types: family – default classification, stating members are related domain – structural unit found in multiple protein contexts repeat – a domain that in itself is not stable, but when combined with multiple tandem repeats forms a domain or structure motif – shorter sequence units found outside of domains
7
For each family in Pfam you can: Look at multiple alignments View protein domain architectures Examine species distribution Follow links to other databases View known protein structures Pfam can be accessed directly or from the PDB description.directly
8
Searching Pfam Two Fundamental Ways of Searching Pfam By Domain Website By Sequence Website Download HMM libraries and Run Locally
9
Searching Pfam Global or Fragment search Search a sequence Search by UniProt ID Output format E-value Search !
10
Domains above the threshold Domains from Pfam-B Domains distribution on the sequence
11
Link to PDB
12
Browsing Pfam
13
SCOP Structural Classification of Proteins Based on known protein structures Manually created by visual inspection Hierarchical database structure Class, fold, superfamily, family Proteins/domains, species instances Founded in 1995 800 folds, 1295 superfamilies, 2327 families
14
SCOP: Navigation Node name Node description Path from root to node Children of node
15
TOPITS 20% of the proteins in SwissProt are remote homologues to a protein in PDB database, i.e. the structures are homologous but pairwise sequence identity is not significant. Threading techniques attempt to predict such remote homologues based on sequence information to thus increase the scope of homology modelling. Principle: Remote homologues (0-25% sequence identity) are detected by a prediction-based threading method. The principle idea is to detect similar motifs of secondary structure and accessibility between a sequence of unknown structure and a known fold.
16
TOPITS Strategy: Project 3D structures onto 1D strings of secondary structure and relative solvent accessibility. Predict secondary structure and solvent accessibility by neural network systems (PHD) for a query sequence. Alignment of the predicted and observed 1D strings is done by dynamic programming. The resulting alignment is used to detect remote 3D homologues.
17
TOPITS Accuracy - results should be taken with caution: The first hit of the prediction-based threading is on average in 30% of the cases correct. Hits with z-scores above 3.0 are more reliable (accuracy > 60%). For exceptional cases the resulting alignments suffice for building correct homology-based models.
18
TOPITS Output (1) Alignment score Alignment length Length of indels Number of indels Length of sequence Alignment significance Matched sequence % sequence identity
19
TOPITS Output (2) Query sequence Predicted structure Buried / Outside Database sequence Amino acid matches Database known secondary structure
20
GenTHREADER Output Prediction confidence Expected errors Score from neural network Sequence alignment score and length Energy measurements Length of sequence Structure from PDB
21
Example 1. Domain searching. >Mystic sequence MESEMLQSPLLGLGEEDEADLTDWNLPLAFMKKRHCEKIEGSKSLAQSWRMKDRMKTVSVALVLCLNVGVDP PDVVKTTPCARLECWIDPLSMGPQKALETIGANLQKQYENWQPRARYKQSLDPTVDEVKKLCTSLRRNAKEE RVLFHYNGHGVPRPTVNGEVWVFNKNYTQYIPLSIYDLQTWMGSPSIFVYDCSNAGLIVKSFKQFALQREQE LEVAAINPNHPLAQMPLPPSMKNCIQLAACEATELLPMIPDLPADLFTSCLTTPIKIALRWFCMQKCVSLVP GVTLDLIEKIPGRLNDRRTPLGELNWIFTAITDTIAWNVLPRDLFQKLFRQDLLVASLFRNFLLAERIMRSY NCTPVSSPRLPPTYMHAMWQAWDLAVDICLSQLPTIIEEGTAFRHSPFFAEQLTAFQVWLTMGVENRNPPEQ LPIVLQVLLSQVHRLRALDLLGRFLDLGPWAVSLALSVGIFPYVLKLLQSSARELRPLLVFIWAKILAVDSS CQADLVKDNGHKYFLSVLADPYMPAEHRTMTAFILAVIVNSYHTGQEACLQGNLIAICLEQLNDPHPLLRQW VAICLGRIWQNFDSARWCGVRDSAHEKLYSLLSDPIPEVRCAAVFALGTFVGNSAERTDHSTTIDHNVAMML AQLVSDGSPMVRKELVVALSHLVVQYESNFCTVALQFIEEEKNYALPSPATTEGGSLTPVRDSPCTPRLRSV SSYGNIRAVATARSLNKSLQNLSLTEESGGAVAFSPGNLSTSSSASSTLGSPENEEHILSFETIDKMRRASS YSSLNSLIGVSFNSVYTQIWRVLLHLAADPYPEVSDVAMKVLNSIAYKATVNARPQRVLDTSSLTQSAPASP TNKGVHIHQAGGSPPASSTSSSSLTNDVAKQPVSRDLPSGRPGTTGPAGAQYTPHSHQFPRTRKMFDKGPEQ TADDADDAAGHKSFISATVQTGFCDWSARYFAQPVMKIPEEHDLESQIRKEREWRFLRNSRVRRQAQQVIQK GITRLDDQIFLNRNPGVPSVVKFHPFTPCIAVADKDSICFWDWEKGEKLDYFHNGNPRYTRVTAMEYLNGQD CSLLLTATDDGAIRVWKNFADLEKNPEMVTAWQGLSDMLPTTRGAGMVVDWEQETGLLMSSGDVRIVRIWDT DREMKVQDIPTGADSCVTSLSCDSHRSLIVAGLGDGSIRVYDRRMALSECRVMTYREHTAWVVKASLQKRPD GHIVSVSVNGDVRIFDPRMPESVNVLQIVKGLTALDIHPQADLIACGSVNQFTAIYNSSGELINNIKYYDGF MGQRVGAISCLAFHPHWPHLAVGSNDYYISVYSVEKRVR
22
Example
23
One of the key players in cancer development is p53. The p53 protein acts as a checkpoint in the cell cycle, either preventing or initiating programmed cell death (apoptosis). Since cancer is the unchecked proliferation of cells, disruption of p53 activation allows diseased cells to multiply or to trigger a tumor's progression
24
Example
25
Example – TOPITS
27
Example – genThreader
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.