Introduction to Bioinformatics - Tutorial no. 8 Protein Prediction: - PROSITE - Pfam - SCOP - TOPITS - genThreader.

Introduction to Bioinformatics - Tutorial no. 8 Protein Prediction: - PROSITE - Pfam - SCOP - TOPITS - genThreader

Prosite http://www.expasy.org/prosite/ ProSite is a database of protein domains that can be searched by either regular expression patterns or sequence profiles.

Protein Sequence Motifs Examples  Ck2_Phospho_Site [ST]-x(2)-[DE]  Actins_1 [FY]-[LIV]-G-[DE]-E-A-Q-x-[RKQ](2)-G  Zinc_Finger_C2H2 C-x(2,4)-C-x(3)-[LIVMFYWC]-x(8)-H-x(3,5)-H x = any AA Acceptable amino acids for a given position x(2,4) = xx or xxx or xxxx Repetition of an element

Prosite Search Quick search by AC, ID or documentation text Scan PROSITE with your sequence

Search Results Your sequence Found domains with color code

Pfam http://www.sanger.ac.uk/Software/Pfam/index.shtml Pfam is a large collection of multiple sequence alignments and hidden Markov models covering many common protein domains and families. Pfam families can be broken down into four basic types:  family – default classification, stating members are related  domain – structural unit found in multiple protein contexts  repeat – a domain that in itself is not stable, but when combined with multiple tandem repeats forms a domain or structure  motif – shorter sequence units found outside of domains

For each family in Pfam you can:  Look at multiple alignments  View protein domain architectures  Examine species distribution  Follow links to other databases  View known protein structures Pfam can be accessed directly or from the PDB description.directly

Searching Pfam Two Fundamental Ways of Searching Pfam  By Domain Website  By Sequence Website Download HMM libraries and Run Locally

Searching Pfam Global or Fragment search Search a sequence Search by UniProt ID Output format E-value Search !

Domains above the threshold Domains from Pfam-B Domains distribution on the sequence

Link to PDB

Browsing Pfam

SCOP Structural Classification of Proteins  Based on known protein structures  Manually created by visual inspection Hierarchical database structure  Class, fold, superfamily, family  Proteins/domains, species instances Founded in 1995  800 folds, 1295 superfamilies, 2327 families

SCOP: Navigation Node name Node description Path from root to node Children of node

TOPITS  20% of the proteins in SwissProt are remote homologues to a protein in PDB database, i.e. the structures are homologous but pairwise sequence identity is not significant.  Threading techniques attempt to predict such remote homologues based on sequence information to thus increase the scope of homology modelling.  Principle: Remote homologues (0-25% sequence identity) are detected by a prediction-based threading method. The principle idea is to detect similar motifs of secondary structure and accessibility between a sequence of unknown structure and a known fold.

TOPITS  Strategy: Project 3D structures onto 1D strings of secondary structure and relative solvent accessibility. Predict secondary structure and solvent accessibility by neural network systems (PHD) for a query sequence. Alignment of the predicted and observed 1D strings is done by dynamic programming. The resulting alignment is used to detect remote 3D homologues.

TOPITS  Accuracy - results should be taken with caution: The first hit of the prediction-based threading is on average in 30% of the cases correct. Hits with z-scores above 3.0 are more reliable (accuracy > 60%). For exceptional cases the resulting alignments suffice for building correct homology-based models.

TOPITS Output (1) Alignment score Alignment length Length of indels Number of indels Length of sequence Alignment significance Matched sequence % sequence identity

TOPITS Output (2) Query sequence Predicted structure Buried / Outside Database sequence Amino acid matches Database known secondary structure

GenTHREADER Output Prediction confidence Expected errors Score from neural network Sequence alignment score and length Energy measurements Length of sequence Structure from PDB

Example 1. Domain searching. >Mystic sequence MESEMLQSPLLGLGEEDEADLTDWNLPLAFMKKRHCEKIEGSKSLAQSWRMKDRMKTVSVALVLCLNVGVDP PDVVKTTPCARLECWIDPLSMGPQKALETIGANLQKQYENWQPRARYKQSLDPTVDEVKKLCTSLRRNAKEE RVLFHYNGHGVPRPTVNGEVWVFNKNYTQYIPLSIYDLQTWMGSPSIFVYDCSNAGLIVKSFKQFALQREQE LEVAAINPNHPLAQMPLPPSMKNCIQLAACEATELLPMIPDLPADLFTSCLTTPIKIALRWFCMQKCVSLVP GVTLDLIEKIPGRLNDRRTPLGELNWIFTAITDTIAWNVLPRDLFQKLFRQDLLVASLFRNFLLAERIMRSY NCTPVSSPRLPPTYMHAMWQAWDLAVDICLSQLPTIIEEGTAFRHSPFFAEQLTAFQVWLTMGVENRNPPEQ LPIVLQVLLSQVHRLRALDLLGRFLDLGPWAVSLALSVGIFPYVLKLLQSSARELRPLLVFIWAKILAVDSS CQADLVKDNGHKYFLSVLADPYMPAEHRTMTAFILAVIVNSYHTGQEACLQGNLIAICLEQLNDPHPLLRQW VAICLGRIWQNFDSARWCGVRDSAHEKLYSLLSDPIPEVRCAAVFALGTFVGNSAERTDHSTTIDHNVAMML AQLVSDGSPMVRKELVVALSHLVVQYESNFCTVALQFIEEEKNYALPSPATTEGGSLTPVRDSPCTPRLRSV SSYGNIRAVATARSLNKSLQNLSLTEESGGAVAFSPGNLSTSSSASSTLGSPENEEHILSFETIDKMRRASS YSSLNSLIGVSFNSVYTQIWRVLLHLAADPYPEVSDVAMKVLNSIAYKATVNARPQRVLDTSSLTQSAPASP TNKGVHIHQAGGSPPASSTSSSSLTNDVAKQPVSRDLPSGRPGTTGPAGAQYTPHSHQFPRTRKMFDKGPEQ TADDADDAAGHKSFISATVQTGFCDWSARYFAQPVMKIPEEHDLESQIRKEREWRFLRNSRVRRQAQQVIQK GITRLDDQIFLNRNPGVPSVVKFHPFTPCIAVADKDSICFWDWEKGEKLDYFHNGNPRYTRVTAMEYLNGQD CSLLLTATDDGAIRVWKNFADLEKNPEMVTAWQGLSDMLPTTRGAGMVVDWEQETGLLMSSGDVRIVRIWDT DREMKVQDIPTGADSCVTSLSCDSHRSLIVAGLGDGSIRVYDRRMALSECRVMTYREHTAWVVKASLQKRPD GHIVSVSVNGDVRIFDPRMPESVNVLQIVKGLTALDIHPQADLIACGSVNQFTAIYNSSGELINNIKYYDGF MGQRVGAISCLAFHPHWPHLAVGSNDYYISVYSVEKRVR

Example

One of the key players in cancer development is p53. The p53 protein acts as a checkpoint in the cell cycle, either preventing or initiating programmed cell death (apoptosis). Since cancer is the unchecked proliferation of cells, disruption of p53 activation allows diseased cells to multiply or to trigger a tumor's progression

Example

Example – TOPITS

Example – genThreader

Introduction to Bioinformatics - Tutorial no. 8 Protein Prediction: - PROSITE - Pfam - SCOP - TOPITS - genThreader.

Similar presentations

Presentation on theme: "Introduction to Bioinformatics - Tutorial no. 8 Protein Prediction: - PROSITE - Pfam - SCOP - TOPITS - genThreader."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Introduction to Bioinformatics - Tutorial no. 8 Protein Prediction: - PROSITE - Pfam - SCOP - TOPITS - genThreader.

Similar presentations

Presentation on theme: "Introduction to Bioinformatics - Tutorial no. 8 Protein Prediction: - PROSITE - Pfam - SCOP - TOPITS - genThreader."— Presentation transcript:

Similar presentations

About project

Feedback