Presentation is loading. Please wait.

Presentation is loading. Please wait.

Protein domain/family db Secondary databases are the fruit of analyses of the sequences found in the primary sequence db Either manually curated (i.e.

Similar presentations


Presentation on theme: "Protein domain/family db Secondary databases are the fruit of analyses of the sequences found in the primary sequence db Either manually curated (i.e."— Presentation transcript:

1

2 Protein domain/family db Secondary databases are the fruit of analyses of the sequences found in the primary sequence db Either manually curated (i.e. PROSITE, Pfam, etc.) or automatically generated (i.e. ProDom, DOMO) Each of them uses a different method to detect if a protein belongs to a particular domain/family (patterns, profiles, HMM)

3 Protein domain/family Most proteins have « modular » structures Estimation: ~ 3 domains / protein Domains (conserved sequences or structures) are identified by multiple sequence alignments Domains can be defined by different methods: –Pattern (regular expression); used for very conserved domains –Profiles (weighted matrices): two-dimensional tables of position specific match-, gap-, and insertion-scores, derived from aligned sequence families; used for less conserved domains –Hidden Markov Model (HMM); probabilistic models; an other method to generate profiles.

4 Some statistics 15 most common domains for H. sapiens (Incomplete) Immunoglobulin and major histocompatibility complex domain Zinc finger, C2H2 type Eukaryotic protein kinase Rhodopsin-like GPCR superfamily Pleckstrin homology (PH) domain Zinc finger, RING type Src homology 3 (SH3) domain RNA-binding region RNP-1 (RNA recognition motif) EF-hand family Homeobox domain Krab box PDZ domain (also known as DHR or GLGF) Fibronectin type III domain EGF-like domain Cadherin domain … http://www.ebi.ac.uk/proteome/HUMAN/interpro/top15d.html

5 Protein domain/family db PROSITEPatterns /Profiles ProDomAligned motifs PRINTSAligned motifs PfamHMM (Hidden Markov Models) SMARTHMM BLOCKSAligned motifs InterPro

6 Prosite  Created in 1988 (SIB)  Contains functional domains fully annotated, based on two methods: patterns and profiles  Entries are deposited in PROSITE in two distinct files:  Pattern/profiles with the list of all matches in SWISS- PROT  Documentation Aug 2001: contains 1089 documentation entries that describe 1474 different patterns, rules and profiles/matrices.

7 Diagnostic performance List of matches

8 Prosite (profile): example

9 PFAM (HMMs): an entry

10 … …

11 PFAM (HMMs): query output

12 HMMs

13 Most protein families are characterized by several conserved motifs Fingerprint: set of motif(s) (simple or composite, such as multidomains) = signature of family membership True family members exhibit all elements of the fingerprint, while subfamily members may possess only part of it

14 ProDom consists of an automated compilation of homologous domain alignment. August 2001: 390 ProDom families were generated automatically using PSI-BLAST. built from non fragmentary sequences from SWISS-PROT 39 + TREMBL - May 29th, 2000

15 ProDom: query output example Your query

16 Protein domain/family: Composite databases Example: InterPro Unification of PROSITE, PRINTS, Pfam, ProDom and SMART into an integrated resource of protein families, domains and functional sites; Single set of documents linked to the various methods; Will be used to improve the functional annotation of SWISS-PROT (classification of unknown protein…) This release (3.2 july 2001) contains 3939 entries, representing 1009 domains, 2850 families, 65 repeats and 15 post-translational modifications sites.

17

18

19


Download ppt "Protein domain/family db Secondary databases are the fruit of analyses of the sequences found in the primary sequence db Either manually curated (i.e."

Similar presentations


Ads by Google