Download presentation
Presentation is loading. Please wait.
Published byCody Leonard Modified over 9 years ago
2
Protein domain/family db Secondary databases are the fruit of analyses of the sequences found in the primary sequence db Either manually curated (i.e. PROSITE, Pfam, etc.) or automatically generated (i.e. ProDom, DOMO) Each of them uses a different method to detect if a protein belongs to a particular domain/family (patterns, profiles, HMM)
3
Protein domain/family Most proteins have « modular » structures Estimation: ~ 3 domains / protein Domains (conserved sequences or structures) are identified by multiple sequence alignments Domains can be defined by different methods: –Pattern (regular expression); used for very conserved domains –Profiles (weighted matrices): two-dimensional tables of position specific match-, gap-, and insertion-scores, derived from aligned sequence families; used for less conserved domains –Hidden Markov Model (HMM); probabilistic models; an other method to generate profiles.
4
Some statistics 15 most common domains for H. sapiens (Incomplete) Immunoglobulin and major histocompatibility complex domain Zinc finger, C2H2 type Eukaryotic protein kinase Rhodopsin-like GPCR superfamily Pleckstrin homology (PH) domain Zinc finger, RING type Src homology 3 (SH3) domain RNA-binding region RNP-1 (RNA recognition motif) EF-hand family Homeobox domain Krab box PDZ domain (also known as DHR or GLGF) Fibronectin type III domain EGF-like domain Cadherin domain … http://www.ebi.ac.uk/proteome/HUMAN/interpro/top15d.html
5
Protein domain/family db PROSITEPatterns /Profiles ProDomAligned motifs PRINTSAligned motifs PfamHMM (Hidden Markov Models) SMARTHMM BLOCKSAligned motifs InterPro
6
Prosite Created in 1988 (SIB) Contains functional domains fully annotated, based on two methods: patterns and profiles Entries are deposited in PROSITE in two distinct files: Pattern/profiles with the list of all matches in SWISS- PROT Documentation Aug 2001: contains 1089 documentation entries that describe 1474 different patterns, rules and profiles/matrices.
7
Diagnostic performance List of matches
8
Prosite (profile): example
9
PFAM (HMMs): an entry
10
… …
11
PFAM (HMMs): query output
12
HMMs
13
Most protein families are characterized by several conserved motifs Fingerprint: set of motif(s) (simple or composite, such as multidomains) = signature of family membership True family members exhibit all elements of the fingerprint, while subfamily members may possess only part of it
14
ProDom consists of an automated compilation of homologous domain alignment. August 2001: 390 ProDom families were generated automatically using PSI-BLAST. built from non fragmentary sequences from SWISS-PROT 39 + TREMBL - May 29th, 2000
15
ProDom: query output example Your query
16
Protein domain/family: Composite databases Example: InterPro Unification of PROSITE, PRINTS, Pfam, ProDom and SMART into an integrated resource of protein families, domains and functional sites; Single set of documents linked to the various methods; Will be used to improve the functional annotation of SWISS-PROT (classification of unknown protein…) This release (3.2 july 2001) contains 3939 entries, representing 1009 domains, 2850 families, 65 repeats and 15 post-translational modifications sites.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.