Download presentation
Presentation is loading. Please wait.
1
Protein Families, Motifs & Domains.
2
The Annotation Process
DNA SEQUENCE Useful Information ANNALYSIS SOFTWARE Annotator
3
A Common Mistake! BLAST PROTEIN SEQUENCE Function Annotator
4
Protein Families, Motifs & Domains.
A word about BLAST and FASTA Sequence alignment Domains Prosite Pfam/HMMs SignalP/ TMHMM
5
BLAST Local Alignment Suggests the presence of a common domain between two proteins. However common domains can be conserved between proteins with very different functions Eg ATP binding common to many proteins
6
BLAST/FASTA Reduces sensitivity increases specificity
FASTA is a global alignment tool BLAST blast is local BLAST FASTA Reduces sensitivity increases specificity
7
Using FASTA Global Alignment
Annotation gained from homology hits is only as good as the annotation you are transferring. Eg there are two different genes called ESAG2 in swall. Small changes in “your gene” might confer functional differences.
8
FASTA 10-5 Low scoring hits Can give good alignments
9
10-8 High scoring hits can give poor alignments
12
The big problem with searching public databases is…
There is a need to reduce The amount of sequences We search and to prevent bad Annotation from spreading
13
Protein Families, Motifs & Domains.
Proteins with common functions have some common features. Domains and motifs from conserved residues. Families can be grouped, profiles and HMMs derived. There is more to life than Blast
14
Sequence Alignment Sequence alignments allow us to see which residues are important to a family of proteins. This lets us make motifs/profiles/fingerprints/HMMs. To define families
15
Domains A domain is a functional part of a protein
It may contain amino acid sequence motifs that can be used to identify it. More than one motif is known as a fingerprint
16
DOMAINS Motifs Prosite Fingerprints Blocks Pfam (HMMs) Domain
Alignment Fingerprints Blocks Pfam (HMMs)
17
Prosite http://us.expasy.org/prosite/
Maintained a the swiss institute of Bioinformatics. All Motifs are checked for false positives and fine tuned. Sometimes a family can be defined by more than one expression. Fingerprints and BLOCKs automatically scan proteins for a number of motifs.
18
Prosite (Bairoch et al (1997) NAR 25(1) 217-221)
Single most conserved motifs Referred to as regular expressions or Patterns. Eg.. cydeggis cyedggis cyeeggit cyhgdggs cyrgdgnt C-Y-x2-[DG]-G-x-[ST]
19
Prosite PROSITE: PS00002 ID GLYCOSAMINOGLYCAN; RULE.
AC PS00002; DT APR-1990 (CREATED); APR-1990 (DATA UPDATE); APR-1990 (INFO UPDATE). DE Glycosaminoglycan attachment site. PA S-G-x-G. RU Additional rules: RU There must be at least two acidic amino acids (Glu or Asp) from -2 to RU -4 relative to the serine. CC /TAXO-RANGE=??E??; CC /SITE=1,glycosaminoglycan; CC /SKIP-FLAG=TRUE; DO PDOC00002; //
20
Prosite documentation entry
************************************* * Glycosaminoglycan attachment site * Proteoglycans [1] are complex glycoconjugates containing a core protein to which a variable number of glycosaminoglycan chains (such as heparin sulfate, chondroitin sulfate, etc.) are covalently attached. The glycosaminoglycans are attached to the core proteins through a xyloside residue which is in turn linked to a serine residue of the protein. A consensus sequence for the attachment site seems to exist [2]. However, it must be noted that this consensus is only based on the sequence of three proteoglycan core proteins. -Consensus pattern: S-G-x-G [S is the attachment site] Additional rule: There must be at least two acidic amino acids from -2 to -4 relative to the serine. -Last update: June 1988 / First entry. [ 1] Hassel J.R., Kimura J.H., Hascall V.C. Annu. Rev. Biochem. 55: (1986). [ 2] Bourdon M.A., Krusius T., Campbell S., Schwarz N.B. Proc. Natl. Acad. Sci. U.S.A. 84: (1987).
21
Prosite A prosite hit is a binary piece of information (True/False).
However some motifs are very simple. so many false positives. Some motifs should be found together. Documentation must always be read.
22
Hidden Markov Models Probabilistic models linking interconnecting states Profile HMMs represent linear chains of match, delete or insert. Each position in an alignment is assigned M,I or D. There is a defined probability of moving from one state to the next.
23
Hidden Markov Models D1 D2 D3 D4 I2 I3 I4 I0 I1 begin M1 M2 M3 M4
24
Pfam Pfam 7.0 contains a total of 3360 families.
Pfam is a database of two parts: Pfam A ..curated Pfam B automatically generated. All HMMs have a seed alighment which is added to using the HMMer package.
25
Pfam
26
Pfam
27
Pfam
28
Pfam annotation
29
Pfam Scores E value expect value, same as for blast, probability of a hit by chance. Noise Cutoff The HMM score below which a hit is uninteresting. Trusted Cutoff The HMM score above which there should be no false positives.
31
Searching for Specific Domains
Signal Peptides Secreted/targeted proteins Transmembrane Domains Membrane bound proteins
32
Interpro curation
34
TMHMM What is a transmembrane domain
35
TMHMM http://www.cbs.dtu.dk/services/TMHMM/
36
SIGNALP What Is a signal Peptide?
Any protein that has to be targeted to a specific part of the cell requires a signal peptide. The signal peptide ensures that the protein in translated at the ER where it can enter the secretory pathway. Ie, the signal peptide suggests a cellular (or extracellular) location other than the cytoplasm.
37
SIGNALP
39
using secondary databases for functional Assignments
Better, more detailed, proffesional annotation. More powerful and sensitive search methods, hmms/profiles/weight matrixes. Not as good coverage.
40
The Gene Prediction Process
BLAST FASTA SignalP DNA SEQUENCE Functional Assignments ANNALYSIS SOFTWARE TMHMM Pfam Prosite Annotator
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.