Download presentation
Presentation is loading. Please wait.
Published byErin Crawford Modified over 9 years ago
1
Bioinformatic Analysis of Protein Families Daniil G. Naumoff Laboratory of Bioinformatics State Institute for Genetics and Selection of Industrial Microorganisms Moscow, Russia Gos NII Genetika Moscow, Russia
2
The International Nucleotide Sequence Database Collaboration (INSDC) GenBank at NCBI: http://www.ncbi.nlm.nih.gov/Genbank/ EMBL Nucleotide Sequence Database: http://www.ebi.ac.uk/embl/ DNA Data Bank of Japan (DDBJ): http://www.ddbj.nig.ac.jp/ Corresponding protein databases: GenPept, UniProtKB/TrEMBL, and DDBJ Curated protein database Swiss-Prot: http://au.expasy.org/sprot/ Three dimensional structures of proteins (3D) PDB: http://www.pdb.org/pdb/home/home.do (database) SCOP: http://scop.mrc-lmb.cam.ac.uk/scop/ (classification)
4
http://www.ebi.ac.uk/embl/Services/DBStats/ http://www.genomesonline.org/gold_statistics.htm
5
http://www.pdb.org/pdb/statistics/contentGrowthChart.do?content=total&seqid=100
7
Search of homologues
8
BLOSUM-62 matrix http://www.ncbi.nlm.nih.gov/blast/html/sub_matrix.html
11
Overprediction is annotation of sequences at a greater level of functional specificity than available evidence supports.
24
- Select a protein - Determine the domain structure of the selected protein - Select a domain to be analyzed - Has the protein domain family been annotated in a database? - Updating of the family list or searching for homologous domains - Cheek each "atypical" sequence (probably it will be edited or removed) - Preliminary division into subfamilies - Multiple sequence alignment (consensus?) - Phylogenetic analysis - Phylogenetic tree visualization - Subfamily structure - Interfamily relationship (superfamilies, clans, etc.) - 2D and 3D analysis (prediction) A Protein Family Analysis (http://zbio.net/bio/001/003.html)
27
ADDA - Automatic Domain Decomposition Algorithm http://ekhidna.biocenter.helsinki.fi/sqgraph/pairsdb/form_browse 33,879 domain families (79,965 if redundant sequences were used) according to Heger A, Holm L. Exhaustive enumeration of protein domain families. J Mol Biol. 2003, 328(3):749-767.
34
- Select a protein - Determine the domain structure of the selected protein - Select a domain to be analyzed - Has the protein domain family been annotated in a database? - Updating of the family list or searching for homologous domains - Cheek each "atypical" sequence (probably it will be edited or removed) - Preliminary division into subfamilies - Multiple sequence alignment (consensus?) - Phylogenetic analysis - Phylogenetic tree visualization - Subfamily structure - Interfamily relationship (superfamilies, clans, etc.) - 2D and 3D analysis (prediction) A Protein Family Analysis (http://zbio.net/bio/001/003.html)
35
Let’s use this protein as a query sequence for BLAST
36
BLAST results (Descriptions) E-value < 0.01 or 0.001
37
BLAST results (Graphic overview) Domain IDomain IIDomain III
38
GH27NGH27C GH27N GH27CCBM13 GH27NGH27CCBM6 GH27NGH27CCBM6CBM13 GH27NCBM13GH27C NEW1GH27NCBM13GH27C NEW1GH27NGH27C NEW2NEW1GH27NGH27C GH27NGH27CNEW3NEW2 GH27NGH27CNEW3 GH27NGH27C Dockerin GH27NGH27CCBM1CE1 N-terminal domain of GH27 family C -terminal domain of GH27 family CE1 domain of carbohydrate esterases Carbohydrate-binding module CBM1 Carbohydrate-binding module CBM6 Carbohydrate-binding module CBM13 Dockerin I domain Uncharacterized domain Uncharacterized domain (NPCBM) Uncharacterized domain CBM13 CBM6 Dockerin NEW1 NEW2 NEW3 CBM1 CE1 GH27C GH27N Domain structure of proteins of the GH27 family according to Naumoff D.G. Phylogenetic analysis of α-galactosidases of the GH27 family. Molecular Biology (Engl Transl), 2004, 38(3):388-399. PDF: http://bioinform.genetika.ru/members/Naumoff/MB2004E.pdf
39
15333 http://ekhidna.biocenter.helsinki.fi/sqgraph/pairsdb ADDA 11082 16December2009 http://www.ebi.ac.uk/interpro/ InterPro24.0 8575 http://compbio.mcs.anl.gov/puma2/cgi-bin/index.cgi PUMA2 11912 October2009http://pfam.janelia.org/ Pfam24.0 4852 http://www.ncbi.nlm.nih.gov/COG/grace/shokog.cgi KOG 4872 http://www.ncbi.nlm.nih.gov/COG/grace/uni.html COG 3902 June2009http://scop.mrc-lmb.cam.ac.uk/scop/ SCOP 1.75 10019 4Jan2010http://www.cathdb.info/CATH3.3 1032 4Jan2010http://www-cryst.bioc.cam.ac.uk/homstrad/ HOMSTRAD Number of families DateAddressDatabase Universal Protein Domain Databases 15333 http://ekhidna.biocenter.helsinki.fi/sqgraph/pairsdb ADDA 11082
40
Databases of individual protein families (http://www.oxfordjournals.org/nar/database/subcat/3/10)
41
Sequence Based Classification of the Carbohydrate-Active Enzymes at the CAZy server (www.cazy.org/) Glycoside Hydrolases (including transglycosidases) => 118 GH families (14 clans) Glycosyltransferases => 92 GT families Polysaccharide Lyases => 21 PL families Carbohydrate Esterases => 16 CE families Carbohydrate-Binding Modules => 59 CBM families
42
Family GH72 of Glycoside Hydrolases (http://www.cazy.org/GH72.html)
43
Multiple Sequence Alignment: – Automatic (ClustalW or ClustalX) >50% of sequence identity only one domain no protein fragments – Manual (BioEdit) (take into account BLAST pairwise sequence alignment!) <30% of sequence identity long insertions / deletions facultative N-terminal part Local dissimilarities of very similar sequences: – Local frameshift – Exon-intron structure – Stop codon
44
BioEdit (http://www.mbio.ncsu.edu/BioEdit/bioedit.html)
45
Phylip (http://evolution.gs.washington.edu/phylip.html) Maximum Parsimony (ProtPars) Distance program (Neighbor-Joining)
46
An infile for the Phylip package programs
47
Maximum Parsimony (protpars.exe) from the Phylip package
48
Phylogenetic tree visualization: TreeView program (http://taxonomy.zoology.gla.ac.uk/rod/treeview.html) Slanted cladogram Radial Rectangular cladogram Phylogram
49
Subfamily criteria (for glycosidases) 1.Pairwise sequence similarity (>30% of identity) 2.Order of sequence appearance during BLAST search (members of the same subfamily always appear at the top of BLAST results) 3.Monophyletic status
50
The maximum parsimony phylogenetic tree of family GH97 100 1000 876 1000 954 1000 97C1_LEIXY 97C1_PRERU 97C2_BACTH 1000 97C1_MICDE 97C2_MICDE 97C1_BACTH 97C2_PRERU 97C3_PRERU 1000 925 579 97D1_CAUCR 97D1_XANAX 97D1_XANCA 1000 97B1_MICDE 97B4_BACTH 97B1_PRERU 97B1_BACTH 874 813 97B2_PRERU 97B1_BACFR 97B3_BACTH 97B2_BACFR 97B2_BACTH 431 1000 809 1000 509 424 977 97E1_BACTH 97E1_RHOBA 97A1_HALMA 97A1_SALRU 97A2_BACFR 97A3_BACTH 1000 496 97A1_PRERU 97A1_PREIN 1000 97A1_BACTH 97A1_TANFO 680 97A1_BACFR 97A2_BACTH 97A1_UNBAC 895 1000 97A8_ENSEQ 97A1_AZOVI 1000 97A5_ENSEQ 97A4_ENSEQ 97A3_ENSEQ 1000 97A7_ENSEQ 97A6_ENSEQ 492 1000 678 97A1_MICDE 97A1_SHEON 97A2_ENSEQ 97A1_ENSEQ 991 1000 97A1_NOVAR 97A1_ERYLI 1000 97A1_XANAX 1000 866 999 558 277 782 Subfamily 97a 97A1_XANCA Subfamily 97d Subfamily 97e Subfamily 97c Subfamily 97b -glucosidase activity [EC 3.2.1.20]
51
The neighbor-joining phylogenetic tree of family GH97 97E1_RHOBA 97E1_BACTH 97C1_LEIXY 97C1_PRERU 97C2_BACTH 97C1_MICDE 97C2_MICDE 97C1_BACTH 97C2_PRERU 97C3_PRERU 97D1_CAUCR 97D1_XANCA 97D1_XANAX 97B1_MICDE 97B1_BACTH 97B4_BACTH 97B1_PRERU 97B2_PRERU 97B1_BACFR 97B3_BACTH 97B2_BACFR 97B2_BACTH 97A1_HALMA 97A1_PRERU 97A1_PREIN 97A1_TANFO 97A1_BACTH 97A1_BACFR 97A1_UNBAC 97A2_BACTH 97A1_SALRU 97A2_BACFR 97A3_BACTH 97A1_AZOVI 97A8_ENSEQ 97A5_ENSEQ 97A4_ENSEQ 97A3_ENSEQ 97A7_ENSEQ 97A6_ENSEQ 97A1_ERYLI 97A1_NOVAR 97A1_XANCA 97A1_XANAX 97A1_MICDE 97A1_SHEON 97A2_ENSEQ 97A1_ENSEQ 996 991 988 839 969 993 646 996 991 996 808 835 996 617 499 392 996 951 996 498 992 908 996 562 953 996 401 996 773 996 992 850 996 975 931 996 995 865 452 271 830 Subfamily 97e Subfamily 97c Subfamily 97d Subfamily 97b Subfamily 97a [EC 3.2.1.20]
52
The neighbor-joining phylogenetic tree of the α-galactosidase superfamily
54
Clans of Glycoside Hydrolases (β) 3 -solenoidinversion (axial orientation)28, 49GH-N (/)6(/)6 inversion (equatorial orientation)8, 48GH-M (/)6(/)6 inversion (axial orientation)15, 65GH-L (β/ ) 8 -barrel retention (equatorial orientation)18, 20, 85GH-K 5-fold β-propeller retention (β ‑ furanoside) 32, 68GH-J +β inversion (equatorial orientation)24, 46, 80GH-I (β/ ) 8 -barrel retention (axial orientation)13, 70, 77GH-H inversion (axial orientation)37, 63GH-G 5-fold β-propellerinversion (equatorial orientation)43, 62GH-F 6-fold β-propellerretention (equatorial orientation)33, 34, 83, 93GH-E (β/ ) 8 -barrel retention (axial orientation)27, 31, 36GH-D β-jelly rollretention (equatorial orientation)11, 12GH-C β-jelly rollretention (equatorial orientation)7, 16GH-B (β/ ) 8 -barrel retention (equatorial orientation)1, 2, 5, 10, 17, 26, 30, 35, 39, 42, 50, 51, 53, 59, 72, 79, 86, 113 GH-A Tertiary StructureOptical ConfigurationFamilies (GH)Clan (/)6(/)6
55
Rigden DJ. Iterative database searches demonstrate that glycoside hydrolase families 27, 31, 36, and 66 share a common evolutionary origin with family 13. FEBS Lett. 2002, 523(1-3):17 ‑ 22. clans GH-D GH-H
56
Nagano N, Porter CT, Thornton JM. The (β/α) 8 glycosidases: sequence and structure analyses suggest distant evolutionary relationships. Protein Eng. 2001, 14(11):845-855. clans:GH-HGH-AGH-K?
57
Screenshot of PSI Protein Classifier D.G. Naumoff and M. Carreras. 2009. PSI Protein Classifier: a new program automating PSI-BLAST search results. Molecular Biology (Engl Transl). V.43. N.4. P.652-664.
58
A hierarchical classification of the (β/α) 8 -type glycosyl hydrolases
59
A hierarchical structure of the -fructosidase (furanosidase) superfamily furanosidase superfamily GH32 GH68 GH43 GH62 GHLP clan GH-J clan GH-F GH32a GH32b GH32c GH32d GH68a GH68b GH43a GH43b GH43c GH43d GH43e GH43f GH43g
60
The Secondary Structure Prediction – 3D-PSSM (http://www.sbg.bio.ic.ac.uk/~3dpssm/) – GOR IV (http://npsa-pbil.ibcp.fr/cgi-bin/npsa_automat.pl?page=npsa_gor4.html) – nnpredict (http://www.cmpharm.ucsf.edu/~nomi/nnpredict-instrucs.html) – PredictProtein (http://www.embl-heidelberg.de/predictprotein/predictprotein.html) – Hydrophobic cluster analysis (HCA) The Tertiary Structure Prediction – The SWISS-MODEL modeling server (http://swissmodel.expasy.org/)
61
Phylogenetic Analysis of a Protein Family – The first stage of a work Prediction of 3D structure and domain structure of the protein Prediction of the active center and residues for site-directed mutagenesis Prediction of the enzymatic activities – The only part of a work (bioinformatics) – The final stage of a work (interpretation of the experimental results) Comparison of the phylogenetic trees of each domain of a certain protein will allow to reveal the protein evolutionary history, viz. the role of gene duplication, lost, fusion, and horizontal transfer.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.