Download presentation
Presentation is loading. Please wait.
Published byWesley Allison Modified over 9 years ago
1
Exploiting Structural and Comparative Genomics to Reveal Protein Functions Predicting domain structure families and their domain contexts Exploring how structural divergence in domain families correlates with functional change Predicting domain relatives likely to have significantly different structures and functions C A T H Domain families of known structure Gene3D Protein families and domain annotations for completed genomes
2
Thanks to Amos, Rolf and the Swiss-Prot Team!!!! Congratulations Swiss-Prot - 20 Years!!
3
H1 Class (3) Architecture (36) Topology or Fold (1100) C A TH Homologous superfamily (2100) H2H3 Orengo and Thornton (1994) 86,000 domains
4
Gene3D : Domain annotations in genome sequences scan against library of HMM models ~2100 CATH ~8300 Pfam >2 million protein sequences from 300 completed genomes and UniProt assign domains to CATH and Pfam superfamilies Benchmarking by structural data shows that 76% of remote homologues can be identified using the HMMs
5
DomainFinder: structural domains from CATH take precedent Gene3D: Domain annotations in genome sequences NC CATH-1 Pfam-2 Pfam-1 NewFam CATH-1Pfam-1 NewFam Pfam-2 UniProt sequence Assigned domains
6
Domain families ranked by size (number of domain sequences) Percentage of all domain family sequences in UniProt Rank by family size CATH superfamilies of known structure Pfam families of unknown structure NewFam of unknown stucture (>50,000 families) >90% of domain sequences in UniProt can be assigned to ~7000 domain families
7
Domain families ranked by size (number of domain sequences) Rank by family size CATH superfamilies of known structure Pfam families of unknown structure NewFam of unknown stucture (>50,000 families) 100 largest families of known structure account for 30% of domain sequences in UniProt Percentage of all domain family sequences in UniProt
8
Population in genomes Structural Diversity Correlation of sequence and structural variability of CATH families with the number of different functional groups
9
Exploiting Structural and Comparative Genomics to Reveal Protein Functions Prediting domain structure families and their domain contexts Exploring how structural divergence in domain families correlates with functional change Predicting domain relatives likely to have significantly different structures and functions C A T H Domain families of known structure Gene3D Protein families and domain annotations for completed genomes
10
Multiple structural alignment by CORA allows identification of consensus secondary structures and secondary structure embellishments Some superfamilies show great structural diversity In 117 superfamilies relatives expanded by >2 fold or more 2DSEC algorithm Gabrielle Reeves J. Mol. Biol. (2006)
11
Structural embellishments can modify the active site Galectin binding superfamily
12
Structural embellishments can modulate domain interactions Glucose 6-phosphate dehydrogenase side orientation face orientation Dihydrodipiccolinate reductase Additional secondary structure shown at (a) are involved in subunit interactions a
13
Structural embellishments can modify function by modifying active site geometry and mediating new domain and subunit interactions Biotin carboxylase D-alanine-d-alanine ligase Dimer of biotin carboxylase ATP Grasp superfamily
14
Secondary structure insertions are distributed along the chain but aggregate in 3D
17
For ~70% of domains analysed, 80% of the secondary structure embellishments are co-located in 3D with 3 or more other embellishments In 80% of domains, 1 or more embellishments contacts other domains or subunits Indel frequency < 1 % 0.85% 0.38% 0.23% 0.11% 0.06% 0.02% 0 20 40 60 80 123456789101112 Size of Indel (number of secondary structures) Frequency (%) 85% of insertions comprise only 1 or 2 secondary structures Size of insertion (number of secondary structures) Frequency (%)
18
2 Layer Beta Sandwich 3 Layer Alpha/Beta Sandwich 2 Layer Alpha/Beta Alpha/Beta Barrel Many structurally diverse superfamilies adopt folds with these regular layered architectures
19
2 Layer Beta Sandwich 3 Layer Alpha/Beta Sandwich 2 Layer Alpha/Beta Alpha/Beta Barrel Many structurally diverse superfamilies adopt folds with these regular layered architectures
20
Exploiting Structural and Comparative Genomics to Reveal Protein Functions Predicting domain structure families and their domain contexts Exploring how structural divergence in domain families correlates with functional change Predicting domain relatives likely to have significantly different structures and functions C A T H Domain families of known structure Gene3D Protein families and domain annotations for completed genomes
21
subfamily of close sequence relatives predicted to have similar functions subfamily of close sequence relatives predicted to have similar functions (>=60% sequence identity) GEMMA – GEne Model and Model Annotation Algorithm for Predicting Sequence Homologues with Similar Structures and Functions Largest 100 CATH families have more than 20,000 subfamilies structuralsuperfamily
22
structuralsuperfamily GEMMA – Predicting Functional Groups in CATH Superfamilies Build multiple sequence alignments for each subfamily subfamily of close relatives predicted to have similar function (>60% identity) subfamily of close relatives predicted to have similar function (>60% identity)
23
structuralsuperfamily GEMMA – Predicting Functional Groups in CATH Superfamilies Cluster subfamilies predicted to have similar functions into functional groups subfamily of close relatives predicted to have similar function (>60% identity) subfamily of close relatives predicted to have similar function (>60% identity)
24
SSAP score = 68.69 PSS score = 0.375 Pyruvate phosphate dikinase (subfamily 1) Succinyl-CoA synthetase (subfamily 22) SSAP score = 93.01 PSS score = 0.827 SSAP score = 68.32 PSS score = 0.333 Pyruvate phosphate dikinase (subfamily 15) ATP Grasp Family 192 subfamilies
25
subfamily profiles coloured by residue conservation (red = high, blue = low) (red = high, blue = low) Pyruvate phosphate dikinase Profiles aligned using profile -profile comparison (MAFFT) Many fully conserved positions 6/7 positions are fully conserved Equivalent functions Scorecons (Valdar and Thornton, Profunc)
26
Succinyl-CoA synthetase Pyruvate phosphate dikinase Fully conserved positions No fully conserved positions subfamily profiles coloured by residue conservation (red = high, blue = low) (red = high, blue = low) Different functions Scorecons (Valdar and Thornton, Profunc) Profiles aligned using profile -profile comparison (MAFFT)
27
10 experimentally identified enzyme functions identified in this family Number of functional groups predicted Performance in Merging Subfamilies into Functional Groups Error rate
28
structural structuralsuperfamily GEMMA – Predicting Functional Groups in CATH Superfamilies subfamily of close relatives predicted to have similar function (>60% identity) subfamily of close relatives predicted to have similar function (>60% identity) functional group Benchmarked on 12 large enzyme families in CATH 6-10 fold reduction in the number of functional subfamilies
29
Summary Summary More than half the domains in UniProt can be assigned to families of known structure Analysis of some very large structural families revealed how secondary structure insertions can modulate functions Functional groups can be identified in diverse families by comparing multiple features (e.g. residue conservation, predicted secondary structure)
30
CATHGene3D Lesley Greene Ian Sillitoe Tony Lewis Ollie Redfern Alison Cuff Tim Dallman Mark Dibley Sarah Addou Stathis Sidderis Russell Marsden Dave Lee Juan Ranea Ilhem Diboun Adam Reid Corin Yeats MRC, Wellcome Trust, NIH, EU -Biosapiens, Embrace, Enfin, BBSRC http://www.biochem.ucl.ac.uk/bsm/cath_new
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.