Pfam a resource for remote homology domain identification et al NAR 2014
Build SEED MSA of representative members Build Profile-HMM Search UniProtKB Annotate EMBO Workshop, Cape Town, 2014 Building families Identify target QCs and fix Significance thresholds Abandon
Old Family New Family EMBO Workshop, Cape Town, 2014 QC: family overlaps
Old Family New Family EMBO Workshop, Cape Town, 2014 SNLVMYIVIIIHWNACVFYSISKAIGFGNDTWVYPDINDPEFGRLARKYVYSLYWSTLTLTTIGETPPPVRDSEYVFVVVDFLIGVLIFATIVGNIGSMI SN QC: family overlaps
EMBO Workshop, Cape Town, 2014 Old Family New Family SNLVMYIVIIIHWNACVFYSISKAIGFGNDTWVYPDINDPEFGRLARKYVYSLYWSTLTLTTIGETPPPVRDSEYVFVVVDFLIGVLIFATIVGNIGSMI SN A – Old and New family are evolutionary related nature overlaps, profile-profile, functional residues, functional annotation, structure QC: family overlaps
EMBO Workshop, Cape Town, 2014 A – Old and New family are evolutionary related Solution 1: Merge Old Family New Family SNLVMYIVIIIHWNACVFYSISKAIGFGNDTWVYPDINDPEFGRLARKYVYSLYWSTLTLTTIGETPPPVRDSEYVFVVVDFLIGVLIFATIVGNIGSMI SN QC: family overlaps
EMBO Workshop, Cape Town, 2014 A – Old and New family are evolutionary related Solution 2: Create/Add to clan Old Family New Family SNLVMYIVIIIHWNACVFYSISKAIGFGNDTWVYPDINDPEFGRLARKYVYSLYWSTLTLTTIGETPPPVRDSEYVFVVVDFLIGVLIFATIVGNIGSMI SN Clan QC: family overlaps
EMBO Workshop, Cape Town, 2014 A – Old and New family are NOT evolutionary related -> then overlaps might be false positives Old Family New Family SNLVMYIVIIIHWNACVFYSISKAIGFGNDTWVYPDINDPEFGRLARKYVYSLYWSTLTLTTIGETPPPVRDSEYVFVVVDFLIGVLIFATIVGNIGSMI SN QC: family overlaps
A – Old and New family are NOT evolutionary related Old Family New Family SNLVMYIVIIIHWNACVFYSISKAIGFGNDTWVYPDINDPEFGRLARKYVYSLYWSTLTLTTIGETPPPVRDSEYVFVVVDFLIGVLIFATIVGNIGSMI SN Solution 1: Separate (expunge seqs from SEED, trim ends, raise threshold) QC: family overlaps
A – Old and New family are NOT evolutionary related Old Family New Family SNLVMYIVIIIHWNACVFYSISKAIGFGNDTWVYPDINDPEFGRLARKYVYSLYWSTLTLTTIGETPPPVRDSEYVFVVVDFLIGVLIFATIVGNIGSMI SN Solution 2: Manually Edit (no change to family but sequence removed) QC: family overlaps
Overlaps Hits Score vs Taxonomic distribution Known annotation (e.g. functional/structural residues) Known structures … EMBO Workshop, Cape Town, 2014 False positive detection
Build SEED MSA of representative members Build Profile-HMM Search UniProtKB Annotate EMBO Workshop, Cape Town, 2014 Building families Identify target QCs and fix Significance thresholds Abandon
Are all Pfam families structural domains? EMBO Workshop, Cape Town, 2014
PDB (43%) No PDB (57%) Pfam families with/without PDB structure EMBO Workshop, Cape Town, 2014
Family Domain Repeat Motif Pfam types EMBO Workshop, Cape Town, 2014
A - Domain B - Metal stabilised domain C - 7 repeats form domain D - 9 repeats form domain could be unlimited number AB CD Domain and repeats EMBO Workshop, Cape Town, 2014
Example: Lipoprotein attachment site, LPAM_1 Alignment coloured by Residue-type Motifs EMBO Workshop, Cape Town, 2014
Family Domain Repeat Disordered Family? Pfam types EMBO Workshop, Cape Town, 2014
PDBid: 2JGC
The Pfam website EMBO Workshop, Cape Town, 2014
The Pfam website
EMBO Workshop, Cape Town, 2014
The Pfam website
EMBO Workshop, Cape Town, 2014 The Pfam website
Pfam families’ interactions: iPfam Finn et al. NAR 2013http://
TUM, January 2013 Some caveats Identifying repeats is challenging, especially with HMMER3 ->local Functional diversity within families and clans Domains of Unknown Function Family boundaries if no structure available EMBO Workshop, Cape Town, 2014
TUM, January 2013 Comparison of Enolase clan/superfamily in Pfam and SFLD SFLD: Akiva et al. NAR 2013 Picture courtesy of Patsy Babbit (UCSF)
from the Pfam blog: at How far from covering the sequence space: H. sapiens EMBO Workshop, Cape Town, 2014
Building a Pfam family EMBO Workshop, Cape Town, 2014
TUM, January KX7 Pick a target region OPEN Chimera 1. File -> Open “2KX7.pdb” 2. EMBO Workshop, Cape Town, 2014
TUM, January 2013 SELECT “2KX7.pdb (#0.1) chain A” Actions-> Ribbon-> hide 2KX7 model 1 1. Actions -> Ribbon -> show EMBO Workshop, Cape Town, 2014 Pick a target region
TUM, January 2013 Schmöe et al. Structure KX7 EMBO Workshop, Cape Town, 2014 Rcs-signaling system bacterial two component system (sensor kinase +response regulator)
TUM, January 2013 EMBO Workshop, Cape Town, 2014 Pick a target region Look-up UniprotKB ID: P39838 on the Pfam website (
TUM, January 2013 EMBO Workshop, Cape Town, 2014 Pick a target region Look-up UniprotKB ID: P39838 on the Pfam website (
TUM, January KX7 EMBO Workshop, Cape Town, 2014 Schmöe et al. Structure 2011 HK S S ABL HPt Pick a target region
TUM, January KX7 EMBO Workshop, Cape Town, 2014 Schmöe et al. Structure 2011 HK S S ABL HPt Pick a target region
EMBO Workshop, Cape Town, 2014 Pick a target region
EMBO Workshop, Cape Town, 2014 Pick a target region
Look for homologs EMBO Workshop, Cape Town, Click Start HMMER website: Finn et al. NAR 2011
Look for homologs EMBO Workshop, Cape Town, Choose “Marco-Data/Other/2KX7.fasta”
Select your dataset EMBO Workshop, Cape Town, 2014 Select rp75 in Sequence Database
Parse hits EMBO Workshop, Cape Town, 2014
Parse hits EMBO Workshop, Cape Town, 2014 Click
Check conservation and coverage EMBO Workshop, Cape Town, 2014
Check low scores EMBO Workshop, Cape Town, 2014 Scroll down
Check taxonomic distribution EMBO Workshop, Cape Town, 2014 Click Taxonomy
Check taxonomic distribution EMBO Workshop, Cape Town, 2014
Check domain architectures/overlaps EMBO Workshop, Cape Town, 2014 Click Domain
Download aligned hits EMBO Workshop, Cape Town, 2014 CLICK on Download and then on Aligned FASTA 1. Save as “RcsD-ABL-hmmer-ali.fasta” 2.
OPEN Jalview 1. File -> Input Alignment -> From File “RcsD-ABL-hmmer-ali.fasta” 2. Manipulate alignment