Homology Profile-HMMs Domains Protein-family Databases How to build a new (Pfam) protein family EMBO Workshop, Cape Town, 2014 Function annotation transfer Outline Pfam database
Homology EMBO Workshop, Cape Town, 2014
Definition: Two proteins are homologous if they share a common ancestor, i.e. they are evolutionary related EMBO Workshop, Cape Town, 2014
Symmetric A A B B homologous Transitive B B A A homologous A A B B AND B B C C homologous A A C C
Detecting homology EMBO Workshop, Cape Town, 2014
Human: 1 MGLSDGEWQLVLNVWGKVEADIPGHGQEVLIRLFKGHPETLEKFDKFKHLKSEDEMKASE 60 MGLSDGEWQLVLNVWGKVEAD GHGQEVLI LFK HPETL KFDKFK LKSE MK SE Mouse: 1 MGLSDGEWQLVLNVWGKVEADLAGHGQEVLIGLFKTHPETLDKFDKFKNLKSEEDMKGSE 60 Human: 61 DLKKHGATVLTALGGILKKKGHHEAEIKPLAQSHATKHKIPVKYLEFISECIIQVLQSKH 120 DLKKHG TVLTALG ILKKKG H AEI PLAQSHATKHKIPVKYLEFISE II VL H Mouse: 61 DLKKHGCTVLTALGTILKKKGQHAAEIQPLAQSHATKHKIPVKYLEFISEIIIEVLKKRH 120 Human: 121 PGDFGADAQGAMNKALELFRKDMASNYKELGFQG 154 GDFGADAQGAM KALELFR D A YKELGFQG Mouse: 121 SGDFGADAQGAMSKALELFRNDIAAKYKELGFQG 154 By excess similarity (see Pearson Curr Protoc Bioinformatics 2013 ) Statistical significance (e.g. E-values) Sequence similarity EMBO Workshop, Cape Town, 2014
2G2X: 1 MAYWLMKSEPDELSIEALARLGEARWDGVRNYQARNFLRAMSVGDEFFFYH-----SSCP 55 MAYWL D W Y N VGD Y 2P5D: 4 MAYWLCITNEDNWKVIKEKKI----WGVAERY--KNTINKVKVGDKLIIYEIQRSGKDYK 57 2G2X: 56 QPGIAGIARITRAAYPD------PTALDPESHY 82 P I G Y D PT P 2P5D: 58 PPYIRGVYEVVSEVYKDSSKIFKPTPRNPNEKF 90 Excess sequence similarity? Structural similarity EMBO Workshop, Cape Town, 2014
2G2X 2P5D Structural similarity EMBO Workshop, Cape Town, 2014
Structural similarity 2G2X 2P5D
Structural similarity 2G2X 2P5D Z-score = 12.2 RMSD = 2.9 Lali = 122 %id =20 DALI:
EMBO Workshop, Cape Town, 2014 Genomic context See e.g. Jun et al. BMC Genomics 2009
EMBO Workshop, Cape Town, 2014 Genomic context Homology See e.g. Jun et al. BMC Genomics 2009
EMBO Workshop, Cape Town, 2014 Genomic context See e.g. Jun et al. BMC Genomics 2009 Homology?
EMBO Workshop, Cape Town, 2014 Genomic context Mostly used for distinguishing orthology from paralogy
Origins of homology in proteins EMBO Workshop, Cape Town, 2014
Origin of homology in proteins Speciation (orthology) Gene duplication (paralogy) Horizontal gene transfer (xenology) Whole genome duplication (ohnology) Gametology EMBO Workshop, Cape Town, 2014
Myoglobin: Serves as a reserve supply of oxygen and facilitates the movement of oxygen within muscles. Orthology EMBO Workshop, Cape Town, 2014
Speciation (orthology) Gene duplication (paralogy) Horizontal gene transfer (xenology) Whole genome duplication (ohnology) Gametology Origin of protein homology EMBO Workshop, Cape Town, 2014
Myoglobin: Serves as a reserve supply of oxygen and facilitates the movement of oxygen within muscles. Hemoglobin: Oxygen-transport protein in red-blood cells of vertebrates Paralogy
EMBO Workshop, Cape Town, 2014
Ancestral Globin B C Myo A Hemo EMBO Workshop, Cape Town, 2014
Ancestral Globin B C Myo A Hemo EMBO Workshop, Cape Town, 2014
Ancestral Globin B C Myo A Hemo Myo Hemo Myo Hemo EMBO Workshop, Cape Town, 2014
Origin of protein homology EMBO Workshop, Cape Town, 2014 Speciation (orthology) Gene duplication (paralogy) Horizontal gene transfer (xenology) Whole genome duplication (ohnology) Gametology, Synology
Mindell and Meyer Trends in Ecology and Evolution 2001
EMBO Workshop, Cape Town, 2014 Homology: why bother? Slide courtesy of Alex Mitchell (EMBL-EBI)
Homology Function? Structure (homology modeling) EMBO Workshop, Cape Town, 2014 Homology: why bother?
Schubert et al. Nat. Struct. Biol. 5 (1998) Protein function(s) EMBO Workshop, Cape Town, 2014
A way to capture biological knowledge in a written and computable form A set of concepts and their relationships to each other EMBO Workshop, Cape Town, 2014 Slide courtesy of Alex Mitchell (EMBL-EBI) The Gene Ontology (GO)
1. Molecular Function 2. Biological Process 3. Cellular Component An elemental activity or task or job protein kinase activity insulin receptor activity A commonly recognised series of events cell division Where a gene product is located mitochondrion mitochondrial matrix mitochondrial inner membrane EMBO Workshop, Cape Town, 2014 Slide courtesy of Alex Mitchell (EMBL-EBI) GO: 3 ontologies in 1
Protein Families EMBO Workshop, Cape Town, 2014
Globins in Human
Definition: We call ‘family’ a group of evolutionary related proteins or protein regions EMBO Workshop, Cape Town, 2014
P P A A Why protein families?
Human: 1 MGLSDGEWQLVLNVWGKVEADIPGHGQEVLIRLFKGHPETLEKFDKFKHLKSEDEMKASE 60 MGLSDGEWQLVLNVWGKVEAD GHGQEVLI LFK HPETL KFDKFK LKSE MK SE Mouse: 1 MGLSDGEWQLVLNVWGKVEADLAGHGQEVLIGLFKTHPETLDKFDKFKNLKSEEDMKGSE 60 Human: 61 DLKKHGATVLTALGGILKKKGHHEAEIKPLAQSHATKHKIPVKYLEFISECIIQVLQSKH 120 DLKKHG TVLTALG ILKKKG H AEI PLAQSHATKHKIPVKYLEFISE II VL H Mouse: 61 DLKKHGCTVLTALGTILKKKGQHAAEIQPLAQSHATKHKIPVKYLEFISEIIIEVLKKRH 120 Human: 121 PGDFGADAQGAMNKALELFRKDMASNYKELGFQG 154 GDFGADAQGAM KALELFR D A YKELGFQG Mouse: 121 SGDFGADAQGAMSKALELFRNDIAAKYKELGFQG 154 Why protein families? EMBO Workshop, Cape Town, 2014
Human: 1 MGLSDGEWQLVLNVWGKVEADIPGHGQEVLIRLFKGHPETLEKFDKFKHLKSEDEMKASE 60 MGLSDGEWQLVLNVWGKVEAD GHGQEVLI LFK HPETL KFDKFK LKSE MK SE Mouse: 1 MGLSDGEWQLVLNVWGKVEADLAGHGQEVLIGLFKTHPETLDKFDKFKNLKSEEDMKGSE 60 Human: 61 DLKKHGATVLTALGGILKKKGHHEAEIKPLAQSHATKHKIPVKYLEFISECIIQVLQSKH 120 DLKKHG TVLTALG ILKKKG H AEI PLAQSHATKHKIPVKYLEFISE II VL H Mouse: 61 DLKKHGCTVLTALGTILKKKGQHAAEIQPLAQSHATKHKIPVKYLEFISEIIIEVLKKRH 120 Human: 121 PGDFGADAQGAMNKALELFRKDMASNYKELGFQG 154 GDFGADAQGAM KALELFR D A YKELGFQG Mouse: 121 SGDFGADAQGAMSKALELFRNDIAAKYKELGFQG 154 Why protein families? EMBO Workshop, Cape Town, 2014
P P A A B B H H G G E E C C D D F F
We can detect functionally important residues EMBO Workshop, Cape Town, 2014
We can detect functionally important residues EMBO Workshop, Cape Town, 2014
We have a window open on evolutionary diversity Human: 1 MGLSDGEWQLVLNVWGKVEADIPGHGQEVLIRLFKGHPETLEKFDKFKHLKSEDEMKASE 60 MGLSDGEWQLVLNVWGKVEAD GHGQEVLI LFK HPETL KFDKFK LKSE MK SE Mouse: 1 MGLSDGEWQLVLNVWGKVEADLAGHGQEVLIGLFKTHPETLDKFDKFKNLKSEEDMKGSE 60 Human: 61 DLKKHGATVLTALGGILKKKGHHEAEIKPLAQSHATKHKIPVKYLEFISECIIQVLQSKH 120 DLKKHG TVLTALG ILKKKG H AEI PLAQSHATKHKIPVKYLEFISE II VL H Mouse: 61 DLKKHGCTVLTALGTILKKKGQHAAEIQPLAQSHATKHKIPVKYLEFISEIIIEVLKKRH 120 Human: 121 PGDFGADAQGAMNKALELFRKDMASNYKELGFQG 154 GDFGADAQGAM KALELFR D A YKELGFQG Mouse: 121 SGDFGADAQGAMSKALELFRNDIAAKYKELGFQG 154 EMBO Workshop, Cape Town, 2014
We have a window open on evolutionary diversity
Example (using homology for protein annotation) EMBO Workshop, Cape Town, 2014
H. influenzae protein (3M71) 1.20 Å Chen et al. Nature 467 (2010) TUM, January 2013 EMBO Workshop, Cape Town, 2014 New York Consortium on Membrane Protein Structure (NYCOMPS)
TUM, January 2013
Thomine and Barbier-Brygoo Nature 467: (2010) EMBO Workshop, Cape Town, 2014
Thomine and Barbier-Brygoo Nature 467: (2010) EMBO Workshop, Cape Town, 2014
Chen et al. Nature 467 (2010)
EMBO Workshop, Cape Town, 2014 Chen et al. Nature 467 (2010)
EMBO Workshop, Cape Town, 2014 Chen et al. Nature 467 (2010)
TUM, January 2013
EMBO Workshop, Cape Town, 2014 OPEN Jalview File -> Input Alignment -> From File “PF03595_seed.txt”
EMBO Workshop, Cape Town, 2014 Colour -> BLOSUM62 1.
EMBO Workshop, Cape Town, 2014 OPEN Chimera 1. File -> Open “3M71.pdb” 2.
EMBO Workshop, Cape Town, 2014
out
EMBO Workshop, Cape Town, 2014 Actions -> Atoms/Bonds -> wire 1. Actions -> Atoms/Bonds -> show 2.
out EMBO Workshop, Cape Town, 2014 Actions -> Atoms/Bonds -> wire 1. Actions -> Atoms/Bonds -> show 2.