Homology Profile-HMMs Domains Protein-family Databases How to build a new (Pfam) protein family EMBO Workshop, Cape Town, 2014 Function annotation transfer.

Slides:



Advertisements
Similar presentations
Homology.
Advertisements

1 Orthologs: Two genes, each from a different species, that descended from a single common ancestral gene Paralogs: Two or more genes, often thought of.
Phylogenetics workshop: Protein sequence phylogeny week 2 Darren Soanes.
. Class 9: Phylogenetic Trees. The Tree of Life Evolution u Many theories of evolution u Basic idea: l speciation events lead to creation of different.
Homology Review Human arm Lobed-fin fish fin Bat wing Bird wing Insect wing Homologous forelimbs not homologous as forelimbs or wings Definition: Structures.
Alignments Why do Alignments?. Detecting Selection Evolution of Drug Resistance in HIV.
Pfam a resource for remote homology domain identification et al NAR 2014.
GENE TREES Abhita Chugh. Phylogenetic tree Evolutionary tree showing the relationship among various entities that are believed to have a common ancestor.
Ontology annotation: mapping genomic regions biological function Paul D Thomas, Huaiyu Mi and Suzanna Lewis.
Basics of Comparative Genomics Dr G. P. S. Raghava.
Types of homology BLAST
Comparative genomics Joachim Bargsten February 2012.
Gene Ontology John Pinney
Xenolog: Homologs resulting from horizontal gene transfer.
COG and GO tutorial.
Bioinformatics and Phylogenetic Analysis
"Nothing in biology makes sense except in the light of evolution" Theodosius Dobzhansky.
"Nothing in biology makes sense except in the light of evolution" Theodosius Dobzhansky.
Bas E. Dutilh Phylogenomics Using complete genomes to determine the phylogeny of species.
Protein Modules An Introduction to Bioinformatics.
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
Finding Orthologous Groups René van der Heijden. What is this lecture about? What is ‘orthology’? Why do we study gene-ancestry/gene-trees (phylogenies)?
Sequence comparisons June 23, 2009 Learning objectives-Understand the concept of sliding window programs. Understand difference between identity, similarity.
BLOSUM Information Resources Algorithms in Computational Biology Spring 2006 Created by Itai Sharon.
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
The diversity of genomes and the tree of life
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Pairwise & Multiple sequence alignments
Protein Bioinformatics Course
HOGENOM a phylogenomic database
GO and OBO: an introduction. Jane Lomax EMBL-EBI What is the Gene Ontology? What is OBO? OBO-Edit demo & practical What is the Gene Ontology? What is.
Chapter 26: Phylogeny and the Tree of Life Objectives 1.Identify how phylogenies show evolutionary relationships. 2.Phylogenies are inferred based homologies.
Ontology and Phylogeny: Ontologies as research tools linking phylogenies, systematics, phenotypes, and genomics Brent D. Mishler University of California,
20.1 Structural Genomics Determines the DNA Sequences of Entire Genomes The ultimate goal of genomic research: determining the ordered nucleotide sequences.
COURSE OF BIOINFORMATICS Exam_31/01/2014 A.
Endogenous Retroviral promoter of the Human gene Kim Tae Hyung Oct 02,2004 MPL.
Integrating the Cell Cycle Ontology with the Mouse Genome Database David R. Smith Mary Dolan Dr. Judith Blake.
DAVID R. SMITH DR. MARY DOLAN DR. JUDITH BLAKE Integrating the Cell Cycle Ontology with the Mouse Genome Database.
Complementarity of network and sequence information in homologous proteins March, Department of Computing, Imperial College London, London, UK 2.
Protein and RNA Families
Getting Started: a user’s guide to the GO GO Workshop 3-6 August 2010.
Genome Analysis II Comparative Genomics Jiangbo Miao Apr. 25, 2002 CISC889-02S: Bioinformatics.
Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. [many slides borrowed from various sources]
Gene Ontology Consortium
Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. WTCCB Bioinformatics Core [many slides borrowed from various sources]
1 The Genome Gamble, Knowledge or Carnage? Comparative Genomics Leading the Organon Tim Hulsen, Oss, November 11, 2003.
Globins. Globin diversity Hemoglobins ( , etc) Myoglobins (muscle) Neuroglobins (in CNS) Invertebrate globins Leghemoglobins flavohemoglobins.
March 28, 2002 NIH Proteomics Workshop Bethesda, MD Lai-Su Yeh, Ph.D. Protein Scientist, National Biomedical Research Foundation Demo: Protein Information.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
S. pombe Unicellular archiascomycete Diverged from S. cerevisiae Ma Size ~14 Mb, 3 chromosomes No synteny Data stored in GeneDB.
Copyright OpenHelix. No use or reproduction without express written consent1.
Tools in Bioinformatics Ontologies and pathways. Why are ontologies needed? A free text is the best way to describe what a protein does to a human reader.
Gene Ontology TM (GO) Consortium
Phylogeny and the Tree of Life
Sequence similarity, BLAST alignments & multiple sequence alignments
GO : the Gene Ontology & Functional enrichment analysis
BLAST program selection guide
Basics of Comparative Genomics
Protein Sequence Alignments
Genomes and their evolution
Genomes and their evolution
Genome Annotation Continued
Protein Bioinformatics Course
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
What is an Ontology An ontology is a set of terms, relationships and definitions that capture the knowledge of a certain domain. (common ontology ≠ common.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
PANTHER (Protein Analysis Through Evolutionary Relationships): Trees, Hidden Markov Models, Biological Annotations Paul Thomas, Ph.D. Division of Bioinformatics.
Homoeologs: What Are They and How Do We Infer Them?
Evolutionary genetics
Basics of Comparative Genomics
Presentation transcript:

Homology Profile-HMMs Domains Protein-family Databases How to build a new (Pfam) protein family EMBO Workshop, Cape Town, 2014 Function annotation transfer Outline Pfam database

Homology EMBO Workshop, Cape Town, 2014

Definition: Two proteins are homologous if they share a common ancestor, i.e. they are evolutionary related EMBO Workshop, Cape Town, 2014

Symmetric A A B B homologous Transitive B B A A homologous A A B B AND B B C C homologous A A C C

Detecting homology EMBO Workshop, Cape Town, 2014

Human: 1 MGLSDGEWQLVLNVWGKVEADIPGHGQEVLIRLFKGHPETLEKFDKFKHLKSEDEMKASE 60 MGLSDGEWQLVLNVWGKVEAD GHGQEVLI LFK HPETL KFDKFK LKSE MK SE Mouse: 1 MGLSDGEWQLVLNVWGKVEADLAGHGQEVLIGLFKTHPETLDKFDKFKNLKSEEDMKGSE 60 Human: 61 DLKKHGATVLTALGGILKKKGHHEAEIKPLAQSHATKHKIPVKYLEFISECIIQVLQSKH 120 DLKKHG TVLTALG ILKKKG H AEI PLAQSHATKHKIPVKYLEFISE II VL H Mouse: 61 DLKKHGCTVLTALGTILKKKGQHAAEIQPLAQSHATKHKIPVKYLEFISEIIIEVLKKRH 120 Human: 121 PGDFGADAQGAMNKALELFRKDMASNYKELGFQG 154 GDFGADAQGAM KALELFR D A YKELGFQG Mouse: 121 SGDFGADAQGAMSKALELFRNDIAAKYKELGFQG 154 By excess similarity (see Pearson Curr Protoc Bioinformatics 2013 ) Statistical significance (e.g. E-values) Sequence similarity EMBO Workshop, Cape Town, 2014

2G2X: 1 MAYWLMKSEPDELSIEALARLGEARWDGVRNYQARNFLRAMSVGDEFFFYH-----SSCP 55 MAYWL D W Y N VGD Y 2P5D: 4 MAYWLCITNEDNWKVIKEKKI----WGVAERY--KNTINKVKVGDKLIIYEIQRSGKDYK 57 2G2X: 56 QPGIAGIARITRAAYPD------PTALDPESHY 82 P I G Y D PT P 2P5D: 58 PPYIRGVYEVVSEVYKDSSKIFKPTPRNPNEKF 90 Excess sequence similarity? Structural similarity EMBO Workshop, Cape Town, 2014

2G2X 2P5D Structural similarity EMBO Workshop, Cape Town, 2014

Structural similarity 2G2X 2P5D

Structural similarity 2G2X 2P5D Z-score = 12.2 RMSD = 2.9 Lali = 122 %id =20 DALI:

EMBO Workshop, Cape Town, 2014 Genomic context See e.g. Jun et al. BMC Genomics 2009

EMBO Workshop, Cape Town, 2014 Genomic context Homology See e.g. Jun et al. BMC Genomics 2009

EMBO Workshop, Cape Town, 2014 Genomic context See e.g. Jun et al. BMC Genomics 2009 Homology?

EMBO Workshop, Cape Town, 2014 Genomic context Mostly used for distinguishing orthology from paralogy

Origins of homology in proteins EMBO Workshop, Cape Town, 2014

Origin of homology in proteins Speciation (orthology) Gene duplication (paralogy) Horizontal gene transfer (xenology) Whole genome duplication (ohnology) Gametology EMBO Workshop, Cape Town, 2014

Myoglobin: Serves as a reserve supply of oxygen and facilitates the movement of oxygen within muscles. Orthology EMBO Workshop, Cape Town, 2014

Speciation (orthology) Gene duplication (paralogy) Horizontal gene transfer (xenology) Whole genome duplication (ohnology) Gametology Origin of protein homology EMBO Workshop, Cape Town, 2014

Myoglobin: Serves as a reserve supply of oxygen and facilitates the movement of oxygen within muscles. Hemoglobin: Oxygen-transport protein in red-blood cells of vertebrates Paralogy

EMBO Workshop, Cape Town, 2014

Ancestral Globin B C Myo A Hemo EMBO Workshop, Cape Town, 2014

Ancestral Globin B C Myo A Hemo EMBO Workshop, Cape Town, 2014

Ancestral Globin B C Myo A Hemo Myo Hemo Myo Hemo EMBO Workshop, Cape Town, 2014

Origin of protein homology EMBO Workshop, Cape Town, 2014 Speciation (orthology) Gene duplication (paralogy) Horizontal gene transfer (xenology) Whole genome duplication (ohnology) Gametology, Synology

Mindell and Meyer Trends in Ecology and Evolution 2001

EMBO Workshop, Cape Town, 2014 Homology: why bother? Slide courtesy of Alex Mitchell (EMBL-EBI)

Homology Function? Structure (homology modeling) EMBO Workshop, Cape Town, 2014 Homology: why bother?

Schubert et al. Nat. Struct. Biol. 5 (1998) Protein function(s) EMBO Workshop, Cape Town, 2014

A way to capture biological knowledge in a written and computable form A set of concepts and their relationships to each other EMBO Workshop, Cape Town, 2014 Slide courtesy of Alex Mitchell (EMBL-EBI) The Gene Ontology (GO)

1. Molecular Function 2. Biological Process 3. Cellular Component An elemental activity or task or job protein kinase activity insulin receptor activity A commonly recognised series of events cell division Where a gene product is located mitochondrion mitochondrial matrix mitochondrial inner membrane EMBO Workshop, Cape Town, 2014 Slide courtesy of Alex Mitchell (EMBL-EBI) GO: 3 ontologies in 1

Protein Families EMBO Workshop, Cape Town, 2014

Globins in Human

Definition: We call ‘family’ a group of evolutionary related proteins or protein regions EMBO Workshop, Cape Town, 2014

P P A A Why protein families?

Human: 1 MGLSDGEWQLVLNVWGKVEADIPGHGQEVLIRLFKGHPETLEKFDKFKHLKSEDEMKASE 60 MGLSDGEWQLVLNVWGKVEAD GHGQEVLI LFK HPETL KFDKFK LKSE MK SE Mouse: 1 MGLSDGEWQLVLNVWGKVEADLAGHGQEVLIGLFKTHPETLDKFDKFKNLKSEEDMKGSE 60 Human: 61 DLKKHGATVLTALGGILKKKGHHEAEIKPLAQSHATKHKIPVKYLEFISECIIQVLQSKH 120 DLKKHG TVLTALG ILKKKG H AEI PLAQSHATKHKIPVKYLEFISE II VL H Mouse: 61 DLKKHGCTVLTALGTILKKKGQHAAEIQPLAQSHATKHKIPVKYLEFISEIIIEVLKKRH 120 Human: 121 PGDFGADAQGAMNKALELFRKDMASNYKELGFQG 154 GDFGADAQGAM KALELFR D A YKELGFQG Mouse: 121 SGDFGADAQGAMSKALELFRNDIAAKYKELGFQG 154 Why protein families? EMBO Workshop, Cape Town, 2014

Human: 1 MGLSDGEWQLVLNVWGKVEADIPGHGQEVLIRLFKGHPETLEKFDKFKHLKSEDEMKASE 60 MGLSDGEWQLVLNVWGKVEAD GHGQEVLI LFK HPETL KFDKFK LKSE MK SE Mouse: 1 MGLSDGEWQLVLNVWGKVEADLAGHGQEVLIGLFKTHPETLDKFDKFKNLKSEEDMKGSE 60 Human: 61 DLKKHGATVLTALGGILKKKGHHEAEIKPLAQSHATKHKIPVKYLEFISECIIQVLQSKH 120 DLKKHG TVLTALG ILKKKG H AEI PLAQSHATKHKIPVKYLEFISE II VL H Mouse: 61 DLKKHGCTVLTALGTILKKKGQHAAEIQPLAQSHATKHKIPVKYLEFISEIIIEVLKKRH 120 Human: 121 PGDFGADAQGAMNKALELFRKDMASNYKELGFQG 154 GDFGADAQGAM KALELFR D A YKELGFQG Mouse: 121 SGDFGADAQGAMSKALELFRNDIAAKYKELGFQG 154 Why protein families? EMBO Workshop, Cape Town, 2014

P P A A B B H H G G E E C C D D F F

We can detect functionally important residues EMBO Workshop, Cape Town, 2014

We can detect functionally important residues EMBO Workshop, Cape Town, 2014

We have a window open on evolutionary diversity Human: 1 MGLSDGEWQLVLNVWGKVEADIPGHGQEVLIRLFKGHPETLEKFDKFKHLKSEDEMKASE 60 MGLSDGEWQLVLNVWGKVEAD GHGQEVLI LFK HPETL KFDKFK LKSE MK SE Mouse: 1 MGLSDGEWQLVLNVWGKVEADLAGHGQEVLIGLFKTHPETLDKFDKFKNLKSEEDMKGSE 60 Human: 61 DLKKHGATVLTALGGILKKKGHHEAEIKPLAQSHATKHKIPVKYLEFISECIIQVLQSKH 120 DLKKHG TVLTALG ILKKKG H AEI PLAQSHATKHKIPVKYLEFISE II VL H Mouse: 61 DLKKHGCTVLTALGTILKKKGQHAAEIQPLAQSHATKHKIPVKYLEFISEIIIEVLKKRH 120 Human: 121 PGDFGADAQGAMNKALELFRKDMASNYKELGFQG 154 GDFGADAQGAM KALELFR D A YKELGFQG Mouse: 121 SGDFGADAQGAMSKALELFRNDIAAKYKELGFQG 154 EMBO Workshop, Cape Town, 2014

We have a window open on evolutionary diversity

Example (using homology for protein annotation) EMBO Workshop, Cape Town, 2014

H. influenzae protein (3M71) 1.20 Å Chen et al. Nature 467 (2010) TUM, January 2013 EMBO Workshop, Cape Town, 2014 New York Consortium on Membrane Protein Structure (NYCOMPS)

TUM, January 2013

Thomine and Barbier-Brygoo Nature 467: (2010) EMBO Workshop, Cape Town, 2014

Thomine and Barbier-Brygoo Nature 467: (2010) EMBO Workshop, Cape Town, 2014

Chen et al. Nature 467 (2010)

EMBO Workshop, Cape Town, 2014 Chen et al. Nature 467 (2010)

EMBO Workshop, Cape Town, 2014 Chen et al. Nature 467 (2010)

TUM, January 2013

EMBO Workshop, Cape Town, 2014 OPEN Jalview File -> Input Alignment -> From File “PF03595_seed.txt”

EMBO Workshop, Cape Town, 2014 Colour -> BLOSUM62 1.

EMBO Workshop, Cape Town, 2014 OPEN Chimera 1. File -> Open “3M71.pdb” 2.

EMBO Workshop, Cape Town, 2014

out

EMBO Workshop, Cape Town, 2014 Actions -> Atoms/Bonds -> wire 1. Actions -> Atoms/Bonds -> show 2.

out EMBO Workshop, Cape Town, 2014 Actions -> Atoms/Bonds -> wire 1. Actions -> Atoms/Bonds -> show 2.