Presentation is loading. Please wait.

Presentation is loading. Please wait.

Functional Linkages between Proteins. Introduction Piles of Information Flakes of Knowledge AGCATCCGACTAGCATCAGCTAGCAGCAGA CTCACGATGTGACTGCATGCGTCATTATCTA.

Similar presentations


Presentation on theme: "Functional Linkages between Proteins. Introduction Piles of Information Flakes of Knowledge AGCATCCGACTAGCATCAGCTAGCAGCAGA CTCACGATGTGACTGCATGCGTCATTATCTA."— Presentation transcript:

1 Functional Linkages between Proteins

2 Introduction Piles of Information Flakes of Knowledge AGCATCCGACTAGCATCAGCTAGCAGCAGA CTCACGATGTGACTGCATGCGTCATTATCTA GTATGAAAAAAGCCATGCTAGGCTAGTCAG CGACATGAGCCATGACTAGCGCAGCATCAG TCATCAGTCAGCGGAGCGAGGAGAGAGAG ACGACTGACTAGCATGCACACATGCATGAC GTCATGACTGCATGACTGACTGACTGACTG CATGCATGATATTTTTTTTTTCATGCATGCAG CATGCTACCCAGCTACAGTGCACAGCAGGT ACGACGCATCAGCATACGTACGGCATGACG ACTCAGACTACGCATACGACTACGAC E. Coli S. cerevisiae Drosophila

3 Data Analysis Traditional Methods (Experiments & Sequence Homology) The function of a protein New Computational Methods Functional linkages between proteins

4 What does Functional Linkage mean ? 1) A common structural complex 2) A common metabolic pathway 3) A common biological process 4) All answers are correct

5 New Computational Methods Phylogenetic Profile Method Rosetta Stone Method Chromosomal Proximity Method COG Database

6 111 Phylogenetic Profile Method

7 Biologically: Simliar profile  likelihood for common pathway or complex Mathematically: N genomes  2 N possible profiles  A unique characterization Why Should it Work ?

8 Rosetta Stone Method

9 Rosetta Stone Method (= Domain Fusion Analysis) Interacting proteins have homologs in another organism fused into a single protein chain

10 Rosestta Stone Method

11 Experimentally: E. coli ~4300 proteins ~6800 pairs similar to a single protein Biologically: Why Should it Work ?

12 Rosestta Stone Method Validation Tests(E. coli): 1) Annotation of proteins from the SWISS- PROT database (68% vs. 15%) 2) Database of Interacting Proteins (6.4%) 3) Phylogenetic Profile Method (5% vs. 0.6%)

13 Models’ Success & Failure +- + True positiveFalse negative - False positiveTrue negative predicted found

14 Rosestta Stone Method False Negatives 1) interactions that have evolved through other mechanisms, i.e. there never was a fusion 2) The fused protein has disppeared during evolution

15 Rosestta Stone Method False Positives 1) Proteins have been fused to regulate co-expression 2) Can’t distinguish between binding and non-binding homologs. 3) Functional interaction rather than a physical interaction

16 Rosestta Stone Method Reducing Errors

17 Rosestta Stone Method Reconstruction of metabolic pathways

18 Functional Protein Networks

19 Orthologs vs. Paralogs Orthologs: genes in different species that evolved from a common ancestral gene by speciation Paralogs: genes related by duplication within a genome

20 Chromosomal Proximity Proximate Genes On the same strand Within 300 bp, or - Respective paralogs within 300 bp Inferred link genes whose orthologs are close in at least three phylogenetic groups

21 Chromosomal Proximity Direct Link two proximate genes that are also proximate in at least two other phylogenetic groups Indirect Link genes whose orthologs are close in at least three other phylogenetic groups

22 Chromosomal Proximity

23 Biologically: Conservation of proximity across multiple genomes  Linked function Logically: How likely is it that two genes are randomly proximate ? Why Should it Work ?

24 Chromosomal Proximity Method’s Reliability:

25 Chromosomal Proximity 1586 links were detected between ortholog families KEGG: 80% in the same biological pathway COG: 67% in the same functional category Validation:

26 Chromosomal Proximity Total validated links per genome 380 direct 352 inferred

27 Chromosomal Proximity

28 The COG Database Clusters of Orthologous Groups COGs creation Each COG contains proteins that have evolved from an ancestral protein

29 The COG Database Current Numbers (2004) 43 Complete genomes 30 phylogenetic groups 2223 phylogenetic patterns 17 functional categories 3307 COGS 74059 proteins, 71% of total

30 The COG Database

31 Direct Information Annotation of Proteins (group and individual) Phylogenetic Patterns Multiple Alignment How can we use it ?

32 The COG Database Detecting Missed Genes Patterns that contain all but one Mostly small proteins How can we use it ?

33 The COG Database Groups number growth Are we approaching saturation ?

34 COG on the WWW

35 Reliability of the Methods Major validation: Experimentally known linkages Validation by “keyword recovery” search

36 references 1) Eisenberg D, Marcotte EM, Xenarios I, Yeates TO. Protein function in the post-genomic era. Nature. 2000 405:823-826. Review 2) Marcotte EM, Pellegrini M, Ng HL, Rice DW, Yeates TO, Eisenberg D. Detecting protein function and proteing protein interactions from genome sequences. Science. 1999 285:751-753. 3) Yanai I, Mellor JC, DeLisi C. Identifying functional links between genes using conserved chromosomal proximity. Trends Genet. 2002 18:176-179. 4) Tatusov RL, Natale DA, Garkavtsev IV, Tatusova TA, Shankavaram UT, Rao BS, Kiryutin B, Galperin MY, Fedorove ND, Koonin EV. The COG database: new developments in phylogenetic classification of proteins from complete genomes. Nucleic Acids Res. 2001 29:22-28. 5) Tatusov,R.L., Koonin,E.V. and Lipman,D.J. (1997) A genomic perspective on protein families. Science, 278, 631–637. 6) http://www.ncbi.nlm.nih.gov/COG


Download ppt "Functional Linkages between Proteins. Introduction Piles of Information Flakes of Knowledge AGCATCCGACTAGCATCAGCTAGCAGCAGA CTCACGATGTGACTGCATGCGTCATTATCTA."

Similar presentations


Ads by Google