1 The Genome Gamble, Knowledge or Carnage? Comparative Genomics Leading the Organon Tim Hulsen, Oss, November 11, 2003
2 Summary (1) An introduction to orthology and paralogy (2) Orthology determination within eukaryotes (3) Testing the advantages of our ortholog set (4) Using evolutionary conservation of co- expression for function prediction (5) Evolutionary conservation of chromosomal distance and orientation
3 (1) An introduction to orthology and paralogy Homologous genes: genes that have a common ancestor Orthologous genes: genes that evolved from a common ancestor through a speciation event ( equivalents in different species) Paralogous genes: genes that evolved from a common ancestor through a duplication event
4 Orthology and paralogy explained graphically (from
5 The importance of orthology and paralogy Orthology relationships especially important for function prediction: orthologous genes generally have the same function but in different species Paralogy relationships can be used for function prediction too: paralogous genes are often involved in the same process, but have different molecular functions (e.g. globins)
6 (2) Orthology determination within eukaryotes Not much eukaryotic orthology available at this moment: euKaryotic Orthologous Groups (KOG,NCBI) Inparanoid OrthoMCL Existing databases are either too inclusive or too restrict Most methods rely on best bidirectional hit (E- value), while orthology is an evolutionary principle.. should be determined using phylogenetic trees!
7 Our orthology determination within eukaryotes Hs At, Ce, Dm, Ec, Gt, Hs, Mm, Sc, Sp Z>20, RH>0.5*QL 24,263 groups PHYLOME SELECTION OF HOMOLOGS ALIGNMENTS AND TREE GENOME GENOMES TREE SCANNING LIST Hs-Mm: 85,848 pairs Hs-Dm: 55,934 pairs etc.
8 Our orthology determination: using phylogenetic trees Example: BMP6 (Bone Morphogenetic Protein 6) 5 orthologous relations are defined, all Hs-Mm
9 The ortholog database: Eukaryortho (only accessible from Organon, CMBI and SARA)
10 (3) Testing the advantages of our ortholog set Quality of orthology difficult to test Orthologs should have more or less the same function --> use conservation of function as an orthology benchmark Gene Ontology (GO) database: hierarchical system of function and location descriptions Orthologs are in same functional category when they are in the same 4th level GO Molecular Function class
11 GO molecular function benchmark Molecular function: one of the three ‘subroots’ (together with biological process and cellular location) ‘True’ orthologs should share a 4th level molecular function (here: GO ) Our Hs-Mm ortholog set: 67 % KOG Hs-Mm ortholog set: 51 %
12 Co-expression benchmark Second method: comparing expression profiles of each orthologous gene pair Using GeneLogic Expressor data set: –Human chips: 3269 samples, fragments, 115 tissue categories, 15 SNOMED tissue categories –Mouse chips: 859 samples, fragments, 25 tissue categories, 12 SNOMED tissue categories
13 SNOMED tissue categories used for co-expression calculation HUMANMOUSE 1 Blood vessel 2 Cardiovascular system 3 Digestive organs 4 Digestive system 5 Endocrine gland- 6 Female genital system 5 Female genital system 7 Hematopoietic system 6 Hematopoietic system 8 Integumentary system 7 Integumentary system HUMANMOUSE 9 Male genital system 8 Male genital system 10 Musculoskeletal system 9 Musculoskeletal system 11 Nervous system10 Nervous system 12 Product of conception - 13 Respiratory system 11 Respiratory system 14 Topographic region - 15 Urinary tract12 Urinary tract
14 Calculating the correlation N xy – ( x)( y) r = sqrt( (N x 2 - ( x) 2 )(N y 2 – ( y) 2 ) ) Human gene 1: _s_at Mouse gene 1: _at Tissue categoryHuman gene 2: _s_at Mouse gene 2: 97166_at High correlation: Low correlation:
15 Co-expression comparison of our ortholog set to the KOG set
16 (4) Using evolutionary conservation of co-expression for function prediction Human Gene A Gene B Human/Mouse Gene A’ Gene B’ Co-expression = Cab (-1<=corr.<=1) Ca’b’ >= Cab Increases probability that A and B are involved in the same process (Co-expression calculated over 115 tissues in human, 25 in mouse)
17 GO biological process benchmark Biological process: one of the three ‘subroots’ (together with cellular location and molecular function) Both orthologs and paralogs are often involved in the same process/pathway (=sharing a 4th level biological process, here: GO )
18 Conservation of co-expression used in function prediction
19 The importance of (conserved) co- expression for function prediction Co-expression without conservation can already be used for function prediction Paralogous conservation gives a 2x higher accuracy Orthologous conservation gives a 3x or 4x higher accuracy Alternative for GO Biological Process: KEGG Pathway database similar results
20 (5) Evolutionary conservation of chromosomal distance and orientation Human Gene A Gene B Distance = Dab (# bp) Orientation = Oab ( , , ) Co-expression = Cab (-1<=corr.<=1) Da’b’ <= Dab Oa’b’ == Oab Ca’b’ >= Cab Human/Mouse Increases probability that A and B are involved in the same process Gene A’ Gene B’ (Co-expression calculated over 115 tissues in human, 25 in mouse)
21 Function prediction using co- expression and chromosomal distance (without conservation)
22 Conservation of chromosomal distance used in function prediction
23 The importance of chromosomal distance and orientation for function prediction Chromosomal distance in eukaryotes less important than in prokaryotes (due to the absence of operons) Only genes with distance < 1 Mbp seem to be coregulated Conservation of relative orientation seems to be important only for very close gene pairs Limited number of genes can be functional annotated using the conservation of chromosomal distance and orientation
24 Conclusions Orthologous and paralogous relations can be used to improve function prediction Our orthologous pairs of Protein World proteins perform better than KOG, in terms of co- expression and involvement in the same process Chromosomal distance and relative orientation between genes can be used for function prediction too, in a limited number of cases Future plans: find examples where the function of a protein can be predicted using these methods
25 Credits Martijn Huynen Peter Groenen Others at Comics Others at Organon Bioinf.