Presentation is loading. Please wait.

Presentation is loading. Please wait.

Comparative genomics Haixu Tang School of Informatics.

Similar presentations


Presentation on theme: "Comparative genomics Haixu Tang School of Informatics."— Presentation transcript:

1 Comparative genomics Haixu Tang School of Informatics

2 WGS of human genome 2001 Two assemblies of initial human genome sequences published –International Human Genome project –Celera Genomics: WGS approach

3 1995 Haemophilus influenzae sequenced 1997 E. Coli sequenced 1998 Complete sequence of the Caenorhabditis elegans genome 2000 Complete sequence of the euchromatic portion of the Drosophila melanogaster genome Model organisms

4 Why model organisms? Testing and improvements of genome sequencing technology and strategy

5 1993 Whole genome shotgun sequencing proposed (J. C. Venter) 1995 Haemophilus influenzae sequenced ~1.5-2 MBps 1995 Automated fluorescent sequencing instruments and robotic operations (PerkinsElmer, Inc) 1996 Yeast sequenced 1996 Double barrelled sequencing 1997 E. Coli sequenced ~4 Mbps 1998 Complete sequence of the Caenorhabditis elegans genome ~ 100 Mbps 1998 Whole genome shotgun sequencing (Weber & Myers) 2000 Complete sequence of the euchromatic portion of the Drosophila melanogaster genome ~ 180 Mbps Model organisms

6 Why model organisms? Testing and improvements of genome sequencing technology and strategy Model organisms have important biological implications themselves.

7 1995 Haemophilus influenzae sequenced (infectious disease) 1996 Yeast sequenced (industry and biology) 1997 E. Coli sequenced (industry and biotechnology) 1998 Complete sequence of the Caenorhabditis elegans genome (multi-cellular organism, development) 2000 Complete sequence of the euchromatic portion of the Drosophila melanogaster genome (genetics, entomology) Model organisms

8 Why model organisms? Testing and improvements of genome sequencing technology and strategy. Model organisms have important biological implications themselves. Genome sequences provide useful information to study genome function and evolution.

9 1995 Haemophilus influenzae sequenced (Bacterial) 1996 Yeast sequenced (Uni-cellular) 1997 E. Coli sequenced (Bacterial) 1998 Complete sequence of the Caenorhabditis elegans genome (Multi-cellular organism, nematode) 2000 Complete sequence of the euchromatic portion of the Drosophila melanogaster genome ( Multi-cellular organism, insect) Model organisms

10 2001 Human genome 2002 Mouse genome –Initial sequencing and comparative analysis of the mouse genome 2003 Rat genome 2004 Chicken genome (first bird) 2005 Chimpanzee genome Model mammalian and vertebrate genomes

11 Comparative genomics Solving biological problems by comparing genomic sequences –Function of genes and genomes –Evolution of genes and genomes Data driven approaches –Computational methods are the core

12 Which genomes to sequence? Species having important biological applications For comparative genomics studies –Functional consideration Evolutionary divergent genomes  conserved elements, e.g. human vs. mouse (~75% identical) Evolutionary close genomes  divergent elements, e.g. human vs. chimpanzee (98.4% identical) –Evolutionary consideration Specific evolutionary puzzles  whole genome duplications in yeast

13 Ongoing eukaryotic genome projects http://igweb.integratedgenomics.com/ERG O_supplement/genomes_eukarya.htmlhttp://igweb.integratedgenomics.com/ERG O_supplement/genomes_eukarya.html >20 yeast, insects (12 drosophila, 2 mosquitoes, Silkworm), Flea, Sea urchin, frog, fish (Zebrafish, Fugu), Mammals (mouse, rat, dog, cow, pig, monkey, etc.), plants (Arabidopsis, Rice(>2), Maize, etc)

14 Comparative genomics: case studies Gene function and evolution Gene-gene relationship Genome evolution

15 Orthologues : any gene pairwise relation where the ancestor node is a speciation event Paralogues : any gene pairwise relation where the ancestor node is a duplication event Homologueelationships of genes Homologue relationships of genes

16 A time Duplication M 2’ Speciation Duplication M 2 A 1 A 2 M 1 H 1 H 2 Inparalogues Outparalogues Orthologues Inparalogues Homologue Relationships

17 Functional implications Orthologous genes  same function in different species Paralogous genes  different functions

18 Yeast species cerevisiae paradoxus mikatae bayanus glabrata castellii lactis gossypii waltii hansenii albicans lipolytica crassa graminearum grisea nidulans pombe 5-20 million years Sufficient conservation to align Sufficient divergence to identify conserved functional elements ~20M ~5M

19 Large scale genome evolution Most genes have a clear match Clear blocks of synteny

20

21 Human–chimpanzee comparisons POSITIVE SELECTION---A sequence change in a species that results in increased fitness is subject to positive selection. As a consequence, the change normally becomes fixed, leading to adaptive evolution of that species.

22 Genome vs. Genes The whole genome sequence can tell not only what genes exist in a genome, but also what genes do not exist (deleted) in a genome.

23 Phylogenetic profile analysis A non-homologous approach to gene function prediction The phylogenetic profile of a gene is a string encoding the presence or absence of the gene in every sequenced genome The phylogenetic profiles of genes involving in the same biological process are often “similar'‘, since they may co- evolve.

24 Phylogenetic profile analysis Phylogenetic profile (against N genomes) –For each gene X in a target genome (e.g., E coli), build a phylogenetic profile as follows –If gene X has a homolog in genome #i, the ith bit of X’s phylogenetic profile is “1” otherwise it is “0”

25 Phylogenetic profile analysis Example – phylogenetic profiles based on 89 genomes orf1034:1110110110010111110100010100000000111100011111110110111010101 orf1036:1011110001000001010000010010000000010111101110011011010000101 orf1037:1101100110000001110010000111111001101111101011101111000010100 orf1038:1110100110010010110010011100000101110101101111111111110000101 orf1039:1111111111111111111111111111111111111111101111111111111111101 orf104: 1000101000000000000000101000000000110000000000000100101000100 orf1040:1110111111111101111101111100000111111100111111110110111111101 orf1041:1111111111111111110111111111111101111111101111111111111111101 orf1042:1110100101010010010110000100001001111110111110101101100010101 orf1043:1110100110010000010100111100100001111110101111011101000010101 orf1044:1111100111110010010111010111111001111111111111101101100010101 orf1045:1111110110110011111111111111111101111111101111111111110010101 orf1046:0101100000010001011000000111110000010100000001010010100000000 orf1047:0000000000000001000010000001000100000000000000010000000000000 orf105: 0110110110100010111101101010111001101100101111100010000010001 orf1054:0100100110000001100001000100000000100100100001000100100000000 Genes with similar phylogenetic profiles have related functions or functionally linked – D Eisenberg and colleagues (1999)

26 Genome evolution Genome rearrangement Whole genome duplication

27 Turnip vs Cabbage: Look and Taste Different Although cabbages and turnips share a recent common ancestor, they look and taste different

28 Turnip vs Cabbage: Comparing Gene Sequences Yields No Evolutionary Information

29 Turnip vs Cabbage: Different mtDNA Gene Order Gene order comparison: Before After Evolution is manifested as the divergence in gene order

30 Comparative Genomic Architecture of Human and Mouse Genomes To locate where corresponding gene is in humans, the relative architecture of human and mouse genomes were analyzed.

31 Types of Rearrangements Reversal 1 2 3 4 5 61 2 -5 -4 -3 6 Translocation 4 1 2 3 4 5 6 1 2 6 4 5 3 1 2 3 4 5 6 1 2 3 4 5 6 Fusion Fission

32 Comparative Genomic Architectures: Mouse vs Human Genome Humans and mice have similar genomes, but their genes are ordered differently ~245 rearrangements –Reversals –Fusions –Fissions –Translocation

33 Hypothesis (1997): Whole Genome Duplication cerevisiae paradoxus mikatae bayanus glabrata castellii lactis gossypii waltii hansenii albicans lipolytica crassa graminearum grisea nidulans pombe ? ~100M

34 Hypothetical resolution of WGD A 1:2 mapping where –nearly every region in species Y would correspond to two sister regions in S. cerevisiae –the two sister regions in S. cerevisiae would contain ordered interleaving subsequences of the genes in the corresponding region of species Y –nearly every region of S. cerevisiae would correspond to one region of species Y, and thus be paired to a sister region in S. cerevisiae

35

36 Hypothesis (1997): Whole Genome Duplication cerevisiae paradoxus mikatae bayanus glabrata castellii lactis gossypii waltii hansenii albicans lipolytica crassa graminearum grisea nidulans pombe ? ~100M

37 Aligning the S. cerevisiae and K. waltii genomes Most regions in K. waltii mapped to two regions in S. cerevisiae with each containing matches to only a subset of the K. waltii genes

38 Duplication covers the whole S. cerevisiae genome

39 What happens to genes post WGD? 12% (457) of paralogous gene pairs were retained 76 of the 457 gene pairs (17%) show accelerated protein evolution


Download ppt "Comparative genomics Haixu Tang School of Informatics."

Similar presentations


Ads by Google