Download presentation
Presentation is loading. Please wait.
Published byKory Maxwell Modified over 9 years ago
1
Chromosome - the most informative molecule of a cell - and the most variable?
2
Jan Sadowski D epartment of Biotechnology Institute of Molecular Biology and Biotechnology Faculty of Biology, Adam Mickiewicz University Genome organisation and evolution PAN-GENOME concept: bacteria and human models
3
The 1000 Genomes Project (www.1000genomes.org) The 1001 Genomes Project – 1 001 lin es of Arabidopsis thaliana (http://1001genomes.org) The Drosophila 1000 genomes – 1000 individuals /lin es from Africa and EuroA sia Genome 10K – 10 000 species of Vertebraceae ( www.genome10k.org ) www.genome10k.org Numerous projects for prokaryotic organisms Genome projects of massively parallel sequencing (re - se quencing )
4
GENOM E definition Tettelin et al. ( 2005 ) Proc Natl Acad Sci USA 102:13950-13955 (pan = whole) „the full complement of genes within a bacterial species”
5
Wide definition of GENOM E (as PANGENOME) „complement of genes a t selected taxonomic level ” species-based PANGENOM E genus-based PANGENOM E family-based PANGENOM E total/universal PANGENOM E lineage-specific gene set (=unique)
6
Comparative analysis of 8 ba c teri al strains of Streptococcus agalactiae /Tettelin et al. (2005) Pangenome is composed of: „a core genome” containing genes present in all strains „a dispensable genome” composed of genes absent from one or more strains and genes that represent part of genome called „unique genome” specific to an individual strain „Surprisingly, unique genes were still detected after eight genomes were sequenced, and mathematical extrapolation predicts that new genes will still be found after sequencing many more strains. Thus, the genomes of multiple, independent isolates are required to understand the global complexity of bacterial species. Beginning of „pangenom e ” studies
7
Wide pangenom e of bacteria G ene complement analysis Lapierre and Gogarten (2009 ) Trends Genet.
8
Bacterial pangenome for 293 species Core genes ~250 gene families (transla tion, repli cation and energy homeosta sis ) Character genes ~7900 gene families ( c oloniza tion, servival in special environmental nisches / symbio sis, ph otosynt h e sis ) Accessory genes a gene set of indefinite number and unknown functions ( „ serves ” in distinquishing of lines and serotypes )
9
Three categories of genes that compose each genome have been identified: the extended core, the character genes and the accessory pool of genes. Genes in the extended core are evolving under different constraints and rules under high selective pressure and only minute changes at the sequence level are allowed. Although many instances of gene transfers have been documented, they mainly spread in populations through vertical inheritance. Gene duplication and domain shuffling are the preferred mode of evolution of the character genes. This set of genes enables organisms to quickly adapt to changing conditions and to exploit new niches. Of the three sets of genes, the character genes are the most likely to be transferred between organisms. The last category of genes, accessory genes pool, consists of genes with low levels of conservation, which are scattered at low frequencies throughout the bacterial domain. This accessory pool of genes might represent in part genes that had previous functions in genomes but that are now stripped of selective pressure (now pseudogenes). These fast evolving genes, perhaps residing in phage genomes most of the time, explore sequence spaces and, occasionally, a new useful protein fold might arise from this pool and spread through populations. Mechanisms and evolutionary constraints
10
Species Tested Strains Relevant features Helicobacter pylori 15 56% of strain-specific genes are “ ORFans ” Escherichia coli O157:H7 31 Even within a single serotype, 1751 ORFs were variable Bartonella henselae 11 Genomic islands mediate genomic rearrangements Streptococcus mutans 9 Accessory genome = 20%; half shows signs of HGT Campylobacter jejuni 11 Largest fraction of acces. genes (19%) related to cell envelope Salmonella enterica 25 Core genome was only 54% Bacillus anthracis 19 Variation in strains ranges 8-34% of reference genome Vibrio cholerae 9 Core genome was 97% S. agalactidae 19 Extensive variation recently confirmed by sequencing E. coli - Shigella 22 E. coli backbone estimated at 2,800 ORFs S. pneumoniae 20 Variability within strains < 2.1%. Overall variability < 10% Francisella tularensis 27 Regions specific to highly virulent strains were identified
11
Bacterial genomes – undefined and/or unlimited? Plots of non-asymptotic curves was obtained Mira et al. (2010) Int. Microbiol. vol. 13
12
Gene complement-based PANGENOME in Eukaryota HUMANS
13
PANGENOM E of humans Population-specific or individual-specific DNA sequences (genes) contributing to human genetic variation, that is, the nonredundant collection of all human DNA sequences (genes) presented in the entire human population
14
Objects of the study The Asian and African complete individual genome sequences were assembled de novo and compared to the NCBI reference human genome. Findings showed that human genomes contain a large amount of novel sequence that is both population- and individual-specific Additional analyses allowed to investigate the amount of sequence variation that is expected to exist between any two individuals as well as obtain information about the presence of potentially functional genetic elements within these novel sequences.
15
Genome re-sequencing project Humans Li et al. (2010) Nature Biotechnology 28:57-62
16
Characterization of novel sequences General Length of individual-specific sequences between random pair of of human individuals would range between 1.8 Mb and 4 Mb and with the inclusion of the composition differences from SNP it would be in range of 4.2 Mb to 8.0 Mb. Estimating the size of the pan-genome (calculating for 6 billion people) we should include an additional 19-40 Mb of novel sequences over the reference genome Number and functions of genes identified in novel sequences Asian (YH) – 72 novel genes African (NA18507) – 69 novel genes 30% - members of highly variable gene families (mucins 2, major histocompatibility complex HLA 50-60% - unknown functions
17
Over 100 human genomes will saturate their pan-genome Wu et al. (2014) Human Genetics
18
Concluding remarks Full genomic sequences let us appreciate many forces acting on genome evolution. Earlier generated view of genomes as very stable sequence storing structures gave way to a dynamic view where genomes gain and lose genes along the way. This constant invasion of exogenious genetic material on genomes - from a cloud of frequently transferred genes - enhances the chance of survival of species by introducing variability in the population.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.