Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chromosome - the most informative molecule of a cell

Similar presentations


Presentation on theme: "Chromosome - the most informative molecule of a cell"— Presentation transcript:

1 Chromosome - the most informative molecule of a cell
- and the most variable?

2 Genome organisation and evolution
PAN-GENOME concept: bacteria and human models Jan Sadowski Department of Biotechnology Institute of Molecular Biology and Biotechnology Faculty of Biology, Adam Mickiewicz University

3 Genome projects of massively parallel sequencing
(re-sequencing) The 1000 Genomes Project ( The 1001 Genomes Project – 1001 lines of Arabidopsis thaliana ( The Drosophila 1000 genomes – 1000 individuals/lines from Africa and EuroAsia Genome 10K – species of Vertebraceae ( Numerous projects for prokaryotic organisms

4 GENOME definition Tettelin et al
GENOME definition Tettelin et al. (2005) Proc Natl Acad Sci USA 102: (pan = whole) „the full complement of genes within a bacterial species”

5 Wide definition of GENOME (as PANGENOME) „complement of genes at selected taxonomic level” species-based PANGENOME genus-based PANGENOME family-based PANGENOME total/universal PANGENOME lineage-specific gene set (=unique)

6 Beginning of „pangenome” studies
Comparative analysis of 8 bacterial strains of Streptococcus agalactiae /Tettelin et al. (2005) Pangenome is composed of: „a core genome” containing genes present in all strains „a dispensable genome” composed of genes absent from one or more strains and genes that represent part of genome called „unique genome” specific to an individual strain „Surprisingly, unique genes were still detected after eight genomes were sequenced, and mathematical extrapolation predicts that new genes will still be found after sequencing many more strains. Thus, the genomes of multiple, independent isolates are required to understand the global complexity of bacterial species.

7 Wide pangenome of bacteria Gene complement analysis
Lapierre and Gogarten (2009) Trends Genet.

8 Bacterial pangenome for 293 species
Core genes ~250 gene families (translation, replication and energy homeostasis) Character genes ~7900 gene families (colonization, servival in special environmental nisches / symbiosis, photosynthesis) Accessory genes a gene set of indefinite number and unknown functions („serves” in distinquishing of lines and serotypes)

9 Mechanisms and evolutionary constraints
Three categories of genes that compose each genome have been identified: the extended core, the character genes and the accessory pool of genes. Genes in the extended core are evolving under different constraints and rules under high selective pressure and only minute changes at the sequence level are allowed. Although many instances of gene transfers have been documented, they mainly spread in populations through vertical inheritance. Gene duplication and domain shuffling are the preferred mode of evolution of the character genes. This set of genes enables organisms to quickly adapt to changing conditions and to exploit new niches. Of the three sets of genes, the character genes are the most likely to be transferred between organisms. The last category of genes, accessory genes pool, consists of genes with low levels of conservation, which are scattered at low frequencies throughout the bacterial domain. This accessory pool of genes might represent in part genes that had previous functions in genomes but that are now stripped of selective pressure (now pseudogenes). These fast evolving genes, perhaps residing in phage genomes most of the time, explore sequence spaces and, occasionally, a new useful protein fold might arise from this pool and spread through populations.

10 Species Tested Strains Relevant features
Helicobacter pylori % of strain-specific genes are “ORFans” Escherichia coli O157:H Even within a single serotype, 1751 ORFs were variable Bartonella henselae Genomic islands mediate genomic rearrangements Streptococcus mutans Accessory genome = 20%; half shows signs of HGT Campylobacter jejuni Largest fraction of acces. genes (19%) related to cell envelope Salmonella enterica Core genome was only 54% Bacillus anthracis Variation in strains ranges 8-34% of reference genome Vibrio cholerae Core genome was 97% S. agalactidae Extensive variation recently confirmed by sequencing E. coli - Shigella E. coli backbone estimated at 2,800 ORFs S. pneumoniae Variability within strains < 2.1%. Overall variability < 10% Francisella tularensis Regions specific to highly virulent strains were identified

11 Bacterial genomes – undefined and/or unlimited
Bacterial genomes – undefined and/or unlimited? Plots of non-asymptotic curves was obtained Mira et al. (2010) Int. Microbiol. vol. 13

12 Gene complement-based PANGENOME in Eukaryota HUMANS

13 PANGENOME of humans Population-specific or individual-specific DNA sequences (genes) contributing to human genetic variation, that is, the nonredundant collection of all human DNA sequences (genes) presented in the entire human population

14 Objects of the study The Asian and African complete individual genome sequences were assembled de novo and compared to the NCBI reference human genome. Findings showed that human genomes contain a large amount of novel sequence that is both population- and individual-specific Additional analyses allowed to investigate the amount of sequence variation that is expected to exist between any two individuals as well as obtain information about the presence of potentially functional genetic elements within these novel sequences.

15 Genome re-sequencing project
Humans Li et al. (2010) Nature Biotechnology 28:57-62

16 Characterization of novel sequences
General Length of individual-specific sequences between random pair of of human individuals would range between 1.8 Mb and 4 Mb and with the inclusion of the composition differences from SNP it would be in range of 4.2 Mb to 8.0 Mb. Estimating the size of the pan-genome (calculating for 6 billion people) we should include an additional Mb of novel sequences over the reference genome Number and functions of genes identified in novel sequences Asian (YH) – 72 novel genes African (NA18507) – 69 novel genes 30% - members of highly variable gene families (mucins 2, major histocompatibility complex HLA 50-60% - unknown functions

17 Over 100 human genomes will saturate their pan-genome
Wu et al. (2014) Human Genetics

18 Concluding remarks Full genomic sequences let us appreciate many forces acting on genome evolution. Earlier generated view of genomes as very stable sequence storing structures gave way to a dynamic view where genomes gain and lose genes along the way. This constant invasion of exogenious genetic material on genomes - from a cloud of frequently transferred genes - enhances the chance of survival of species by introducing variability in the population.


Download ppt "Chromosome - the most informative molecule of a cell"

Similar presentations


Ads by Google