Lecture-4 SEQUENCING THE LIVINGS Huseyin Tombuloglu, Phd GBE423 Genomics & Proteomics
World's Oldest Living People Have Their Genomes Sequenced | November 12, 2014 Scientists have sequenced the genomes of 17 of the world's oldest living people. Participants ranged in age from 110 to 116 All but one were female. The ultimate goal of the research is to figure out how those people are able to "slow down the aging clock,“ If researchers are able to figure that out, they might be able to create a drug or vitamin that would do the same thing in non-superagers, so that people could extend their "middle age" for many years
Unfortunately, the secret to a long life span remains a mystery for now — a first analysis of the genomes did not reveal any rare genetic mutations that might have been responsible for the participants' extraordinary ages
https://gold.jgi.doe.gov/statistics
https://gold.jgi.doe.gov/statistics
https://gold.jgi.doe.gov/statistics
https://gold.jgi.doe.gov/statistics
NCBI GENOME BROWSER http://www.ncbi.nlm.nih.gov/genome/browse/
Metagenomics Metagenomics is the study of genetic material recovered directly from environmental samples. "the application of modern genomics technique without the need for isolation and lab cultivation of individual species” Microbial community in mouth Microbial community in sea Microbial community in bladder Microbial community in soil Microbial community in plant Microbial community in intestine …. Etc.
Metagenomics allows the study of microbial communities like those present in this stream receiving acid drainage from surface coal mining.
Environmental Shotgun Sequencing (ESS). Sampling from habitat; filtering particles, typically by size; Lysis and DNA extraction; cloning and library construction; sequencing the clones; sequence assembly into contigs and scaffolds.
Genome organization of complex organisms
Genome sizes 54 Mbp – Cardamine amara 124 852 Mbp - Fritillaria 149 000 Mbp - Paris japonica currently the largest (not only plant) http://data.kew.org/cvalues/
Percent of DNA non-coding
Plant genome sizes 10 Mb Ostreococcus (single cell alga) 54 Mb Cardamine amara 64 Mb Genlisea aurea 125 Mb Arabidopsis 500 Mb Oryza 5 000 Mb Hordeum 17 000 Mb Triticum 84 000 Mb Fritillaria (largest diploid) 143 000 Mb Paris (oktaploid) - Angiosperms – size differences up to almost 3 000 times - Gymnosperms – genome sizes often around 10 000 Mb - Gene number differences much lower (approx. 20 – 200 fold) Ratio of globe volumes differing 3000 times
Aside – term definition: sequence complexity (~ the amount of information) repetitive AAAAAAAAAAAAAAAAAAAAA complexity 1 (21xA) ATCATCATCATCATCATCATC complexity 3 (7xATC) (what is the complexity if it is a coding sequence?) unique ATCGTATCGCGATTTTAACGT complexity 21 (1xAT…)
Sequence complexity of plant genomes Higly repetitive Medium repetitive Unique Sequence complexity
Differences in small and large genome arrangements large genomes: genes present in „gene-rich islands“ isolated with long regions of repetitive DNA
Transposable Elements (TEs) 50-80% of plant genomes are TEs Discovered by Barbara McClintock by studying unstable corn kernel phenotypes Fragments of DNA that can insert into new chromosomal locations Often duplicate themselves during the process of moving around 23
Autonomous elements contain necessary genes for transposition Class 1 TEs use RNA intermediates to move around and undergo duplicative transposition Class 2 TEs are excised during transposition and may undergo “cut and paste” transposition with no duplication or “gap repair” where the gap is filled with a copy of the transposon Autonomous elements contain necessary genes for transposition Non-autonomous elements rely on products of other elements for transposition 24
The majority of plant genes form gene families Number of paraloques gene families are often in tandem arrangement, but also spread in the genome tandem repeats are composed of near, but also far paralogues (recombinations) duplications of long chromosomal regions
Aside – terms definition: Homologous genes genes with similar sequences derived from the same ancestral gene (quantification – sequence identity, similarity) Paralogous genes genes with similar sequences derived from the same ancestral gene present at different loci within the same genome. Orthologous genes genes in different species that are similar to each other because they originated from a common ancestral gene in a common ancestor. (if more paralogues are present – genes serving the same function are regarded to be orthologs)
Orthologues vs. paralogues Orthologous genes Species A Species B Ancestral Species Gene A Gene A” Gene A’ Paralogous genes = genes duplicated within the species Species A Gene A” Gene A’” Ancestral Species Gene A Paralogous genes Species B Gene A’
Mechanisms of gene duplications (increase in paralogue number) tandem duplication transpozition segmental duplications whole genome duplications
Arabidopsis is ancient tetraploid (as well as probably the majority of plants) Duplicated chromosomal regions form about 60 % of genome (67.9 Mb) Polyploidization significantly increases genome (and organism) plasticity and played very important role in plant (genome) evolution; About 30-80% plant species are polyploid
Polyploidization in plant evolution 35 % species neopolyploids most species repeatedly polyploid viable aneuploid variants – (frequetly after allopolyploidization – hexaploid wheat) stabile wheat lines with missing chromosomal arm (of homeologic chromosome) Blue dots – duplications, asterix – triplication K-T (Fawcett et al. 2013)
Polyploidization - fusion of non-reduced gametes or endoreduplication n = x = 4 n = x = 4 n = x = 4 n = x = 7 x x Spontaneous duplication (endoreduplication) 2n = 4x = 16 2n = 4x = 22 autopolyploidy allopolyploidy Similar frequency in polyploidic plant species
Chromosome doubling is necessary for meiosis in hybrids species A species B X sterile fertile Genome duplication Preferential pairing of homologous chromosomes Related from different species (homeologous) can also pair
Allopolyploidic genomes in Brassica genus Brassica nigra Brassica rapa Brassica olarecea Brassica carinata Brassica juncea Brassica napus BB AABB AA AACC BBCC CC Siyah hardal Species Caryotype Genome Brassica rapa 2n = 2x = 20 A B. nigra 2n = 2x = 16 B B. oleracea 2n = 2x = 18 C B. juncea 2n = 4x = 36 AB B. napus 2n = 4x = 38 AC B. carinata 2n = 4x = 34 BC Ancient interspecies hybrids Kanola
Changes in newly formed allopolyploid genome: DNA methylation changes losses of parts or whole chromosomes (aneuploidy – decreased fertility) frequent activation of TE expression of homeologous genes is not usually additive - transcriptome usually more reduced than genome different regulation of expression often organ specific expression of genes from each parent, new sites of expression, new regulation - „divergent resolution“ - speciation (different gene loss in individuals - lethality in F2, - absence of essential gene = reproduction barrier
... But genomes are still similar Colinearity, syntheny Paterson et al., Plant Cell 12: 1523-1539, 2000
„Syntheny“ is usually missused to describe colinearity Syntheny = orthologous loci in two species on the same chromosome A’ C’ Species A Species B Ancestral Species B’ C” B” A” A B C Colinearity = group of loci in two species on a chromosom in the same order A’ B’ Species A Species B Ancestral Species C’ A” B” C” A B C
Changes in colinearity caused by chromosomal arm inversion
Colinearity of Poaceae genomes
Colinear regions differ mainly in repetitive DNA
Summary: Current plant genomes result from repeated cycles of partial and complete duplications, followed by reduction and modification of duplicated sequences. Plant genomes are still very dynamic. High portion of genome consists of repetitive DNA