Presentation is loading. Please wait.

Presentation is loading. Please wait.

The eukaryotic chromosome (Chapter 16) Monday, November 8, 2011 Wednesday, November 10, 2011 Genomics 260.605.01 J. Pevsner

Similar presentations


Presentation on theme: "The eukaryotic chromosome (Chapter 16) Monday, November 8, 2011 Wednesday, November 10, 2011 Genomics 260.605.01 J. Pevsner"— Presentation transcript:

1 The eukaryotic chromosome (Chapter 16) Monday, November 8, 2011 Wednesday, November 10, 2011 Genomics 260.605.01 J. Pevsner pevsner@jhmi.edu

2 Many of the images in this powerpoint presentation are from Bioinformatics and Functional Genomics by J Pevsner, © 2009 by Wiley-Blackwell. These images and materials may not be used without permission. Visit http://www.bioinfbook.org Copyright notice

3 Today: The eukaryotic chromosome (Chapter 16) Wednesday Nov. 10: The eukaryotic chromosome (continued) Friday November 12: The fungi (Chapter 17) The following week: lab; Sarah Wheelan; David Sullivan Schedule

4 Outline: eukaryotic chromosomes General features of eukaryotic chromosomes C value paradox and genome sizes Organization of genomes into chromosomes Genome browsers The ENCODE project Repetitive DNA content of eukaryotic genomes Gene content of eukaryotic chromosomes Regulatory regions of eukaryotic chromosomes Comparison of eukaryotic DNA Variation in chromosomal DNA Techniques to measure chromosomal change

5 Introduction to the eukaryotes Eukaryotes are single-celled or multicellular organisms that are distinguished from bacteria and archaea by the presence of a membrane-bound nucleus, an extensive system of intracellular organelles, and a cytoskeleton. Later we will explore the eukaryotes using a phylogenetic tree by Baldauf et al. (Science, 2000). This tree was made by concatenating four protein sequences: elongation factor 1a, actin,  -tubulin, and  -tubulin. Page 640

6 Eukaryotes (after Baldauf et al., 2000) Page 730

7 Note that we will learn how to make phylogenetic trees in lab as soon as MEGA software is installed in the computer labs. You should visit: http://www.megasoftware.net/ Download it for Windows, MacOS or Linux

8 General features of the eukaryotes Some of the general features of eukaryotes that distinguish them from bacteria and archaea are: eukaryotes include many multicellular organisms, in addition to unicellular organisms eukaryotes have [1] a membrane-bound nucleus [2] intracellular organelles [3] a cytoskeleton Most eukaryotes undergo sexual reproduction Page 641

9 General features of the eukaryotes (cont.) Some of the general features of eukaryotes that distinguish them from bacteria and archaea are: The genome size of eukaryotes spans a wider range than that of most bacteria and archaea Eukaryotic genomes have a lower density of genes Bacteria are haploid; eukaryotes have varying ploidy Eukaryotic genomes tend to be organized into linear chromosomes with a centromere and telomeres. Page 641

10 Questions about eukaryotic chromosomes What are the sizes of eukaryotic genomes, and how are they organized into chromosomes? What are the types of repetitive DNA elements? What are their properties and amounts? What are the types of genes? How can they be identified? What is the mutation rate across the genome; what are the selective forces affecting genome evolution? What is the spectrum of variation between species (comparative genomics) and within species?

11 Outline: eukaryotic chromosomes General features of eukaryotic chromosomes C value paradox and genome sizes Organization of genomes into chromosomes Genome browsers The ENCODE project

12 C value paradox: why eukaryotic genome sizes vary The haploid genome size of eukaryotes, called the C value, varies enormously. Encephalitozoon cuniculi ~3 Mb A variety of fungi 10-40 Mb Takifugu rubripes (pufferfish)365 Mb (same number of genes as other fish or as the human genome, but 1/8 th the size) Pinus resinosa (Canadian red pine)68 Gb Protopterus aethiopicus (Marbled lungfish)140 Gb Amoeba dubia (amoeba)690 Gb Page 643

13 C value paradox: why eukaryotic genome sizes vary The range in C values does not correlate well with the complexity of the organism. This phenomenon is called the C value paradox. The solution to this “paradox” is that genomes are filled with large tracts of noncoding, often repetitive DNA sequences. Page 643

14 Eukaryotic genomes are organized into chromosomes Genomic DNA is organized in chromosomes. The diploid number of chromosomes is constant in each species (e.g. 46 in human). Chromosomes are distinguished by a centromere and telomeres. The chromosomes are routinely visualized by karyotyping (imaging the chromosomes during metaphase, when each chromosome is a pair of sister chromatids). Page 644

15 Fig. 16.1 Page 645

16 Plate II. First P.G. mitosis in polar view. Tradescantia virginiana, Commelinaceae, n = 9 (from aberrrant plant with 22 chromosomes). 2 BE - CV smears. x 1200. Printed on multigrade paper. Darlington.

17 Mitosis in Paris quadrifolia, Liliaceae, showing all stages from prophase to telophase. n = 10 (Darlington).

18 Root tip squashes showing anaphase separation. Fritillaria pudica, 3x = 39, spiral structure of chromatids revealed by pressure after cold treatment. Darlington.

19 Cleavage mitosis in the teleostean fish, Coregonus clupeoides, in the middle of anaphase. Spindle structure revealed by slow fixation. Darlington.

20 Outline: eukaryotic chromosomes General features of eukaryotic chromosomes C value paradox and genome sizes Organization of genomes into chromosomes Genome browsers The ENCODE project

21 The eukaryotic chromosome: the centromere The centromere is a primary constriction where the chromosome attaches to the spindle fibers; here the boundary between sister chromatids is not clear. It may be in the middle (metacentric) or the end (acrocentric). If a chromosome has two centromeres spaced apart (dicentric) then at anaphase there is a 50% chance that a single chromatid would be pulled to opposite poles of the mitotic spindle. This would result in a bridge formation and chromosome breakage. Page 644

22 View a centromere on chromosome 11 Go to UCSC, human, hg18 assembly, chr11:51,000,001-55,000,000 Set tracks in Mapping and Sequencing (all dense): FISH clones GAP BAC End Pairs Fosmid End Pairs Gene Tracks RefSeq genes Variation and Repeats Segmental Dups RepMask 3.2.7 (1) Try zooming out 3-fold: are there more gaps? (2) Switch to the hg19 assembly: do the answers differ? (3) How many gaps are on chromosome 11? What sizes?

23 How many gaps are on chromosome 11? What are their sizes? [1] Go to Table Browser, human Assembly = hg19 Group = Mapping and Sequencing Tracks Track = Gap Region = chr11 (click lookup) Output = table in browser Output = BED, send to Galaxy [2] Galaxy result: there are 15 regions Left sidebar  Text Manipulation  Compute  c3-c2 (round the result) Left sidebar  Filter and Sort  Sort  on column c4  Execute [3] Note the Galaxy output (implicitly) includes the telomeres (10 kb, 50 kb) and the centromere (3 Mb).

24 The eukaryotic chromosome: the acrocentrics The short arm of the acrocentric autosomes has a secondary constriction usually containing a nucleolar organizer. This contains the genes for 18S and 28S ribosomal RNA. Page 644

25 The eukaryotic chromosome: the telomere The telomere is a region of highly repetitive DNA at either end of a linear chromosome. Telomeres include nucleoprotein complexes that function in the protection, replication, and stabilization of chromosome ends. Telomeres of many eukaryotes have tandemly repeated DNA sequences (discussed below). Page 644

26 Outline: eukaryotic chromosomes General features of eukaryotic chromosomes C value paradox and genome sizes Organization of genomes into chromosomes Genome browsers The ENCODE project

27 Two main genome browsers There are two principal genome browsers for eukaryotes: (1) Ensembl (www.ensembl.org) offers browsers for dozens of genomes (2) UCSC (http://genome.ucsc.edu) offers genome and table browsers for dozens of organisms. We will focus on this browser. Page 645

28 Example #1 of a human genome web browser: human chromosome 21 at NCBI nucleolar organizing center centromere

29 nucleolar organizing center centromere Example #2 of a human genome web browser: human chromosome 21 at www.ensembl.org

30 centromere Example #3 of a human genome web browser: human chromosome 21 at UCSC

31 Example #4 of a human genome web browser: the beta globin gene cluster on human chromosome 11 at UCSC [1] Enter hbb (for beta globin) into the gene box, and this takes you to chr11:5,246,696-5,248,301 (for the hg19 assembly), a 1,606 base pair region. [2] Enter chr11:5,200,001-5,750,000 (a 550,000 bp region). Set the UCSC Genes and the RefSeq genes to “pack.” Note that the UCSC Genes track includes ENORMOUS models for HBE1 and HBG2. Why? [3] We will instead focus on the RefSeq region. Enter: chr11:5,245,001-5,295,000 (50 kb region)

32 Beta globin cluster on chr11:5,245,001-5,295,000 (50 kb region, hg19 build, default tracks shown) HBB, HBD, HBG1, HBG2, HBE1

33 Outline: eukaryotic chromosomes General features of eukaryotic chromosomes C value paradox and genome sizes Organization of genomes into chromosomes Genome browsers The ENCODE project Page 647

34 The ENCODE project ►The ENCyclopedia Of DNA Elements (ENCODE) project was launched in 2003 ► Pilot phase (completed): devise and test high-throughput approaches to identify functional elements. 44 DNA targets: 1 percent of the human genome, ~30 million base pairs (Mb). ► Second phase (simultaneous): technology development. ► Third phase: production. Expand the ENCODE project to analyze the remaining 99 percent of the human genome. Key reference: Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature (2007) 447:799-816. PMID: 17571346 Page 647

35 The ENCODE project Goal of ENCODE: build a list of all sequence-based functional elements in human DNA. This includes: ► protein-coding genes ► non-protein-coding genes ► regulatory elements involved in the control of gene transcription ► DNA sequences that mediate chromosomal structure and dynamics. Page 647

36 http://genome.cse.ucsc.edu/ ENCODE/encode.hg17.html ENCODE data at the UCSC Genome Browser

37 ENCODE data at the UCSC Genome Browser: beta globin (switch from hg19 to hg18 assembly to access tracks!)

38 <> ENCODE tracks available at the UCSC Genome Browser (hg18 assembly)

39 ENCODE data at the UCSC Genome Browser: beta globin (switch from hg19 to hg18 assembly to access tracks!) The ENCODE EGASP competition shows that leading gene prediction software generates gene models that differ greatly in sensitivity and sensitivity. This often results in quite distinct predictions of gene structure. Jigsaw is one of the best.

40 ENCODE variation tracks available at UCSC (hg18 assembly; try chr11:5,200,001-5,250,000)

41 Outline: eukaryotic chromosomes General features of eukaryotic chromosomes Repetitive DNA content of eukaryotic genomes Noncoding and repetitive sequences 1. Interspersed repeats 2. Processed pseudogenes 3. Simple sequence repeats 4. Segmental duplications 5. Blocks of tandem repeats

42 Britten & Kohne’s analysis of repetitive DNA In the 1960s, Britten and Kohne defined the repetitive nature of genomic DNA in a variety of organisms. They isolated genomic DNA, sheared it, dissociated the DNA strands, and measured the rates of DNA reassociation. For dozens of eukaryotes—but not bacteria or viruses— large amount of DNA reassociates extremely rapidly. This represents repetitive DNA. Page 650

43 Fig. 16.5 Page 651 Britten and Kohne (1968) identified repetitive DNA classes

44 Software to detect repetitive DNA It is essential to identify repetitive DNA in eukaryotic genomes. RepBase Update is a database of known repeats and low-complexity regions. RepeatMasker is a program that searches DNA queries against RepBase. There are many RepeatMasker sites available on-line. We will use 50,000 base pairs from human chromosome 11 as an example. This region includes the beta globin gene cluster. Page 653

45 http://www.repeatmasker.org/current 11/11 Repeatmasker software screens DNA for repeats

46

47 RepeatMasker results for 50 kb of DNA in the human HBB locus

48

49 RepeatMasker masks repetitive DNA (FASTA format)

50 Five main classes of repetitive DNA Page 652 1.Interspersed repeats (transposon-derived repeats) constitute ~45% of the human genome. They involve RNA intermediates (retroelements) or DNA intermediates (DNA transposons). Long-terminal repeat transposons (RNA-mediated) Long interspersed elements (LINEs); these encode a reverse transcriptase Short interspersed elements (SINEs)(RNA-mediated); these include Alu repeats DNA transposons (3% of human genome)

51 UCSC displays interspersed repeats identified by RepeatMasker (hg18 assembly; chr11:5,200,001-5,250,000) Turn on the RepeatMasker track from the “Variation and Repeats” section; or the RepMask 3.2.7 track

52 Five main classes of repetitive DNA Table 16.6 Page 653 1.Interspersed repeats (transposon-derived repeats) Examples include retrotransposed genes that lack introns, such as: ADAM20 NM_00381414q (original gene on 8p) Cetn1NM_00406618p (original gene on Xq) Glud2NM_012084Xq (original gene on 10q) Pdha2NM_0053904q (original gene on Xp)

53 Outline: eukaryotic chromosomes General features of eukaryotic chromosomes Repetitive DNA content of eukaryotic genomes Noncoding and repetitive sequences 1. Interspersed repeats 2. Processed pseudogenes 3. Simple sequence repeats 4. Segmental duplications 5. Blocks of tandem repeats

54 Five main classes of repetitive DNA Page 653 2. Processed pseudogenes These genes have a stop codon or frameshift mutation and do not encode a functional protein. They commonly arise from retrotransposition, or following gene duplication and subsequent gene loss. For a superb on-line resource, visit Mark Gerstein’s website, http://www.pseudogene.org. Gerstein and colleagues (2006) suggest that there are ~19,000 pseudogenes in the human genome, slightly fewer than the number of functional protein-coding genes. (11,000 non-processed, 8,000 processed [lack introns].)

55 Pseudogenes in the UCSC genome browser Yale pseudogenes VEGA pseudogenes

56 Pseudogenes in the UCSC genome browser Note one retrogene prediction (duplicated HBG1 gene)

57 Pseudogenes in the HOX cluster ENCODE region

58 HOX genes From the Entrez Gene entry for human HOXA1: In vertebrates, the genes encoding the class of transcription factors called homeobox genes are found in clusters named A, B, C, and D on four separate chromosomes. Expression of these proteins is spatially and temporally regulated during embryonic development. This gene is part of the A cluster on chromosome 7 and encodes a DNA-binding transcription factor which may regulate gene expression, morphogenesis, and differentiation.

59 Vertebrate Genome Annotation (VEGA) database From the VEGA home page (http://vega.sanger.ac.uk): "The Vertebrate Genome Annotation (VEGA) database build 30 is designed to be a central repository for manual annotation of different vertebrate finished genome sequence. In collaboration with the genome sequencing centres Vega attempts to present consistent high-quality curation of the published chromosome sequences." "Finished genomic sequence is analysed on a clone by clone basis using a combination of similarity searches against DNA and protein databases as well as a series of ab initio gene predictions (GENSCAN, Fgenes)." "In addition, comparative analysis using vertebrate datasets such as the Riken mouse cDNAs and Genoscope Tetraodon nigroviridis Ecores (Evolutionary Conserved Regions) are used for novel gene discovery."

60 Vertebrate Genome Annotation (VEGA) database VEGA definition of pseudogenes (http://vega.sanger.ac.uk):

61 Yale pseudogene database http://www.pseudogene.org The human genome has as many pseudogenes as genes

62 Pseudogenes: example Mouse GULO, required for vitamin C biosynthesis, has become a pseudogene in the primate lineage (  GULO). Here is an output for GULO on the human genome:

63 Pseudogenes: example GULO pseudogene in Entrez nucleotide (NG_001136):

64 Pseudogenes: example Mouse GULO in Entrez protein:

65 Outline: eukaryotic chromosomes General features of eukaryotic chromosomes Repetitive DNA content of eukaryotic genomes Noncoding and repetitive sequences 1. Interspersed repeats 2. Processed pseudogenes 3. Simple sequence repeats 4. Segmental duplications 5. Blocks of tandem repeats

66 Five main classes of repetitive DNA Page 657 3. Simple sequence repeats Microsatellites: from one to a dozen base pairs Examples: (A) n, (CA) n, (CGG) n These may be formed by replication slippage. Minisatellites: a dozen to 500 base pairs Simple sequence repeats of a particular length and composition occur preferentially in different species. In humans, an expansion of triplet repeats such as CAG is associated with over a dozen disorders (including Huntington’s disease).

67 Example of simple sequence repeats (e.g. AC n ) from the globin locus Page 657 >hg18_rmskRM327_(TAAAA)n range=chr11:5206265-5206312 5'pad=0 3'pad=0 strand=+ repeatMasking=lower aaaataaaataaaataaaataaaataaaacaataaaatgaaa taaaat >hg18_rmskRM327_(CA)n range=chr11:5207526- 5207560 5'pad=0 3'pad=0 strand=+ repeatMasking=lower acacacacacacacacacacacacacacacacaca

68 Repetitive DNA in the UCSC genome browser Sequences of at least 15 perfect di-nucleotide and tri-nucleotide repeats identified by TRF. These tend to be highly polymorphic in the population.

69 Beta globin locus: tandem repeats, microsatellites, and RepeatMasker

70 Outline: eukaryotic chromosomes General features of eukaryotic chromosomes Repetitive DNA content of eukaryotic genomes Noncoding and repetitive sequences 1. Interspersed repeats 2. Processed pseudogenes 3. Simple sequence repeats 4. Segmental duplications 5. Blocks of tandem repeats

71 Five main classes of repetitive DNA Page 658 4. Segmental duplications These are blocks of about 1 kilobase to 300 kb that are copied intra- or interchromosomally. Evan Eichler and colleagues estimate that about 5% of the human genome consists of segmental duplications. Duplicated regions often share very high (99%) sequence identity.

72 Beta globin locus: segmental duplications of the HBG1 and HBG2 genes

73 Fig. 16.12 Page 659 Successive tandem gene duplications (after Lacazette et al., 2000) observed today

74 Successive tandem gene duplications (after Lacazette et al., 2000) Fig. 16.12 Page 659

75 Successive tandem gene duplications (after Lacazette et al., 2000) Fig. 16.12 Page 659

76 Successive tandem gene duplications (after Lacazette et al., 2000) Fig. 16.12 Page 659

77 Beta globin locus: segmental duplications involving HBG1 and HBG2 genes Page 660

78 Beta globin locus: segmental duplications (pairwise alignment of duplicated regions at UCSC)

79 Region of beta globin locus: segmental duplications (one region is duplicated at dozens of loci)

80 Region of beta globin locus: segmental duplications (one region is duplicated at dozens of loci) By definition, non- RepeatMasked

81 Outline: eukaryotic chromosomes General features of eukaryotic chromosomes Repetitive DNA content of eukaryotic genomes Noncoding and repetitive sequences 1. Interspersed repeats 2. Processed pseudogenes 3. Simple sequence repeats 4. Segmental duplications 5. Blocks of tandem repeats

82 Five main classes of repetitive DNA Page 660 5. Blocks of tandem repeats These include telomeric repeats (e.g. TTAGGG in humans) and centromeric repeats (e.g. a 171 base pair repeat of  satellite DNA in humans). Such repetitive DNA can span millions of base pairs, and it is often species-specific.

83 Fig. 16.14 Page 661 Example of telomeric repeats (obtained by blastn searching TTAGGG 4 )

84 Five main classes of repetitive DNA Page 660 5. Blocks of tandem repeats In two exceptional cases, chromosomes lack satellite DNA: Saccharomyces cerevisiae (very small centromeres) Neocentromeres (an ectopic centromere; 60 have been described in human, often associated with disease)

85 Outline: eukaryotic chromosomes General features of eukaryotic chromosomes Repetitive DNA content of eukaryotic genomes Gene content of eukaryotic chromosomes Regulatory regions of eukaryotic chromosomes Comparison of eukaryotic DNA Variation in chromosomal DNA Techniques to measure chromosomal change

86 Finding genes in eukaryotic DNA Two of the biggest challenges in understanding any eukaryotic genome are defining what a gene is, and identifying genes within genomic DNA Page 662

87 Finding genes in eukaryotic DNA Types of genes include protein-coding genes pseudogenes functional RNA genes --tRNAtransfer RNA --rRNAribosomal RNA --snoRNAsmall nucleolar RNA --snRNAsmall nuclear RNA --miRNAmicroRNA Page 662

88 Finding genes in eukaryotic DNA RNA genes have diverse and important functions. However, they can be difficult to identify in genomic DNA, because they can be very small, and lack open reading frames that are characteristic of protein-coding genes. tRNAscan-SE identifies 99 to 100% of tRNA molecules, with a rate of 1 false positive per 15 gigabases. Visit http://lowelab.ucsc.edu/tRNAscan-SE/ Page 662

89

90 Finding genes in eukaryotic DNA Protein-coding genes are relatively easy to find in prokaryotes, because the gene density is high (about one gene per kilobase). In eukaryotes, gene density is lower, and exons are interrupted by introns. There are several kinds of exons: -- noncoding -- initial coding exons -- internal exons -- terminal exons -- some single-exon genes are intronless Page 663

91 Fig. 16.16 Page 664 Eukaryotic gene prediction algorithms distinguish several kinds of exons

92 Finding genes in eukaryotic DNA Algorithms that find protein-coding genes are extrinsic or intrinsic (refer to Chapter 13). Page 664 Homology-based searches (“extrinsic”) Rely on previously identified genes Algorithm-based searches (“intrinsic”) Investigate nucleotide composition, open- reading frames, and other intrinsic properties of genomic DNA

93 DNA RNA Mature RNA protein intron Page 664

94 DNA RNA protein Extrinsic, homology-based searching: compare genomic DNA to expressed genes (ESTs) intron Page 664

95 DNA RNA Intrinsic, algorithm-based searching: Identify open reading frames (ORFs). Compare DNA in exons (unique codon usage) to DNA in introns (unique splices sites) and to noncoding DNA. Page 664

96 chimpanzee DNA Comparative genomics: Compare gene models between species. (For annotation of the chimpanzee genome reported in 2005, BLAT and BLASTZ searches were used to align the two genomes.) human DNA

97 Finding genes in eukaryotic DNA While ESTs are very helpful in finding genes, beware of several caveats. -- The quality of EST sequence is sometimes low -- Highly expressed genes are disproportionately represented in many cDNA libraries -- ESTs provide no information on genomic location Page 665

98 Finding genes in eukaryotic DNA Both intrinsic and extrinsic algorithms vary in their rates of false-positive and false-negative gene identification. Programs such as GENSCAN and Grail account for features such as the nucleotide composition of coding regions, and the presence of signals such as promoter elements. Try using the on-line genome annotation pipeline offered by Oak Ridge National Laboratory. Google ORNL pipeline, or visit http://compbio.ornl.gov/tools/pipeline/ Page 665

99 Oak Ridge National Laboratory (ORNL) offers an on-line annotation pipeline

100

101

102

103 Finding genes in eukaryotic DNA We used 100,000 base pairs of human DNA. The pipeline correctly identified several exons of RBP4, but failed to generate a complete gene model. As another example, initial annotation of the rice genome yielded over 75,000 gene predictions, only 53,000 of which were complete (having initial and terminal exons). Also, it is very difficult to accurately identify exon-intron boundaries. Estimates of gene content improve dramatically when finished (rather than draft) sequence is analyzed.

104 EGASP: the human ENCODE Genome Annotation Assessment Project EGASP goals: [1] Assess of the accuracy of computational methods to predict protein coding genes. 18 groups competed to make gene predictions, blind; these were evaluated relative to reference annotations generated by the GENCODE project. [2] Assess of the completeness of the current human genome annotations as represented in the ENCODE regions. Page 666

105 <> UCSC: tracks for Gencode and for various gene prediction algorithms (focus on 50 kb encompassing five globin genes) JIGSAW Gencode Page 667

106 EGASP: the human ENCODE Genome Annotation Assessment Project “RESULTS: The best methods had at least one gene transcript correctly predicted for close to 70% of the annotated genes. Nevertheless, the multiple transcript accuracy, taking into account alternative splicing, reached only approximately 40% to 50% accuracy. At the coding nucleotide level, the best programs reached an accuracy of 90% in both sensitivity and specificity. Programs relying on mRNA and protein sequences were the most accurate in reproducing the manually curated annotations. Experimental validation shows that only a very small percentage (3.2%) of the selected 221 computationally predicted exons outside of the existing annotation could be verified.” Guigo R et al., Genome Biology (2006) 7 Suppl 1: S2.1-31 Page 667

107 Protein-coding genes in eukaryotic DNA: a new paradox The C value paradox is answered by the presence of noncoding DNA. Why are the number of protein-coding genes about the same for worms, flies, plants, and humans? This has been called the N-value paradox (number of genes) or the G value paradox (number of genes). Page 668

108 Outline: eukaryotic chromosomes General features of eukaryotic chromosomes Repetitive DNA content of eukaryotic genomes Gene content of eukaryotic chromosomes Regulatory regions of eukaryotic chromosomes Comparison of eukaryotic DNA Variation in chromosomal DNA Techniques to measure chromosomal change

109 Transcription factor databases In addition to identifying repetitive elements and genes, it is also of interest to predict the presence of genomic DNA features such as promoter elements and GC content. See Table 16.10 (p. 670) for a list of websites that predict transcription factor binding sites and related sequences. Page 669

110 http://www.sanger.ac.uk/Users/td2/eponine Eponine predicts transcription start sites in promoter regions. The algorithm uses a set of DNA weight matrices recognizing sequence motifs that are associated with a position distribution relative to the transcription start site. The model is as follows: The specificity is good (~70%), and the positional accuracy is excellent. The program identifies ~50% of TSSs—although it does not always know the direction of transcription.

111 Outline: eukaryotic chromosomes General features of eukaryotic chromosomes Repetitive DNA content of eukaryotic genomes Gene content of eukaryotic chromosomes Regulatory regions of eukaryotic chromosomes Comparison of eukaryotic DNA Variation in chromosomal DNA Techniques to measure chromosomal change

112 Comparison of eukaryotic DNA: PipMaker and VISTA In studying genomes, it is important to align large segments of DNA. PipMaker and VISTA are two tools for sequence alignment and visualization. They show conserved segments, including the order and orientation of conserved elements. They also display large-scale genomic changes (inversions, rearrangements, duplications). Try VISTA (http://www-gsd.lbl.gov/vista) or PipMaker (http://bio.cse.psu.edu/pipmaker) with genomic DNA from Hs10 and Mm19 (containing RBP4). Page 673

113 VISTA offers comparative analysis of genomes

114 Fig. ~16.22 Page 674 VISTA output for an alignment of human beta globin region with seven vertebrates

115 conserved noncoding coding exon VISTA analysis of HBB gene locus UTR

116 VISTA allows comparison of conserved transcription factor binding sites

117 Outline: eukaryotic chromosomes General features of eukaryotic chromosomes Repetitive DNA content of eukaryotic genomes Gene content of eukaryotic chromosomes Regulatory regions of eukaryotic chromosomes Comparison of eukaryotic DNA Variation in chromosomal DNA Techniques to measure chromosomal change

118 The spectrum of variation Category of variationSizetype Single base pair changes1 bpSNPs, point mutations Small insertions/deletions1 – 50 bp Short tandem repeats1 – 500 bpmicrosatellites Fine-scale structural var.50 bp – 5 kbdel, dup, inv tandem repeats Retroelement insertions0.3 – 10 kb SINEs, LINEs LTRs, ERVs Intermediate-scale struct.5 kb – 50 kbdel, dup, inv, tandem repeats Large-scale structural var.50 kb – 5 Mbdel, dup, inv, large tandem repeats Chromosomal variation>>5Mb aneuploidy Adapted from Sharp AJ et al. (2006) Annu Rev Genomics Hum Genet 7:407-42

119 Eukaryotic chromosomes can be dynamic Chromosomes can be highly dynamic, in several ways. Whole genome duplication (autopolyploidy) can occur, as in yeast (Chapter 15) and some plants. The genomes of two distinct species can merge, as in the mule (male donkey, 2n = 62 and female horse, 2n = 64) An individual can acquire an extra copy of a chromosome (e.g. Down syndrome, TS13, TS18) Chromosomes can fuse; e.g. human chromosome 2 derives from a fusion of two ancestral primate chromosomes Chromosomal regions can be inverted (hemophilia A) Portions of chromosomes can be deleted (e.g.  11q syndrome) Segmental and other duplications occur Chromatin diminution can occur (Ascaris) Page 675

120 Conservative nature of chromosome evolution Among placental mammals, the number of diploid chromosomes is: 84 in black rhinoceros 46 in Homo sapiens 17 in two rodent species The process of chromosome evolution tends to remain conservative. Heterozygous carriers of most types of chromosomal rearrangements are semisterile. Thus many chromosomal changes cannot be fixed. Ohno (1970) p. 41

121 Inversions in chromosome evolution Chromosomal inversions occur when a fragment of a chromosome breaks at two places, inverts, and is reinserted. This is a useful mechanism for producing a sterility barrier during speciation. An example is in deer mice; another example is in Anopheles gambiae. Ohno (1970) p. 42

122 The eukaryotic chromosome: Robertsonian fusion creates one metacentric by fusion of two acrocentrics Translocations occur when chromosomal material is exchanged between two non-homologous chromosomes. Roberstonian fusion, which often accompanies speciation, is the creation of one metacentric chromosome by the centric fusion of two acrocentrics. Robertsonian fusions are often tolerated and may sometimes be considered selectively neutral. An example is the house mouse (Mus musculus, 2n = 40) and a small group of tobacco mice in Switzerland (Mus poschiavinus, 2n = 26). Mus poschiavinus is homozygous for seven Robertsonian fusions. Page 675; Ohno (1970) p. 43

123 The eukaryotic chromosome: Robertsonian fusion creates one metacentric by fusion of two acrocentrics Ohno (1970) Plate II ordinary male house mouse (Mus musculus, 2n = 40) male tobacco mouse (Mus poschiavinus, 2n = 26) Male first meiotic metaphase from an interspecific F1-hbrid. Note seven trivalents (each from one poschiavinus metacentric and two musculus acrocentrics)

124 Diploidization of the tetraploid Ohno (1970) pp 98- 101 A species can become tetraploid. All loci are duplicated, and what was formerly the diploid chromosome complement is now the haploid set of the genome. Polyploid evolution occurs commonly in plants. For example, in the cereal plant Sorghum S. versicolor (diploid) 2n = 2 x 5; 10 chromosomes S. sudanense (tetraploid) 4n = 4 x 5; 20 chromosomes S. halepense (octoplooid) 8n = 8 x 5; 40 chromosomes In plants, the male sex organ (stamen) and female organ (pistil or carpel) is present in the same flower; they are hermaphroditic.

125 Diploidization of the tetraploid Ohno (1970) pp 98- 101 Polyploid evolution occurs rarely in vertebrates and other metazoans. For diploid organisms with XY/XX sex determination, in tetraploidy the male must maintain 4AXXYY and the female 4AXXXX. But during meiosis of the 4AXXYY male, the four sex elements may pair off as the XX bivalent and the YY bivalent such that every gamete is 2AXY. All offspring of the tetraploid male and tetraploid female would be 4AXXXY. If this were male, there would be no females. The 4AXXYY male cannot produce the necessary two classes of gametes, 2AXX and 2AYY. Mammals, birds, and reptiles are thus not polyploid.

126 Tetraploidy: Odontophyrynus americanus is a newly arisen bisexual autotetraploid vertebrate Ohno (1970) pp 98- 101 Polyploid evolution can occur in fish and amphibians, because of differences in sex determination (X and Y in males, Z and W in females). Autopolyploidy: in a diploid organism, two daughter cells at the end of mitotic telophase may fuse into one cell, forming a tetraploid cell. Two diploid gametes may produce a tetraploid zygote. Allopolyploidy: interspecies polyploidy.

127 Tetraploidy: Odontophyrynus americanus is a newly arisen bisexual autotetraploid vertebrate Ohno (1970) pp 98- 101 South American frogs species of the family Ceratophyrydidae may be autopolyploids. The diploid chromosome number has a wide range: O. cultripes 22 chromosomes11 bivalents in meiosis O. americanus 44 chromosomes11 quadrivalents in meiosis other species of this family 110 chromosomes

128 Ohno (1970) plate III; p. 100 Karyotype of the tetraploid frog Odontophrynus americanus (4n = 44) 44 chromosomes: 11 sets of four homologs sperm head two bivalents ten quadrivalents

129 Diploidization of the tetraploid Ohno (1970) p. 102 As an autotetraploid arises, it has four homologous chromosomes for each linkage group. These must change to a disomic state to allow functional diversification of the loci. If the four homologues form a quadrivalent, there cannot be functional diversification. Two distinct, separate bivalents must form, for example via a pericentric inversion. In fish of the suborder Salmonoidea, trout, salmon, whitefish and graylings are probably autotetraploid species.

130 Ohno (1970) plate IV; p. 102 Karyotypes of a species in the process of diploidization: rainbow trout Salmo irideus Liver cell 61 chromosomes 43 metacentrics 18 acrocentrics 104 chrom. arms Spleen cell 59 chromosomes 45 metacentrics 14 acrocentrics 104 chrom. arms 4 quadrivalents 1 quadrivalent

131 Trisomy and polysomy Ohno (1970) p. 107 Nondisjunction results in two chromatids of one chromosome moving to the same division pole. In diploid species, one daughter cell receives three homologous chromosomes (trisomy). If this occurs in germ cells, the progeny may be trisomic. In the Jimson weed (Datura stramonium) trisomy for each of the 12 chromosomes was observed by Blakeslee (1930). A mating between trisomic individuals may produce tetrasomic progeny having two homologous chromosomes (thus duplicating an entire chromosome).

132 Trisomy and polysomy Ohno (1970) p. 107 For vertebrates, this mechanism is too severe. Generally, only trisomy of chromosomes 13, 18, or 21 are compatible with postnatal survival in humans. In rainbow trout that have become tetraploid, trisomy (i.e. from four to five copies) and monosomy (i.e. from four to three copies) may be tolerated.

133 Outline: eukaryotic chromosomes General features of eukaryotic chromosomes Repetitive DNA content of eukaryotic genomes Gene content of eukaryotic chromosomes Regulatory regions of eukaryotic chromosomes Comparison of eukaryotic DNA Variation in chromosomal DNA Techniques to measure chromosomal change


Download ppt "The eukaryotic chromosome (Chapter 16) Monday, November 8, 2011 Wednesday, November 10, 2011 Genomics 260.605.01 J. Pevsner"

Similar presentations


Ads by Google