Eukaryotic Genomes: From Parasites to Primates (Part 1 of 2) Monday, November 3, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner

Slides:



Advertisements
Similar presentations
Genomics – The Language of DNA Honors Genetics 2006.
Advertisements

DNA Organization Lec 2. Aims The aims of this lecture is to investigate how cells organize their DNA within the cell nucleus, how is the huge amount of.
Introduction to genomes & genome browsers
The Organization of Cellular Genomes Complexity of Genomes Chromosomes and Chromatin Sequences of Genomes Bioinformatics As we have discussed for the last.
© Wiley Publishing All Rights Reserved. Using Nucleotide Sequence Databases.
Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence carry out dideoxy sequencing connect seqs. to make whole chromosomes.
Chap. 6 Problem 2 Protein coding genes are grouped into the classes known as solitary (single) genes, and duplicated or diverged genes in gene families.
Copyright, ©, 2002, John Wiley & Sons, Inc.,Karp/CELL & MOLECULAR BIOLOGY 3E The Stability of the Genome Duplication, Deletion, Transposition.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. CHAPTER 18 LECTURE SLIDES.
1 Gene Finding Charles Yan. 2 Gene Finding Genomes of many organisms have been sequenced. We need to translate the raw sequences into knowledge. Where.
ECE 501 Introduction to BME
Genes. Outline  Genes: definitions  Molecular genetics - methodology  Genome Content  Molecular structure of mRNA-coding genes  Genetics  Gene regulation.
Bioinformatics Lecture 2. Bioinformatics: is the computational branch of molecular biology Using the computer software to analyze biological data The.
CHAPTER 15 Microbial Genomics Genomic Cloning Techniques Vectors for Genomic Cloning and Sequencing MS2, RNA virus nt sequenced in 1976 X17, ssDNA.
Genome Browsers UCSC (Santa Cruz, California) and Ensembl (EBI, UK)
Genomes summary 1.>930 bacterial genomes sequenced. 2.Circular. Genes densely packed Mbases, ,000 genes 4.Genomes of >200 eukaryotes (45.
Kinetics and Components
Online Counseling Resource YCMOU ELearning Drive… School of Architecture, Science and Technology Yashwantrao C havan Maharashtra Open University, Nashik.
Genome organization Eukaryotic genomes are complex and DNA amounts and organization vary widely between species.
Biology 224 Instructor: Tom Peavy Oct 11, 2010 Gene Structure & Genomes.
Biology 224 Instructor: Tom Peavy Oct 12 & 14, 2009 Gene Structure & Genomes.
Introduction Basic Genetic Mechanisms Eukaryotic Gene Regulation The Human Genome Project Test 1 Genome I - Genes Genome II – Repetitive DNA Genome III.
GenomesGenomes Chapter 21. Genomes Sequencing of DNA Human Genome Project countries 20 research centers.
The eukaryotic chromosome (Chapter 16) Monday, November 8, 2011 Wednesday, November 10, 2011 Genomics J. Pevsner
Eukaryotic Gene Expression The “More Complex” Genome.
Chapter 5 Genome Sequences and Gene Numbers. 5.1Introduction  Genome size vary from approximately 470 genes for Mycoplasma genitalium to 25,000 for human.
Selfish DNA Honors Genetics.
Genome Organization and Evolution. Assignment For 2/24/04 Read: Lesk, Chapter 2 Exercises 2.1, 2.5, 2.7, p 110 Problem 2.2, p 112 Weblems 2.4, 2.7, pp.
The eukaryotic chromosome (Chapter 16) Friday, November 5, 2010 Genomics J. Pevsner
Genomes and Their Evolution. GenomicsThe study of whole sets of genes and their interactions. Bioinformatics The use of computer modeling and computational.
Gene & Genome Evolution1 Chapter 9 You will not be responsible for: Read the How We Know section on Counting Genes, and be able to discuss methodologies.
GenomesGenomes Chapter 21 Genomes Sequencing of DNA Human Genome Project countries 20 research centers.
Fig Genome = Genic + Intergenic (or non-genic) Eukaryotic genomes: composition of human genome.
The Human Genome (part 1 of 2) Wednesday, November 5, 2003 Introduction to Bioinformatics ME: J. Pevsner
Copyright © 2002 Pearson Education, Inc., publishing as Benjamin Cummings Section B: Genome Organization at the DNA Level 1.Repetitive DNA and other noncoding.
Genome Organization & Evolution. Chromosomes Genes are always in genomic structures (chromosomes) – never ‘free floating’ Bacterial genomes are circular.
Ch. 21 Genomes and their Evolution. New approaches have accelerated the pace of genome sequencing The human genome project began in 1990, using a three-stage.
Genomes & their evolution Ch 21.4,5. About 1.2% of the human genome is protein coding exons. In 9/2012, in papers in Nature, the ENCODE group has produced.
Chapter 21 Eukaryotic Genome Sequences
1 Genome Evolution Chapter Introduction Genomes contain the raw material for evolution; Comparing whole genomes enhances – Our ability to understand.
Non-Coding Areas & Mutations Within the human genome the majority of the DNA (~75%) is made up of sequences not involved in coding for proteins, RNA, or.
Sackler Medical School
Chapter 5 The Content of the Genome 5.1 Introduction genome – The complete set of sequences in the genetic material of an organism. –It includes the.
Mark D. Adams Dept. of Genetics 9/10/04
BioSci D145 lecture 1 page 1 © copyright Bruce Blumberg All rights reserved Organization and Structure of Genomes (contd) Genome size –i.e. total.
Genes and Genomes. Genome On Line Database (GOLD) 243 Published complete genomes 536 Prokaryotic ongoing genomes 434 Eukaryotic ongoing genomes December.
Johnson - The Living World: 3rd Ed. - All Rights Reserved - McGraw Hill Companies Genomics Chapter 10 Copyright © McGraw-Hill Companies Permission required.
Genomics Chapter 18.
The Secret of Life! DNA. 2/4/20162 SOMETHING HAPPENS GENE PROTEIN.
How many genes are there?
Finding genes in the genome
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
Eukaryotic genes are interrupted by large introns. In eukaryotes, repeated sequences characterize great amounts of noncoding DNA. Bacteria have compact.
Genetic Code and Interrupted Gene Chapter 4. Genetic Code and Interrupted Gene Aala A. Abulfaraj.
Looking Within Human Genome King abdulaziz university Dr. Nisreen R Tashkandy GENOMICS ; THE PIG PICTURE.
1 Gene Finding. 2 “The Central Dogma” TranscriptionTranslation RNA Protein.
Chromosome Organization & Molecular Structure. Chromosomes & Genomes Chromosomes complexes of DNA & proteins – chromatin Viral – linear, circular; DNA.
bacteria and eukaryotes
”Gene Finding in Eukaryotic Genomes”
Genomes and their evolution
Genomes and Their Evolution
SGN23 The Organization of the Human Genome
Genomes and Their Evolution
Genomes and Their Evolution
Genome structures.
Fig Figure 21.1 What genomic information makes a human or chimpanzee?
Introduction to Bioinformatics II
Evolution of eukaryote genomes
Gene Density and Noncoding DNA
Genome Annotation and the Human Genome
Presentation transcript:

Eukaryotic Genomes: From Parasites to Primates (Part 1 of 2) Monday, November 3, 2003 Introduction to Bioinformatics ME: J. Pevsner

Many of the images in this powerpoint presentation are from Bioinformatics and Functional Genomics by J Pevsner (ISBN ). Copyright © 2003 by Wiley. These images and materials may not be used without permission from the publisher. Visit Copyright notice

Today: Eukaryotic genomes Wednesday Nov. 5: Human genome Friday Nov. 7: computer lab Monday Nov. 10: Human disease Wednesday Nov. 12: Final exam (in class) Announcements

Outline of today’s lecture 1. General features of eukaryotic genomes C value paradox and genome sizes noncoding DNA; repetitive DNA genes organization of DNA in chromosomes 2. Individual eukaryotic genomes protozoans (e.g. trypanosomes, Plasmodium) plants (Arabidopsis, rice) metazoans (nematodes, insects) vertebrates (fish, mouse, primates) Page 539

Introduction to the eukaryotes Eukaryotes are single-celled or multicellular organisms that are distinguished from prokaryotes by the presence of a membrane-bound nucleus, an extensive system of intracellular organelles, and a cytoskeleton. We will explore the eukaryotes using a phylogenetic tree by Baldauf et al. (Science, 2000). This tree was made by concatenating four protein sequences: elongation factor 1a, actin,  -tubulin, and  -tubulin. Page 541

Eukaryotes (after Baldauf et al., 2000)

General features of the eukaryotes Some of the general features of eukaryotes that distinguish them from prokaryotes are: eukaryotes include many multicellular organisms, in addition to unicellular organisms. eukaryotes have [1] a membrane-bound nucleus, [2] intracellular organelles, and [3] a cytoskeleton Most eukaryotes undergo sexual reproduction The genome size of eukaryotes spans a wider range than that of most prokaryotes Eukaryotic genomes have a lower density of genes Prokaryotes are haploid; eukaryotes have varying ploidy Eukaryotic genomes tend to be organized into linear chromosomes with a centromere and telomeres. Page 541

C value paradox: why eukaryotic genome sizes vary The haploid genome size of eukaryotes, called the C value, varies enormously. Small genomes include: Encephalotiozoon cuniculi (2.9 Mb) A variety of fungi (10-40 Mb) Takifugu rubripes (pufferfish)(365 Mb)(same number of genes as other fish or as the human genome, but 1/10 th the size) Large genomes include: Pinus resinosa (Canadian red pine)(68 Gb) Protopterus aethiopicus (Marbled lungfish)(140 Gb) Amoeba dubia (amoeba)(690 Gb) Page 543

C value paradox: why eukaryotic genome sizes vary The range in C values does not correlate well with the complexity of the organism. This phenomenon is called the C value paradox. The solution to this “paradox” is that genomes are filled with large tracts of noncoding, often repetitive DNA sequences. Page 543

Britten & Kohne’s analysis of repetitive DNA In the 1960s, Britten and Kohne defined the repetitive nature of genomic DNA in a variety of organisms. They isolated genomic DNA, sheared it, dissociated the DNA strands, and measured the rates of DNA reassociation. For dozens of eukaryotes—but not bacteria or viruses— large amount of DNA reassociates extremely rapidly. This represents repetitive DNA. Page

Fig Page 545 Britten and Kohne (1968) identified repetitive DNA classes

Five main classes of repetitive DNA 1.Interspersed repeats 2.Processed pseudogenes 3.Simple sequence repeats 4.Segmental duplications 5.Blocks of tandem repeats Page

Five main classes of repetitive DNA Page Interspersed repeats (transposon-derived repeats) constitute ~45% of the human genome. They involve RNA intermediates (retroelements) or DNA intermediates (DNA transposons). Long-terminal repeat transposons (RNA-mediated) Long interspersed elements (LINEs); these encode a reverse transcriptase Short interspersed elements (SINEs)(RNA-mediated); these include Alu repeats DNA transposons (3% of human genome)

Five main classes of repetitive DNA Table 16.5 Page Interspersed repeats (transposon-derived repeats) Examples include retrotransposed genes that lack introns, such as: ADAM20 NM_ q (original gene on 8p) Cetn1NM_ p (original gene on Xq) Glud2NM_012084Xq (original gene on 10q) Pdha2NM_ q (original gene on Xp)

Five main classes of repetitive DNA Page Processed pseudogenes These genes have a stop codon or frameshift mutation and do not encode a functional protein. They commonly arise from retrotransposition, or following gene duplication and subsequent gene loss. For a superb on-line resource, visit Mark Gerstein’s website,

Five main classes of repetitive DNA Page Simple sequence repeats Microsatellites: from one to a dozen base pairs Examples: (A) n, (CA) n, (CGG) n These may be formed by replication slippage. Minisatellites: a dozen to 500 base pairs Simple sequence repeats of a particular length and composition occur preferentially in different species. In humans, an expansion of triplet repeats such as CAG is associated with at least 14 disorders (including Huntington’s disease).

Example of a simple sequence repeat (CCCA or GGGT) in human genomic DNA

Five main classes of repetitive DNA Page Segmental duplications These are blocks of about 1 kilobase to 300 kb that are copied intra- or interchromosomally. Evan Eichler and colleagues estimate that about 5% of the human genome consists of segmental duplications. Duplicated regions often share very high (99%) sequence identity. As an example, consider a group of lipocalin genes on human chromosome 9.

Fig Page 548 Successive tandem gene duplications (after Lacazette et al., 2000) observed today

Fig Page 548 Successive tandem gene duplications (after Lacazette et al., 2000)

Fig Page 548 Successive tandem gene duplications (after Lacazette et al., 2000)

Fig Page 548 Successive tandem gene duplications (after Lacazette et al., 2000)

Five main classes of repetitive DNA Page Blocks of tandem repeats These include telomeric repeats (e.g. TTAGGG in humans) and centromeric repeats (e.g. a 171 base pair repeat of  satellite DNA in humans). Such repetitive DNA can span millions of base pairs, and it is often species-specific.

Fig Page 549 Example of telomeric repeats (obtained by tblastn searching TTAGGG 4 )

Five main classes of repetitive DNA Page Blocks of tandem repeats In two exceptional cases, chromosomes lack satellite DNA: Saccharomyces cerevisiae (very small centromeres) Neocentromeres (an ectopic centromere; 60 have been described in human, often associated with disease)

Software to detect repetitive DNA It is essential to identify repetitive DNA in eukaryotic genomes. RepBase Update is a database of known repeats and low-complexity regions. RepeatMasker is a program that searches DNA queries against RepBase. There are many RepeatMasker sites available on-line. We will use 100,000 base pairs from human chromosome 10 as an example. This region (from NT_008769) includes the retinol-binding protein 4 gene. Page 550

Fig Page 551

Fig Page 553

Fig Page 554 RepeatMasker identifies simple sequence repeats

Fig Page 554

Fig Page 554 RepeatMasker identifies Alu repeats

RepeatMasker masks repetitive DNA (FASTA format) Fig Page 555

Finding genes in eukaryotic DNA Two of the biggest challenges in understanding any eukaryotic genome are defining what a gene is, and identifying genes within genomic DNA Page 551

Finding genes in eukaryotic DNA Types of genes include protein-coding genes pseudogenes functional RNA genes --tRNAtransfer RNA --rRNAribosomal RNA --snoRNAsmall nucleolar RNA --snRNAsmall nuclear RNA --miRNAmicroRNA Page 552

Finding genes in eukaryotic DNA RNA genes have diverse and important functions. However, they can be difficult to identify in genomic DNA, because they can be very small, and lack open reading frames that are characteristic of protein-coding genes. tRNAscan-SE identifies 99 to 100% of tRNA molecules, with a rate of 1 false positive per 15 gigabases. Visit Page 553

Finding genes in eukaryotic DNA Protein-coding genes are relatively easy to find in prokaryotes, because the gene density is high (about one gene per kilobase). In eukaryotes, gene density is lower, and exons are interrupted by introns. There are several kinds of exons: -- noncoding -- initial coding exons -- internal exons -- terminal exons -- some single-exon genes are intronless Page 553

Fig Page 558 Eukaryotic gene prediction algorithms distinguish several kinds of exons

Finding genes in eukaryotic DNA Algorithms that find protein-coding genes are extrinsic or intrinsic (refer to Chapter 12, Completed genomes, Figure 12.17). Page 555

Gene-finding algorithms Homology-based searches (“extrinsic”) Rely on previously identified genes Algorithm-based searches (“intrinsic”) Investigate nucleotide composition, open- reading frames, and other intrinsic properties of genomic DNA Page 556

DNA RNA Mature RNA protein intron Page 556

DNA RNA protein Extrinsic, homology-based searching: compare genomic DNA to expressed genes (ESTs) intron Page 556

DNA RNA Intrinsic, algorithm-based searching: Identify open reading frames (ORFs). Compare DNA in exons (unique codon usage) to DNA in introns (unique splices sites) and to noncoding DNA. Page 556

Finding genes in eukaryotic DNA While ESTs are very helpful in finding genes, beware of several caveats. -- The quality of EST sequence is sometimes low -- Highly expressed genes are disproportionately represented in many cDNA libraries -- ESTs provide no information on genomic location Page 557

Finding genes in eukaryotic DNA Both intrinsic and extrinsic algorithms vary in their rates of false-positive and false-negative gene identification. Programs such as GENSCAN and Grail account for features such as the nucleotide composition of coding regions, and the presence of signals such as promoter elements. Try using the on-line genome annotation pipeline offered by Oak Ridge National Laboratory. Google ORNL pipeline, or visit Page 557

Fig Page 560 Oak Ridge National Laboratory (ORNL) offers an on-line annotation pipeline

Fig Page 561

Fig Page 561

Fig Page 562

Finding genes in eukaryotic DNA We used 100,000 base pairs of human DNA. The pipeline correctly identified several exons of RBP4, but failed to generate a complete gene model. As another example, initial annotation of the rice genome yielded over 75,000 gene predictions, only 53,000 of which were complete (having initial and terminal exons). Also, it is very difficult to accurately identify exon-intron boundaries. Estimates of gene content improve dramatically when finished (rather than draft) sequence is analyzed. Page 561

Protein-coding genes in eukaryotic DNA: a new paradox The C value paradox is answered by the presence of noncoding DNA. Why are the number of protein-coding genes about the same for worms, flies, plants, and humans? This has been called the N-value paradox (number of genes) or the G value paradox (number of genes). Page 562

Transcription factor databases In addition to identifying repetitive elements and genes, it is also of interest to predict the presence of genomic DNA features such as promoter elements and GC content. See Table (p. 564) for a list of websites that predict transcription factor binding sites and related sequences. Page 563

Eukaryotic genomes are organized into chromosomes Genomic DNA is organized in chromosomes. The diploid number of chromosomes is constant in each species (e.g. 16 in S. cerevisiae, 46 in human). Chromosomes are distinguished by a centromere and telomeres. The chromosomes are routinely visualized by karyotyping (imaging the chromosomes during metaphase, when each chromosome is a pair of sister chromatids). Page 564

Fig Page 565

Eukaryotic chromosomes can be dynamic Chromosomes can be highly dynamic, in several ways. Whole genome duplication (autopolyploidy) can occur, as in yeast (Chapter 15) and some plants. The genomes of two distinct species can merge, as in the mule (male donkey, 2n = 62 and female horse, 2n = 64) An individual can acquire an extra copy of a chromosome (e.g. Down syndrome, TS13, TS18) Chromosomes can fuse; e.g. human chromosome 2 derives from a fusion of two ancestral primate chromosomes Chromosomal regions can be inverted (hemophilia A) Portions of chromosomes can be deleted (e.g.  11q syndrome) Segmental and other duplications occur Chromatin diminution can occur (Ascaris) Page 565

Comparison of eukaryotic DNA: PipMaker and VISTA We studied pairwise sequence alignment at the beginning of the course. In studying genomes, it is important to align large segments of DNA. PipMaker and VISTA are two tools for sequence alignment and visualization. They show conserved segments, including the order and orientation of conserved elements. They also display large-scale genomic changes (inversions, rearrangements, duplications). Try VISTA ( or PipMaker ( with genomic DNA from Hs10 and Mm19 (containing RBP4). Page 566

Fig Page 568 VISTA output for an alignment of human and mouse genomic DNA (including RBP4)

Fig Page 568 VISTA output for an alignment of human and mouse genomic DNA (including RBP4)

This lecture continues with a discussion of individual eukaryotic genomes (part 2)