Chapter 21 Lecture Outline Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. See separate PowerPoint slides.

Chapter 21 Lecture Outline Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. See separate PowerPoint slides for all figures and tables pre-inserted into PowerPoint without notes. 1

Chapter 21 Genomes, Proteomes, and Bioinformatics Bacterial and Archaeal Genomes Eukaryotic Genomes Proteomes Bioinformatics 2 Key Concepts:

The unifying theme of biology is evolution The genome of every living species is the product of over 3.5 billion years of evolution All species evolved from an interrelated group of ancestors 3 Bacterial and Archaeal Genomes

Prokaryotic genomes Important because  Bacteria cause disease  Can apply knowledge to more complex organisms  Origin of first eukaryotic cell probably involved the union between an archaeal and bacterial cell Entire genomes of many prokaryotes have been sequenced and analyzed Prokaryotic genomes less complex than those of eukaryotes  Lack centromeres and telomeres  Single origin of replication  Relatively little repetitive DNA 4

Prokaryotic chromosomes usually several hundred thousand to a few million bp Most contain a single chromosome  Multiple copies may be found in a single cell  Some prokaryotes are known to have different chromosomes Bacterial chromosomes usually circular  Linear chromosomes in some prokaryotes  Some have both linear and circular Often have plasmids – typically small 5

Venter, Smith, and Colleagues Sequenced the First Genome in 1995 Haemophilus influenzae causes a variety of human diseases Relatively small genome – 1.8 Mb One strategy for mapping large genomes is extensive mapping Alternative is shotgun DNA sequencing  Randomly sequence fragments  Does not require extensive mapping but you may waste time sequencing the same DNA region

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 GOAL The goal is to obtain the entire genome sequence of Haemophilus influenzae. This information will reveal its genome size and also which genes the organism has. 1 2 Clone the DNA fragments into vectors. The procedures for cloning are described in Chapter 20. This produces a DNA library. 3 4 5 KEY MATERIALS A strain of H. influenzae. Conceptual levelExperimental level H.influenzae chromosomal DNA Sound waves A DNA library Purify DNA from a strain of H. influenzae. This involves breaking the cells open by adding phenol and chloroform. Most protein and lipid components go into the phenol- chloroform phase. DNA remains in the aqueous (water) phase. DNA fragments in aqueous phase Proteins and lipids in phenol-chloroform phase DNA in aqueous (water) phase Vector DNA Sonicate the DNA to break it into small fragments of about 2,000 bp in length. Sound waves Refer back to Figures 20.2 and 20.3. Piece of H.influenzae DNA Produces a large number of sequences with overlapping regions. Refer back to Figure 20.9. Subject many clones to the procedure of dideoxy sequencing, also described in Chapter20. A total of 10.8 Mb was sequenced. Use tools of bioinformatics, described in the last section of this chapter, to identify various types of genes in the genome. Explores the genome sequence and identifies and characterizes genes.

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 6 7 CONCLUSION H. influenzae has a genome size of 1.83 Mb with approximately 1,743 genes. The functions of many of those genes could be inferred by comparing them to genes in other species. 8 SOURCE Fleischmann et al. 1995. Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science 269:496−512. 1 1,600,000 1,200,000 1,830,137 bp ~1,743 genes 400,000 800,000 Functions of Proteins Encoded by Genes % of genome Other categories Transcription Translation Transport and binding proteins Replication Regulatory functions Metabolism of purines, pyrimidines, nucleosides, and nucleotides 5.3 6.8 5.4 5.3 8.3 3.0 2.5 10.4 6.3 8.6 12.2 14.0 2.7 9.2 Fatty acid/phospholipid metabolism Energy metabolism Central intermediary metabolism Cellular processes Cell envelope Biosynthesis of cofactors, prosthetic groups, carriers Amino acid biosynthesis THE DATA

Nuclear genome usually found in sets of linear chromosomes Extranuclear DNA found in mitochondria and chloroplasts Entire nuclear genome has been sequenced for many species 10 Eukaryotic Genomes

Four motivators to sequence genomes 1. Great benefit from identifying and characterizing genes in model organisms 2. More information to identify and treat human diseases 3. Improved strains of agricultural species 4. Way to establish evolutionary relationships 11

Genome size is not the same as the number of genes Relative size of nuclear genome varies dramatically In general, increases in the amount of DNA are correlated with increasing cell size, cell complexity and body complexity However, major variations can be observed between organisms with similar form 12

14 b: © The Picture Store/SPL/Photo Researchers, Inc.; c: © Photo by Michael Beckmann, Institute of Geobotany and Botanical Garden, Halle, Germany Fungi Plants Insects Mollusks Fish Reptiles Brids Mammals AmphibiansNucleotide base pairs per haploid genome (b) Echinops bannaticus (c) Echinops nanus Species groups (a) Genome size 10 12 10 11 10 10 9 10 8 10 7 10 6 Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Eukaryotic genomes have repetitive sequences  Many copies of short DNA sequences Moderately repetitive sequences  Few hundred to several thousand times  rRNA genes, multiple origins of replication, or role in gene transcription and translation Highly repetitive sequences  Tens of thousands or millions of times  Most have no known function Coding regions are only 2% of our genome 15 Repetitive sequences

Noncoding vs. coding DNA 98% of genome is noncoding  Intron DNA – 24%  Unique noncoding DNA – 15%  Repetitive DNA – 59% Much derived from transposable elements 2% of genome is in coding regions  Exons of structural genes  Genes for rRNA and tRNA 16

17 Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 0 Percentage in the human genome 100 75 50 25 Regions of genes that encode proteins (exons) or give rise to rRNA or tRNA Classes of DNA sequences Repetitive DNA Unique noncoding DNA Introns and other parts of genes such as enhancers 59% 15% 24% 2%

Transposable elements Transposition – when a short segment of DNA moves from original site to a new site Transposable elements (TEs)  DNA segments that move  “Jumping genes”  Found in all species examined First discovered by Barbara McClintock  1983, awarded Nobel Prize 18

19 Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. (a) Barbara McClintock(b) Speckled corn kernels caused by transposable elements a: © Topham/The Image Works; b: © Jerome Wexler/Visuals Unlimited.

DNA transposons  Both ends have inverted repeats – DNA sequences that are identical (or very similar) but run in opposite directions  TEs may contain a central region that encodes transposase, an enzyme that facilitates transposition  Cut-and-paste mechanism Transposase recognizes inverted repeat and then removes sequence from original site Complex moves to new location where transposase inserts it into the chromosome 20

21 Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 3 4 2 Transposase gene IR DNA transposon (a) Organization of a DNA transposon Transposase recognizes the inverted repeats. Transposase cleaves at both ends of the DNA transposon, releasing it from its original site. Transposase carries the transposon to a new site. Transposase cleaves the target DNA at the staggered sites and inserts the transposon at the new site. Transposase DNA transposon — A few hundred to several thousand base pairs in length 5′ 3′ 5′ 3′ IR (b) Cut-and-paste mechanism of transposition Transposon inserted into new site IR DNA transposon Transposase

RNA intermediates  Common only in eukaryotes  Retroelements or retrotransposons  Retroelement contains reverse transcriptase and transposase Reverse transcriptase uses RNA as a template to make a complementary copy of DNA  Retroelements may accumulate rapidly in a genome Alu elements are 10% of human genome 22

23 Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 3 Terminal repeat (b) Mechanism of movement of a retroelement Integrase inserts this retroelement DNA into the chromosome. Reverse transcriptase gene Integrase gene Terminal repeat RNA polymerase Integrase RERetroelement (a) Organization of a retroelement DNA 2 Reverse transcriptase uses RNA as a template to synthesize a double-stranded DNA molecule. Reverse transcriptase 4 The chromosome now contains 2 copies of the retroelement. RNA polymerase transcribes the retroelement into RNA. RE RNA

Not resolved Selfish DNA hypothesis  TEs exist because they have characteristics that allow them to insert themselves and replicate  Resemble parasites, can do harm Others argue TEs may benefit a species  Promote genetic variation 24 What is the role of transposable elements?

Gene duplication Provides raw material for the addition of new genes into a species’ genome Create homologous genes  Two or more genes that are derived from the same ancestral gene Over many generations, each version of the gene accumulates different mutations  Genes with similar but not identical DNA sequences 25

26 Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 Over the course of many generations, the 2 genes may gradually accumulate DNA mutations that make them somewhat different. 2 Gene Homologous genes Mutations Gene An abnormal genetic event occurs (such as a misaligned crossover) that causes a gene duplication. Gene (a) Gene duplication and the formation of homologous genes

Mechanism Gene duplication caused by misaligned crossovers Two homologous chromosomes have paired during meiosis but the homologs are misaligned If a crossover occurs, one chromosome gets a duplication, one a deletion, and two are normal 27

28 Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Each of these chromosomes will be segregated into different haploid cells. (b) Mechanism of gene duplication Gene deletion Gene duplication Following meiosis Misaligned crossover between homologous chromosomes during meiosis. A B CD A A B B C C D D A B C D A B CD A B C CD A B D A B D C

Paralogs  Two or more homologous genes within a single species Gene family  Two or more paralogous genes that carry out related functions 29

ex: Globin genes Encode polypeptides that are subunits of proteins that function in oxygen binding 14 paralogs derived from a single ancestral globin gene Duplications and rearrangements occurred Mutations have created specialized globins  Hemoglobin, myoglobin, embryonic and fetal forms  Based on differences in oxygen transport needs 30

Pseudogenes  Genes that have been produced by gene duplication but have accumulated mutations that make them nonfunctional  Not transcribed into RNA 31 Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Millions of years ago (mya) 1,000 800 600 400 200 0 Chromosome 16Chromosome 11 β-globins Hemoglobins α-globins Chromosome 22 Mb Nonfunctional pseudogenes  αα αα α2α2 α1α1  εγGγG CβCβ  βγAγA Myoglobin Ancestral globin 2 1

Human Genome Project Officially began October 1, 1990 Largely finished by end of 2003 Goals  Identify all human genes  Sequence entire human genome  Develop technology  Analyze genomes of model organisms  Develop legal, ethical and social programs addressing the results 32

Proteome – the collection of proteins that a given cell or species makes Protein abundance – can refer to  Number of genes for a type of protein in the genome  Amount of each protein made by a cell ex: Liver cell vs. muscle cell  Both cells have the same genes  Cellular protein abundance very different 33 Proteomes

34 Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Abundance in cell Abundance in genome Liver cell Genes for metabolic 25% enzymes Genes for structural 5% proteins Genes for motor < 2% proteins Metabolic enzymes > 50% Structural proteins < 10% Motor proteins < 5% Abundance in cell Abundance in genome Skeletal muscle cell Genes for metabolic 25% enzymes Genes for structural 5% proteins Genes for motor < 2% proteins Metabolic enzymes < 10% Structural proteins 20–30% Motor proteins 25–40%

Proteomes are larger than genomes Due to…  Alternative splicing A single pre-mRNA can be spliced into more than one version Often cell specific or related to environmental conditions  Post-translational covalent modification Permanent or reversible Involved in assembly and construction of protein Phosphorylation, methylation, acetylation 36

37 Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. or Introns pre-mRNA Exon 6 (a) Alternative splicing Each of these 3 polypeptides has segments with different amino acid sequences. Exon 5Exon 4Exon 1Exon 2Exon 3 Translation Alternative splicing Exon 1Exon 4 Exon 5 Exon 6 or Exon 2 Exon 6 Exon 5Exon 3 Exon 1 Exon 6Exon 2 Exon 4 Mature mRNA Exon 1Exon 4

38 Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. C O Attachment of prosthetic groups, sugars, or lipids SS Permanent modifications Heme group Sugar Phospholipid Reversible modifications Phosphate group Acetyl group Methyl group (b) Post-translational covalent modification Phosphorylation SH Disulfide bond formation Proteolytic processing Acetylation Methylation PO 4 2– CH 3

Use of computers, mathematical tools, and statistical techniques to record, store, and analyze biological information More than just DNA sequences Highly interdisciplinary – incorporates principles from mathematics, statistics, information science, chemistry, and physics 39 Bioinformatics

First step is to collect and store data Then write programs to analyze sequences in particular ways  Translate DNA sequence into amino acid sequence – results for all 3 reading frames  May not know which strand is coding strand so look at results for both strands 40

41 Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 5′ 3′ Computer DNA sequence file TTCCCAAG GAGCTTTCAC CATCAAGATC CCCATTTGGA AGACCTTCTC ATTTGAGGAG TCAAAGAAGA AACTGCAATC ACTCTCTGAC GTGTCCACGCGGTCCTGGAA GAGATAAGAA TTGAGGCATG TAAACGTAGC GTTTAAAGAA AATGATGTAA AGTTGGTGCA AAAATGGTGC TTTGGACAGG AACCCAGGCT GAAAGACACA ACATTGGTGC CTGCCTGCTC AGATGAGTAT ACCTGACCCA TTGGCCAAAG CATATCACTG AAACAAGCTA TGGGCAGGAA TATTGAAGAC ATCTTCTCAC GTGCCCTGGT CACTGTCCAT TGACAAACAT GAATTTTTCA CATTGAATCT TATTGCGCTT Run a computer program that translates this DNA sequence into an amino acid sequence in all 3 reading frames. 5′3′ 5′3′ 5′3′ Possible amino acid sequences Frame 1 Val Ser Thr Arg Ser Trp Lys Thr Gln Ala Trp Ala Gly Asn Ser Leu Thr Leu Asp Arg Lys Gln Ala Ile Leu Lys Thr Thr Ala Ile Lys Met Val Pro Tyr His STOP Ser Ser His Ser Lys Lys Lys Leu Val His Trp Pro Lys Tyr Cys Ala Tyr Leu Arg Arg Met Met STOP Thr STOP Pro Thr Leu Asn Leu Asp Leu Leu Val STOP Arg Lys Met Ser Met Asn Phe Ser Pro Ile Trp Ile Asn Val Ala Cys Leu Leu STOP Gln Thr Ser Ser Arg Ser STOP Gly Met Thr Leu Val Pro Leu Ser Met Ser Phe His Glu Ile Arg Arg Lys Thr Gln Cys Pro Gly Ser Gln Cys Pro Arg Gly Pro Gly Lys Pro Arg Leu Gly Gln Glu Thr Leu STOP Leu Trp Thr Gly Asn Lys Leu Tyr STOP Arg Gln Leu Gln Ser Lys Trp Cys His Ile Thr Asp Leu Leu Thr Gln Arg Arg Ser Trp Cys Ile Gly Gln Ser Ile Ala Leu Ile STOP Gly Glu STOP Cys Lys Pro Asp Pro His STOP Ile STOP Thr Phe Ser Phe Lys Glu Arg STOP Val STOP Ile Phe His Pro Phe Gly STOP Thr STOP Pro Ala Cys Ser Asp Lys His His Gln Asp Leu Glu Ala STOP His Trp Cys His Cys Pro STOP Ala Phe Thr Arg STOP Glu Glu Arg His Ser Ala Leu Val Pro Lys Frame 2 Val His Ala Val Leu Glu Asn Pro Gly Leu Gly Arg Lys Leu Ser Asp Phe Gly Gln Glu Thr Ser Tyr Ile Glu Asp Asn Cys Asn Gln Asn Gly Ala Ile Ser Leu Ile Phe Ser Leu Lys Glu Glu Val Gly Ala Leu Ala Lys Val Leu Arg Leu Phe Glu Glu Asn Asp Val Asn Leu Thr His Ile Glu Ser Arg Pro Ser Arg Leu Lys Lys Asp Glu Tyr Glu Phe Phe Thr His Leu Asp Lys Arg Ser Leu Pro Ala Leu Thr Asn Ile Ile Lys Ile Leu Arg His Asp Ile Gly Ala Thr Val His Glu Leu Ser Arg Asp Lys Lys Lys Asp Thr Val Pro Trp Phe Pro Frame 3

Databases Collect large numbers of files and store them in one place for rapid search and retrieval Research community has collected genetic information from thousands of research labs and created several large databases 42

Databases Nucleotide sequences  GenBank (a U.S. database)  EMBL (European Molecular Biology Laboratory)  DDBJ (DNA Data Bank of Japan) Amino acid sequences  Swiss-Prot (Swiss protein database)  PIR (Protein Information Resource)  TrEMBL (translated sequences from the EMBL database)  Genpept (translated sequences from the GenBank database) 43

Identify homologous sequences Software can identify evolutionarily related genes Closely related organisms tend to have genes with similar DNA sequences Ortholog – homologous genes in different species Reveals evolutionary relationships 44

45 Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. GGGCAGGTTGGTATCCAGGTTACAAGG C AGCTC AC AAGTAGAAG C T G GGTGCTTGGAGAC GGGCAGGTTGGTATCCAGGTTACAAGG T AGCTC CT AAGTAGAAG T T T GGTGCTTGGAGAC X Mouse Rat Evolutionary divergence produced many species of rodents. (b) The formation of homologous  -globin genes during evolution of mice and rats Rattus norvegicus Random mutations Mus musculus XX (a) A comparison of one DNA strand of the mouse and rat β-globin genes  -Globin gene in common ancestor to mice and rats Accumulation of random mutations over many generations Time

A matrix can be used to compare two sequences Long DNA sequences require complex dynamic programming methods 46 Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. L L O O O O G G Y Y I I B BLOOGYCE (b) Comparison of two different words (a) Comparison of two identical words

BLAST Homologous genes usually carry out similar or identical functions First indication of function for a new sequence is through homology to known sequences Basic Local Alignment Search Tool (BLAST) Uses particular genetic sequence to find homologous sequences in a large database 47

Chapter 21 Lecture Outline Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. See separate PowerPoint slides.

Similar presentations

Presentation on theme: "Chapter 21 Lecture Outline Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. See separate PowerPoint slides."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Chapter 21 Lecture Outline Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. See separate PowerPoint slides.

Similar presentations

Presentation on theme: "Chapter 21 Lecture Outline Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. See separate PowerPoint slides."— Presentation transcript:

Similar presentations

About project

Feedback