Download presentation
Presentation is loading. Please wait.
1
Genomics 20
2
Key Concepts Once a genome has been completely sequenced, researchers use a variety of techniques to identify which sequences code for products and which act as regulatory sites. Bacterial and archaeal genomes are relatively small. Among species, there is a positive correlation between total gene number and metabolic capabilities. Gene transfer between species is also common.
3
Key Concepts Eukaryotic genomes are large and complex. They include many sequences that have little to no effect on the fitness of the organism, and many transcribed sequences whose function is not known. Data and techniques derived from genome sequencing projects are being used to analyze cancer cells.
4
Introduction The complete DNA sequence of an organism is its genome. The human genome sequence was published in February 2001 as part of the Human Genome Project. Genomics is the scientific effort to sequence, interpret, and compare whole genomes. Genomics provides a list of the genes present in an organism. Functional genomics looks at when those genes are expressed and how their products interact.
5
Whole-Genome Sequencing
Improved automation has increased the speed and reduced the cost of DNA sequencing. The primary international repositories for DNA sequence data now contain over 194 billion nucleotides. With about 3 billion nucleotides, humans have the largest haploid genome sequenced to date. The size of the database increases by about 30 percent every year.
7
How Are Complete Genomes Sequenced?
Most genome sequencing projects use a whole-genome shotgun sequencing approach. In this process, the genome is broken up into a set of overlapping fragments that are sequenced, and these sequences are then put in order.
8
The Shotgun Sequencing Process
1. Sonication (use of high-frequency sound waves) breaks a genome into pieces approximately 160 kilobases long. 2. Each piece is inserted into a plasmid called a bacterial artificial chromosome (BAC). A BAC library is created by inserting each BAC into a different Escherichia coli cell. Colonies of each cell are allowed to grow, creating multiple copies of each BAC library. 3. Each 160-kb DNA segment is broken into 1-kb segments.
9
The Shotgun Sequencing Process
Each 1-kb segment is cloned into a plasmid. These plasmids are then inserted into E. coli cells and replicated, producing shotgun clones. The fragments from each clone are then sequenced and analyzed by computer programs. The computer puts the sequences in order, thus reconstructing the BACs. The ends of the reconstructed BACs are similarly analyzed. The goal is to arrange each 160-kb segment in its correct position along the chromosome, based on regions of overlap.
14
The Shotgun Sequencing Process
In essence, the shotgun strategy consists of breaking a genome into tiny fragments, sequencing the fragments, and then putting the sequence data back into the correct order.
15
The Role of Next-Generation Sequences Strategies
Pyrosequencing is a cheaper and faster alternative to traditional sequencing. It takes place on a single DNA fragment rather than multiple copies of the same fragment. However, it only works with fragments that are too small to be pieced back together to reconstruct a complete genome accurately. If the entire genome of the organism is known, pyrosequencing produces the sequence of an individual for comparison to the “master genome.”
16
How Are Complete Genomes Sequenced?
Bioinformatics is the effort to manage, analyze, and interpret biological information, and is key to managing the vast quantity of data generated by genome sequencing.
17
Which Genomes Are Being Sequenced, and Why?
The first genome of an organism to be sequenced was that of the bacterium Haemophilus influenzae in 1995; it consists of about 1.8 million base pairs. The first eukaryotic genome to be sequenced was that of the yeast Saccharomyces cerevisiae in 1996. To date, complete genomes have been sequenced from over 800 species. Most of the organisms that have been sequenced cause disease or have other interesting biological properties.
18
Which Sequences Are Genes?
The most basic task in annotating or interpreting a genome is to identify which bases constitute genes. Identifying genes is relatively straightforward in bacteria and archaea but is much more difficult in eukaryotes, who have many noncoding sequences in their genomes.
19
Identifying Genes in Bacterial and Archaeal Genomes
Computer programs are used to scan a genome sequence in both directions in order to identify open reading frames (ORFs). ORFs are possible genes—long stretches of sequence that lack a stop codon but are flanked by a start codon and a stop codon. The computer programs also look for sequences typical of promoters, operators, and other regulatory sites. Researchers can confirm that an ORF is actually a gene by analyzing its product or by finding that it is homologous (similar due to common ancestry) to a known gene.
21
Identifying Genes in Eukaryotic Genomes
In eukaryotic organisms, genes contain introns, and most of the genome does not code for a product—thus, it is not possible to scan for ORFs. The most effective strategy for identifying genes is to use reverse transcriptase to produce a cDNA version of each mRNA, and sequence a portion of the resulting molecule to produce an expressed sequence tag, or EST. ESTs represent protein-coding genes.
22
Web Activity: Human Genome Sequencing Strategies
23
Bacterial and Archaeal Genomes
By sequencing the genomes of various strains of the same prokaryotic species, researchers can now compare the genomes of closely related organisms that have different ways of life.
24
The Natural History of Prokaryotic Genomes
In bacteria, there is a general correlation between the size of the genome and the metabolic capabilities of the organism. The function of many bacterial genes is still unknown. There is tremendous genetic diversity among bacteria and archaea. About 15 percent of the genes in a prokaryotic genome are unique to its own species. Redundancy among genes is common. Some genes are found multiple times within a prokaryotic genome.
25
The Natural History of Prokaryotic Genomes
Multiple chromosomes and plasmids are more common than expected. In many bacterial and archaeal species, a significant portion of the genome appears to have been acquired from other, often distantly related, species.
27
Lateral Gene Transfer The movement of DNA from one species to another species is called lateral gene transfer. Recent evidence suggests that over 50 percent of archaean species and 30–50% of bacterial species have at least one gene acquired by lateral gene transfer.
28
Evidence for Lateral Gene Transfer
Two general criteria support the hypothesis that sequences in bacterial or archaeal genomes originated in another species: A gene is much more similar to genes in distantly related species than it is to those in closely related species. When the proportion of G-C base pairs to A-T base pairs in a particular gene or series of genes is markedly different from the base composition of the rest of the genome.
29
How Does Lateral Gene Transfer Occur?
Lateral gene transfer often results because genes are carried on plasmids. Another way lateral gene transfer occurs is through transformation, taking up DNA fragments from the environment. Thus, mutation and genetic recombination within species are not the only sources of genetic variation in bacteria and archaea.
30
Environmental Sequencing
Environmental sequencing, or metagenomics, is the practice of cataloging all of the genes present in a community of bacteria and archaea. The subject of these studies is genes—not organisms. This method resulted in the discovery of nearly 150 new species of bacteria, and over 1 million new alleles in the Sargasso Sea.
31
Eukaryotic Genomes Many eukaryotic genomes are dominated by repeated DNA sequences that occur between genes or inside introns and do not code for products used by the organism. Sequencing eukaryotic genomes presents unique challenges. Eukaryotic genomes are much larger than the genomes of bacteria and archaea. The presence of noncoding repetitive sequences.
32
Parasitic and Repeated Sequences
Protein-coding sequences constitute a very small percentage of the human genome, and repetitive sequences make up more than 50 percent. In contrast, over 90 percent of the prokaryotic genome consists of genes. Repeated sequences in the human genome are often the result of transposable elements—segments of DNA that can move from one location in a genome to another.
33
Characteristics of Transposable Elements
Transposable elements are examples of selfish genes—parasitic DNA sequences that survive and reproduce but that do not increase the fitness of the host genome. Transposable elements are classified as parasitic because they decrease their host’s fitness: It takes time and resources to copy them along with the rest of the genome. They can disrupt gene function when they insert in a new location.
34
How Do Transposable Elements Work?
Long interspersed nuclear elements (LINEs) are one type of transposable element. An active LINE contains all the sequences required to make copies of itself and insert them into a new location in the genome. Analyses of the human genome have revealed that only a handful of LINEs appear to be complete and potentially active. However, virtually every prokaryotic and eukaryotic genome examined to date contains at least some transposable elements.
38
Repeated Sequences Eukaryotic genomes have several thousand loci called short tandem repeats (STRs). These are small sequences repeated down the length of a chromosome. There are two types of STRs. Microsatellites, or simple sequence repeats, are repeating units of 1 to 5 bases. Minisatellites, or variable number terminal repeats (VNTRs), are repeating units of 6 to 500 bases. Repeated sequences are hypervariable and vary among individuals much more than any other type of sequence.
39
Repeated Sequences One hypothesis for why microsatellites and minisatellites have so many different alleles is that these highly repetitive stretches may misalign when chromosomes synapse during meiosis. This misalignment then causes unequal crossover. Chromosomes produced by unequal crossover contain different numbers of repeats.
41
Repeated Sequences and DNA Fingerprinting
DNA fingerprinting refers to any technique for identifying individuals on the basis of unique features of their genomes. Because microsatellite and minisatellite loci vary so much among individuals, they are now the markers of choice for DNA fingerprinting.
42
DNA Fingerprinting Process
A sample of DNA is acquired from the individual. PCR is performed using primers that flank a region containing an STR. The region is cloned. The region can be analyzed to determine the number of repeats present.
44
BLAST Animation: DNA Fingerprinting
45
Gene Families In eukaryotes, the major source of new genes is duplication of existing genes. Within a species, genes that are extremely similar to each other in structure and function are considered to be part of the same gene family. Genes that make up gene families are hypothesized to have arisen from a common ancestral sequence through gene duplication.
46
How Do Gene Families Arise?
When gene duplication occurs, an extra copy of a gene is added to the genome. The most common type of gene duplication results from unequal crossing over during meiosis. The redundancy of duplicated genes may allow one copy to mutate to create a new gene with different function or regulation, possibly leading to the evolution of novel traits.
48
New Genes—New Functions?
Gene duplication is important because the original gene is still functional and produces a normal product. The duplicated gene may: Retain its original function and provide additional quantities of the same product. Undergo mutation resulting in a beneficial altered protein, thus creating an important new gene. Be a nonfunctional pseudogene, a remnant of a functional copy of the gene that does not produce a working product.
49
Insights from the Human Genome Project
Scientists do not know the function of more than half of the genes found in the human genome. Two recent discoveries are changing biologists’ thinking about the human genome: Genes for miRNAs are much more common than previously thought. A much larger proportion of the genome is transcribed than previously thought. Many of these sequences are referred to as Transcripts of Unknown Function (TUFs) because their role in the cell is unknown.
51
Why Do Humans Have So Few Genes?
A surprising observation about eukaryotic genomes is that organisms with complex morphology and behavior do not appear to have large numbers of genes. Before the human genome was sequenced, scientists expected that humans would have at least 100,000 genes. However, the actual sequence revealed that we have only about 20,000 genes. The alternative-splicing hypothesis proposes that certain multicellular eukaryotes do not need large numbers of genes because alternative splicing creates different proteins from the same gene.
52
Similarities between Human and Chimp Genomes
At the level of base sequence, the human and chimpanzee genomes are 98.8 percent identical. This raises the question of how humans and chimps can be so similar genetically but so different in morphology and behavior. One hypothesis proposes that even though many structural genes (those that code for products) in humans and chimps are identical, regulatory genes (those that code for regulatory transcription factors) of the two species might have important differences.
53
Functional Genomics and Proteomics
Whole-genome data can be used to answer fundamental questions about how organisms work. Large-scale analyses of gene expression are called functional genomics. One of the basic tools of functional genomics is a DNA microarray. Microarrays, used to study gene expression, consist of a large number of single-stranded DNAs that are permanently affixed to a glass slide.
55
How Are DNA Microarrays Used?
mRNAs produced in two contrasting types of cells are isolated, and then cDNAs produced from these mRNAs are used to probe the microarray. Researchers can thus identify differences in which genes are expressed in the two cell types. A microarray allows researchers to study the expression of thousands of genes at a time, and to identify which sets of genes are expressed together under specific sets of conditions.
59
What Is Proteomics? A transcriptome is the complete set of genes that are transcribed in a particular cell. A proteome is the complete set of proteins that are produced. Proteomics is the large-scale study of protein function. Instead of studying individual proteins or how two proteins might interact, proteomics is based on studying all of the proteins present at once.
60
Applied Genomics: Understanding Cancer
Researchers are using tools created by advances in genomics to deepen our understanding of cancer. Microarrays allow researchers to compare gene expression in normal versus cancerous cells. The Human Genome Project has revealed common sets of genes that are mutated in cancerous cells. The complete genome sequences of cancerous and noncancerous cells from the same person identified over 600 mutations in the cancerous cells.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.