Genomics Chapter 18
Mapping Genomes Maps of genomes can be divided into 2 types: -Genetic maps -Abstract maps that place the relative location of genes on chromosomes based on recombination frequency. -Physical maps -Use landmarks within DNA sequences, ranging from restriction sites to the actual DNA sequence.
Physical Maps Distances between “landmarks” are measured in base-pairs. -1000 basepairs (bp) = 1 kilobase (kb) Knowledge of DNA sequence is not necessary. There are three main types of physical maps: -Restriction maps (constructed use restriction enzymes) -Cytological maps (chromosome-banding pattern) -Radiation hybrid maps (using radiation to fragment chromosomes)
Restriction maps -The first physical maps; -Based on distances between restriction sites; -Overlap between smaller segments can be used to assemble them into a contig -Continuous segment of the genome.
Physical Maps Cytological maps -Employ stains that generate reproducible patterns of bands on the chromosomes -Divide chromosomes into subregions -Provide a map of the whole genome, but at low resolution -Cloned DNA is correlated with map using fluorescent in situ hybridization (FISH)
Physical Maps Radiation hybrid maps -Use radiation to fragment chromosomes randomly; -Fragments are then recovered by fusing irradiated cell to another cell -Usually a rodent cell -Fragments can be identified based on banding patterns or FISH.
Genetic Maps Most common markers are short repeat sequences called, short tandem repeats, or STR loci: -Differ in repeat length between individuals; -13 form the basis of modern DNA fingerprinting developed by the FBI; -Cataloged in the CODIS database to identify criminal offenders
Genetic Maps Genetic and physical maps can be correlated: -Any cloned gene can be placed within the genome and can also be mapped genetically.
Genetic Maps All of these different kinds of maps are stored in databases: -The National Center for Biotechnology Information (NCBI) serves as the US repository for these data and more; -Similar databases exist in Europe and Japan
Whole Genome Sequencing The ultimate physical map is the base-pair sequence of the entire genome. - Requires use of high-throughout automated sequencing and computer analysis.
Whole Genome Sequencing Sequencers provide accurate sequences for DNA segments up to 800 bp long -To reduce errors, 5-10 copies of a genome are sequenced and compared Vectors use to clone large pieces of DNA: -Yeast artificial chromosomes (YACs) -Bacterial artificial chromosomes (BACs) -Human artificial chromosomes (HACs) -Are circular, at present
Whole Genome Sequencing Clone-by-clone sequencing -Overlapping regions between BAC clones are identified by restriction mapping or STS analysis. Shotgun sequencing -DNA is randomly cut into smaller fragments, cloned and then sequenced; -Computers put together the overlaps. -Sequence is not tied to other information.
The Human Genome Project Originated in 1990 by the International Human Genome Sequencing Consortium; Craig Venter formed a private company, and entered the “race” in May, 1998; In 2001, both groups published a draft sequence. -Contained numerous gaps
The Human Genome Project In 2004, the “finished” sequence was published as the reference sequence (REF-SEQ) in databases: -3.2 gigabasepairs -1 Gb = 1 billion basepairs; -Contains a 400-fold reduction in gaps; -99% of euchromatic sequence; -Error rate = 1 per 100,000 bases
Characterizing Genomes The Human Genome Project found fewer genes than expected: -Initial estimate was 100,000 genes; -Number now appears to be about 25,000! In general, eukaryotic genomes are larger and have more genes than those of prokaryotes: -However, the complexity of an organism is not necessarily related to its gene number.
Finding Genes Genes are identified by open reading frames: -An ORF begins with a start codon and contains no stop codon for a distance long enough to encode a protein. Sequence annotation: -The addition of information, such as ORFs, to the basic sequence information.
Finding Genes BLAST -A search algorithm used to search NCBI databases for homologous sequences; -Permits researchers to infer functions for isolated molecular clones Bioinformatics -Use of computer programs to search for genes, and to assemble and compare genomes.
Genome Organization Genomes consist of two main regions -Coding DNA -Contains genes than encode proteins -Noncoding DNA -Regions that do not encode proteins
Coding DNA in Eukaryotes Four different classes are found: -Single-copy genes: Includes most genes. -Segmental duplications: Blocks of genes copied from one chromosome to another. -Multigene families: Groups of related but distinctly different genes. -Tandem clusters : Identical copies of genes occurring together in clusters.
Noncoding DNA in Eukaryotes Each cell in our bodies has about 6 feet of DNA stuffed into it. -However, less than one inch is devoted to genes! Six major types of noncoding human DNA have been described.
Noncoding DNA in Eukaryotes Noncoding DNA within genes: -Protein-encoding exons (less than 1.5%) are embedded within much larger noncoding introns (about 24%). Structural DNA: -Called constitutive heterochromatin; -Localized to centromeres and telomeres. Simple sequence repeats (SSRs): -One- to six-nucleotide sequences repeated thousands of times. (SSRs can arise from DNA replication errors. About 3%).
Noncoding DNA in Eukaryotes Segmental duplications: -Consist of 10,000 to 300,000 bp that have duplicated and moved either within a chromosome or to a nonhomologous chromosome. Pseudogenes: -Inactive genes that may have lost function because of mutation.
Noncoding DNA in Eukaryotes Transposable elements (transposons) -Mobile genetic elements - Able to move from one location on a chromosome to another. -Four types: -Long interspersed elements (LINEs) (21%) -Short interspersed elements (SINEs) (13%) -Long terminal repeats (LTRs) (8%) -Dead transposons (3%) TOTAL OF 45% OF THE GENOME!!!!
Genomics Comparative genomics, the study of whole genome maps of organisms, has revealed similarities among them: -Over half of Drosophila genes have human counterparts; - Humans and mouse: only 300 genes that have no counterparts in the genome. Synteny refers to the conserved arrangements of DNA segments in related genomes; -Allows comparisons of unsequenced genomes.
Genomic Alignment (Segment Rearrangement) Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Rice Sugarcane Corn Wheat
Genomics Functional genomics is the study of the function of genes and their products; DNA microarrays (“gene chips”) enable the analysis of gene expression at the whole-genome level; -DNA fragments are deposited on a slide: -Probed with labeled mRNA from different sources; -Active/inactive genes are identified.
Proteomics Proteomics is the study of the proteome: -All the proteins encoded by the genome. - A single gene can code for multiple proteins using alternative splicing. Although all the DNA in a genome can be isolated from a single cell, only a portion of the proteome is expressed in a single cell or tissue. The transcriptome consists of all the RNA that is present in a cell or tissue.
Proteomics Proteins are much more difficult to study than DNA because of: -Post-translational modifications -Alternative splicing. However, databases containing the known protein structural exist: -These can be searched to predict the structure and function of gene sequences.
Applications of Genomics The genomics revolution will have a lasting effect on how we think about living systems; The immediate impact of genomics is being seen in diagnostics: -Identifying genetic abnormalities; -Identifying victims by their remains; -Distinguishing between naturally occurring and intentional outbreaks of infections.
Applications of Genomics
Applications of Genomics Genomics has also helped in agriculture. -Improvement in the yield and nutritional quality of rice. -Doubling of world grain production in last 50 years, with only a 1% cropland increase.
Applications of Genomics Genome science is also a source of ethical challenges and dilemmas: -Gene patents -Should the sequence/use of genes be freely available or can it be patented? -Privacy concerns -Could one be discriminated against because their SNP profile indicates susceptibility to a disease?