A Lot More Advanced Biotechnology Tools Sequencing 2007-2008
DNA Sequencing Sanger method determine the base sequence of DNA based on replication dideoxynucleotides ddATP, ddGTP, ddTTP, ddCTP missing O for bonding of next nucleotide terminates the growing chain
DNA Sequencing Sanger method 1 Sanger method synthesize complementary DNA strand in vitro in each tube: “normal” N-bases dideoxy N-bases ddA, ddC, ddG, ddT DNA polymerase primer buffers & salt 2 3 4 2
Reading the sequence Load gel with sequences from ddA, ddT, ddC, ddG in separate lanes read lanes manually & carefully polyacrylamide gel
Fred Sanger 1978 | 1980 This was his 2nd Nobel Prize!! 1st was in 1958 for the structure of insulin
Advancements to sequencing Fluorescent tagging no more radioactivity all 4 bases in 1 lane each base a different color Automated reading
Advancements to sequencing Fluorescent tagging sequence data Computer read & analyzed
Advancements to sequencing Capillary tube electrophoresis no more pouring gels higher capacity & faster Applied Biosystems, Inc (ABI) built an industry on these machines 384 lanes
Big labs! economy of scale PUBLIC Joint Genome Institute (DOE) MIT Washington University of St. Louis Baylor College of Medicine Sanger Center (UK) PRIVATE Celera Genomics Celera: Rockville, MD & San Francisco, CA Baylor: Houston TX
Automated Sequencing machines Really BIG labs!
Human Genome Project U.S government project Celera Genomics begun in 1990 estimated to be a 15 year project DOE & NIH initiated by Jim Watson led by Francis Collins goal was to sequence entire human genome 3 billion base pairs Celera Genomics Craig Venter challenged gov’t would do it faster, cheaper private company 1990-1995 build the technology groundwork improve sequencing methods build clones build better data management systems (computer tools to find overlaps) better, cheaper, faster! 1996-1998 painstaking sequencing work 1998 Celera genomics challenge 2000 rough draft of human genome (90% sequence, 99% accurate) 2001 1st draft of human genome 2003 “finished” sequence of human genome can’t sequence telomeres & centromeres
Different approaches “map-based method” “shotgun method” gov’t method Craig Venter’s method 1. Cut DNA entire chromosome into small fragments and clone. 2. Sequence each segment & arrange based on overlapping nucleotide sequences. Cut DNA segment into fragments, arrange based on overlapping nucleotide sequences, and clone fragments. 2. Cut and clone into smaller fragments. 3. Assemble DNA sequence using overlapping sequences.
Human Genome Project On June 26, 2001, HGP published the “working draft” of the DNA sequence of the human genome. Historic Event! blueprint of a human the potential to change science & medicine
Sequence of 46 Human Chromosomes 3G of data 3 billion base pairs
human genome 3.2 billion bases TACGCACATTTACGTACGCGGATGCCGCGACTATGATCACATAGACATGCTGTCAGCTCTAGTAGACTAGCTGACTCGACTAGCATGATCGATCAGCTACATGCTAGCACACYCGTACATCGATCCTGACATCGACCTGCTCGTACATGCTACTAGCTACTGACTCATGATCCAGATCACTGAAACCCTAGATCGGGTACCTATTACAGTACGATCATCCGATCAGATCATGCTAGTACATCGATCGATACTGCTACTGATCTAGCTCAATCAAACTCTTTTTGCATCATGATACTAGACTAGCTGACTGATCATGACTCTGATCCCGTAGATCGGGTACCTATTACAGTACGATCATCCGATCAGATCATGCTAGTACATCGATCGATACTGCTACTGATCTAGCTCAATCAAACTCTTTTTGCATCATGATACTAGACTAGCTGACTGATCATGACTCTGATCCCGTAGATCGGGTACCTATTACAGTACGATCATCCGATCAGATCATGCTAGTACATCGATCGATACT human genome 3.2 billion bases
Raw genome data
NCBI GenBank Database of genetic sequences gathered from research Publicly available on Web!
Organizing the data
Maps of human genes… Where the genes are… mapping genes & their mutant alleles
Defining a gene… “Defining a gene is problematic because… one gene can code for several protein products, some genes code only for RNA, two genes can overlap, and there are many other complications.” – Elizabeth Pennisi, Science 2003 protein gene RNA gene 1990s -- thought humans had 100,000 genes 2000 -- 40,000 was considered a good estimate 2004 -- 30,000 2006 -- 25,000 is our best estimate gene polypeptide 1 polypeptide 2 polypeptide 3
How does the human genome stack up? Organism Genome Size (bases) Estimated Genes Human (Homo sapiens) 3 billion 30,000 Laboratory mouse (M. musculus) 2.6 billion Mustard weed (A. thaliana) 100 million 25,000 Roundworm (C. elegans) 97 million 19,000 Fruit fly (D. melanogaster) 137 million 13,000 Yeast (S. cerevisiae) 12.1 million 6,000 Bacterium (E. coli) 4.6 million 3,200 Human Immunodeficiency Virus (HIV) 9700 9