Download presentation
Presentation is loading. Please wait.
1
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Chapter 4 Genome Sequencing Strategies and procedures for sequencing entire genomes
2
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Contents The Human Genome Project Sequencing strategies Large-scale sequencing Accuracy and coverage EST sequencing Sequence annotation
3
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Background Field of genomics began with decision to sequence human genome Size of human genome is 3 billion base pairs, which necessitated new ways to do sequencing Approaches to sequencing the human genome Scale up existing techniques Develop new sequencing techniques Start with smaller genomes used as a warm-up project
4
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Goals of the Human Genome Project Sequence entire genome Not just transcribed genes Sequencing should be performed with a high level of accuracy One error in 10,000 bases Develop genomic resources that would be useful for all genes Example: collections of physical markers Develop economies of scale
5
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Scale-up of existing technologies There has been remarkable improvement in sequencing efficiency since the invention of sequencing The amount of sequencing that one person can perform has increased dramatically 1980: 0.1– 1 kb per year 1985: 2–10 kb per year 1990: 25–50 kb per year 1996: 100–200 kb per year 2000: 500–1,000 kb per year Almost all large-scale sequencing is still based on Sanger chain-termination technology
6
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 New technologies A high-priority goal at the beginning of the Human Genome Project was to develop new mapping and sequencing technologies To date, no major breakthrough technology has been developed Possible exception: whole-genome shotgun sequencing applied to large genomes
7
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Automated sequencers Perhaps the most important contribution to large-scale sequencing was the development of automated sequencers Most use Sanger sequencing method Fluorescently labeled reaction products Capillary electrophoresis for separation Most commonly used automated sequencers are the following: ABI MegaBACE
8
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Automated sequencers: ABI 3700 Made by Applied Biosystems Most widely used automated sequencers: 96 capillaries robot loading from 384-well plates Two to three hours per run 600–700 bases per run 96–well plate robotic arm and syringe 96 glass capillaries load bar
9
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Automated sequencers: MegaBACE Made by Amersham 96 capillaries Robotic loading from 384–well plate Two to four hours per run Can read up to 800 bases
10
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Automatic gel reading Top image: confocal detection by the MegaBACE sequencer of fluorescently labeled DNA Bottom image: computer image of sequence read by automated sequencer
11
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Steps in genomic sequencing Library making Large-insert library from genome Production sequencing Generate fragments to be sequenced Perform sequencing reactions Determine sequence Finishing Assemble into continuous sequence Fill gaps
12
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Library making Library of genomic fragments made in vector BAC, PAC, or YAC Usually have several-fold coverage Every DNA sequence on five to eight different clones Difficult and inefficient to sequence straight from large fragment Need to break into manageable pieces Random shearing By nebulization or sonication
13
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Fragments for sequencing Generally use 2–10 kb pieces for sequencing Clone into sequencing vector Contains binding sites for sequencing primers Can be single stranded or double stranded
14
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Sequence assembly Random sequences First assemble into overlapping sequence Then create one continuous sequence Program used for this operation named PHRAP Analyzes each position to determine the following: Quality of sequence Consistency of sequence of same region Acquired from different random fragments
15
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Sequence assembly readout
16
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Finishing I Process of assembling raw sequence reads into accurate contiguous sequence Required to achieve 1/10,000 accuracy Manual process Look at sequence reads at positions where programs can’t tell which base is the correct one Fill gaps Ensure adequate coverage Gap Single stranded
17
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Finishing II To fill gaps in sequence, design primers and sequence from primer To ensure adequate coverage, find regions where there is not sufficient coverage and use specific primers for those areas GAP Primer
18
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Verification Region verified for the following: Coverage Sequence quality Contiguity Determine restriction-enzyme cleavage sites Generate restriction map of sequenced region Must agree with fingerprint generated of clone during mapping step
19
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Map-based sequencing I Human Genome Project adopted a map-based strategy Start with well-defined physical map Produce shortest tiling path for large-insert clones Assemble the sequence for each clone Then assemble the entire sequence, based on the physical map
20
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Map-based sequencing II Construct clone map and select mapped clones Generate several thousand sequence reads per clone Assemble
21
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Whole-genome shotgun sequencing I Developed by Celera Subsidiary of Applied Biosystems, maker of automated sequencers No mapping Instead, the whole genome is sheared Randomly sequenced
22
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Whole-genome shotgun sequencing II Generate tens of millions of sequence reads Assemble
23
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Whole-genome shotgun sequencing III Major challenge: assembly Repetitive elements are the biggest problem Performed on very high-speed computers, using novel software Key to assembly is paired reads Sequence both ends of each clone
24
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Controversy: Map-based sequencing vs. whole-genome shotgun sequencing Celera used publicly funded sequence to produce its published draft of the human genome Scientists who worked on the map-based effort claimed Celera couldn’t have produced a draft without access to the public sequence Celera scientists claim that they could have produced an accurate draft even without the public sequence
25
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Hybrid approach Combines aspects of both map-based and whole-genome shotgun approaches Map clones Sequence some of the mapped clones Do whole-genome sequencing Combine information from both methods Use sequence from mapped clones as scaffold to assemble whole-genome shotgun reads Used for sequencing the mouse genome
26
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
27
Completed genomes as of 2002 OrganismBase pairs Whole-genome shotgun MapbasedHybrid > 40 Bacteria 0.8-6 million +–– Yeast 15 million –+– C. elegans (roundworm) 100 million –+– Drosophila (fruitfly) 120 million +–– Arabidopsis (thale cress) 130 million –+– Rice 435 million –+– Human 3 billion ++– Mouse 2.5 billion ––+ Fugu (puffer fish) 365 million +–– Anopheles (malaria- carrying mosquito) 278 million +––
28
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Sizes of genomes and numbers of genes
29
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Sequencing parameters Difficulty and cost of large-scale sequencing projects depend on the following parameters: Accuracy How many errors are tolerated Coverage How many times the same region is sequenced The two parameters are related More coverage usually means higher accuracy Accuracy is also dependent on the finishing effort
30
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Sequence accuracy Highly accurate sequences are needed for the following: Diagnostics e.g., Forensics, identifying disease alleles in a patient Protein coding prediction One insertion or deletion changes the reading frame Lower accuracy sufficient for homology searches Differences in sequence are tolerated by search programs
31
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Sequence accuracy and sequencing cost Level of accuracy determines cost of project Increasing accuracy from one error in 100 to one error in 10,000 increases costs three to fivefold Need to determine appropriate level of accuracy for each project If reference sequence already exists, then a lower level of accuracy should suffice Can find genes in genome, but not their position
32
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Sequencing coverage Coverage is the number of times the same region is sequenced Ideally, one wants an equal number of sequences in each direction To obtain accuracy of one error in 10,000 bases, one needs the following: 10 x coverage Stringent finishing Complete sequence Base-perfect sequencing
33
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Rough-draft and skimming sequence Rough-draft sequence refers to an average of 5 x coverage Skimming is 1–3 x coverage Obtains 67%–97% of the sequence On average, 99% accurate Of greatest use when can compare the sequence to a reference sequence For example, chimpanzee genome compared with human genome
34
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Industrialization of sequencing Most large-scale sequencing projects divide tasks among different teams Large-insert libraries Production sequencing Finishing Sequencing machines run 24/7 Many tasks performed by robots
35
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 EST sequencing I Idea: sequence only “important” genes Those genes expressed in a particular tissue Sequence random cDNAs made from RNA extracted from tissue of interest Muscle mRNA cDNA libraries “New” Biolims Robotized stationsDNA sequencers
36
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 EST sequencing II Make cDNA library Select clones at random Sequence in from one or both ends One-pass sequencing The resulting sequence = expressed sequence tag (EST) 5’ 3’ cDNA Partial sequence = EST
37
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 EST sequencing: pros and cons Advantages Relatively inexpensive Certainty that sequence comes from transcribed gene Information about tissue and developmental stage Disadvantages No regulatory information Usually less than 60% of genes found in EST collections Location of sequence in genome unknown
38
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Sequence annotation Annotation performed on completed sequence Computer programs used to find the following: Genes Exons and introns Regulatory sequences Repetitive elements
39
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Summary I Human Genome Project Goals Automated sequencers Sequencing strategies Mapbased Whole-genome shotgun Hybrid
40
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Summary II Steps in large-scale sequencing Large-insert libraries Production sequencing Finishing Accuracy and coverage EST sequencing Sequence annotation
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.