Presentation is loading. Please wait.

Presentation is loading. Please wait.

© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Chapter 4 Genome Sequencing Strategies and procedures for.

Similar presentations


Presentation on theme: "© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Chapter 4 Genome Sequencing Strategies and procedures for."— Presentation transcript:

1 © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Chapter 4 Genome Sequencing Strategies and procedures for sequencing entire genomes

2 © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Contents  The Human Genome Project  Sequencing strategies  Large-scale sequencing  Accuracy and coverage  EST sequencing  Sequence annotation

3 © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Background  Field of genomics began with decision to sequence human genome  Size of human genome is 3 billion base pairs, which necessitated new ways to do sequencing  Approaches to sequencing the human genome  Scale up existing techniques  Develop new sequencing techniques  Start with smaller genomes used as a warm-up project

4 © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Goals of the Human Genome Project  Sequence entire genome  Not just transcribed genes  Sequencing should be performed with a high level of accuracy  One error in 10,000 bases  Develop genomic resources that would be useful for all genes  Example: collections of physical markers  Develop economies of scale

5 © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Scale-up of existing technologies  There has been remarkable improvement in sequencing efficiency since the invention of sequencing  The amount of sequencing that one person can perform has increased dramatically  1980: 0.1– 1 kb per year  1985: 2–10 kb per year  1990: 25–50 kb per year  1996: 100–200 kb per year  2000: 500–1,000 kb per year  Almost all large-scale sequencing is still based on Sanger chain-termination technology

6 © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 New technologies  A high-priority goal at the beginning of the Human Genome Project was to develop new mapping and sequencing technologies  To date, no major breakthrough technology has been developed  Possible exception: whole-genome shotgun sequencing applied to large genomes

7 © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Automated sequencers  Perhaps the most important contribution to large-scale sequencing was the development of automated sequencers  Most use Sanger sequencing method  Fluorescently labeled reaction products  Capillary electrophoresis for separation  Most commonly used automated sequencers are the following:  ABI  MegaBACE

8 © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Automated sequencers: ABI 3700  Made by Applied Biosystems  Most widely used automated sequencers:  96 capillaries  robot loading from 384-well plates  Two to three hours per run  600–700 bases per run 96–well plate robotic arm and syringe 96 glass capillaries load bar

9 © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Automated sequencers: MegaBACE  Made by Amersham  96 capillaries  Robotic loading from 384–well plate  Two to four hours per run  Can read up to 800 bases

10 © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Automatic gel reading  Top image: confocal detection by the MegaBACE sequencer of fluorescently labeled DNA  Bottom image: computer image of sequence read by automated sequencer

11 © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Steps in genomic sequencing  Library making  Large-insert library from genome  Production sequencing  Generate fragments to be sequenced  Perform sequencing reactions  Determine sequence  Finishing  Assemble into continuous sequence  Fill gaps

12 © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Library making  Library of genomic fragments made in vector  BAC, PAC, or YAC  Usually have several-fold coverage  Every DNA sequence on five to eight different clones  Difficult and inefficient to sequence straight from large fragment  Need to break into manageable pieces  Random shearing  By nebulization or sonication

13 © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Fragments for sequencing  Generally use 2–10 kb pieces for sequencing  Clone into sequencing vector  Contains binding sites for sequencing primers  Can be single stranded or double stranded

14 © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Sequence assembly  Random sequences  First assemble into overlapping sequence  Then create one continuous sequence  Program used for this operation named PHRAP  Analyzes each position to determine the following:  Quality of sequence  Consistency of sequence of same region  Acquired from different random fragments

15 © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Sequence assembly readout

16 © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Finishing I  Process of assembling raw sequence reads into accurate contiguous sequence  Required to achieve 1/10,000 accuracy  Manual process  Look at sequence reads at positions where programs can’t tell which base is the correct one  Fill gaps  Ensure adequate coverage Gap Single stranded

17 © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Finishing II  To fill gaps in sequence, design primers and sequence from primer  To ensure adequate coverage, find regions where there is not sufficient coverage and use specific primers for those areas GAP Primer

18 © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Verification  Region verified for the following:  Coverage  Sequence quality  Contiguity  Determine restriction-enzyme cleavage sites  Generate restriction map of sequenced region  Must agree with fingerprint generated of clone during mapping step

19 © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Map-based sequencing I  Human Genome Project adopted a map-based strategy  Start with well-defined physical map  Produce shortest tiling path for large-insert clones  Assemble the sequence for each clone  Then assemble the entire sequence, based on the physical map

20 © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Map-based sequencing II Construct clone map and select mapped clones Generate several thousand sequence reads per clone Assemble

21 © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Whole-genome shotgun sequencing I  Developed by Celera  Subsidiary of Applied Biosystems, maker of automated sequencers  No mapping  Instead, the whole genome is sheared  Randomly sequenced

22 © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Whole-genome shotgun sequencing II Generate tens of millions of sequence reads Assemble

23 © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Whole-genome shotgun sequencing III  Major challenge: assembly  Repetitive elements are the biggest problem  Performed on very high-speed computers, using novel software  Key to assembly is paired reads  Sequence both ends of each clone

24 © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Controversy: Map-based sequencing vs. whole-genome shotgun sequencing  Celera used publicly funded sequence to produce its published draft of the human genome  Scientists who worked on the map-based effort claimed Celera couldn’t have produced a draft without access to the public sequence  Celera scientists claim that they could have produced an accurate draft even without the public sequence

25 © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Hybrid approach  Combines aspects of both map-based and whole-genome shotgun approaches  Map clones  Sequence some of the mapped clones  Do whole-genome sequencing  Combine information from both methods  Use sequence from mapped clones as scaffold to assemble whole-genome shotgun reads  Used for sequencing the mouse genome

26 © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458

27 Completed genomes as of 2002 OrganismBase pairs Whole-genome shotgun MapbasedHybrid > 40 Bacteria 0.8-6 million +–– Yeast 15 million –+– C. elegans (roundworm) 100 million –+– Drosophila (fruitfly) 120 million +–– Arabidopsis (thale cress) 130 million –+– Rice 435 million –+– Human 3 billion ++– Mouse 2.5 billion ––+ Fugu (puffer fish) 365 million +–– Anopheles (malaria- carrying mosquito) 278 million +––

28 © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Sizes of genomes and numbers of genes

29 © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Sequencing parameters  Difficulty and cost of large-scale sequencing projects depend on the following parameters:  Accuracy  How many errors are tolerated  Coverage  How many times the same region is sequenced  The two parameters are related  More coverage usually means higher accuracy  Accuracy is also dependent on the finishing effort

30 © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Sequence accuracy  Highly accurate sequences are needed for the following:  Diagnostics  e.g., Forensics, identifying disease alleles in a patient  Protein coding prediction  One insertion or deletion changes the reading frame  Lower accuracy sufficient for homology searches  Differences in sequence are tolerated by search programs

31 © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Sequence accuracy and sequencing cost  Level of accuracy determines cost of project  Increasing accuracy from one error in 100 to one error in 10,000 increases costs three to fivefold  Need to determine appropriate level of accuracy for each project  If reference sequence already exists, then a lower level of accuracy should suffice  Can find genes in genome, but not their position

32 © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Sequencing coverage  Coverage is the number of times the same region is sequenced  Ideally, one wants an equal number of sequences in each direction  To obtain accuracy of one error in 10,000 bases, one needs the following:  10 x coverage  Stringent finishing  Complete sequence  Base-perfect sequencing

33 © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Rough-draft and skimming sequence  Rough-draft sequence refers to an average of 5 x coverage  Skimming is 1–3 x coverage  Obtains 67%–97% of the sequence  On average, 99% accurate  Of greatest use when can compare the sequence to a reference sequence  For example, chimpanzee genome compared with human genome

34 © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Industrialization of sequencing  Most large-scale sequencing projects divide tasks among different teams  Large-insert libraries  Production sequencing  Finishing  Sequencing machines run 24/7  Many tasks performed by robots

35 © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 EST sequencing I  Idea: sequence only “important” genes  Those genes expressed in a particular tissue  Sequence random cDNAs made from RNA extracted from tissue of interest Muscle mRNA cDNA libraries “New” Biolims Robotized stationsDNA sequencers

36 © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 EST sequencing II  Make cDNA library  Select clones at random  Sequence in from one or both ends  One-pass sequencing  The resulting sequence = expressed sequence tag (EST) 5’ 3’ cDNA Partial sequence = EST

37 © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 EST sequencing: pros and cons  Advantages  Relatively inexpensive  Certainty that sequence comes from transcribed gene  Information about tissue and developmental stage  Disadvantages  No regulatory information  Usually less than 60% of genes found in EST collections  Location of sequence in genome unknown

38 © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Sequence annotation  Annotation performed on completed sequence  Computer programs used to find the following:  Genes  Exons and introns  Regulatory sequences  Repetitive elements

39 © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Summary I  Human Genome Project  Goals  Automated sequencers  Sequencing strategies  Mapbased  Whole-genome shotgun  Hybrid

40 © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Summary II  Steps in large-scale sequencing  Large-insert libraries  Production sequencing  Finishing  Accuracy and coverage  EST sequencing  Sequence annotation


Download ppt "© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Chapter 4 Genome Sequencing Strategies and procedures for."

Similar presentations


Ads by Google