Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sequence the 3 billion base pairs of human

Similar presentations


Presentation on theme: "Sequence the 3 billion base pairs of human"— Presentation transcript:

1 Sequence the 3 billion base pairs of human
DNA and identify the 100,000 genes contained in the human genome

2 Goals of the Human Genome Project
1. Sequence: Human x 109 Mouse x 109 Drosophila x 108 Worm x 108 Dictyostellium x 107 Yeast x 107 Bacteria x 106 BCM- HGSC 2

3 Goals of the Human Genome Project
2. Characterize all genes and enable studies of genetics, evolution and function. BCM- HGSC 2

4 BERMUDA 1996 ‘Primary Genomic Sequence Should be in the Public Domain’
Should be Rapidly Released’ BCM- HGSC

5 Quality < 1 error/ 10,000 (Polymorphism rate is 1/1,000)
No gaps or ‘mis-assemblies’ Merit for high quality data only ‘Slippery Slope’ Arguments BCM- HGSC

6 Technology ABD 4 color Fluorescence Mapped-Clone Approach Random Phase
Directed Phase Modular, 96 well Automation BCM- HGSC

7 3.0 Gb by Oct 2005? (Feb ‘98) ? x X X X BCM- HGSC

8 - New Capillary Instrument - >10 runs/day x 96 samples
May 98: P/E :Celera Scheme - New Capillary Instrument - >10 runs/day x 96 samples - Total 230 Instruments - $330M Private Funds - Total 250,000 reads/day - Whole Genome Shotgun BCM- HGSC

9 - ‘Public Release’, 3 months Delay
P/E ‘Celera’ Scheme:Release Policy - ‘Public Release’, 3 months Delay - Consensus sequence only - All SNPs held - Drosophila, Mouse BCM- HGSC

10 Regional mapping

11 Regional mapping

12 Regional mapping Minimal tiling path selected for sequencing.

13 Restriction fragment fingerprinting Molecular weight marker every
5th lane Restriction fragment fingerprinting >20 kbp ~300 bp - BAC clones are grown in 96-well format - Hind III digest - 1% agarose

14 Contig assembly FPC* Overlap identification by
restriction pattern similarities Facilitated contig assembly *Sanger Centre C. Soderlund, I Longden and R. Mott Clone A B C D E F G * All restriction fragments within a clone selected for the tiling path must be verified by their presence in overlapping clones. : insert fragments : vector fragments

15 Shotgun Sequencing I :RANDOM PHASE
Sheared DNA: kb Bac Clone: kb Random Reads Sequencing Templates: BCM- HGSC

16 Shotgun Sequencing II:ASSEMBLY
Low Base Quality Single Stranded Region Mis-Assembly (Inverted) Sequence Gap Consensus BCM- HGSC

17 Shotgun Sequencing III: FINISHING
Low Base Quality Single Stranded Region Mis-Assembly (Inverted) Sequence Gap Consensus BCM- HGSC

18 Shotgun Sequencing III: FINISHING
Single Stranded Region Mis-Assembly (Inverted) Sequence Gap Consensus BCM- HGSC

19 Shotgun Sequencing III: FINISHING
Mis-Assembly (Inverted) Sequence Gap Consensus BCM- HGSC

20 Shotgun Sequencing III: FINISHING
Mis-Assembly (Inverted) Consensus BCM- HGSC

21 Shotgun Sequencing III: FINISHING
High Accuracy Sequence: < 1 error/ 10,000 bases BCM- HGSC

22 Whole Genome Shotgun Sequencing
Sheared DNA: kb Whole Genome: 3,000 Mb Random Reads Sequencing Templates: BCM- HGSC

23 Whole Genome Shotgun Sequencing:Assembly
Low Base Quality Single Stranded Region Mis-Assembly (Inverted) Sequence Gap Consensus BCM- HGSC

24 Whole Genome Shotgun Sequencing:Assembly
Sequence Gap Low Base Quality Consensus BCM- HGSC

25 - Regions very densely covered - Contigs 1.0 -15 kb
P/E ‘Celera’ Scheme:10 X coverage in three years - Regions not covered - Regions very densely covered - Contigs kb - # Gaps? >100,000? - Base Quality High or Low? - Mis-Assemblies? - Duplications? BCM- HGSC

26 ‘That (draft) sequence will be of lower accuracy and contiguity…..
‘Complete an accurate, high quality sequence of the human genome by the end of 2003, …….a working draft can be completed…within the next three years…’ ‘That (draft) sequence will be of lower accuracy and contiguity….. …will be useful for finding genes…and other features….’ BCM- HGSC

27 Integrating Multiple Sources of Data
Human Genome Sequencing Project Integrating Multiple Sources of Data Chromosome Map location Clone Fingerprint Project XYZ ??? Celera NHGRI Random sequences 500 bp reads consensus (3-5 kb) Mapped projects (~100kb) 5-20 contigs (10-20kb) How to use Celera data in NHGRI assemblies? Lichtarge Lab - HGSC Baylor College of Medicine


Download ppt "Sequence the 3 billion base pairs of human"

Similar presentations


Ads by Google