Presentation is loading. Please wait.

Presentation is loading. Please wait.

BioSci D145 Lecture #3 Bruce Blumberg

Similar presentations


Presentation on theme: "BioSci D145 Lecture #3 Bruce Blumberg"— Presentation transcript:

1 Bruce Blumberg (blumberg@uci.edu)
BioSci D145 Lecture #3 Bruce Blumberg 4103 Nat Sci 2 - office hours Tu, Th 3:30-5:00 (or by appointment) phone TA – Riann Egusquiza 4351 Nat Sci 2– office hours M 1-3 Phone check and noteboard daily for announcements, etc.. Please use the course noteboard for discussions of the material Updated lectures will be posted on web pages after lecture Don’t forget to discuss term paper topics with me BioSci D145 lecture 1 page 1 ©copyright Bruce Blumberg All rights reserved

2 Bacteriophage library cloning systems
All are relatively similar to each other Lambda, cosmid, fosmid, P1 P1 cloning systems derived from bacteriophage P1 one of the primary tools of E. coli geneticists for many years infect cells with packaged DNA then recover as a plasmid. useful, but size limited to 95 kb by “headfull” packaging mechanism BioSci D145 lecture 2 page 2 ©copyright Bruce Blumberg All rights reserved

3 Cosmid/fosmid cloning
P1, cosmids and fosmids replicated as plasmids after infection Cosmids have ColE1 origin (25-50 copies/cell) Fosmids have F1 origin (1 copy/cell) BioSci D145 lecture 2 page 3 ©copyright Bruce Blumberg All rights reserved

4 Large insert vectors - YACs, BACs and PACs
Three complementary approaches, each with its own strengths and weaknesses YACs - Yeast artificial chromosomes requires two vector arms, one with an ARS one with a centromere both fragments have selective markers trp and ura are commonly used background reduction is by dephosphorylation ligation is transformed into spheroplasts colonies picked into microtiter dishes containing media with cryoprotectant BioSci D145 lecture 2 page 4 ©copyright Bruce Blumberg All rights reserved

5 can propagate extremely large fragments
YAC cloning YAC cloning (contd) advantages can propagate extremely large fragments may propagate sequences unclonable in E. coli disadvantages tedious to purify away from yeast chromosomes by PFGE grow slowly insert instability generally difficult to handle BioSci D145 lecture 2 page 5 ©copyright Bruce Blumberg All rights reserved

6 partial digests are cloned into dephosphorylated vector
BAC cloning BAC – Bacterial artificial chromosome (Based on the E. coli F’ plasmid) partial digests are cloned into dephosphorylated vector ligation is transformed into E. coli by electroporation advantages large plasmids - handle with usual methods Stable - stringently controlled at 1 copy/cell Vectors are small ~7 kb – good for shotgun cloning strategies disadvantages low yield no selection against nonrecombinant clones (blue/white only) apparent size limitation BioSci D145 lecture 2 page 6 ©copyright Bruce Blumberg All rights reserved

7 PAC - P1 artificial chromosome
PAC cloning PAC - P1 artificial chromosome combines best features of P1 and BAC cloning size selected partial digests are ligated to dephosphorylated vector and electrotransformed into E. coli. Stored as colonies in microtiter plates Selection against non-recombinants via SacBII selection (nonrecombinant cells convert sucrose into a toxic product) inducible P1 lytic replicon allows amplification of plasmid copy number BioSci D145 lecture 2 page 7 ©copyright Bruce Blumberg All rights reserved

8 all the advantages of BACS stability replication as plasmids
PAC cloning (contd) PAC advantages all the advantages of BACS stability replication as plasmids stringent copy control selection against nonrecombinant clones inducible P1 lytic replicon addition of IPTG causes loss of copy control and larger yields disadvantages effective size limitation (~300 kb) Vector is large – lots of vector fragments from shotgun cloning PACs BioSci D145 lecture 2 page 8 ©copyright Bruce Blumberg All rights reserved

9 Comparison of cloning systems
BioSci D145 lecture 2 page 9 ©copyright Bruce Blumberg All rights reserved

10 Which type of library to make
Do I need to make a new library at all? Is the library I need available? PAC libraries are suitable for most purposes BAC libraries are most widely available If your organism only has YAC libraries available you may wish to make PAC or BACs Much easier to buy pools or gridded libraries for screening doesn’t always work What is the intended use? Will this library be used many times? e.g. for isolation of clones for knockouts if so, it pays to do it right who should make the library? Going rate for custom PAC or BAC library is 50K. Most labs do not have these resources if care is taken, construction is not so difficult BioSci D145 lecture 2 page 10 ©copyright Bruce Blumberg All rights reserved

11 The problem – genomes are large, workable fragments are small
Genome mapping The problem – genomes are large, workable fragments are small How to figure out where everything is? How to track mutations in families or lineages? analogy to roadmaps The most useful maps do not have too much detail but have major features and landmarks that everything can be related to Allows genetic markers to be related to physical markers What sorts of maps are useful for genomes? BioSci D145 lecture 2 page 11 ©copyright Bruce Blumberg All rights reserved

12 Genome mapping (contd)
How are maps made? What do we map these days? BioSci D145 lecture 2 page 12 ©copyright Bruce Blumberg All rights reserved

13 Genome mapping (contd)
Useful markers STS – sequence tagged sites Short randomly acquired sequences PCRing sequences, then prove by hybridization that only a single sequence is amplified/genome VERY tedious and slow validated ones mapped back to RH panels Orders sequences on large chunks of DNA STC – sequence tagged connectors Array BAC libraries to 15x coverage of genome Sequence BAC ends Combine with genomic maps and fingerprints to link clones Average about 1 tag/5 kb Most useful preparatory to sequencing BioSci D145 lecture 2 page 13 ©copyright Bruce Blumberg All rights reserved

14 Genome mapping (contd)
Useful markers (contd) ESTs – expressed sequence tags randomly acquired cDNA sequences Lots of value in ESTs Info about diversity of genes expressed Quick way to get expressed genes Better than STS because ESTs are expressed genes Can be mapped to chromosomes by FISH RH panels BAC contigs Polymorphic STS – STS with variable lengths Often due to microsatellite differences Useful for determining relationships Also widely used for forensic analysis OJ, Kobe, etc BioSci D145 lecture 2 page 14 ©copyright Bruce Blumberg All rights reserved

15 Genome mapping (contd)
Useful markers (contd) SNPs – single nucleotide polymorphisms Extraordinarily useful - ~1/1000 bp in humans Much of the differences among us are in SNPs SNPs that change restriction sites cause RFLPs (restriction fragment length polymorphisms Detected in various ways Hybridization to high density arrays (Affymetrix) Sequencing Denaturing electrophoresis or HPLC Invasive cleavage Tony Long in E&E Biology has method for ligation mediated SNP detection that they use for evolutionary analyses BioSci D145 lecture 2 page 15 ©copyright Bruce Blumberg All rights reserved

16 Genome mapping (contd)
Useful markers (contd) RAPDs – randomly amplified polymorphic DNA Amplify genomic DNA with short, arbitrary primers Some fraction will amplify fragments that differ among individuals These can be mapped like STS Issues with PCR amplification Benefit – no sequence information required for target AFLPs – amplified fragment length polymorphisms Cut with enzymes (6 and 4 cutter) that yield a variety of small fragments ( < 1 kb) Ligate sequences to ends and amplify by PCR Generates a fingerprint Controlled by how frequently enzymes cut Often correspond to unique regions of genome Can be mapped Benefit – no sequence required. BioSci D145 lecture 2 page 16 ©copyright Bruce Blumberg All rights reserved

17 Genome mapping (contd)
Fingerprinting Array and spot ibraries Probe with short oligos (10-mers) Repeat Build up a “fingerprint” for each clone Can tell which ones share sequences tedious BioSci D145 lecture 2 page 17 ©copyright Bruce Blumberg All rights reserved

18 Genome mapping (contd)
Mapping by walking/hybridization Start with a seed clone then walk along the chromosome Takes a LOOONNNNGGG time Benefit – can easily jump repetitive sequences BioSci D145 lecture 2 page 18 ©copyright Bruce Blumberg All rights reserved

19 Genome mapping (contd)
Mapping by hybridization Array library – pick a “seed clone” See where it hybridizes, pick new seed and repeat Product BioSci D145 lecture 2 page 19 ©copyright Bruce Blumberg All rights reserved

20 Genome mapping (contd) Restriction mapping of large insert clones
Mapping by restriction digest fingerprinting Order clones by comparing patterns from restriction enzyme digestion BioSci D145 lecture 2 page 20 ©copyright Bruce Blumberg All rights reserved

21 Genome mapping (contd)
FISH - Fluorescent in situ hybridization – can detect chromosomes or genes Can localize probes to chromosomes and Relationship of markers to each other Requires much knowledge of genome being mapped Chromosome painting marker detection BioSci D145 lecture 2 page 21 ©copyright Bruce Blumberg All rights reserved

22 Genome mapping (contd)
Radiation hybrid mapping Old but very useful technique (Geisler paper) Lethally irradiate cells with X-rays Fuse with cells of another species, e.g., blast human cells then fuse with hamster cells Chunks of human DNA will remain in mouse cells Expand colonies of cells to get a collection of cell lines, each containing a single chunk of human cDNA Collection = RH panel Now map markers onto these RH panels Can identify which of any type of markers map together STS, EST (very commonly used), etc Can then map others by linkage to the ones you have mapped Compare RH panel with other maps Utility – great for cloning gaps in other maps HAPPY Mapping – PCR-based method – see Riann’s presentation BioSci D145 lecture 3 page 22 ©copyright Bruce Blumberg All rights reserved

23 Genome mapping (contd)
How should maps be made with current knowledge? All methods have strengths and weaknesses – must integrate data for useful map e.g, RH panel, BAC maps, STS, ESTs Size and complexity of genome is important More complex genomes require more markers and time mapping Breakpoints and markers are mapped relative to each other Maps need to be defined by markers (cities, lakes, roads in analogy) Key part of making a finely detailed map is construction of genomic libraries and cell lines for common use Efforts by many groups increase resolution and utility of maps Current strategies BAC end sequencing Whole genome shotgun sequencing EST sequencing HAPPY mapping Mapping of above to RH panels Fancier techniques (Dovetail, Chicago reads, Hi-C assemblies) BioSci D145 lecture 3 page 23 ©copyright Bruce Blumberg All rights reserved

24 DNA sequencing = determining the nucleotide sequence of DNA
DNA sequence analysis DNA sequencing = determining the nucleotide sequence of DNA Two main methods shared Nobel prize in 1980 Chemical cleavage – Maxam and Gilbert Enzymatic sequencing (based on polymerization reaction) Nobel Prize in Chemistry 1980 Walter Gilbert (Harvard) & Frederick Sanger (MRC Labs) (Sanger also won Nobel in 1958 for protein sequencing) How many others have won 2 Nobel prizes? In the same field? BioSci D145 lecture 4 page 24 ©copyright Bruce Blumberg All rights reserved

25 Only people to have won 2 Nobel Prizes?
BioSci D145 lecture 4 page 25 ©copyright Bruce Blumberg All rights reserved

26 One of the first reasonable sequencing methods
DNA sequence analysis Maxam and Gilbert One of the first reasonable sequencing methods Very popular in late 70s and early 80s VERY TEDIOUS!! Totally superceded by dideoxy sequencing now BioSci D145 lecture 4 page 26 ©copyright Bruce Blumberg All rights reserved

27 DNA sequence analysis (contd)
Dideoxy sequencing – Sanger 1977 Virtually all sequencing is done this way now Requires modified nucleotide 2’3’-dideoxy dNTP DNA polymerase incorporates the ddNTP and chain elongation terminates Original method used 4 separate elongation reactions Products separated by denaturing PAGE and visualized by autoradiography BioSci D145 lecture 4 page 27 ©copyright Bruce Blumberg All rights reserved

28 DNA sequence analysis (contd)
Dideoxy sequencing (contd) – Sanger 1977 Dideoxy NTPs present at ~1% of [dNTP] Each reaction has identified end In principle, all possible chain lengths are represented varies by [dNTPs], [ddNTPs], [primer] and [template] and ratios BioSci D145 lecture 4 page 28 ©copyright Bruce Blumberg All rights reserved

29 DNA sequence analysis (contd)
A C G T A C G T A C G T BioSci D145 lecture 4 page 29 ©copyright Bruce Blumberg All rights reserved

30 Automated DNA sequence analysis
How to improve throughput of sequencing? Incorporate fluorescent ddNTPs, separate products by PAGE Base calling and lane calling issues Key advance was capillary sequencers Separate DNA in a thin capillary instead of gel Very accurate, no tracking errors, much more automation friendly Trace files (dye signals) are analyzed and bases called to create chromatograms. Chromatograms from opposite strands are reconciled with software to create double-stranded sequence data. BioSci D145 lecture 4 page 30 ©copyright Bruce Blumberg All rights reserved

31 Automated DNA sequence analysis
Capillaries vs gels Capillaries much faster – higher field strength possible Fully automated = higher throughput BioSci D145 lecture 4 page 31 ©copyright Bruce Blumberg All rights reserved

32 Applied Biosystems PRISM 377 (Gel, 34-96 lanes)
(Capillary, 96 capillaries) BioSci D145 lecture 4 page 32 ©copyright Bruce Blumberg All rights reserved

33 PCR – polymerase chain reaction amplification of DNA
PCR is most routinely used method to amplify DNA Exponential amplification of DNA by polymerases – Saiki et al, 1985 2n fold amplification, n= # cycles 35 cycles = 235 = 3.4 x 1010 fold Originally used DNA polymerase I Needed to add fresh enzyme at every cycle because heat denaturation of template killed the enzyme Not widely used – too painful to do manually Nobel Prize to Kary Mullis in 1993 for deciding to use Taq DNA polymerase for PCR He was middle author on paper! BioSci D145 lecture 4 page 33 ©copyright Bruce Blumberg All rights reserved

34 PCR – polymerase chain reaction amplification of DNA (contd)
Hot water bacteria: Thermus aquaticus Taq DNA polymerase Life at High Temperatures by Thomas D. Brock Biotechnology in Yellowstone © 1994 Yellowstone Association for Natural Science BioSci D145 lecture 4 page 34 ©copyright Bruce Blumberg All rights reserved

35 Cycle sequencing – fusion of PCR and fluorescent ddNTP sequencing
Combine PCR amplification with dideoxy sequencing – cycle sequencing Linear amplification of template in the presence of fluorescent ddNTPs When nucleotides are used up reaction is over Separate on capillary electrophoresis instrument Advantages Fast, single tube reaction Works with small amounts of starting material Disadvantages Still need to prepare high quality template to sequence Cost and time Many sequencing centers spend time, $$ on template prep Automation requirements BioSci D145 lecture 4 page 35 ©copyright Bruce Blumberg All rights reserved

36 Isothermal amplification – the solution to template preparation
How to make template preparation faster, easier and more reliable? Eliminate automation requirement, amplify starting material in some other way Φ29 DNA polymerase (aka TempliPhi) Enzyme has high processivity and strand displacement activity Isothermal reaction produces huge quantities of DNA from tiny amount of input More efficient than PCR (no temp change, no machine, no cleanup) BioSci D145 lecture 4 page 36 ©copyright Bruce Blumberg All rights reserved

37 Modern DNA sequence analysis
Cycle sequencing Virtually all DNA sequencing today is done by cycle sequencing with fluorescent ddNTPs ABI Big Dye chemistry Template preparation still tedious for small scale TempliPHi used in genome centers (obviated need for most automation) Capillary sequencers predominant form of technology in use But, next generation sequencing is already coming online and will rapidly displace old technology in genome centers. 454 sequencing (Roche) Solexa (Illumina) SoLID (Applied Biosystems) 3rd generation sequencing (individual DNA molecule) now available e.g., Pacific Biosciences (sequence reads of 1,000-10K bases) BioSci D145 lecture 4 page 37 ©copyright Bruce Blumberg All rights reserved

38 Landmarks in DNA sequencing
DNA sequence analysis Landmarks in DNA sequencing Sanger, Nicklen and Coulson. Sequencing with chain terminating inhibitors. Proc. Natl. Acad. Sci. 74, (1977). Sanger, F. et al. The nucleotide sequence of bacteriophage ΦX174. J Mol Biol 125, (1978). Sutcliffe, J. G. Complete nucleotide sequence of the Escherichia coli plasmid pBR322. Cold Spring Harb Symp Quant Biol 43, (1979). Sanger et al., Nucleotide sequence of bacteriophage lambda DNA. J Mol Biol 162, (1982). Messing, J., Crea, R. & Seeburg, P. H. A system for shotgun DNA sequencing. Nucl.Acids Res 9, (1981). Anderson, S. et al. Sequence and organization of the human mitochondrial genome. Nature 290, (1981). Deininger, P. L. Random subcloning of sonicated DNA: application to shotgun DNA sequence analysis. Anal Biochem 129, (1983). Baer et al. DNA sequence and expression of the B95-8 Epstein-Barr virus genome. Nature 310, (1984). (189 kb) Innis et al. DNA sequencing with Taq DNA polymerase and direct sequencing of PCR-amplified DNA Proc. Natl. Acad. Sci. 85, (1988) BioSci D145 lecture 4 page 38 ©copyright Bruce Blumberg All rights reserved

39 DNA sequence analysis (contd)
Landmarks in DNA sequencing (contd). Haemophilus influenzae (1.83 Mb) Mycoplasma genitalium (0.58 Mb) Saccharomyces cerevisiae genome (13 Mb) Methanococcus jannaschii (1.66 Mb) Escherichia coli (4.6 Mb) Bacillus subtilis (4.2 Mb) Borrelia burgdorferi (1.44 Mb) Archaeoglobus fulgidus (2.18 Mb) Helicobacter pylori (1.66 Mb) BioSci D145 lecture 4 page 39 ©copyright Bruce Blumberg All rights reserved

40 DNA sequence analysis (contd)
Landmarks in DNA sequencing (contd) Treponema pallidum (1.14 Mb) Caenorhabditis elegans genome (97 Mb) Deinococcus radiodurans (3.28 Mb) Drosophila melanogaster (120 Mb) Arabidopsis thaliana (115 Mb) Escherichia coli O157:H7 (4.1 Mb) 2001 – draft Human “genome” 2002 – mouse genome 2002 – Ciona intestinalis 2003 – “complete “human genome 2004 – rat genome 2006 – Human “genome” complete sequence of all chromosomes Many more genomes underway, check JGI, Sanger and other web sites BioSci D145 lecture 4 page 40 ©copyright Bruce Blumberg All rights reserved

41 Complete DNA sequence (all nts both strands, no gaps)
DNA Sequence analysis Complete DNA sequence (all nts both strands, no gaps) complete sequence is desirable but takes time how long depends on size and strategy employed which strategy to use depends on various factors how large is the clone? cDNA genomic How fast is sequence required? sequencing strategies primer walking cloning and sequencing of restriction fragments progressive deletions Bidirectional, unidirectional Shotgun sequencing whole genome with mapping map first (C. elegans) map as you go (many) BioSci D145 lecture 4 page 41 ©copyright Bruce Blumberg All rights reserved

42 DNA Sequence analysis (contd)
Primer walking - walk from the ends with oligonucleotides sequence, back up ~50 nt from end, make a primer and continue Why back up? Need to see overlap to be sure about sequence you are reading BioSci D145 lecture 4 page 42 ©copyright Bruce Blumberg All rights reserved

43 DNA Sequence analysis (contd)
Primer walking (contd) advantages very simple no possibility to lose bits of DNA restriction mapping deletion methods no restriction map needed best choice for short DNA disadvantages slowest method about a week between sequencing runs oligos are not free (and not reusable) not feasible for large sequences applications cDNA sequencing when time is not critical targeted sequencing verification closing gaps in sequences BioSci D145 lecture 4 page 43 ©copyright Bruce Blumberg All rights reserved

44 DNA Sequence analysis (contd)
Cloning and sequencing of restriction fragments once the most popular method make a restriction map, subclone fragments sequence advantages straightforward directed approach can go quickly cloned fragments often useful otherwise RNase protection, nuclease mapping, in situ hybridization disadvantages possible to lose small fragments must run high quality analytical gels depends on quality of restriction map mistaken mapping -> wrong sequence restriction site availability applications sequencing small cDNAs isolating regions to close gaps BioSci D145 lecture 4 page 44 ©copyright Bruce Blumberg All rights reserved

45 DNA Sequence analysis (contd)
nested deletion strategies - sequential deletions from one end of the clone cut, close and sequence Approach make restriction map use enzymes that cut in polylinker and insert Religate, sequence from end with restriction site repeat until finished, filling in gaps with oligos advantages Fast, simple, efficient disadvantages limited by restriction site availability in vector and insert need to make a restriction map BioSci D145 lecture 4 page 45 ©copyright Bruce Blumberg All rights reserved

46 DNA Sequence analysis (contd)
nested deletion strategies (contd) Exonuclease III-mediated deletion cut with polylinker enzyme protect ends - 3’ overhang phosphorothioate cut with enzyme between first cut and the insert can’t leave 3’ overhang timed digestions with Exonuclease III stop reactions, blunt ends ligate and size select recombinants sequence advantages unidirectional processivity of enzyme gives nested deletions BioSci D145 lecture 4 page 46 ©copyright Bruce Blumberg All rights reserved

47 DNA Sequence analysis (contd)
Nested deletion strategies Exonuclease III-mediated deletion (contd) disadvantages need two unique restriction sites flanking insert on each side best used successively to get > 10kb total deletions may not get complete overlaps of sequences fill in with restriction fragments or oligos applications method of choice for moderate size sequencing projects cDNAs genomic clones good for closing larger gaps Small-scale sequence analysis – how is it practiced today? Primer walking ExoIII-mediated deletion with primer walking BioSci D145 lecture 4 page 47 ©copyright Bruce Blumberg All rights reserved

48 Genome sizes for most eukaryotes are large (108-109 bp)
Genome sequencing The problem Genome sizes for most eukaryotes are large ( bp) High quality sequences only about bp per run The solution Break genome into lots of bits and sequence them all Reassemble with computer The benefit Rapid increase in information about genome size, gene comparisons, etc The cost 3 x 109 bp(human haploid genome) ÷ 600 bp/reaction = 5 x 106 reactions for 1x coverage! Need both strands (x2), need overlaps and need to be sure of sequences ~ reactions/runs required for a human-sized genome About $1-2 per reaction these days, ~$8 commercially. BioSci D145 lecture 4 page 48 ©copyright Bruce Blumberg All rights reserved

49 Genome sequencing (contd)
Shotgun sequencing NOT invented by Craig Venter Messing 1981 first description of shotgun sequencing Sanger lab developed current methods in 1983 approach blast genome into small chunks clone these chunks 3-5 kb, 8 kb plasmid 40 kb fosmid jump repetitive sequences sequence + assemble by computer A priori difficulties how to get nice uniform distribution how to assemble fragments what to do about repeats? How to minimize sequence redundancy? BioSci D145 lecture 4 page 49 ©copyright Bruce Blumberg All rights reserved

50 Genome sequencing(contd)
BioSci D145 lecture 4 page 50 ©copyright Bruce Blumberg All rights reserved

51 Genome sequencing(contd)
BioSci D145 lecture 4 page 51 ©copyright Bruce Blumberg All rights reserved

52 Genome sequencing (contd)
Shotgun sequencing (contd) How to minimize sequence redundancy? Best way to minimize redundancy is map before you start C. elegans was done this way - when the sequence was finished, it was FINISHED mapping took almost 10 years mapping much too tedious and nonprofitable for Celera who cares about redundancy, let’s sequence and make $$ There is scientific value to draft genomes, too. why does redundancy matter? Finished sequence today costs about $0.50/base Note that 10x, % coverage leaves at least 150 kb unsequenced BioSci D145 lecture 4 page 52 ©copyright Bruce Blumberg All rights reserved

53 Genome sequencing (contd)
Mapping by hybridization Mapping by fingerprinting BioSci D145 lecture 4 page 53 ©copyright Bruce Blumberg All rights reserved

54 Traditional (map first) vs STC (map as you go along) mapping
BioSci D145 lecture 4 page 54 ©copyright Bruce Blumberg All rights reserved

55 Consensus from all sources ~30K Number of genes C. elegans – 19,000
The human genome In Feb , Celera and Human Genome project published “draft” human genome sequencs Celera -> 39114 Ensembl -> 29691 Consensus from all sources ~30K Number of genes C. elegans – 19,000 Arabidopsis - 25,000 Predictions had been from k human genes What’s up with that? Are we only slightly more complicated than a weed? How can we possibly get a human with less than 2x the number of genes as C. elegans Implications? UNRAVELING THE DNA MYTH: The spurious foundation of genetic engineering, Barry Commoner, Harpers Magazine Feb, 2002 BioSci D145 lecture 4 page 55 ©copyright Bruce Blumberg All rights reserved

56 The answer – Gene sets don’t overlap completely (duh) Floor is 42K
The human genome The answer – Gene sets don’t overlap completely (duh) Floor is 42K 130,029build #236 UniGene Clusters (from EST and mRNA sequencing Up from 123,891 last year (85,793, 105,680, 128,826, 123,891 previous years) Important questions to be answered about what constitutes a “gene” = 42113 BioSci D145 lecture 4 page 56 ©copyright Bruce Blumberg All rights reserved

57 Genome sequencing(contd)
Whole genome shotgun sequencing (Celera) premise is that rapid generation of draft sequence is valuable why bother trying to clone and sequence difficult regions? Basically just forget regions of repetitive DNA - not cost effective using this approach, genomes rarely are completely finished rule of thumb is that it takes at least as long to finish the last 5% as it took to get the first 95% problems sequence may never be complete as is C. elegans much redundant sequence with many sparse regions and lots of gaps. Fragment assembly for regions of highly repetitive DNA is dubious at best “Finished” fly and human genomes lack more than a few already characterized genes BioSci D145 lecture 4 page 57 ©copyright Bruce Blumberg All rights reserved

58 The human genome (stopped here)
How finished is the human genome sequence? Draft sequence to high coverage Chromosome by chromosome finishing now Chr 22 – 1999 Chr 21 – 2000 Chr 20 – 2001 Chr 15 – 2003 Chr 6,7,Y-2003 Chr 13, May 2006 – all finished BioSci D145 lecture 4 page 58 ©copyright Bruce Blumberg All rights reserved

59 Genome sequencing (contd)
Knowing what we know now – how to approach a large new genome? Xenopus tropicalis 1.7 Gb (about ½ human) BAC end sequencing Whole genome shotgun Gaps closed with BACS 8.5 x coverage (but > 9000 scaffolds for 18 chromosomes) Finishing now in process BioSci D145 lecture 4 page 59 ©copyright Bruce Blumberg All rights reserved


Download ppt "BioSci D145 Lecture #3 Bruce Blumberg"

Similar presentations


Ads by Google