Presentation is loading. Please wait.

Presentation is loading. Please wait.

BioSci 145B lecture 5 page 1 © copyright Bruce Blumberg 2004. All rights reserved BioSci 145B Lecture #5 5/4/2004 Bruce Blumberg –2113E McGaugh Hall -

Similar presentations


Presentation on theme: "BioSci 145B lecture 5 page 1 © copyright Bruce Blumberg 2004. All rights reserved BioSci 145B Lecture #5 5/4/2004 Bruce Blumberg –2113E McGaugh Hall -"— Presentation transcript:

1 BioSci 145B lecture 5 page 1 © copyright Bruce Blumberg 2004. All rights reserved BioSci 145B Lecture #5 5/4/2004 Bruce Blumberg –2113E McGaugh Hall - office hours Wed 12-1 PM (or by appointment) –phone 824-8573 –blumberg@uci.edublumberg@uci.edu TA – Curtis Daly cdaly@uci.educdaly@uci.edu –2113 McGaugh Hall, 924-6873, 3116 –Office hours Tuesday 11-12 lectures will be posted on web pages after lecture –http://eee.uci.edu/04s/05705/ - link only herehttp://eee.uci.edu/04s/05705/ –http://blumberg-serv.bio.uci.edu/bio145b-sp2004http://blumberg-serv.bio.uci.edu/bio145b-sp2004 –http://blumberg.bio.uci.edu/bio145b-sp2004http://blumberg.bio.uci.edu/bio145b-sp2004

2 BioSci 145B lecture 5 page 2 © copyright Bruce Blumberg 2004. All rights reserved Genome sequencing The problem –Genome sizes for most eukaryotes are large (10 8 -10 9 bp) –High quality sequences only about 600-800 bp /pass The solution –Break genome into lots of bits and sequence them all –Reassemble with computer The benefit –Rapid increase in information about genome size, gene comparisons, etc

3 BioSci 145B lecture 5 page 3 © copyright Bruce Blumberg 2004. All rights reserved Genome sequencing (contd) Shotgun sequencing NOT invented by Craig Venter –Messing 1981 first description of shotgun –Sanger lab developed current methods in 1983 –approach blast genome into small chunks –Shearing is usual –4-cutters also used clone these chunks –In the early days, try to make small insert libraries.5-1.5kb –Now typically make 3 library types »3-5 kb, 8 kb plasmid »40 kb fosmid - to jump repetitive sequences

4 BioSci 145B lecture 5 page 4 © copyright Bruce Blumberg 2004. All rights reserved Genome sequencing(contd) sequence + assemble by computer –A priori difficulties how to assemble fragments –Software now very good what to do about repeats? –Fosmids and BAC STC help a lot how to get nice uniform distribution of sequences without too much redundancy? –Biggest problem, not really well resolved

5 BioSci 145B lecture 5 page 5 © copyright Bruce Blumberg 2004. All rights reserved Genome sequencing(contd) –Assembled sequences always have gaps of various sizes how to cross these gaps? –Quickly and cost-effectively Need to link sequences somehow –How depends on the size of the gaps to be crossed

6 BioSci 145B lecture 5 page 6 © copyright Bruce Blumberg 2004. All rights reserved Genome sequencing(contd) –For small gaps (up to 8 kb or so) often can close by sequencing both ends of clones –For medium sized gaps (8-30 kb) Primer walking across a linking clone (cosmid or fosmid)

7 BioSci 145B lecture 5 page 7 © copyright Bruce Blumberg 2004. All rights reserved Genome sequencing (contd) Large gaps require much more effort –Identify large insert clones that span gap Typically from BAC end sequences May have to screen libraries to find –Shotgun sequence these and assemble –Close any small gaps remaining with primer walking

8 BioSci 145B lecture 5 page 8 © copyright Bruce Blumberg 2004. All rights reserved Genome sequencing (contd) Shotgun sequencing (contd) –How to minimize sequence redundancy (re-sequencing the same region)? Best way to minimize redundancy is map before you start –C. elegans was done this way - when the sequence was finished, it was FINISHED »mapping took almost 10 years –mapping much too tedious and nonprofitable for Celera »who cares about redundancy, let’s sequence and make $$ why does redundancy matter? –Finished sequence today costs about $0.50/base

9 BioSci 145B lecture 5 page 9 © copyright Bruce Blumberg 2004. All rights reserved Genome sequencing (contd) –Mapping by fingerprinting –Mapping by hybridization

10 BioSci 145B lecture 5 page 10 © copyright Bruce Blumberg 2004. All rights reserved Genome sequencing (contd) Actual large insert fingerprinting gel

11 BioSci 145B lecture 5 page 11 © copyright Bruce Blumberg 2004. All rights reserved Traditional (map first) vs STC (map as you go along) mapping Map before sequencing Map as you go

12 BioSci 145B lecture 5 page 12 © copyright Bruce Blumberg 2004. All rights reserved The human genome In Feb 12 2001, Celera and Human Genome project published “draft” human genome sequencs –Celera -> 39114 (WGS) –Ensembl -> 29691 (map as you go) –Consensus from all sources ~30K Number of genes –C. elegans – 19,000 –Arabidopsis 25,000 Predictions had been from 50-140k human genes –What’s up with that? –Are we only slightly more complicated than a weed? –How can we possibly get a human with less than 2x the number of genes as C. elegans –Implications? UNRAVELING THE DNA MYTH: The spurious foundation of genetic engineering, Barry Commoner, Harpers Magazine Feb, 2002

13 BioSci 145B lecture 5 page 13 © copyright Bruce Blumberg 2004. All rights reserved The human genome The answer – Somewhat sloppy science –Gene sets don’t overlap completely –Floor is 42K –105,680 UniGene clusters from ESTs (down from 128,826 last year) = 42113

14 BioSci 145B lecture 5 page 14 © copyright Bruce Blumberg 2004. All rights reserved Genome sequencing(contd) Whole genome shotgun sequencing (Celera) –premise is that rapid generation of draft sequence is valuable –why bother trying to clone and sequence difficult regions? Basically just forget regions of repetitive DNA - not cost effective –R 0 t analysis suggests not many genes there anyway –using this approach, genome was alleged to be 90% finished in 2001 More than 95% today rule of thumb is that it takes at least as long to finish the last 5% as it took to get the first 95% –problems sequence may never be complete as is C. elegans much redundant sequence with many sparse regions and lots of gaps. Fragment assembly for regions of highly repetitive DNA is dubious at best Map as you go method inherently more complete –Sets up for finishing since an ordered set of overlapping BACs is produced Both methods produce reasonable data given enough sequencing

15 BioSci 145B lecture 5 page 15 © copyright Bruce Blumberg 2004. All rights reserved The human genome How finished is the human genome sequence? –Draft sequence to high coverage –Chromosome by chromosome finishing now Chr 22 – 1999 Chr 21 – 2000 Chr 20 – 2001 Chr 15 – 2003 Chr 6,7,Y-2003 Chr 13,19 -2004

16 BioSci 145B lecture 5 page 16 © copyright Bruce Blumberg 2004. All rights reserved Genome sequencing (contd) Knowing what we know now – how to approach a large new genome? –Xenopus tropicalis 1.7 Gb (about ½ human) –BAC end sequencing –Whole genome shotgun –Gaps closed with BACS –8 x coverage by end of 2004 –Finishing dependent on additional funding

17 BioSci 145B lecture 5 page 17 © copyright Bruce Blumberg 2004. All rights reserved Genome sequencing DOE – Joint Genome Institute –http://www.jgi.doe.gov/http://www.jgi.doe.gov/ –Numerous advances in sequencing technology Increased pass rate from ~70% to > 90% Lowered cost nearly 3 fold

18 BioSci 145B lecture 5 page 18 © copyright Bruce Blumberg 2004. All rights reserved Other sequencing technologies Sequencing by hybridization is most interesting –Construct a high-density microchip with all possible combinations of a short oligonucleotide Up to 25-mers By photolithography –Synthesized on chip directly –Label and hybridize fragment to be sequenced –Wash stringently –Read fluorescent spots –Reconstruct sequence by computer

19 BioSci 145B lecture 5 page 19 © copyright Bruce Blumberg 2004. All rights reserved Other sequencing technologies (contd) Sequencing by hybridization rarely used for de novo sequencing –Extremely fast and useful to sequence something you already know the sequence of but want to identify mutation –Disease causing changes e.g in mitochondrial DNA –SNP discovery –Works best for examining sequence of <10 kb

20 BioSci 145B lecture 5 page 20 © copyright Bruce Blumberg 2004. All rights reserved Other sequencing technologies (contd) http://www.affymetrix.com/products/arrays/index.affx SNP discovery –Photo shows mitochondrial chip –Right panel shows pairs of normal (top) vs disease (bottom) (Leber’s Hereditary Optic Neuropathy) Top 3 disease mutations Bottom control with no change

21 BioSci 145B lecture 5 page 21 © copyright Bruce Blumberg 2004. All rights reserved Useful software for molecular biology (contd) NCBI – www.ncbi.nlm.nih.govwww.ncbi.nlm.nih.gov –main information and analysis resource –indispensable resource

22 BioSci 145B lecture 5 page 22 © copyright Bruce Blumberg 2004. All rights reserved Useful software for molecular biology (contd) NCBI – Blast – how to find similar genes www.ncbi.nlm.nih.gov/BLAST/

23 BioSci 145B lecture 5 page 23 © copyright Bruce Blumberg 2004. All rights reserved Useful software for molecular biology (contd) Why pay Celera?

24 BioSci 145B lecture 5 page 24 © copyright Bruce Blumberg 2004. All rights reserved Practice midterm 1.(6 points) Your laboratory works on the strange organisms that live around hydrothermal vents in the deep ocean as a model system for the first multicellular organisms. Your PI has developed a new method of culturing such organisms, making it possible to grow the wormlike animals found around the vents in the laboratory. One of the first things that needs to be done is to construct the molecular tools that will be required to characterize your assigned animal, the Pompeii worm (Alvinella pompejana) which can survive an environment as hot as 80° C. The ultimate goal will be to establish an A. pompejana genome project including whole genome sequencing and mapping, an EST project and DNA microarrays. The first goal is to make a genomic library. What type of library will you make, i.e., which type of vector? Justify your choice. What type of equipment will be required to make your library?

25 BioSci 145B lecture 5 page 25 © copyright Bruce Blumberg 2004. All rights reserved Practice midterm 1. answer You should choose to make a BAC or PAC library. BAC is best for genome sequencing because it accepts large inserts, is stable and the vector is small, facilitating shotgun sequencing Not so much equipment required other than standard molecular biology laboratory equipment, electroporator and PFGE – pulsed field gel electrophoresis. PFGE is indispensable for isolation of large DNA as needs to be used for making good genomic libraries.

26 BioSci 145B lecture 5 page 26 © copyright Bruce Blumberg 2004. All rights reserved Practice midterm 2.(4 points) Describe a method to make a physical map of the A. pompejana genome in order to facilitate large-scale sequencing. Use large insert genomic library to construct a map. Map the clones by fingerprinting, map as you go, or hybridization. Restriction mapping of the whole genome was NOT an acceptable answer.

27 BioSci 145B lecture 5 page 27 © copyright Bruce Blumberg 2004. All rights reserved Practice midterm 3.(5 points) You received an E. coli strain with the following genotype from a neighboring laboratory for the purposes of propagating your genomic library: mcrA, Δ(mrr-hsdRMS-mcrBC), ΔlacX74, deoR, recA1, araD139, Δ(ara- leu)7697, galU, galK, endA1, nupG (in every case above, the bacteria are DEFICIENT in the indicated gene product) a)Is this a good strain for the type of genomic library you have chosen to make, i.e., does it have the necessary genetic markers for your library to be stable and readily screened? b)If so, what are the desirable markers that the strain has. If not, which ones are missing? c)Would the strain be suitable if you had made a YAC library? Why? a) suitable for PAC and BAC b) is restriction deficient, and deoR. Some also pointed out that the strain should have lacZΔM15 for blue white selection if BACs were being used. c) strain is not suitable for YAC library because yeast artificial chromosomes can only be propagated in YEAST

28 BioSci 145B lecture 5 page 28 © copyright Bruce Blumberg 2004. All rights reserved Practice midterm 4.(5 points) A colleague has experimentally determined that the A. pompejana genome is 110 Mb – right between C. elegans (97 Mb) and Drosophila melanogaster (120 Mb). Describe a sequencing strategy that could allow the rapid generation of a draft genome sequence. How might you combine the mapping proposed in your answer to question 2 to facilitate the completion of the genome sequence? Whole genome shotgun will generate a rapid draft sequence. Combining this with whole genome map made in 2 will enable closing gaps.

29 BioSci 145B lecture 5 page 29 © copyright Bruce Blumberg 2004. All rights reserved Practice midterm 5.(6 points) As a side project, you decide to see if the A. pompejana genome contains homeobox genes. You dig into the laboratory archives and find a cDNA probe that contains the Drosophila melanogaster Antennapedia homeobox. What is the best way to find whether the A. pompejana genome contains homeobox genes? If so, how will you isolate genomic clones containing these homeobox genes? Let’s say you find 8 A. pompejana homeobox genes. Describe a quick way to tell whether they are located in one or more clusters as in Drosophila or C. elegans? Genomic southern with A. pompejana DNA probed with Antp homeobox to work out conditions Screen the genomic library you made using the Antp probe using these conditions Once you recover the 8 genes, start hybridizing them back to the large insert clones or to Southern of PFGE electrophoresis of 8-cutter digest of genomic DNA. Note whether more than 1 homeobox gene maps to each clone or fragment

30 BioSci 145B lecture 5 page 30 © copyright Bruce Blumberg 2004. All rights reserved Practice midterm 7.(6 points) Remember that you also need to provide material for the EST project. This means that it is time to make cDNA libraries, right? Assume that the libraries you make will be used for more than just EST sequencing. What sort of vector will you choose? Should you go to the trouble of enriching the library for full-length cDNAs? If so, how? Should the libraries be standard, normalized, or subtracted? Justify your answer. If normalized or subtracted libraries are required, describe generally how you will make them. Plasmid vector (NOT PAC or BAC) Yes you should enrich for full-length cDNAs since the library will be used for multiple purposes Cap trap, oligo-capping or cap-affinity chromatography gets full-length mRNA which should yield a library enriched for full-length cDNAs The libraries should be normalized since EST sequencing is contemplated and we don’t want to sequence the same thing many times Make normalized libraries by making driver from the library you wish to normalize, then hybridizing it back to ss-cDNA from that library to a low Cot value (5-10). After removing hybrids, use the remaining cDNA to make the normalize library

31 BioSci 145B lecture 5 page 31 © copyright Bruce Blumberg 2004. All rights reserved Practice midterm 8.(4 points) What are the major differences between normalized and subtracted cDNA libraries? If you want to use a cDNA library to isolate genes expressed specifically in the tail of A. pompejana compared with the head, would it be better to normalize or subtract the probe that you will use? Explain your reasoning. Normalized libraries are depleted in abundant genes and enhanced in rare genes by self-hybridization. Subtracted libraries are depleted in genes that are common between two sources A subtracted probe is appropriate here since you wish to identify genes specifically expressed in the tail.


Download ppt "BioSci 145B lecture 5 page 1 © copyright Bruce Blumberg 2004. All rights reserved BioSci 145B Lecture #5 5/4/2004 Bruce Blumberg –2113E McGaugh Hall -"

Similar presentations


Ads by Google