Bio Sci 203 Lecture 2 – genomic and cDNA libraries Bruce Blumberg (blumberg@uci.edu) office – 2113E McGaugh Hall 824-8573 lab x46873, x43116 office hours MWF 11-12 or by appointment http://blumberg-serv.bio.uci.edu/bio203-2004/index.htm http://blumberg.bio.uci.edu/bio203-w2004/index.htm Link is also on main class web site Today genomic libraries factors affecting sequence clonability in E. coli cDNA library theory and construction BioSci 203 Blumberg lecture 2 page 1 ©copyright Bruce Blumberg 2001-2005. All rights reserved
What do we commonly use genomic libraries for? Genome sequencing gene cloning prior to targeted disruption or promoter analysis positional cloning genetic mapping Radiation hybrid, STS (sequence tagged sites) chromosome walking gene identification from large insert clones disease locus isolation and characterization Considerations before making a genomic library what will you use it for, i.e., what size inserts are required? Walking to a clone isolation of genes for knockouts Are high quality validated libraries available? Caveat emptor Drosophila ~50% of clones are not traceable to original plates Research Genetics Xenopus tropicalis BAC library is really Xenopus laevis apply stringent standards, your time is valuable BioSci 203 Blumberg lecture 2 page 2 ©copyright Bruce Blumberg 2001-2005. All rights reserved
Genomic libraries (contd.) Considerations before making a genomic library (contd) availability of equipment? PFGE laboratory automation if not available locally, it may be better to use a commercial library when available Goals for a genomic library Faithful representation of genome clonability and stability of fragments essential >5 fold coverage is desirable (i.e., base library should have a complexity of five times the estimated genome size to have a 95% probability of identifying a clone. easy to screen plaques much easier to deal with colonies UNLESS you are dealing with libraries spotted in high density on filter supports easy to produce quantities of DNA for further analysis BioSci 203 Blumberg lecture 2 page 3 ©copyright Bruce Blumberg 2001-2005. All rights reserved
Construction of a genomic library Prepare HMW DNA bacteriophage λ, cosmids or fosmids partial digest with frequent (4) cutter followed by sucrose gradient fractionation or gel electrophoresis Sau3A (^GATC) most frequently used, compatible with BamHI (G^GATCC) why can’t we use rare cutters? Ligate to phage or cosmid arms then package in vitro Stratagene >>> better than competition Vectors that accept larger inserts prepare DNA by enzyme digestion in agarose blocks why? Partial digest with frequent cutter Separate size range of interest by PFGE (pulsed field gel electrophoresis) ligate to vector and transform by electroporation What is the potential flaw for all these methods? BioSci 203 Blumberg lecture 2 page 4 ©copyright Bruce Blumberg 2001-2005. All rights reserved
Construction of a genomic library (contd) What is the potential flaw for all these methods? Unequal representation of restriction sites, even 4 cutters in genome large regions may exist devoid of any restriction sites tend not to be in genes Solution? Shear DNA or cut with several 4 cutters, then methylate and attach linkers for cloning benefits should get accurate representation of genome can select restriction sites for particular vector (i.e., not limited to BamHI) pitfalls quality of methylases more steps potential for artefactual ligation of fragments molar excess of linkers BioSci 203 Blumberg lecture 2 page 5 ©copyright Bruce Blumberg 2001-2005. All rights reserved
YACs - Yeast artificial chromosomes YACs, BACs and PACs Three complementary approaches, each with its own strengths and weaknesses YACs - Yeast artificial chromosomes requires two vector arms, one with an ARS one with a centromere both fragments have selective markers trp and ura are commonly used background reduction is by dephosphorylation ligation is transformed into spheroplasts colonies picked into microtiter dishes containing media with cryoprotectant BioSci 203 Blumberg lecture 2 page 6 ©copyright Bruce Blumberg 2001-2005. All rights reserved
can propagate extremely large fragments YAC cloning YAC cloning (contd) advantages can propagate extremely large fragments may propagate sequences unclonable in E. coli disadvantages tedious to purify away from yeast chromosomes by PFGE grow slowly insert instability generally difficult to handle BioSci 203 Blumberg lecture 2 page 7 ©copyright Bruce Blumberg 2001-2005. All rights reserved
Based on the E. coli F’ plasmid BAC cloning Based on the E. coli F’ plasmid partial digests are cloned into dephosphorylated vector ligation is transformed into E. coli by electroporation advantages large plasmids - handle with usual methods Stable - stringently controlled at 1 copy/cell Vectors are small ~7 kb – good for shotgun cloning strategies disadvantages low yield no selection against nonrecombinant clones (blue/white only) apparent size limitation BioSci 203 Blumberg lecture 2 page 8 ©copyright Bruce Blumberg 2001-2005. All rights reserved
derived from bacteriophage P1 P1 cloning P1cloning systems derived from bacteriophage P1 one of the primary tools of E. coli geneticists for many years like cosmids, infect cells with packaged DNA then recover as a plasmid. useful, but size limited to 95 kb by “headfull” packaging mechanism similar to bacteriophage λ BioSci 203 Blumberg lecture 2 page 9 ©copyright Bruce Blumberg 2001-2005. All rights reserved
PAC - P1 artificial chromosome PAC cloning PAC - P1 artificial chromosome combines best features of P1 and BAC cloning size selected partial digests are ligated to dephosphorylated vector and electrotransformed into E. coli. Stored as colonies in microtiter plates Selection against non-recombinants via SacBII selection (nonrecombinant cells convert sucrose into a toxic product) inducible P1 lytic replicon allows amplification of plasmid copy number BioSci 203 Blumberg lecture 2 page 10 ©copyright Bruce Blumberg 2001-2005. All rights reserved
all the advantages of BACS stability replication as plasmids PAC cloning (contd) PAC advantages all the advantages of BACS stability replication as plasmids stringent copy control selection against nonrecombinant clones inducible P1 lytic replicon addition of IPTG causes loss of copy control and larger yields disadvantages effective size limitation (~300 kb) Vector is large – lots of vector fragments from shotgun cloning PACs BioSci 203 Blumberg lecture 2 page 11 ©copyright Bruce Blumberg 2001-2005. All rights reserved
Comparison of cloning systems BioSci 203 Blumberg lecture 2 page 12 ©copyright Bruce Blumberg 2001-2005. All rights reserved
Which type of library to make Do I need to make a new library at all? Is the library I need available? http://bacpac.chori.org/home.htm PAC libraries are suitable for most purposes BAC libraries are most widely available If your organism only has YAC libraries available you may wish to make PAC or BACs Much easier to buy pools or gridded libraries for screening doesn’t always work What is the intended use? Will this library be used many times? e.g. for isolation of clones for knockouts if so, it pays to do it right who should make the library? Going rate for custom PAC or BAC library is 50K. Most labs do not have these resources if care is taken, construction is not so difficult BioSci 203 Blumberg lecture 2 page 13 ©copyright Bruce Blumberg 2001-2005. All rights reserved
Screening of genomic libraries What types of probes are suitable for screening genomic libraries? BioSci 203 Blumberg lecture 2 page 14 ©copyright Bruce Blumberg 2001-2005. All rights reserved
Screening of genomic libraries What types of probes are suitable for screening genomic libraries? suitable cDNAs (or mRNAs) genomic fragments longer oligonucleotides (> 30 mers) Not suitable antibodies (no protein expression) degenerate (mixed) oligonucleotides (genome complexity) DNA binding proteins (genome complexity) BioSci 203 Blumberg lecture 2 page 15 ©copyright Bruce Blumberg 2001-2005. All rights reserved
Sequence stability in E. coli What are the sorts of factors that might modulate whether a sequence can be stably propagated in E. coli? 1 2 3 toxicity restriction recombination BioSci 203 Blumberg lecture 2 page 16 ©copyright Bruce Blumberg 2001-2005. All rights reserved
Sequence stability in E. coli toxicity sequence may lead to the production of a toxic product or toxic levels of an otherwise innocuous product more problematic with cDNA than genomic clones restriction - Raleigh 1987 Meth. Enzymol. 152, 130-141 virtually all microorganisms have systems to destroy non-endogenous DNA host range restriction four classes of restriction endonucleases very important for cloning purposes are recently discovered systems that degrade DNA containing 5-methyl cytosine or 6-methyl adenine. If you are cloning genomic DNA, or hemimethylated cDNA these are very important! virtually all eukaryotic DNA contains 5-methyl cytosine and/or 6-methyl adenine mcrA,B,C - methylcytosine mrr - methyl adenine BioSci 203 Blumberg lecture 2 page 17 ©copyright Bruce Blumberg 2001-2005. All rights reserved
Sequence stability in E. coli (contd) Restriction (contd) foreign DNA escapes restriction 1/105 for EcoK and EcoB, 1/10 for mcrA. one needs to be conscious of the mcr and mrr restriction status of strains and packaging extracts to be used. Recombination - Wyman and Wertman (1987) Meth Enzymol 152, 173-180 genomic DNA contains lots of repeated sequences direct repeats inverted repeats interspersed repeats (e.g. Alu) repeated sequences unstable in recombination proficient E. coli if cloned in: lambda plasmid cosmid seems not to apply to single copy vectors such as BAC and PAC What does this imply? ~30% of the human genome is unstable in plasmid or phage clones phages with such sequences either don’t grow at all or get shorter with time BioSci 203 Blumberg lecture 2 page 18 ©copyright Bruce Blumberg 2001-2005. All rights reserved
Sequence stability in E. coli (contd) Recombination (contd) E. coli has a variety of recombination pathways. These are the major players in causing sequence underrepresentation recA required for all pathways recBCD - major recombination pathway sbcB,C - suppressor of B,C minor pathways recE recF recJ rule of thumb - the more recombination pathways mutated, the sicker the cells and the slower they grow major players for inverted repeats are recBCD and sbc recA is most important for stabilizing direct repeats and preventing plasmid concatamerization BioSci 203 Blumberg lecture 2 page 19 ©copyright Bruce Blumberg 2001-2005. All rights reserved
Sequence stability in E. coli (contd) Plating a genomic library whenever possible, select a cell type that is recA, recD, sbcB and deficient in all restriction systems. Conveniently, EcoK, mcrB,C and mrr are all linked and often deleted together in strains can get more than 100 fold difference in numbers of phage between wild type and recombination deficient recD is preferred over recB,C because recD promotes rolling circle replication in lambda which improves yields BioSci 203 Blumberg lecture 2 page 20 ©copyright Bruce Blumberg 2001-2005. All rights reserved
What do I need to know about E. coli genetics? You look in a supplier’s catalog and see lots of E. coli with different genotypes of the following general form: F’{lacIq Tn10 (TetR)} mcrA, Δ(mrr-hsdRMS-mcrBC), Φ80lacZΔM15, ΔlacX74, deoR, recA1, araD139, Δ(ara-leu)7697, galU, galK, rpsL(StrR), endA1, nupG Does this make any difference for your experiments? Or should you simply follow the supplier’s instructions? Or just use whatever people in the next lab are using without thinking about it? BioSci 203 Blumberg lecture 2 page 21 ©copyright Bruce Blumberg 2001-2005. All rights reserved
What do I need to know about E. coli genetics? F’{lacIq Tn10 (TetR)} mcrA, Δ(mrr-hsdRMS-mcrBC), Φ80lacZΔM15, ΔlacX74, deoR, recA1, araD139, Δ(ara-leu)7697, galU, galK, rpsL(StrR), endA1, nupG restriction systems mcrA - cuts Cm5CGG mcrB,C - complex cuts at Gm5C mrr - restricts 6-methyl adenine containing DNA Why are these important? hsdRMS - EcoK restriction system R cuts 5'-AAC(N)6 GTGC-3’ M/S methylates A residues in this sequence for stability of long repeated sequences recA1 - deficient in general recombination recD - deficiency in Exonuclease V sbcB,C - Exonuclease I deoR - allows uptake of large DNA BioSci 203 Blumberg lecture 2 page 22 ©copyright Bruce Blumberg 2001-2005. All rights reserved
What do I need to know about E. coli genetics? (contd) for lac color selection lacZ ΔM15 either on F’ or on Φ80 prophage lacIq - constitutive expression of lac repressor. Prevents leaky expression of promoters containing lac operator for high quality DNA preps recA1 - deficient in general recombination endA1 - deficient in endonuclease I if you buy ESTs from Research Genetics (InVitrogen) or OpenBiosystems tonA - resistant to bacteriophage T1 for recombinant protein expression lon - protease deficiency OmpT - protease found in periplasmic space most important protease inhibitor for E. coli protein preps is pepstatin A BioSci 203 Blumberg lecture 2 page 23 ©copyright Bruce Blumberg 2001-2005. All rights reserved
What do I need to know about E. coli genetics? (contd) suppressors supE - inserts glutamine at UAG (amber) codons supF - inserts tyrosine at UAG (amber) codons many older phages have S100am which can only be suppressed by supF λZAP, λgt11, λZipLOX, BioSci 203 Blumberg lecture 2 page 24 ©copyright Bruce Blumberg 2001-2005. All rights reserved
Construction of cDNA libraries What is a cDNA library? What are they good for? Collection of DNA copies representing the expressed mRNA population of a cell, tissue, organ or embryo Identifying and isolating expressed mRNAs functional identification of gene products cataloging expression patterns for a particular tissue EST sequencing and microarray analysis BioSci 203 Blumberg lecture 2 page 25 ©copyright Bruce Blumberg 2001-2005. All rights reserved
Determinants of library quality What constitutes a full-length cDNA? Strictly it is an exact copy of the mRNA full-length protein coding sequence considered acceptable for most purposes mRNA full-length, capped mRNAs are critical to making full-length libraries cytoplasmic mRNAs are best – WHY? 1st strand synthesis complete first strand needs to be synthesized issues about enzymes 2nd strand synthesis thought to be less important than 1st strand (probably not) choice of vector plasmids are best for EST sequencing phages are best for manual screening how will library quality be evaluated test with 2, 4, 6, 8 kb probes to ensure that these are well represented BioSci 203 Blumberg lecture 2 page 26 ©copyright Bruce Blumberg 2001-2005. All rights reserved
mRNA is isolated from source of interest cDNA synthesis Scheme mRNA is isolated from source of interest 1-2 ug is denatured and annealed to primer containing d(T)n reverse transcriptase copies mRNA into cDNA DNA polymerase I and Rnase H convert remaining mRNA into DNA cDNA is rendered blunt ended linkers or adapters are added for cloning cDNA is ligated into a suitable vector vector is introduced into bacteria Caveats there is lots of bad information out there much is derived from vendors who want to increase sales of their enzymes or kits all manufacturers do not make equal quality enzymes most kits are optimized for speed at the expense of quality small points can make a big difference in the final outcome BioSci 203 Blumberg lecture 2 page 27 ©copyright Bruce Blumberg 2001-2005. All rights reserved