Computational Biology Lecture #3: Mapping

Slides:



Advertisements
Similar presentations
Bacterial Transformation
Advertisements

Recombinant DNA Introduction to Recombinant DNA technology
Physical Mapping I CIS 667 February 26, Physical Mapping A physical map of a piece of DNA tells us the location of certain markers  A marker is.
Recombinant DNA Technology
Lecture ONE: Foundation Course Genetics Tools of Human Molecular Genetics I.
CS273a Lecture 4, Autumn 08, Batzoglou Hierarchical Sequencing.
CS273a Lecture 2, Autumn 10, Batzoglou DNA Sequencing (cont.)
Cloning:Recombinant DNA
Genome sequencing. Vocabulary Bac: Bacterial Artificial Chromosome: cloning vector for yeast Pac, cosmid, fosmid, plasmid: cloning vectors for E. coli.
Gene Cloning Techniques for gene cloning enable scientists to prepare multiple identical copies of gene-sized pieces of DNA. Most methods for cloning pieces.
Cloning into Plasmids Restriction Fragment Cloning & PCR Cloning by the Topo TA™ Method.
TOOLS OF GENETIC ENGINEERING
Section 20.3 – DNA and Biotechnology. DNA and Biotechnology  Carpenters require tools such as hammers, screwdrivers, and saws, and surgeons require scalpels,
Reading the Blueprint of Life
Genetic Engineering Do you want a footer?.
Presentation on genome sequencing. Genome: the complete set of gene of an organism Genome annotation: the process by which the genes, control sequences.
CHAPTER 20 BIOTECHNOLOGY: PART I. BIOTECHNOLOGY Biotechnology – the manipulation of organisms or their components to make useful products Biotechnology.
Recombinant DNA and Biotechnology Gene cloning in bacterial plasmids Plasmid – extrachromosomal piece of DNA not necessary for survival can be transferred.
1 Genetics Faculty of Agriculture Instructor: Dr. Jihad Abdallah Topic 13:Recombinant DNA Technology.
Recombinant DNA I Basics of molecular cloning Polymerase chain reaction cDNA clones and screening.
Molecular Basis for Relationship between Genotype and Phenotype DNA RNA protein genotype function organism phenotype DNA sequence amino acid sequence transcription.
AP Biology Biotechnology Part 3. Bacterial Cloning Process Bacterium Bacterial chromosome Plasmid Gene inserted into plasmid Cell containing gene of interest.
DNA Cloning and PCR.
Biotechnology Methods Producing Recombinant DNAProducing Recombinant DNA Locating Specific GenesLocating Specific Genes Studying DNA SequencesStudying.
Cell-based DNA Cloning
Recombinant DNA Technology. Restriction endonucleases - Blunt ends and Sticky ends.
Genetics 6: Techniques for Producing and Analyzing DNA.
19.1 Techniques of Molecular Genetics Have Revolutionized Biology
© SSER Ltd.. Gene Technology or Recombinant DNA Technology is about the manipulation of genes Recombinant DNA Technology involves the isolation of DNA.
PHARMACOBIOTECHNOLOGY.  Recombinant DNA (rDNA) is constructed outside the living cell using enzymes called “restriction enzymes” to cut DNA at specific.
Linkage and Mapping. Figure 4-8 For linked genes, recombinant frequencies are less than 50 percent.
Human Genome.
GENETIC ENGINEERING CHAPTER 20
Lecture # 04 Cloning Vectors.
Genetic Engineering Genetic engineering is also referred to as recombinant DNA technology – new combinations of genetic material are produced by artificially.
Molecular Biology II Lecture 1 OrR. Restriction Endonuclease (sticky end)
Restriction Enzymes Gabriela Perales 1. Restriction Enzymes  Restriction enzymes, also called restriction endonucleases, are molecules that cut double.
Molecular Cloning.
A Molecular Toolkit AP Biology Fall The Scissors: Restriction Enzymes  Bacteria possess restriction enzymes whose usual function is to cut apart.
Relationship between Genotype and Phenotype
Molecular Cloning. Definitions   Cloning :   Obtaining a piece of DNA from its original source (Genome) and introducing it in a DNA vector   Sub-cloning:
Genome Analysis. This involves finding out the: order of the bases in the DNA location of genes parts of the DNA that controls the activity of the genes.
Molecular Basis for Relationship between Genotype and Phenotype DNA RNA protein genotype function organism phenotype DNA sequence amino acid sequence transcription.
Albia Dugger Miami Dade College Cecie Starr Christine Evers Lisa Starr Chapter 15 Biotechnology (Sections )
Biotechnology.
Molecular Genetics Diagnosis Methods
Gene Cloning Techniques for gene cloning enable scientists to prepare multiple identical copies of gene-sized pieces of DNA. Most methods for cloning pieces.
Lecture# 2 Recombinant DNA technology
Molecular Genetic Analysis and Biotechnology
DNA Technologies (Introduction)
Recombinant DNA (rDNA) technology
Bacterial Transformation
Introduction to Biotechnology
Genome sequence assembly
Pre-genomic era: finding your own clones
Dr. Peter John M.Phil, PhD Assistant Professor Atta-ur-Rahman School of Applied Biosciences (ASAB) National University of Sciences & Technology (NUST)
Human Population Genomics bud mishra
Gene Isolation and Manipulation
Biotechnology: Part 1 DNA Cloning, Restriction Enzymes and Plasmids
Material for Quiz 5: Chapter 8
Computational Biology Lecture #3: Mapping
Chapter 20 Biotechnology.
Chapter 14 Bioinformatics—the study of a genome
Presentation Topic Cloning Vector and its Types Presented By
CLONING VECTORS Shumaila Azam.
Recombinant DNA Technology
Recombinant DNA Unit 12 Lesson 2.
Molecular Cloning.
Restriction Enzyme Digestion of DNA
Relationship between Genotype and Phenotype
Presentation transcript:

Computational Biology Lecture #3: Mapping Bud Mishra Professor of Computer Science and Mathematics 10 ¦ 1 ¦ 2001 11/27/2018 ©Bud Mishra, 2001

Restriction Enzyme ©Bud Mishra, 2001 Type II sequence specific restriction endonuclease An enzyme that can “cut “a double stranded DNA by breaking the phosophodiester bonds at specific “target or restriction sites” on the DNA. Retriction Sites: Completely determined by their base pair decomposition 4 » 8 long sequences of base pairs Restriction Pattern 11/27/2018 ©Bud Mishra, 2001

Restriction Patterns ©Bud Mishra, 2001 Restriction patterns are “reverse palindromic”: s = sCR Restriction Enzyme Restriction Patterns Bam H I G | G A T C C C C T A G | G Eco R I G | A A T T C C T T A A | G Hae III G G | C C C C | G G Hpa II C | C G G G | G C C Sal I G | T C G A C C A G C T | G 11/27/2018 ©Bud Mishra, 2001

Restriction Enzymes ©Bud Mishra, 2001 Bacterial Immune Systems against Viral DNA Bacteria use restriction enzymes by cleaving invading foreign DNA Bacteria protect their own DNA against leaving by a methylation process Restriction Enzymes are very useful in biotechnology as Biochemical Scissors Biochemical Markers 11/27/2018 ©Bud Mishra, 2001

Applications of Restriction Enzyme RFLP (Restriction Fragment Length Polymorphisms) Polymorphisms ´ Sequence variation within a population Restriction Maps Fingerprints Double Digestion Maps Multiple Complete Digestion Maps Ordered Restriction Maps Clone Library (with Partial Digestion) DNA Probes (Small Restriction Fragments) 11/27/2018 ©Bud Mishra, 2001

Digression Brun’s Sieve: Poison Approximation Theorem: Let W be a nonnegative integer valued random variable such that E[CW,i] = li/i! Then Pr[W=M] ¼ e-l lM/M! Proof: Let the indicator variable IW=j be IW=j = { 1 if W=j {0 otherwise Show that IW=j = åk=01 CW,j+k Cj+k, k (-1)k 11/27/2018 ©Bud Mishra, 2001

Brun’s Sieve ©Bud Mishra, 2001 IW=j = CW,j åk=0W-j CW-j,k (-1)k = åk=0W-j CW,j CW-j,k (-1)k = åk=01 CW,j+k Cj+k,k (-1)k By Convention, CW,j = 0 if j > W. Pr[W=m] = E[1W=m] = åk=01 E[CW, M+k CM+k,k (-1)k ¼ åk=01 lM+k/(M+k)! WM+k, k (-1)k = lM/M! åk=01 (-l)k/k! = e-l lM/M! ð 11/27/2018 ©Bud Mishra, 2001

Restriction Map: Resolution G = Length of a genomic DNA pk = Probability that an arbitrary site is a restriction site for a k-cutter enzyme k = 4, 6 or 8 = Cutting Frequency Uniform i.i.d. assumption: “All base pairs occur at any given position with equal probability and independently: pk = 1/(4k) 11/27/2018 ©Bud Mishra, 2001

Numerical Values ©Bud Mishra, 2001 pk = 1/(4k) lk = G pk Cut Numbers Cut Probability p4 1/256 p6 1/4,096 p8 1/65,536 p10 1/1,048,576 11/27/2018 ©Bud Mishra, 2001

Statistics of Restriction Sites Xi = Bernouli r.v. = Event that “there is a restriction site beginning at i.” W = åi=1G Xi = Total # restriction sites in the genome = # successes in G independent trials CW,i = # of “i-successful trials.” Consider the set of all “i-trials” There are CG, i of these Each “i-successful trial occurs with probability pki 11/27/2018 ©Bud Mishra, 2001

Applying Brun’s Sieve ©Bud Mishra, 2001 E[CW,i] = CG,i pki = (G(i)/i!)pki = (G pk)i/i! = lki/i! Pr[# Restriction Sites = M | G, lk] = Pr[W=M] ¼ e-lk (lk)M/M! ð E[W] = lk = G pk = G/4k s2[W] = lk a S.D.[W] = G1/2/2k 11/27/2018 ©Bud Mishra, 2001

Statistics of Restriction Fragments Pr[A restriction Fragment is of length l] = (1 –pk)l pk ¼ e-l/mk/mk, where mk-1 = log (1/1-pk) W =r.v. with exponential distribution with mean mk : fW(w) = e-w/mk/mk, w > 0 Z = b W c =r.v. giving the length of a restriction fragment in base pairs E[W] = mk ¼ 1/pk= 4k s2[W] = mk2 a S.D.[W] = mk ¼ 4k 11/27/2018 ©Bud Mishra, 2001

Matching Rules for Restriction Fragments Given two restriction fragments without any identifying markers, when can they be said to be the same? We must account for small measurement errors: b = Relative Sizing Errors Matching Rule (II): Two restriction fragments are said to match if their Lengths x and y differ by less than b fraction (I.e. < 100 b % -b 5 1 – y/x 5 b 11/27/2018 ©Bud Mishra, 2001

False Positive Match Probability Given: Two randomly chosen distinct restriction fragments obtained by cleaving a large genomic DNA by the same restriction enzyme, What is the probability that the matching rule accidentally identify as the same? 11/27/2018 ©Bud Mishra, 2001

False Positive Probability mk = Expected length of a restriction fragment x, y » Exponential(1/mk) fX(x) = e-x/mk/mk False Positive Probability = s01 ( sx(1-b)x(1+b) e-y/mk/mk dy ) e-x/mk/mk dx =s01 ( sv(1-b)v(1+b) e-udu) e-vdv =s01 ( e-v(1-b)-ev(1+b)) e-vdv =(1/2-b) – (1/2+b) = 2 b/(4-b2) ¼ b/2 11/27/2018 ©Bud Mishra, 2001

Maps using Clones ©Bud Mishra, 2001 Clone: A large fragment of genomic DNA that has been preselected. One can make faithful copies of a clone large number of times from a small number of initial clones. All location information for a clone is assumed to be lost. For instance: it is not known: Which chromosome a clone belongs to… Whether two clones overlap… What base-pair sequence the clone has… etc. 11/27/2018 ©Bud Mishra, 2001

Clone Libraries Commonly Used Clones Insert Size Lambda (l) 2—20 Kb Cosmid (Artificial Plasmid) 20—45 Kb BAC (Bacterial Artificial Chromosome) 100—200 Kb YAC (Yeast Artificial Chromosome) 1—2 Mb 11/27/2018 ©Bud Mishra, 2001

Clone Library ©Bud Mishra, 2001 A preselected set of clones ´ Clone Library Locations of the clones are assumed to be uniformly random i.i.d. The size of a clone is roughly same. G = Genome length, L = Clone Length, N = # Clones in a library Coverage = NL/G = c (The number of times the clones will cover the genome if the clones are concatenated end-to-end. Also, the expected number of clones covering any location of the genome.) 11/27/2018 ©Bud Mishra, 2001

Example ©Bud Mishra, 2001 A BAC library for human G = 3,300 Mb, L =180 Kb, N = 96,000 c = NL/G = 96 £ 103 £ 180 £ 103/ (3.3 £ 109) ¼ 6£ 96,000 randomly chosen BACs from the human genome provide a 6£ library. Certain regions of the genome may be difficult to clone and hence may not be represented in the library. A Tiling Path = A subset of clones that minimally cover the genome. Removal of any clone from the tiling path will leave some location of the genome uncovered. Every location of the genome is covered by no more than two clones. Every clone is overlapped by at most two other clones. The coverage for a tiling path: 1 · cTP · 2 11/27/2018 ©Bud Mishra, 2001

Clone Library ©Bud Mishra, 2001 Genome Clone Library Minimal Tiling Path 11/27/2018 ©Bud Mishra, 2001

Mapping A Single Clone ©Bud Mishra, 2001 Restriction Pattern: Decorate a clone with additional information—E.g., Restriction Pattern (Ordered Restriction Map, Finger Prints) End Sequencing (500 base pairs on each end) Probes (PCR products, Hybridization probes, etc.) Restriction Pattern: Take a clone and completely digest it into small pieces (restriction fragments) by a restriction enzyme. The restriction fragments and their order are always the same for that clone. 11/27/2018 ©Bud Mishra, 2001

Restriction Maps of a Clone Clone with Restriction Sites 1 2 3 4 5 6 1 2 3 4 5 6 5 Ordered Restriction Map of the Clone (Ordered set of Restriction Fragments) 2 4 1 3 Finger Print or Unordered Restriction Map of the Clone (unordered collection of Restriction Fragments) 6 11/27/2018 ©Bud Mishra, 2001

A Clone Map ©Bud Mishra, 2001 Key Question: Given two clones, when can we say whether they overlap by simply examining their fingerprints or maps? Issues: False positive and false negative in overlap detection Ordering all the clones using the overlap prpoerties Computing the tiling path Subcloning and sequencing (Divide-and Conquer) 11/27/2018 ©Bud Mishra, 2001

Amplification by Molecular Cloning In vivo Approach: Ingredients a Host Organism: E. coli bacteria or yeast replicates a suitably modified foreign DNA. Cloning Vector: Insert DNA: Cell will not replicate any foreign DNA in the absence of a suitable cloning vector. Combined to create a circular Recombinant DNA—”replicon” Vector Insert ”replicon” 11/27/2018 ©Bud Mishra, 2001

Cloning ©Bud Mishra, 2001 Step 1: Step 2: Step3: Inserts and vectors with same “sticky ends” are mixed together with ligase enzyme. This produces a circular replicon. Step 2: Transform the host cell by exposing a population of hosts to the ligase mixture containing the replicon The replicons are inserted into the host cell Transformed host cells are transferred to culture dishes containing a solid growth medium Cells divide making a colony containing 230 ¼ 109 inserts in 10 hours. Step3: Identify the colonies of clones containing the copies of the inserts Pick these colonies Isolate and linearize the replicons. 11/27/2018 ©Bud Mishra, 2001

Sequencing A Genome ©Bud Mishra, 2001 A “divide-and-conquer” approach: Step 1: Divide…Create a “high coverage” clone library by choosing many randomly located clones (E.g., 96,000 BAC clones- each of length 180 Kb – from a human genome of length 3,300 Mb. 6£ coverage BAC library.) Step 2: Contig…Use the clone overlap information to create the contigs (E.g., 6 \times coverage BAC library would yield 96,000 \times e^{-6} \approx 200 contigs—About 10 contigs per chromosome each of size aout 10 Mb) 11/27/2018 ©Bud Mishra, 2001

Sequencing A Genome ©Bud Mishra, 2001 Step 3: Prune…Remove “non-essential” clones from the contigs to form a “minimal tiling path.” (E.g., Minimal tiling path would consist of \sim 32,000 BAC clones.) Step 4: Shotgun Sequencing…Subclone a BAC on the minimal tiling path into M13’s. Generate sequence reads from M13 subclones. Sequence reads = 300 \sim 1,000 bps, 95% accuracy. Step 5: Contig the sequence reads… Step 6: Assemble the sequences and close the gaps… 11/27/2018 ©Bud Mishra, 2001

Finishing Phase ©Bud Mishra, 2001 Filling the gaps between the contigs: Synthesize a primer from the end of the contig sequence Generate a new read from the M13 subclone that starts with the synthesized primer. If there is no such M13 subclone— Synthesize a pair of primers from the sequence at the ends of a “gap” Amplify the DNA across the gap by performing PCR on the clone DNA Sequence the PCR product. 11/27/2018 ©Bud Mishra, 2001

Sequence Assembly ©Bud Mishra, 2001 Idealized Assembly: Assuming no error in the read sequence. Shortest Common Superstring Problem: Given: A set {si}, where si is a string over some alphabet. Find: The shortest string S which contains each si as a contiguous substring. (SCSP – Shortest Common Superstring Problem – is NP-complete) 11/27/2018 ©Bud Mishra, 2001

Greedy Algorithm for Sequence Assembly Find overlaps between pairs of sequence reads – (Only consider overlaps that span at least 15 bps.) Sort overlaps by decreasing length Merge read contigs according to the sorted list. 11/27/2018 ©Bud Mishra, 2001