Restriction Mapping An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info.

Slides:



Advertisements
Similar presentations
Biotechnological Tools. What are we doing here?!?! One of the major advances in genetic research is the usage of recombinant DNA. Recombinant DNA refers.
Advertisements

Introduction to Techniques
Kinship DNA Fingerprinting Simulation Grab the packet from the front table and begin reading.
13-2 Manipulating DNA.
Introduction to Bioinformatics Algorithms DNA Mapping and Brute Force Algorithms.
Sequencing and Sequence Alignment
Introduction to Bioinformatics Molecular Biology Tools.
Introduction to Bioinformatics Algorithms DNA Mapping and Brute Force Algorithms.
Physical Mapping II + Perl CIS 667 March 2, 2004.
Exhaustive Search: DNA Mapping and Brute Force Algorithms
Agarose Gel Electrophoresis 1Dr. Nikhat Siddiqi. Agarose is a linear polymer made up of the basic repeating unit of agarobiose which comprises alternating.
Introduction to Bioinformatics Algorithms Exhaustive Search and Branch-and-Bound Algorithms for Partial Digest Mapping.
Gel Electrophoresis of DNA
Quickie Intro to DNA Technologies
Gene Technology Chapters 11 & 13. Gene Expression 0 Genome 0 Our complete genetic information 0 Gene expression 0 Turning parts of a chromosome “on” and.
SC.912.L Forensics and DNA fingerprinting Discuss the technologies associated with forensic medicine and DNA identification, including restriction.
Physical Mapping of DNA Shanna Terry March 2, 2004.
Copyright Pearson Prentice Hall
Biotechnology SB2.f – Examine the use of DNA technology in forensics, medicine and agriculture.
Chapter 13 Section 1 DNA Technology. DNA Identification Only.10% of the human genome varies from person to person 98% of our genetic makeup does not code.
Biotechnology.
1 Outline Last time: –Molecular biology primer (sections ) –PCR Today: –More basic techniques for manipulating DNA (Sec. 3.8) Cutting into shorter.
PROTEIN STRUCTURE NAME: ANUSHA. INTRODUCTION Frederick Sanger was awarded his first Nobel Prize for determining the amino acid sequence of insulin, the.
Introduction to Bioinformatics Algorithms DNA Mapping and Brute Force Algorithms.
III Manipulating DNA. The Tools of Molecular Biology How do scientists make changes to DNA? The Tools of Molecular Biology.
Manipulating DNA.
13-1 Changing the Living World
KEY CONCEPT Biotechnology relies on cutting DNA at specific places.
Genetic Engineering. What is genetic engineering? Application of molecular genetics for practical purposes Used to – identify genes for specific traits.
Lecture 3 Agarose Gel Electrophoresis Gel electrophoresis is a technique for the analysis of nucleic acids and proteins and preparation and analysis of.
Nucleic Acids Genetic Material. Nucleic Acids are macromolecules There are two main types: DNARNA.
Part One BIOTECHNOLOGICAL TOOLS & TECHNIQUES. What is biotechnology? Applied biology genetics; molecular biology; microbiology; biochemistry Uses living.
What is restriction fragment analysis? Restriction fragment analysis is a process used to compare the DNA of two or more different organisms.
Gel Electrophoresis. Definition – COPY ME! Separation of DNA fragments according to size and charge Based on movement through a gel medium when an.
DNA fingerprinting is not taking someone’s fingerprint. It is cutting up a DNA strand and separating them by size.
Introduction to Bioinformatics Algorithms DNA Mapping and Brute Force Algorithms.
Human Genomics. Writing in RED indicates the SQA outcomes. Writing in BLACK explains these outcomes in depth.
Locating and sequencing genes
BIOTECHNOLOGY DNA is now being easily manipulated. Molecular biologists analyze and alter genes and their respective proteins. Recombinant DNA is DNA from.
Restriction Enzymes Gabriela Perales 1. Restriction Enzymes  Restriction enzymes, also called restriction endonucleases, are molecules that cut double.
AYESHA MASRUR KHAN DECEMBER More on Restriction Enzymes 2 Restriction enzymes are Nucleases which can cleave the sugar-phosphate backbone of DNA,
Restriction Enzymes Biotechnology Fall 2013.
Section 14-3: Studying the Human Genome. Manipulating DNA The SMALLEST human chromosome contains 50 million bases DNA is a HUGE molecule that is difficult.
Biotechnological Tools and Techniques. 1. Restriction Endonuclease (enzymes) Molecular scissors. Recognizes specific sequence (recognition site) on DNA.
Gel Electrophoresis L/O - Describe how gel electrophoresis can be used to separate DNA fragments of different length. 2 3.
An Introduction to Bioinformatics Algorithmswww.bioalgorithms.info Physical Mapping – Restriction Mapping.
Biotechnology I. POINT > Define what restriction enzymes are POINT > Describe how restriction enzymes cut DNA POINT > Show how restriction enzymes facilitate.
Genetic Changes  Humans have changed the genetics of other species for thousands of years by selective breeding  Causing Artificial Selection  Natural.
Gel electrophoresis.
Part 1. Gel electrophoresis
Uses of Restriction Enzymes
Copyright Pearson Prentice Hall
DNA Technology Ch 13.
COURSE OF MICROBIOLOGY
BIOTECHNOLOGICAL TOOLS & TECHNIQUES
KEY CONCEPT Biotechnology relies on cutting DNA at specific places.
Copyright Pearson Prentice Hall
Copyright Pearson Prentice Hall
Recombinant DNA Unit 12 Lesson 2.
Restriction Endonuclease
KEY CONCEPT Biotechnology relies on cutting DNA at specific places.
Copyright Pearson Prentice Hall
Genetic Engineering Terms: Plasmid
KEY CONCEPT Biotechnology relies on cutting DNA at specific places.
Copyright Pearson Prentice Hall
CSE 5290: Algorithms for Bioinformatics Fall 2009
Copyright Pearson Prentice Hall
KEY CONCEPT Biotechnology relies on cutting DNA at specific places.
Presentation transcript:

Restriction Mapping An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

Molecular Scissors (restriction enzymes) Molecular Cell Biology, 4 th edition An Introduction to Bioinformatics Algorithms (Jones and Pevzner) HindII (first restriction enzyme): discovered accidentally in 1970 while studying how bacterium Haemophilus influenzae takes up DNA from the virus. Recognizes and cuts DNA at sequence GAATTC

Discovering Restriction Enzymes Werner Arber Daniel Nathans Hamilton Smith Werner Arber – discovered restriction enzymes Daniel Nathans - pioneered the application of restriction for the construction of genetic maps Hamilton Smith - showed that restriction enzyme cuts DNA in the middle of a specific sequence My father has discovered a servant who serves as a pair of scissors. If a foreign king invades a bacterium, this servant can cut him in small fragments, but he does not do any harm to his own king. Clever people use the servant with the scissors to find out the secrets of the kings. For this reason my father received the Nobel Prize for the discovery of the servant with the scissors". Daniel Nathans’ daughter (from Nobel lecture) An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

Recognition Sites of Restriction Enzymes Molecular Cell Biology, 4 th edition An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

Uses of Restriction Enzymes Recombinant DNA technology Recombinant DNA technology Cloning Cloning cDNA/genomic library construction cDNA/genomic library construction DNA mapping DNA mapping An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

Restriction Maps A map showing positions of restriction sites in a DNA sequence If DNA sequence is known then construction of restriction map is a trivial exercise In early days of molecular biology DNA sequences were often unknown Biologists had to solve the problem of constructing restriction maps without knowing DNA sequences An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

Full Restriction Digest Cutting DNA at each restriction site creates multiple restriction fragments: Is it possible to reconstruct the order of the fragments from the sizes of the fragments {3,5,5,9} ? An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

Full Restriction Digest: Multiple Solutions Alternative ordering of restriction fragments: vs An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

Measuring Length of Restriction Fragments Restriction enzymes break DNA into restriction fragments. Restriction enzymes break DNA into restriction fragments. Gel electrophoresis is a process for separating DNA by size and measuring sizes of restriction fragments Gel electrophoresis is a process for separating DNA by size and measuring sizes of restriction fragments Can separate DNA fragments that differ in length in only 1 nucleotide for fragments up to 500 nucleotides long Can separate DNA fragments that differ in length in only 1 nucleotide for fragments up to 500 nucleotides long An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

Gel Electrophoresis DNA fragments are injected into a gel positioned in an electric field DNA fragments are injected into a gel positioned in an electric field DNA are negatively charged near neutral pH DNA are negatively charged near neutral pH The ribose phosphate backbone of each nucleotide is acidic; DNA has an overall negative charge The ribose phosphate backbone of each nucleotide is acidic; DNA has an overall negative charge DNA molecules move towards the positive electrode DNA molecules move towards the positive electrode An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

Gel Electrophoresis (cont’d) DNA fragments of different lengths are separated according to size DNA fragments of different lengths are separated according to size Smaller molecules move through the gel matrix more readily than larger molecules Smaller molecules move through the gel matrix more readily than larger molecules The gel matrix restricts random diffusion so molecules of different lengths separate into different bands The gel matrix restricts random diffusion so molecules of different lengths separate into different bands An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

Gel Electrophoresis: Example Direction of DNA movement Smaller fragments travel farther Molecular Cell Biology, 4 th edition An Introduction to Bioinformatics Algorithms (Jones and Pevzner) Distance traveled is (roughly) inversely proportional to the logarithm of molecule size Different sized molecules form distinct bands

Detecting DNA: Autoradiography One way to visualize separated DNA bands on a gel is autoradiography: One way to visualize separated DNA bands on a gel is autoradiography: The DNA is radioactively labeled The DNA is radioactively labeled The gel is laid against a sheet of photographic film in the dark, exposing the film at the positions where the DNA is present. The gel is laid against a sheet of photographic film in the dark, exposing the film at the positions where the DNA is present. An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

Detecting DNA: Fluorescence Another way to visualize DNA bands in gel is fluorescence: Another way to visualize DNA bands in gel is fluorescence: The gel is incubated with a solution containing the fluorescent dye ethidium The gel is incubated with a solution containing the fluorescent dye ethidium Ethidium binds to the DNA Ethidium binds to the DNA The DNA lights up when the gel is exposed to ultraviolet light. The DNA lights up when the gel is exposed to ultraviolet light. An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

Partial Restriction Digest The sample of DNA is exposed to the restriction enzyme for only a limited amount of time to prevent it from being cut at all restriction sites The sample of DNA is exposed to the restriction enzyme for only a limited amount of time to prevent it from being cut at all restriction sites This experiment generates the set of all possible restriction fragments between every two (not necessarily consecutive) cuts This experiment generates the set of all possible restriction fragments between every two (not necessarily consecutive) cuts This set of fragment sizes is used to determine the positions of the restriction sites in the DNA sequence This set of fragment sizes is used to determine the positions of the restriction sites in the DNA sequence An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

Partial Digest Example Partial Digest results in the following 10 restriction fragments: Partial Digest results in the following 10 restriction fragments: An Introduction to Bioinformatics Algorithms (Jones and Pevzner) L = {3, 5, 5, 8, 9, 14, 14, 17, 19, 22} X = {0, 5, 14, 19, 22}

Partial Digest Problem: Goal: Given all pairwise distances between points on a line, reconstruct the positions of those points Goal: Given all pairwise distances between points on a line, reconstruct the positions of those points Input: The multiset of pairwise distances L, containing C(n,2) integers Input: The multiset of pairwise distances L, containing C(n,2) integers Output: A set X, of n integers, such that ΔX = L Output: A set X, of n integers, such that ΔX = L An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

Note: It is not always possible to uniquely reconstruct a set X based only on ΔX. For example, the sets X = {0, 2, 5} and (X + 10) = {10, 12, 15} both produce ΔX={2, 3, 5} as their partial digest set. The sets {0,1,2,5,7,9,12} and {0,1,5,7,8,10,12} present a less trivial example of non-uniqueness. They both digest into: {1, 1, 2, 2, 2, 3, 3, 4, 4, 5, 5, 5, 6, 7, 7, 7, 8, 9, 10, 11, 12} An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

Homometric Sets An Introduction to Bioinformatics Algorithms (Jones and Pevzner) Two sets A and B are homometric if  A =  B A = {0,1,2,5,7,9,12}B = {0,1,5,7,8,10,12}

Partial Digest: Brute Force (exhaustive search) 1. Find the restriction fragment of maximum length M. M is the length of the DNA sequence. 2. For every possible set X={ 0, x 2, …,x n-1, M} X={ 0, x 2, …,x n-1, M} compute corresponding ΔX (i.e., pairwise distances) compute corresponding ΔX (i.e., pairwise distances) 3. If ΔX is equal to the experimental partial digest L, then X is the correct restriction map An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

Partial Digest: Brute Force To do this, we will need to know n. Note that C(n,2) is n!/[(n-2)!2!] = n(n-1)/2 But |L| = C(n,2) = n(n-1)/2, so n 2 – n – 2|L| = 0 For L = {3, 5, 5, 8, 9, 14, 14, 17, 19, 22} (i.e., our previous example), |L| = 10 and n = 5. (Recall that X = {0, 5, 14, 19, 22} in that example.) An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

BruteForcePDP(L, n): M ← maximum element in L for every set of n – 2 integers 0 < x 2 < … x n-1 < M X ← {0, x 2, …, x n-1, M} form ΔX from X if ΔX = L return X output “no solution” An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

AnotherBruteForcePDP(L, n) M ← maximum element in L for every set of n – 2 integers 0 < x 2 < … x n-1 < M from L X ← { 0, x 2, …, x n-1, M } form ΔX from X if ΔX = L return X output “no solution” An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

AnotherBruteForcePDP(L, n) M ← maximum element in L for every set of n – 2 integers 0 < x 2 < … x n-1 < M from L X ← { 0, x 2, …, x n-1, M } form ΔX from X if ΔX = L return X output “no solution” An Introduction to Bioinformatics Algorithms (Jones and Pevzner) Example: L = {3, 5, 5, 8, 9, 14, 14, 17, 19, 22} n=5 Form all possible variations of X = {0, a, b, c, M} until finding one for which ΔX= L (where a, b, and c are values < M from L) Answer: X = {0, 5, 14, 19, 22}

BruteForcePDP(L, n): M ← maximum element in L for every set of n – 2 integers 0 < x 2 < … x n-1 < M X ← {0, x 2, …, x n-1, M} form ΔX from X if ΔX = L return X output “no solution” An Introduction to Bioinformatics Algorithms (Jones and Pevzner) Efficiency: 1.There are C(M-1,n-2) sets of integers having values in the range (0,M) 2.Creating X, forming ΔX from X, and comparing ΔX to L each requires a constant number of operations 3.So, efficiency is O(C(M-1,n-2))  O(M n-2 )

AnotherBruteForcePDP(L, n) M ← maximum element in L for every set of n – 2 integers 0 < x 2 < … x n-1 < M from L X ← { 0, x 2, …, x n-1, M } form ΔX from X if ΔX = L return X output “no solution” An Introduction to Bioinformatics Algorithms (Jones and Pevzner) Efficiency: 1.There are C(|L|,n-2) sets of integers in L having values in the range [0,M]. Note that |L| = n(n-1)/2. 2.As before, the other processes each take a constant number of operations 3.So, efficiency is O(C(|L|,n-2))  O(n 2n-4 )

An Introduction to Bioinformatics Algorithms (Jones and Pevzner) Compare AnotherBruteForcePDP with BruteForcePDP More efficient, but still slow Consider L = {2, 998, 1000} (n = 3, M = 1000), BruteForcePDP will be extremely slow, but AnotherBruteForcePDP will be quite fast Fewer sets are examined, but runtime is still exponential: O(n 2n-4 )

PartialDigest(L) width ← Maximum element in L DELETE(width, L) X ← {0, width} PLACE(L, X) if L is empty output X return y ← maximum element in L if Δ(y, X )  L Add y to X and remove lengths Δ(y, X) from L PLACE(L,X ) Remove y from X and add lengths Δ(y, X) to L if Δ(width-y, X )  L Add width-y to X and remove lengths Δ(width-y, X) from L PLACE(L,X ) Remove width-y from X and add lengths Δ(width-y, X) to L return A Better Algorithm… Notes: 1.DELETE(y, L) removes the value y from L. 2.Δ(y, X) denotes the multiset of distances between a point y and all points in a set X. 3.After each recursive call in PLACE, X and L are restored to their condition before the call in case another branch in the search tree must be explored 4.The algorithm lists all sets X with ΔX = L. Consider an example where L = { 2, 2, 3, 3, 4, 5, 6, 7, 8, 10}… An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

Example

PartialDigest(L) width ← Maximum element in L DELETE(width, L) X ← {0, width} PLACE(L, X) if L is empty output X return y ← maximum element in L if Δ(y, X )  L Add y to X and remove lengths Δ(y, X) from L PLACE(L,X ) Remove y from X and add lengths Δ(y, X) to L if Δ(width-y, X )  L Add width-y to X and remove lengths Δ(width-y, X) from L PLACE(L,X ) Remove width-y from X and add lengths Δ(width-y, X) to L return A Better Algorithm… Efficiency: For the ideal case, only one recursive call is made in PLACE each time PLACE is called. The amount of work done for the call is O(n) the first time, O(n-1) the second time, etc., and this continues for n times, so the total work is n+(n-1)+(n-2)+…+1 = n(n+1)/2 or O(n 2 ). For pathological cases where both recursive calls are made in PLACE (i.e., if both alternatives are viable) each time PLACE is called, the complexity is O(2 n ) where n is |X|. An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

Notes for Brute Force Approaches BruteForcePDP M ← maximum element in L for every set of n – 2 integers 0 < x 2 < … x n-1 < M X ← {0, x 2, …, x n-1, M} form ΔX from X if ΔX = L return X output “no solution” AnotherBruteForcePDP M ← maximum element in L for every set of n – 2 integers 0 < x 2 < … x n-1 < M from L X ← {0, x 2, …, x n-1, M} form ΔX from X if ΔX = L return X output “no solution” compare We would like to use the same algorithm to solve both. In order to do this, let’s put the values we will be using to create X into an array called workingArray. For BruteForcePDP we will be dealing with #s 1, 2, 3, 4, …, M-1. Let’s call this set of values allValues. For AnotherBruteForcePDP we will be dealing with #s from L except M and any duplicates. Let’s call this set of values reducedL. So, in order to use the same code for both BruteForcePDP and AnotherBruteForcePDP, all we need to do is put allValues or reducedL into workingArray, respectively, then use workingArray: GenericBruteForcePDP if algorithm = BruteForcePDP workingArray ← allValues else workingArray ← reducedL M ← maximum element in L for every set of n – 2 integers in workingArray X ← { 0, x 2, …, x n-1, M } form ΔX from X if ΔX = L return X output “no solution”

Notes for Brute Force Approaches Our next problem is to generate every possible set of n-2 integers from values in workingArray. One way is to envision this as a tree search problem where the leaf nodes represent the possible arrangements of the values in workingArray. For example, consider L = {2, 2, 5, 7, 9, 10} and n = 4. In this case, workingArray contains the values {2, 5, 7, 9}. A simple (but naïve) tree would look like this:But this tree eliminates redundancy : Of course, we really just want the leaf nodes. To produce them, we can simply perform a depth first search, adding to the set of values at each of the n-2 positions (from left to right) as we go deeper into the tree until all positions have been filled. Our choice of value at any time will be made from the unused values in workingArray. When we use a value, we must remove it from workingArray so that it cannot be used at the next level. However, in order for this to work, we must restore workingArray to its previous state when we backtrack to a node. The easiest way to do this is via recursion, in which case we only need to make a copy of workingArray before the next recursive call, remove the appropriate value from the copy, and then pass the copy. That way, when returning from the recursive call, workingArray will already be as it was before the recursion. (continued) Combinations: C(x,y) = x!/[(x-y)!y!] C(4,2) = 4!/[(4-2)!2!] = 6 Permutations: P(x,y) = x!/(x-y)! P(4,2) = 4!/(4-2)! = 12

Notes for Brute Force Approaches (continued) Let setOfIntegers be the collection of n-2 integers that we must generate (i.e., a candidate map). Recall that this will begin with no values. Here is pseudocode for a depth-first traversal of the search tree: depthFirst (setOfIntegers, workingArray) if setOfIntegers is complete (i.e., has no unfilled positions) if ΔX = L show setOfIntegers return for each position in workingArray V ← value at current position in workingArray (i.e., next unused value) workingArrayCopy ← workingArray remove V from workingArrayCopy setOfIntegersCopy ← setOfIntegers next available position in setOfIntegersCopy ← V depthFirst (setOfIntegersCopy, workingArrayCopy) return Note: Before each recursive call we are reducing the contents of workingArray and increasing the number of values in setOfIntegers (i.e., the candidate map). We make copies of these arrays and pass them so that upon return both workingArray and setOfIntegers are as they were before being modified for the recursive call.