Restriction Mapping An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info.

Restriction Mapping An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

Molecular Scissors (restriction enzymes) Molecular Cell Biology, 4 th edition An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info HindII (first restriction enzyme): discovered accidentally in 1970 while studying how bacterium Haemophilus influenzae takes up DNA from the virus. Recognizes and cuts DNA at sequence GAATTC

Discovering Restriction Enzymes Werner Arber Daniel Nathans Hamilton Smith Werner Arber – discovered restriction enzymes Daniel Nathans - pioneered the application of restriction for the construction of genetic maps Hamilton Smith - showed that restriction enzyme cuts DNA in the middle of a specific sequence My father has discovered a servant who serves as a pair of scissors. If a foreign king invades a bacterium, this servant can cut him in small fragments, but he does not do any harm to his own king. Clever people use the servant with the scissors to find out the secrets of the kings. For this reason my father received the Nobel Prize for the discovery of the servant with the scissors". Daniel Nathans’ daughter (from Nobel lecture) An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

Recognition Sites of Restriction Enzymes Molecular Cell Biology, 4 th edition An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

Uses of Restriction Enzymes Recombinant DNA technology Recombinant DNA technology Cloning Cloning cDNA/genomic library construction cDNA/genomic library construction DNA mapping DNA mapping An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

Restriction Maps A map showing positions of restriction sites in a DNA sequence If DNA sequence is known then construction of restriction map is a trivial exercise In early days of molecular biology DNA sequences were often unknown Biologists had to solve the problem of constructing restriction maps without knowing DNA sequences An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

Full Restriction Digest Cutting DNA at each restriction site creates multiple restriction fragments: Is it possible to reconstruct the order of the fragments from the sizes of the fragments {3,5,5,9} ? An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

Full Restriction Digest: Multiple Solutions Alternative ordering of restriction fragments: vs An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

Measuring Length of Restriction Fragments Restriction enzymes break DNA into restriction fragments. Restriction enzymes break DNA into restriction fragments. Gel electrophoresis is a process for separating DNA by size and measuring sizes of restriction fragments Gel electrophoresis is a process for separating DNA by size and measuring sizes of restriction fragments Can separate DNA fragments that differ in length in only 1 nucleotide for fragments up to 500 nucleotides long Can separate DNA fragments that differ in length in only 1 nucleotide for fragments up to 500 nucleotides long An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

Gel Electrophoresis DNA fragments are injected into a gel positioned in an electric field DNA fragments are injected into a gel positioned in an electric field DNA are negatively charged near neutral pH DNA are negatively charged near neutral pH The ribose phosphate backbone of each nucleotide is acidic; DNA has an overall negative charge The ribose phosphate backbone of each nucleotide is acidic; DNA has an overall negative charge DNA molecules move towards the positive electrode DNA molecules move towards the positive electrode An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

Gel Electrophoresis (cont’d) DNA fragments of different lengths are separated according to size DNA fragments of different lengths are separated according to size Smaller molecules move through the gel matrix more readily than larger molecules Smaller molecules move through the gel matrix more readily than larger molecules The gel matrix restricts random diffusion so molecules of different lengths separate into different bands The gel matrix restricts random diffusion so molecules of different lengths separate into different bands An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

Gel Electrophoresis: Example Direction of DNA movement Smaller fragments travel farther Molecular Cell Biology, 4 th edition An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info Distance traveled is (roughly) inversely proportional to the logarithm of molecule size Different sized molecules form distinct bands

Detecting DNA: Autoradiography One way to visualize separated DNA bands on a gel is autoradiography: One way to visualize separated DNA bands on a gel is autoradiography: The DNA is radioactively labeled The DNA is radioactively labeled The gel is laid against a sheet of photographic film in the dark, exposing the film at the positions where the DNA is present. The gel is laid against a sheet of photographic film in the dark, exposing the film at the positions where the DNA is present. An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

Detecting DNA: Fluorescence Another way to visualize DNA bands in gel is fluorescence: Another way to visualize DNA bands in gel is fluorescence: The gel is incubated with a solution containing the fluorescent dye ethidium The gel is incubated with a solution containing the fluorescent dye ethidium Ethidium binds to the DNA Ethidium binds to the DNA The DNA lights up when the gel is exposed to ultraviolet light. The DNA lights up when the gel is exposed to ultraviolet light. An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

Partial Restriction Digest The sample of DNA is exposed to the restriction enzyme for only a limited amount of time to prevent it from being cut at all restriction sites The sample of DNA is exposed to the restriction enzyme for only a limited amount of time to prevent it from being cut at all restriction sites This experiment generates the set of all possible restriction fragments between every two (not necessarily consecutive) cuts This experiment generates the set of all possible restriction fragments between every two (not necessarily consecutive) cuts This set of fragment sizes is used to determine the positions of the restriction sites in the DNA sequence This set of fragment sizes is used to determine the positions of the restriction sites in the DNA sequence An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

Partial Digest Example Partial Digest results in the following 10 restriction fragments: Partial Digest results in the following 10 restriction fragments: An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info L = {3, 5, 5, 8, 9, 14, 14, 17, 19, 22} X = {0, 5, 14, 19, 22}

Partial Digest Problem: Goal: Given all pairwise distances between points on a line, reconstruct the positions of those points Goal: Given all pairwise distances between points on a line, reconstruct the positions of those points Input: The multiset of pairwise distances L, containing C(n,2) integers Input: The multiset of pairwise distances L, containing C(n,2) integers Output: A set X, of n integers, such that ΔX = L Output: A set X, of n integers, such that ΔX = L An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

Note: It is not always possible to uniquely reconstruct a set X based only on ΔX. For example, the sets X = {0, 2, 5} and (X + 10) = {10, 12, 15} both produce ΔX={2, 3, 5} as their partial digest set. The sets {0,1,2,5,7,9,12} and {0,1,5,7,8,10,12} present a less trivial example of non-uniqueness. They both digest into: {1, 1, 2, 2, 2, 3, 3, 4, 4, 5, 5, 5, 6, 7, 7, 7, 8, 9, 10, 11, 12} An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

01257912 0 12579 1 146811 2 35710 5 247 7 25 9 3 12 015781012 0 15781012 1 467911 5 2357 7 135 8 24 10 2 12 Homometric Sets An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info Two sets A and B are homometric if  A =  B A = {0,1,2,5,7,9,12}B = {0,1,5,7,8,10,12}

Partial Digest: Brute Force (exhaustive search) 1. Find the restriction fragment of maximum length M. M is the length of the DNA sequence. 2. For every possible set X={ 0, x 2, …,x n-1, M} X={ 0, x 2, …,x n-1, M} compute corresponding ΔX (i.e., pairwise distances) compute corresponding ΔX (i.e., pairwise distances) 3. If ΔX is equal to the experimental partial digest L, then X is the correct restriction map An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

Partial Digest: Brute Force To do this, we will need to know n. Note that C(n,2) is n!/[(n-2)!2!] = n(n-1)/2 But |L| = C(n,2) = n(n-1)/2, so n 2 – n – 2|L| = 0 For L = {3, 5, 5, 8, 9, 14, 14, 17, 19, 22} (i.e., our previous example), |L| = 10 and n = 5. (Recall that X = {0, 5, 14, 19, 22} in that example.) An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

BruteForcePDP(L, n): M ← maximum element in L for every set of n – 2 integers 0 < x 2 < … x n-1 < M X ← {0, x 2, …, x n-1, M} form ΔX from X if ΔX = L return X output “no solution” An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

AnotherBruteForcePDP(L, n) M ← maximum element in L for every set of n – 2 integers 0 < x 2 < … x n-1 < M from L X ← { 0, x 2, …, x n-1, M } form ΔX from X if ΔX = L return X output “no solution” An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

AnotherBruteForcePDP(L, n) M ← maximum element in L for every set of n – 2 integers 0 < x 2 < … x n-1 < M from L X ← { 0, x 2, …, x n-1, M } form ΔX from X if ΔX = L return X output “no solution” An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info Example: L = {3, 5, 5, 8, 9, 14, 14, 17, 19, 22} n=5 Form all possible variations of X = {0, a, b, c, M} until finding one for which ΔX= L (where a, b, and c are values < M from L) Answer: X = {0, 5, 14, 19, 22}

BruteForcePDP(L, n): M ← maximum element in L for every set of n – 2 integers 0 < x 2 < … x n-1 < M X ← {0, x 2, …, x n-1, M} form ΔX from X if ΔX = L return X output “no solution” An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info Efficiency: 1.There are C(M-1,n-2) sets of integers having values in the range (0,M) 2.Creating X, forming ΔX from X, and comparing ΔX to L each requires a constant number of operations 3.So, efficiency is O(C(M-1,n-2))  O(M n-2 )

AnotherBruteForcePDP(L, n) M ← maximum element in L for every set of n – 2 integers 0 < x 2 < … x n-1 < M from L X ← { 0, x 2, …, x n-1, M } form ΔX from X if ΔX = L return X output “no solution” An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info Efficiency: 1.There are C(|L|,n-2) sets of integers in L having values in the range [0,M]. Note that |L| = n(n-1)/2. 2.As before, the other processes each take a constant number of operations 3.So, efficiency is O(C(|L|,n-2))  O(n 2n-4 )

An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info Compare AnotherBruteForcePDP with BruteForcePDP More efficient, but still slow Consider L = {2, 998, 1000} (n = 3, M = 1000), BruteForcePDP will be extremely slow, but AnotherBruteForcePDP will be quite fast Fewer sets are examined, but runtime is still exponential: O(n 2n-4 )

PartialDigest(L) width ← Maximum element in L DELETE(width, L) X ← {0, width} PLACE(L, X) if L is empty output X return y ← maximum element in L if Δ(y, X )  L Add y to X and remove lengths Δ(y, X) from L PLACE(L,X ) Remove y from X and add lengths Δ(y, X) to L if Δ(width-y, X )  L Add width-y to X and remove lengths Δ(width-y, X) from L PLACE(L,X ) Remove width-y from X and add lengths Δ(width-y, X) to L return A Better Algorithm… Notes: 1.DELETE(y, L) removes the value y from L. 2.Δ(y, X) denotes the multiset of distances between a point y and all points in a set X. 3.After each recursive call in PLACE, X and L are restored to their condition before the call in case another branch in the search tree must be explored 4.The algorithm lists all sets X with ΔX = L. Consider an example where L = { 2, 2, 3, 3, 4, 5, 6, 7, 8, 10}… An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

Example

PartialDigest(L) width ← Maximum element in L DELETE(width, L) X ← {0, width} PLACE(L, X) if L is empty output X return y ← maximum element in L if Δ(y, X )  L Add y to X and remove lengths Δ(y, X) from L PLACE(L,X ) Remove y from X and add lengths Δ(y, X) to L if Δ(width-y, X )  L Add width-y to X and remove lengths Δ(width-y, X) from L PLACE(L,X ) Remove width-y from X and add lengths Δ(width-y, X) to L return A Better Algorithm… Efficiency: For the ideal case, only one recursive call is made in PLACE each time PLACE is called. The amount of work done for the call is O(n) the first time, O(n-1) the second time, etc., and this continues for n times, so the total work is n+(n-1)+(n-2)+…+1 = n(n+1)/2 or O(n 2 ). For pathological cases where both recursive calls are made in PLACE (i.e., if both alternatives are viable) each time PLACE is called, the complexity is O(2 n ) where n is |X|. An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

Notes for Brute Force Approaches BruteForcePDP M ← maximum element in L for every set of n – 2 integers 0 < x 2 < … x n-1 < M X ← {0, x 2, …, x n-1, M} form ΔX from X if ΔX = L return X output “no solution” AnotherBruteForcePDP M ← maximum element in L for every set of n – 2 integers 0 < x 2 < … x n-1 < M from L X ← {0, x 2, …, x n-1, M} form ΔX from X if ΔX = L return X output “no solution” compare We would like to use the same algorithm to solve both. In order to do this, let’s put the values we will be using to create X into an array called workingArray. For BruteForcePDP we will be dealing with #s 1, 2, 3, 4, …, M-1. Let’s call this set of values allValues. For AnotherBruteForcePDP we will be dealing with #s from L except M and any duplicates. Let’s call this set of values reducedL. So, in order to use the same code for both BruteForcePDP and AnotherBruteForcePDP, all we need to do is put allValues or reducedL into workingArray, respectively, then use workingArray: GenericBruteForcePDP if algorithm = BruteForcePDP workingArray ← allValues else workingArray ← reducedL M ← maximum element in L for every set of n – 2 integers in workingArray X ← { 0, x 2, …, x n-1, M } form ΔX from X if ΔX = L return X output “no solution”

Notes for Brute Force Approaches Our next problem is to generate every possible set of n-2 integers from values in workingArray. One way is to envision this as a tree search problem where the leaf nodes represent the possible arrangements of the values in workingArray. For example, consider L = {2, 2, 5, 7, 9, 10} and n = 4. In this case, workingArray contains the values {2, 5, 7, 9}. A simple (but naïve) tree would look like this:But this tree eliminates redundancy : Of course, we really just want the leaf nodes. To produce them, we can simply perform a depth first search, adding to the set of values at each of the n-2 positions (from left to right) as we go deeper into the tree until all positions have been filled. Our choice of value at any time will be made from the unused values in workingArray. When we use a value, we must remove it from workingArray so that it cannot be used at the next level. However, in order for this to work, we must restore workingArray to its previous state when we backtrack to a node. The easiest way to do this is via recursion, in which case we only need to make a copy of workingArray before the next recursive call, remove the appropriate value from the copy, and then pass the copy. That way, when returning from the recursive call, workingArray will already be as it was before the recursion. (continued) Combinations: C(x,y) = x!/[(x-y)!y!] C(4,2) = 4!/[(4-2)!2!] = 6 Permutations: P(x,y) = x!/(x-y)! P(4,2) = 4!/(4-2)! = 12

Notes for Brute Force Approaches (continued) Let setOfIntegers be the collection of n-2 integers that we must generate (i.e., a candidate map). Recall that this will begin with no values. Here is pseudocode for a depth-first traversal of the search tree: depthFirst (setOfIntegers, workingArray) if setOfIntegers is complete (i.e., has no unfilled positions) if ΔX = L show setOfIntegers return for each position in workingArray V ← value at current position in workingArray (i.e., next unused value) workingArrayCopy ← workingArray remove V from workingArrayCopy setOfIntegersCopy ← setOfIntegers next available position in setOfIntegersCopy ← V depthFirst (setOfIntegersCopy, workingArrayCopy) return Note: Before each recursive call we are reducing the contents of workingArray and increasing the number of values in setOfIntegers (i.e., the candidate map). We make copies of these arrays and pass them so that upon return both workingArray and setOfIntegers are as they were before being modified for the recursive call.

Restriction Mapping An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info.

Similar presentations

Presentation on theme: "Restriction Mapping An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Restriction Mapping An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info.

Similar presentations

Presentation on theme: "Restriction Mapping An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info."— Presentation transcript:

Similar presentations

About project

Feedback