Model for Evaluation of DNA Synthesis

Model for Evaluation of DNA Synthesis
Created by: Ori Kaplan Gilad Myerson Supervised by: Gregory Linshiz, Weizmann institute Prof. Udi Shapiro, Weizmann institute

Synthesizing DNA Currently, there are few successful ways of synthesizing DNA. Most common - Assembly PCR. Methods are costly and take much time (±3 weeks from order to delivery of a DNA strand). Mer-Made6 ABI 3900

New Approach Prof. Udi Shapiro / Gregory Linshiz:
New confidential method of in-vitro DNA molecule synthesis. Goal – synthesize DNA quicker, easier and cheaper. Part of this method, involves elongation of oligonucleotides. Elongation success rate (until now) ≈ 80-90%.

Top Secret New Approach Elongation of DNA includes…..
Since the elongation of oligonucleotides in-vitro is done on the pattern of synthetic DNA strands, we will give a brief explanation of synthetic oligonucleotide synthesis. Oligonucleotide synthesis is a remarkably simple process that has far reaching implications. Oligonucleotide synthesis is extremely useful in laboratory procedures. It is used to make primers crucial in methods such as PCR replication. Making a custom oligonucleotide is additionally useful because they will only bind to the region of DNA that is complementary to your custom oligonucleotide sequence. This allows specific segments of DNA to be amplified. In addition, custom oligonucleotide synthesis allows other sequences, such as restriction sites, to be added on to the desired oligonucleotide. Custom oligonucleotides are generally 50 bases in length which can limit how many additional sequences can be added on to the desired primer sequence. Oligonucleotides are synthesized by using DNA Phosphoramidite Monomer Bases as building blocks. The monomer bases active sites are all chemically blocked in such a way that they can be unblocked at will by use of unblocking solutions. The oligonucleotide synthesis involves 4 stages: Stage 1: De blockingThe first base, which is attached to the solid support, is at first inactive because all the active sites have been blockedor protected. To add the next base, the DMT group protecting the 5'-hydroxyl group must be removed. This is done by adding a base. The 5’-hydroxyl group is now the only reactive group on the base monomer. This ensures that the addition of the next base will only bind to that site. Stage 2: Base condensationThe next base monomer cannot be added until it has been activated. This is achieved by adding tetrazole to the base. The active 5’-hydroxyl group of the preceding base and the newly activated phosphorus bind to loosely oin the two bases together. Stage 3: CappingThe unbound, active 5’-hydroxyl group is capped with a protective group which subsequently prohibits that strand from growing again. This is done by adding acetic anhydride and N-methylimidazole to the reaction column.Stage 4: Oxidation In order to stabilize the phosphate linkage, a solution of dilute iodine in water, pyridine, and tetrahydrofuran is added to the reaction column, oxidizing and strengthening it.

Sequencing After the DNA synthesis procedure ,sequencing the new molecules will indicate if the right molecule was synthesized. A chromatogram of DNA synthesis:

Chromatogram What does a chromatogram portray? “Clean” chromatogram –
all molecules are identical “Noisy” chromatogram – inexplicit All A Some A Some T

The problem Lets assume this is the sequencing result:
I. Is the experiment successful??? II. What needs to be changed in order to improve method? pH, temp, polymerase, dNTP’s, concentrations… Noise

The problem contd.. Which result is better…?

Conventional Analysis
CLONE TO UNDERSTAND THE SEQUENCING Isolation cloning: Isolate single molecules  read exact sequence. Cloning several oligos gives an insight to the methods' degree of success. Theoretically, clone all in order to see if experiment was successful.

Weizmann’s request Cloning – very long, hard and expensive.
Please try figure out a way to asses the degree of success “visually” using the chromatogram…

אם נחייך יחשבו שאנחנו מבינים???
OK… ננסה בכל מקרה אם נחייך יחשבו שאנחנו מבינים???

OK… יש לי יש לי יש לי...

A Solution ??? Lets treat the graph like LEGO© and see what we can do with the pieces…

Perfect Sequencing A C T G 10 molecules
C A C T G A C A C G C T T A C T G C C G

Mutations occur “Dirty” chromatogram substitution deletion deletion
insertion deletion “Dirty” chromatogram

Two ways to try understand graph
Sequence every single oligonucleotide (isolation cloning) Impossible Sample sequencing and assessment of result Statistically inaccurate

Another Option Mathematically “Build” oligonucleotide molecules in such a way that the accumulated graph of those molecules will be identical to the chromatogram

Graph  Table If I had nucleotide long molecules – how many bases of each kind do I have in each “place”? 1 6 3 7 2 8 10 A 9 G 4 C T

Table  Molecules 1 6 3 7 2 8 10 A 9 G 4 C T Random procedure

Molecules  Graph

New Problem How do we choose the 100 molecules that build graph?
Linear – too many options to check O(4n)! Choose 100 from 4n. If oligo is 100 nucleotides long  n = 100. Choose 100 molecules from 1.6*1060 n k 1.6*1060 100 = ≈

OK… ננסה... תחייך – אולי יתנו לנו 100

OK… יש לי יש לי יש לי...

The problem Reduced molecules: 4n  8n
Don’t choose from all possibilities, assume that each molecule has only one mutation – Edit Distance 1 Reduced molecules: 4n  8n Select 100 molecules from 800 (instead of 1.6*1060) OR

Still a problem How do we choose 100 molecules from 800? Linear: n!
k!(n-k)! n k 1.6*1060 100 = = 3*10129 possibilities

Genetic Algorithm

Genetic algorithm Define initial mutation rates:
deletions , insertions (?), substitutions (?) Normalize graph and convert graph to matrix (4xn). Build a molecule bank of “Edit Distance 1”.

Population There is a population of 100 – each entity in population represents a single result. Each result consists of 100 molecules (from the ED1 bank) that build up a graph. The population is initialized using the mutation rate. 100 One result 100

Evaluation function F(e) ∑ F(e) = ∑|Mij – Rij|
The current Evaluation function is: F(e) = ∑|Mij – Rij| In the future the function will take amount of substitutions into consideration. F(e) ∑ experiment result

Generation Generation Policy (current):
Replication – Always replicate best 10. Crossover – Biased choice of entities for crossover. Mutations – i: mutate best 10. ii: randomly mutate the whole pop. Local Minimum Policy: 20 generations without improvement – shake pop.

File Handling Sequence data is initially in *.ab1 files
In order to utilize data: Retranslate *ab1 file – Sequencing Analysis Convert *.ab1  *.txt – Bioedit Manage *.txt – Excel (also calculate del rate) Genetic Algorithm

No mutations - before 1

No mutations - after 1

10*del at 1, 10*del at 9 1

15 scattered subs - before

15 scattered subs 1

Setbacks ED1 – Result will never be 100% correct.
Genetic Algorithm setbacks: heuristic, different final results, local min, evaluation function… No indication if results are correct. Algorithm deals with successful experiments. Data input – noise interpretation, normalized data.

Advantages New method of sequencing analysis.
Potentially save many hours of isolation cloning. Mathematically – result is correct. Development potential for different areas of research.

Personal View Thrown into deep water  swam.
Idea will (hopefully) be practical and useful. Learned a great deal – new programs, languages, methods. Mathematical analysis of chromatogram sequencing – ever done before???

Thank you

Model for Evaluation of DNA Synthesis

Similar presentations

Presentation on theme: "Model for Evaluation of DNA Synthesis"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Model for Evaluation of DNA Synthesis

Similar presentations

Presentation on theme: "Model for Evaluation of DNA Synthesis"— Presentation transcript:

Similar presentations

About project

Feedback