DNAQL a data model and query language for databases in DNA Jan Van den Bussche joint work with Joris Gillis, Robert Brijder Hasselt University, Belgium.

Slides:



Advertisements
Similar presentations
Lecture 24 MAS 714 Hartmut Klauck
Advertisements

Ashish Gupta Ashish Gupta Unremarkable Problem, Remarkable Technique Operations in a DNA Computer DNA : A Unique Data Structure ! Pros.
Lecture 24 Coping with NPC and Unsolvable problems. When a problem is unsolvable, that's generally very bad news: it means there is no general algorithm.
DNA Computing COMP308 I believe things like DNA computing will eventually lead the way to a “molecular revolution,” which ultimately will have a very dramatic.
Molecular Computation : RNA solutions to chess problems Dirk Faulhammer, Anthony R. Cukras, Richard J. Lipton, and Laura F. Landweber PNAS 2000; vol. 97:
1 DNA Computing: Concept and Design Ruoya Wang April 21, 2008 MATH 8803 Final presentation.
1 Introduction to Computability Theory Lecture12: Decidable Languages Prof. Amos Israeli.
1 Introduction to Computability Theory Lecture3: Regular Expressions Prof. Amos Israeli.
1 Introduction to Computability Theory Lecture11: Variants of Turing Machines Prof. Amos Israeli.
Montek Singh COMP Nov 15,  Two different technologies ◦ TODAY: DNA as biochemical computer  DNA molecules encode data  enzymes, probes.
Codes for Deletion and Insertion Channels with Segmented Errors Zhenming Liu Michael Mitzenmacher Harvard University, School of Engineering and Applied.
Mining Tree-Query Associations in a Graph Bart Goethals University of Antwerp, Belgium Eveline Hoekx Jan Van den Bussche Hasselt University, Belgium.
Presented By:- Anil Kumar MNW-882-2K11
Query Processing Presented by Aung S. Win.
DNA Computing on Surfaces
AP Biology: Chapter 14 DNA Technologies
Joost N. Kok Artificial Intelligence: from Computer Science to Molecular Informatics.
DNA Sequencing (Lecture for CS498-CXZ Algorithms in Bioinformatics) Sept. 8, 2005 ChengXiang Zhai Department of Computer Science University of Illinois,
DNA Computing on a Chip Mitsunori Ogihara and Animesh Ray Nature, vol. 403, pp Cho, Dong-Yeon.
Strand Design for Biomolecular Computation
Beyond Silicon: Tackling the Unsolvable with DNA.
Manipulating DNA.
Theory of Computation, Feodor F. Dragan, Kent State University 1 Regular expressions: definition An algebraic equivalent to finite automata. We can build.
Graph Theory And Bioinformatics Jason Wengert. Outline Introduction to Graphs Eulerian Paths & Hamiltonian Cycles Interval Graph & Shape of Genes Sequencing.
Algorithms and Running Time Algorithm: Well defined and finite sequence of steps to solve a well defined problem. Eg.,, Sequence of steps to multiply two.
DNA Computing.  Elements of complementary nature abound in nature. Complementary parts (in nature) can “self-assemble”. A universal principle?  This.
A new DNA computing model for the NAND gate based on induced hairpin formation 생물정보학 협동과정 소 영 제.
An Introduction to Genetic Algorithms Lecture 2 November, 2010 Ivan Garibay
Week 10Complexity of Algorithms1 Hard Computational Problems Some computational problems are hard Despite a numerous attempts we do not know any efficient.
Combinatorial Optimization Problems in Computational Biology Ion Mandoiu CSE Department.
Fast parallel molecular solution to the Hitting-set problem Speaker Nung-Yue Shi.
MA/CSSE 474 Theory of Computation Decision Problems DFSMs.
PHARMACOBIOTECHNOLOGY.  Recombinant DNA (rDNA) is constructed outside the living cell using enzymes called “restriction enzymes” to cut DNA at specific.
What is DNA Computing? Shin, Soo-Yong Artificial Intelligence Lab.
Regular Expressions Chapter 6 1. Regular Languages Regular Language Regular Expression Finite State Machine L Accepts 2.
CHAPTER 1 Regular Languages
The Inference via DNA Computing Piort Wasiewicz et al. Proceedings of the 1999 Congress on Evolutionary Computation, vol. 2, pp Cho, Dong-Yeon.
DNA computing on a chip Mitsunori Ogihara and Animesh Ray Nature, 2000 발표자 : 임예니.
1 Biological Computing – DNA solution Presented by Wooyoung Kim 4/8/09 CSc 8530 Parallel Algorithms, Spring 2009 Dr. Sushil K. Prasad.
Amplification of a DNA fragment by Polymerase Chain Reaction (PCR) Ms. Nadia Amara.
Regular Expressions Chapter 6. Regular Languages Regular Language Regular Expression Finite State Machine L Accepts.
UNIT - I Formal Language and Regular Expressions: Languages Definition regular expressions Regular sets identity rules. Finite Automata: DFA NFA NFA with.
A comparative approach for gene network inference using time-series gene expression data Guillaume Bourque* and David Sankoff *Centre de Recherches Mathématiques,
Solution of Satisfiability Problem on a Gel-Based DNA computer Ji Yoon Park Dept. of Biochem Hanyang University.
Towards Autonomous Molecular Computers Towards Autonomous Molecular Computers Masami Hagiya, Proceedings of GP, Nakjung Choi
LECTURE 4 Logic Design. LOGIC DESIGN We already know that the language of the machine is binary – that is, sequences of 1’s and 0’s. But why is this?
An Introduction to Genetic Algorithms Lecture 2 November, 2010 Ivan Garibay
The Church-Turing Thesis Chapter 18. Are We Done? FSM  PDA  Turing machine Is this the end of the line? There are still problems we cannot solve: ●
Some simple basics. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field.
The NP class. NP-completeness Lecture2. The NP-class The NP class is a class that contains all the problems that can be decided by a Non-Deterministic.
Cloning of PCR Fragment into T- Vector Jung-Min Choi Department of Biochemistry, College of Life Science and Biotechnology, Mouse Genetics and Laboratory.
Universal Turing Machine
On the Relation Between Simulation-based and SAT-based Diagnosis CMPE 58Q Giray Kömürcü Boğaziçi University.
Lecture 14: Theory of Automata:2014 Finite Automata with Output.
Chapter 1 INTRODUCTION TO THE THEORY OF COMPUTATION.
From the double helix to the genome
DNA Sequencing (Lecture for CS498-CXZ Algorithms in Bioinformatics)
Ultra Scale High Density Hybrid DNA Memory Mohamad Al-Sheikhly, William Bentley, Aris Christou, Joseph Silverman Department of Materials Science.
IIT Kharagpur & Kingston Uni
Modeling Arithmetic, Computation, and Languages
CSE 105 theory of computation
Solution of Satisfiability Problem on a Gel-Based DNA computer
DNA Computing and Molecular Programming
JSPS Project on Molecular Computing (presentation by Masami Hagiya)
DNA computing on surfaces
Molecular Computation by DNA Hairpin Formation
Error Correction Coding
DNA Computing Herman G. Meyer III Sept. 28, 2004.
Presentation transcript:

DNAQL a data model and query language for databases in DNA Jan Van den Bussche joint work with Joris Gillis, Robert Brijder Hasselt University, Belgium

Natural Computing 1.Conventional computing, inspired by nature – Evolutionary systems, algorithms, programs – Parallel systems, swarm computing 2.Physics as a computation model – Analog computers – Quantum computing 3.“Wet” computing: use hardware from nature ☞ DNA computing – Reprogrammed bacteria & viruses

DNA Computing: What it is NOT Solving NP-complete problems – First DNA computing experiment solved a small instance of the Hamiltonian Path problem – [Adleman, Science 1994] Genetic engineering – DNA computing works with dead material – Synthetic DNA Bioinformatics – Conventional databases, algorithms to store, analyse genetic information

DNA Computing: What it IS Use synthetic DNA molecules as data carrier Programmed nanotechnology Computation on the DNA carried out by: – Biotechnology laboratory protocols – Enzymes – DNA itself: self-assembly Computation goes on in: – In vitro: Test tube (watery solution) – DNA chips, diamond surfaces – In vivo (smart medicine)

DNA Single-stranded DNA molecule: = string over the 4-letter alphabet {A,C,G,T} – the string is called “strand” – the positions are called “bases”

Image credit: Madeleine Price Ball

DNA synthesis and sequencing Synthesis: – Input: string over {A,C,G,T} – Output: actual DNA single-stranded molecule Currently limited to length ~ 100 – but strands can be concatenated Sequencing: – Input: DNA single-stranded molecule – Output: string over {A,C,G,T} Quite reliable, redundancy

Data storage in DNA Enormous capacity – Theoretical capacity ~ 455 EB per gram – ~ 2.2 PB per gram with reliable encode & decode – [Goldman et al., Nature 2013] Very robust Long term – 1000nds of years – Can be easily copied Archiving

Databases in DNA? We need much more than mere archival write/read Efficient and flexible access Data model Query language ☞ DNA computing

Talk Outline 1.DNA hybridization 2.Representing tuples, relations in DNA 3.Doing relational algebra by DNA computing 4.DNAQL, the language 5.DNA complexes: the DNAQL data model 6.Typechecking 7.Expressive power of DNAQL

Base pairing Watson-Crick complementarity – A and T are complementary – C and G are complementary Complementary bases naturally form bonds “Base pairing”

Complementing strings Complement of a string: 1.Reverse the string; 2.Complement each base. E.g.

Hybridization When two single strands containing complementary substrings meet, they hybridize into a double-stranded complex A A A A CT G A G T T C A A Very stable at normal temperatures

Denaturation Undo base pairing by increasing temperature A A A A CT G A G T T C A A “Melting temperature” is higher for longer consecutive base pairings

Talk Outline 1.DNA hybridization 2.Representing tuples, relations in DNA 3.Doing relational algebra by DNA computing 4.DNAQL, the language 5.DNA complexes: the DNAQL data model 6.Typechecking 7.Expressive power of DNAQL

Data representation: alphabets 4-letter alphabet is a bit limiting Can use larger alphabet – Encode each letter by a DNA strand – DNA codewords Alphabet Λ of value bits – Atomic data values: strings of value bits Alphabet Ω of attributes Alphabet Θ of tags: # 1, # 2, …, # 9 – Used for punctuation, marking, splitting

Tuples as DNA strings Combined alphabet Σ = Λ ∪ Ω ∪ Θ Tuple t over relation schema R = A…B t = # 2 A# 3 t(A)# 4 …# 2 B# 3 t(B)# 4 Relation r over R: set of DNA strings Content of a test tube

Talk Outline 1.DNA hybridization 2.Representing tuples, relations in DNA 3.Doing relational algebra by DNA computing 4.DNAQL, the language 5.DNA complexes: the DNAQL data model 6.Typechecking 7.Expressive power of DNAQL

Selection Value bit a We want to retrieve all tuples from test tube r that contain a 1.Add complementary strand ā to test tube (in surplus quantities) 2.Will stick to requested tuples 3.Retrieve tuples bound to a sticker

Probing, Flush, Cleanup Immobilize the stickers so they can be retrieved – Tiny magnetic beads – Surface (DNA chip) Once a tuple sticks, tuple is immobilized too 1.Insert probes 2.Hybridize 3.Flush: wash away tuples that did not stick 4.Cleanup: recover remaining tuples

āāāā a a a a DNA chip

āāāā a a a a Cleanup

Selection expressed in DNAQL

Cartesian Product Concatenation: r x s = { t 1 t 2 : t 1 in r & t 2 in s } Assume r over AB and s over CD t 1 = # 2 A# 3 t 1 (A)# 4 # 2 B# 3 t 1 (B)# 4 t 2 = # 2 C# 3 t 2 (C)# 4 # 2 D# 3 t 2 (D)# 4 Use a length-two sticker:

Ligate Sticker will just hold tuples together temporarily (until denaturation) Apply ligase (an enzyme) to truly concatenate Single strand sticker Concatenation Before ligation After ligation sticker

Cartesian product in DNAQL? abbreviated

Nonterminating hybridization Each concatenation still ends with # 4, begins with # 2 Allows chain reaction

Solution (to avoid nontermination) Add # 5 at end of each tuple of r Add # 1 at beginning of each tuple of s let inlet in t1 t2 #5 #1

Getting rid of the # 5 # 1 Step 1: Blocking (Polymerase) Step 2: Bind to probe Step 3: Add sticker & Ligate Step 4: Splitting (Restriction enzymes)

Projection, renaming Using similar methods Reshuffling order of attributes – Ingenious procedure – Joris Gillis

Set difference Subtractive hybridization Most sensitive and error-prone operation

DNAQL operations so far Test-tube variables Probes Length-two stickers Union Difference Hybridize Ligate Flush Cleanup Split Block Block-from For-loop Block-except

Equality selection Select[A=B](r) = { t in r : t(A) = t(B) } We can already do: Select[θ a ](r) = { t in r : t contains ‘a’ } Variant: Select[A = i B](r) = { t in r : i-th bit of t(A) is ‘a’ } Add to DNAQL: – Block-except[i] operator, with i a counter variable – For-loop construct to iterate over i

For-loop DNAQL program for Select[A=B](r): (assumes only two value bits 0 and 1)

DNAQL

Talk Outline 1.DNA hybridization 2.Representing tuples, relations in DNA 3.Doing relational algebra by DNA computing 4.DNAQL, the language 5.DNA complexes: the DNAQL data model 6.Typechecking 7.Expressive power of DNAQL

Complexes Relation in DNA: set of DNA strings During execution of DNAQL program, more complex structures are formed Complexes formalized as directed graph Data model for DNAQL

DNA complex as a graph structure

Types If complexes are the “instances” in our data model, what are the “schemes”? Approach: – All data values are carried by strings of value bits – All other nodes are for structuring ➔ Type of a complex: – Replace all value strings by wildcard ‘*’

Type of a relation relationtype # 2 A# # 4 # 2 B# # 4 # 2 A# # 4 # 2 B# # 4 # 2 A# # 4 # 2 B# # 4 # 2 A# 3 *# 4 # 2 B# 3 *# 4 # 2 A# # 4 # 2 B# # 4 # 2 A# # 4 # 2 B# # 4

Talk Outline 1.DNA hybridization 2.Representing tuples, relations in DNA 3.Doing relational algebra by DNA computing 4.DNAQL, the language 5.DNA complexes: the DNAQL data model 6.Typechecking 7.Expressive power of DNAQL

Well-definedness of DNAQL operations Implementability by biotechnological operations imposes some preconditions Always well-defined: – Union – Ligate – Split – Cleanup

Well-definedness conditions Difference: – single strands only, all same length Blocking: – complex must be hybridized Hybridize: – termination – can be statically characterized in terms of absence of certain alternating cycles

Typechecking and inference Check well-definedness condition for operation statically, based on given input types Infer type for output, so that next operation can be typechecked

Type inference example e(x) = hybridize(x ∪ immob(ā)) If x : S then e(x) : T #3#3 * #4#4 #3#3 * #4#4 #3#3 * #4#4 type Stype T

Typechecking Cleanup Input: any complex (always well-defined) Output: denature, remove all stickers, probes, keep only longest strands Gel electrophoresis

Typechecking Cleanup Consider type S = A*A*A ∪ AA*AA “Dimension” of a complex: – Number of value bits used for data values – Like word length in a digital computer Suppose dimension = d – Strands of type A*A*A have length 2d+3 – Strands of type AA*AA length 4+d – 4+d < 2d+3 for all d ➔ If x : S then Cleanup(x) : A*A*A

Type inference algorithm Given input types for program: – Decides if “well-typed” – If so, computes result type Soundness: Well-typed programs always succeed on inputs of given type – Output guaranteed to be of computed result type Maximality: Converse to soundness – Only for individual operations Tightness

Talk Outline 1.DNA hybridization 2.Representing tuples, relations in DNA 3.Doing relational algebra by DNA computing 4.DNAQL, the language 5.DNA complexes: the DNAQL data model 6.Typechecking 7.Expressive power of DNAQL

Expressive power “String relational algebra” – For-loop over value bits – Value-bit selection Select[A = i a] Theorem: Every string-relational algebra expression, over given database schema, can be computed by a well-typed DNAQL program. [Yamamoto et al., DNA12, 2006] [Yeh et al., Simulation Practice & Theory, 2011]

Converse direction Theorem: Every well-typed DNAQL program, for given input types, can be simulated in the string-relational algebra – Need to allow finite number of cases on dimension – E.g., cases d 7

DNAQL to relational algebra If x : S then Hybridize(x) : T Store values in components of type S 1 in a relation R 1, similar for S 2 Then pairs of values in components of Hybridize(x) can be computed R 1 x R 2 Hybridization = Cartesian product! #4#4 * #2#2 * * #4#4 #2#2 * S1S1 S2S2 Type S Type T

Conclusion We have tried to find the equivalent of relational data model, relational algebra in the world of DNA computing Made a language by taking “minimal” set of DNA computing operations needed to simulate relational algebra Made a data model by abstracting the structures arising during the simulation Resulting DNAQL data model can stand on itself Satisfying equivalence with string-relational algebra

Outlook Experimental validation Simulation Modeling and analysis of errors ☞ Self-assembly models of DNA computing Further exploration of database aspects of Natural Computing Want to learn more? See paper in proceedings for textbooks, conferences, journals, sources, references