Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations Shuhei Denzumi1, Ryo Yoshinaka2,

Slides:



Advertisements
Similar presentations
Model Checking Lecture 4. Outline 1 Specifications: logic vs. automata, linear vs. branching, safety vs. liveness 2 Graph algorithms for model checking.
Advertisements

Recognising Languages We will tackle the problem of defining languages by considering how we could recognise them. Problem: Is there a method of recognising.
Black Box Checking Book: Chapter 9 Model Checking Finite state description of a system B. LTL formula. Translate into an automaton P. Check whether L(B)
Representing Boolean Functions for Symbolic Model Checking Supratik Chakraborty IIT Bombay.
CMPS 3223 Theory of Computation
Lecture 24 MAS 714 Hartmut Klauck
Theory Of Automata By Dr. MM Alam
1 1 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 3 School of Innovation, Design and Engineering Mälardalen University 2012.
1 Don´t Care Minimization of *BMDs: Complexity and Algorithms Christoph Scholl Marc Herbstritt Bernd Becker Institute of Computer Science Albert-Ludwigs-University.
1 Huffman Codes. 2 Introduction Huffman codes are a very effective technique for compressing data; savings of 20% to 90% are typical, depending on the.
Spring 07, Feb 13 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Binary Decision Diagrams Vishwani D. Agrawal James.
DATE-2002TED1 Taylor Expansion Diagrams: A Compact Canonical Representation for Symbolic Verification M. Ciesielski, P. Kalla, Z. Zeng B. Rouzeyre Electrical.
ECE Synthesis & Verification - Lecture 18 1 ECE 697B (667) Spring 2006 ECE 697B (667) Spring 2006 Synthesis and Verification of Digital Systems Word-level.
A New Approach to Structural Analysis and Transformation of Networks Alan Mishchenko November 29, 1999.
Taylor Expansion Diagrams (TED): Verification EC667: Synthesis and Verification of Digital Systems Spring 2011 Presented by: Sudhan.
ECE Synthesis & Verification - Lecture 10 1 ECE 697B (667) Spring 2006 ECE 697B (667) Spring 2006 Synthesis and Verification of Digital Systems Binary.
 2001 CiesielskiBDD Tutorial1 Decision Diagrams Maciej Ciesielski Electrical & Computer Engineering University of Massachusetts, Amherst, USA
Topics Automata Theory Grammars and Languages Complexities
ECE 667 Synthesis & Verification - BDD 1 ECE 667 ECE 667 Synthesis and Verification of Digital Systems Binary Decision Diagrams (BDD)
ENGG3190 Logic Synthesis “Binary Decision Diagrams” BDDs Winter 2014 S. Areibi School of Engineering University of Guelph.
 2000 M. CiesielskiPTL Synthesis1 Synthesis for Pass Transistor Logic Maciej Ciesielski Dept. of Electrical & Computer Engineering University of Massachusetts,
1 HEXA : Compact Data Structures for Faster Packet Processing Department of Computer Science and Information Engineering National Cheng Kung University,
Binary Decision Diagrams for First Order Predicate Logic By: Jan Friso Groote Afsaneh Shirazi.
Digitaalsüsteemide verifitseerimise kursus1 Formal verification: BDD BDDs applied in equivalence checking.
Finite-State Machines with No Output
Identifying Reversible Functions From an ROBDD Adam MacDonald.
Binary Decision Diagrams (BDDs)
Minimization of Symbolic Automata Presented By: Loris D’Antoni Joint work with: Margus Veanes 01/24/14, POPL14.
Theory of Computation, Feodor F. Dragan, Kent State University 1 Regular expressions: definition An algebraic equivalent to finite automata. We can build.
Introduction to CS Theory Lecture 3 – Regular Languages Piotr Faliszewski
Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.
CS 267: Automated Verification Lecture 6: Binary Decision Diagrams Instructor: Tevfik Bultan.
MINATO ZDD Project Efficient Enumeration of the Directed Binary Perfect Phylogenies from Incomplete Data Toshiki Saitoh (ERATO) Joint work with Masashi.
Overview of Previous Lesson(s) Over View  An NFA accepts a string if the symbols of the string specify a path from the start to an accepting state.
4b 4b Lexical analysis Finite Automata. Finite Automata (FA) FA also called Finite State Machine (FSM) –Abstract model of a computing entity. –Decides.
Daniel Kroening and Ofer Strichman 1 Decision Procedures An Algorithmic Point of View BDDs.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2010.
Regular Expressions and Languages A regular expression is a notation to represent languages, i.e. a set of strings, where the set is either finite or contains.
CHAPTER 1 Regular Languages
Copyright © Curt Hill Finite State Automata Again This Time No Output.
Daniel Kroening and Ofer Strichman 1 Decision Procedures An Algorithmic Point of View BDDs.
CMSC 330: Organization of Programming Languages Finite Automata NFAs  DFAs.
Overview of Previous Lesson(s) Over View  Symbol tables are data structures that are used by compilers to hold information about source-program constructs.
Lecture # 12. Nondeterministic Finite Automaton (NFA) Definition: An NFA is a TG with a unique start state and a property of having single letter as label.
Binary decision diagrams (BDD’s) Compact representation of a logic function ROBDD’s (reduced ordered BDD’s) are a canonical representation: equivalence.
2017/4/26 Rethinking Packet Classification for Global Network View of Software-Defined Networking Author: Takeru Inoue, Toru Mano, Kimihiro Mizutani, Shin-ichi.
Strings Basic data type in computational biology A string is an ordered succession of characters or symbols from a finite set called an alphabet Sequence.
Finite State Machines 1.Finite state machines with output 2.Finite state machines with no output 3.DFA 4.NDFA.
UNIT - I Formal Language and Regular Expressions: Languages Definition regular expressions Regular sets identity rules. Finite Automata: DFA NFA NFA with.
BDDs1 Binary Tree Representation The recursive Shannon expansion corresponds to a binary tree Example: Each path from the root to a leaf corresponds to.
Transparency No. 4-1 Formal Language and Automata Theory Chapter 4 Patterns, Regular Expressions and Finite Automata (include lecture 7,8,9) Transparency.
Overview of Previous Lesson(s) Over View  A token is a pair consisting of a token name and an optional attribute value.  A pattern is a description.
Chapter 5 Finite Automata Finite State Automata n Capable of recognizing numerous symbol patterns, the class of regular languages n Suitable for.
CSCI 4325 / 6339 Theory of Computation Zhixiang Chen.
1 Chapter 3 Regular Languages.  2 3.1: Regular Expressions (1)   Regular Expression (RE):   E is a regular expression over  if E is one of:
Binary Decision Diagrams Prof. Shobha Vasudevan ECE, UIUC ECE 462.
What do we know? DFA = NFA =  -NFA We have seen algorithms to transform DFA to NFA (trival) NFA to  NFA (trivial) NFA to DFA (subset construction)
1 Section 11.2 Finite Automata Can a machine(i.e., algorithm) recognize a regular language? Yes! Deterministic Finite Automata A deterministic finite automaton.
Chapter 2 Finite Automata
Chapter 7 PUSHDOWN AUTOMATA.
Jaya Krishna, M.Tech, Assistant Professor
Chain Reduction for Binary and Zero-Suppressed Decision Diagrams
REGULAR LANGUAGES AND REGULAR GRAMMARS
ECE 667 Synthesis and Verification of Digital Systems
Binary Decision Diagrams
Minimal DFA Among the many DFAs accepting the same regular language L, there is exactly one (up to renaming of states) which has the smallest possible.
Binary Decision Diagrams
4b Lexical analysis Finite Automata
Instructor: Aaron Roth
Verifying Programs with BDDs Sept. 22, 2006
Presentation transcript:

Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations Shuhei Denzumi1, Ryo Yoshinaka2, 1, Shin-ichi Minato1,2, and Hiroki Arimura1 1) Hokkaido University 2) JST ERATO Minato Discrete Structure Manipulation System Project My name is Shuhei Denzumi.

Background Researches on string processing become active. Data mining Massive online data: The internet and sensing networks. String matching and string mining problems. Data mining Input data should be represented in compact form Computation under compressed structure is needed Input Back ground. Researches on string processing become more active recently. The internet and sensing network provide massive online data every day. String matching and string mining are required to handle these data. Then, efficient data structures are needed to solve these problems. We use data structures to represent data compactly and execute computation under compressed form. Data Structure Result Input Compress Operation Input Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations by Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

Manipulatable & Compact Manipulatable Compact data structure Represent data in compressed form Have operations to manipulate data in compacted style Get much attention for recent years Binary Decision Diagram (BDD) LSI area Deterministic Finite Automata (DFA) Natural Language Processing area Data Structure Input Compaction D 1 In many types of data structures, we forcus manipulatable compact datastructure. This data structure not only represent data in compact form, but also have efficient method to compute operation between multiple different structures. For instance of manipulatable compact data structure, Binary Decision Diagram and Deterministic Finite Automata exist. Binary Decision Diagram, BDD, is widely used in LSI designing area. DFA is applied to natural language processing. Operation D 3 D 2 Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations by Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

Sequence Binary Decision Diagram What is Sequence BDD? Sequence Binary Decision Diagram (SeqBDD, SDD). Loekito, Bailey, and Pei (2009) Graph structure Represent finite sets of strings with finite length SDD’s basic properties are unknown Minimization Size complexity Operation time Application Data mining Graph mining Human genome sequencing Text … Sequence Binary Decision Diagram Sequence Binary Decision Diagram is a new manipulatable compact data structure. It is porposed by Leokito, et.al in 2009. The SDD represent finite sets of strings with finite length. The basic properties of SDD are still unknown, because it is new. We think that SDD is applicable to data mining, enumerating and bioinformatics area. Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations by Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

Family of BDDs Compact representation for discrete structure With rich algebraic operations BDD [Bryant 1986] Boolean functions xy ∨ yz ∨ zx ¬xyz ∨ x¬yz ∨ xy¬z ZDD [Minato 1993] Sets of combinations {{a}, {b}, {a, b}} {{a}, {b}, {c}, {a, b, c}} SDD [Loekito, et.al 2009] Sets of strings {a, b, ab, bab, abbab} {abc, acb, bac, bca} The SDD is a new member in the BDD family. Respective members of the family represents and different tyeps of discrete structures. The BDD, the original of the family, manipulate boolean functions. The ZDD, manipulates sets of combinations. Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations by Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

Result Relationship to Acyclic Deterministic Finite Automata (ADFA) Translation from an SDD to an ADFA and vice versa An SDD is never larger than an ADFA An SDD can be |Σ| times smaller than an ADFA Computational complexity of binary set operations Generalize eight set operations Tight analysis on time complexity for binary set operation algorithm Experimental results SDDs can be smaller than ADFAs Binary operation time Our results can be devided into two parts. First, we study the relationship between SDD and ADFA. We proved that SDD can be alphabet size times smaller than ADFA. In second part, we analyzed time complexity of bianry set operations on SDDs. We also conducted some experiments. Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations by Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

Preliminary Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations by Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

Definition Σ: alphabet (totally ordered by ≺) Internal node: , , , , 1/0 - terminal node: / 1/0 - edge: / SDD: directed acyclic graph Internal node S, τ(S) ↦ 〈S.lab, S.1, S.0〉 S.lab: label S.1: 1-child S.0: 0-child Ordering rule N.lab ≺ (N.0).lab a b … z 1 S.0 S.lab S S.1 1 a b c Let sigma be an alphabet with total order. SDD is directed acyclic graph structure. It composed from internal nodes and 0 and 1 terminal node. Internal nodes have exactly two edges, 1-edge and 0-edge. 1-edge are denoted by solid line, and 0-edge is denoted by broken line. SDD internal nodes are repsesented by a tiple tau(S) = <S.lab, S.1, and S>0> S.Lab is a letter labelling the node. S.1 is the node pointed by 1-edge by S. S.0 is the node pointed by 0-edge by S. We call them as 1-child and 0-child respectively. Internal nodes follow ordering rule. The latter labeling a node must be smaller than the letter labeling the 0-child of the node. a b z … ≺ Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations by Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

Semantics L(N): set of strings N represents L( ) = {ε} L( ) = {} L(N) = N.lab・L(N.1) ∪ L(N.0) A path from the root to the 1-terminal node represent a string. 1 a b {ε} {} {b} {a, b} {bb} {aa, ab, bb} 1 a b {ε} {} {b} {a, b} {bb} {aa, ab, bb} 1 a b {ε} {} {b} {a, b} {bb} {aa, ab, bb} 1 a b {ε} {} {b} {a, b} {bb} {aa, ab, bb} 1 Every SDD node represents a set of strings. The one terminal node represents the singleto of the empty string. 0-terminal node represents the empty set. And the set of strings of an internal node is recursively defined by this formula. The set of strings of an SDD is the one represented by its root node. For convenience, we identify a node with the SDD %That has the node as its root. consisting all the nodes reachable from that node. Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations by Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

Comparison to ADFA a b c a b  accept state  reject state 1 a b c 1 a 1 a b {a, b} {bb} {aa, ab, bb} {aa, ab, bb} a b c {a, b} a b {b} Let’s compare SDD with Acyclic Deterministic Finite Automata. Roughly speaking, one way think of an SDD as a binarized ADFA. Here is a node of an ADFA with three outgoing edges labeled with a, b and c, Which correspond to three nodes of an SDD labeled with a , b and c. In an SDD, One checks whether the input symbol is a, and if so one takes the edge labeled with a, And otherwise checks whether it is b or not, and so on. Such a sequential process is represented by a series of SDD nodes in this way. b c a Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations by Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

Reduction process a・{} ∪ L(N.0) = L (N.0) a Suppression a Suppression N.1 ≠ 0-terminal node In ADFA, removing edges pointing dead state Merging τ(N) = τ(N’) ⇒ N = N’ In ADFA, share all equivalent nodes Theorem Under these rules, SDD is unique and minimal Like ADFA’s have unique canonical form N.0 N.0 N.0 x N N.1 N’ An SDD can be reduced through this reduction process. The first rule is suppression rule. If an internal node points 0-terminal node by 1-edge, then such node is deleted. By definition, the node represents the same set of strings with its 0-child. So, such node is redundant. L(P) = a・{} ∪ L(Q) = L(Q) Second rule is merging rule. If two different node have the same triple, label, 1-child, and 0-child. Such nodes are merged. In ADFA, this rule corresponds to merge all equivalent nodes. After this reduction process, an SDD get canonical form and it is minimal. Just like ADFAs have the canonical form. Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations by Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

Characteristic Almost isomorphic to Acyclic Deterministic Finite Automata BDD/ZDD techniques are applicable Binary form Simple recursive algorithm Easy to implement Rich collections of operations Use of hash tables To share equivalent nodes To share intermediate computations BDD/ZDD ADFA As I have explaned, SDD is almost isomorphic to ADFA. However, there are some differences between SDD and ADFA. As an advantage of SDD, We can apply BDD/ZDD techniques to SDD. Binary form allows simple recursive algorithms for SDD. It makes implementation of SDD be easier. Rich collection of operations inherited from ZDD can apply to SDD. Same as other members of BDD family, SDD use hash tables. SDDs have hash tables to keep SDDs always reduced and to make set operations work efficiently. SDD Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations by Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

Relationship to Acyclic Automata Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations by Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

Size a b c An SDD node correspond to an ADFA edge The description size is proportional to |N|: the number of internal nodes in SDD N |A|: the number of edges in ADFA A b c a a b c We define the size of SDD and ADFA. SDD size is defined by the number of internal nodes. On the other hand, ADFA size is defined by the number of edges. Since the description sizes are proportionally to them each other. Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations by Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

Theorem: Size compare For equivalent an SDD and an ADFA From an ADFA A to an SDD N From an SDD N to an ADFA A SDD |Σ| times can be smaller than ADFA We compare the size of an SDD to that of ADFA representing the same set of strings. We proved two theorems on size. First, SDD is never larger than ADFA. First, an SDD is not larger than the ADFA. This equality can be An ADFA can be |Sigma| times larger than the SDD. In other words, an SDD can be |Sigma| times smaller than the ADFA. There are cases that holds this equality. Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations by Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

0-child sharing e c a b d d c e a c d e b This slide illustrates why an SDD can be smaller than an ADFA. Suppose that an ADFA has two distinct nodes that have an edge labeled by the same letter that goes to the same node, like the edges labeled by c, d and e. In ADFA those edges are not merged, but in an SDD, those three edges turn to be nodes and merged in this way. We have 8 edges on the right figure, while we have only 5 nodes on the left. Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations by Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

Example a c a b b c c b c c b a b c a {anbicj, n = 0, …, 4, i, j = 0, 1} ADFA A SDD S 1 a b c a c a b b c c b c c b a b c This slide shows a typical case that SDD size is smaller than ADFA. Assume that we want to represent 3 sets of strings, {a,b,c}, {b,c}, {c} SDD can represent the sets by just 3 nodes, but ADFA needs 6 edges. In SDD, nodes are labeled, but edges are labeled in ADFA. However, both SDD and ADFA merges nodes. It causes this difference of size. Since their edges are labeled. They cannot be merged, but SDD can merge their nodes. This is the example that SDD size is asimptically |Sigma| times smaller. The number of SDD nodes is 6, and the number of ADFA edges is 14. a |S| = 6 |A| = 14 Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations by Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

Experiment Input: Canterbury corpus BibleAll: bible.txt, BibleBi: all bigrams from bible.txt, Ecoli: E.coli.txt Fac means store all fanctors of input data This experiment shows the ratio of the sizes of SDD and ADFA for different sets of strings. We can see the effect of merging in binary form for real data. From bible.txt, we make two type data. BibleAll is just the set of the sentences of the text. BibleBi is all set of bigrams got from bible.txt. Ecoli is single genome data. The (Fac) means that the SDD and ADFA stores all factors of inputs. For BibleALL, the sizes are almost equivalent. But for other data, SDDs are about 10 to 20 % smaller than ADFAs. Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations by Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

Binary Set Operation Algorithm SDD has inherited the algorithm to compute binary set operations from BDD. We define the algorithm and analyze its time complexity. Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations by Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

Set operation A binary set operation ♢ ∈ {∪, ∩, \, …} Input: two SDDs P, Q Output: SDD R such that L(R) = L(P) ♢ L(Q) P Q Binary Set Operation P ♢ Q We would like to compute the union or intersection of two sets. By P daia Q we denote the unique SDD R that represents the set L(R) = L(P) daiamond L(Q) SDD’s binary set operation create SDD P daiya Q for two inputs SDDs P and Q. Where L(P diaya Q) = L(P) daiya L(Q) We create a new SDD via Binary set operations . Given 2 SDDs P and Q, we compute the SDD R, such that Set operation is a method to create new SDDs by 2 given input SDDs. Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations by Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

Apply algorithm Originally for BDD [Bryant 1986], applied to SDD Based on the definition L(N) = N.lab ・ L(N.1) ∪ L(N.0) In operation, (when P.lab = Q.lab) L(P) ♢ L(Q) = P.lab ・ (L(P.1) ♢ L(Q.1)) ∪ (L(P.0) ♢ L(Q.0)) P Q P♢Q a Q1 Q0 a P1♢Q1 P0♢Q1 a P1 P0 ♢ For binary set operation, we can use apply algorithm. It is proposed for BDD by Bryant. It can be used for SDD too. It compute P daiyamond Q recursively. Based on the formula of the definition of a set of strings for an SDD node, Apply algorithm devide the operation P daiamond Q into two direction. 0-side recursion and 1-side recursion. To compute set operations P daia Q, an algorithm named applie is used. It is proposed for BDD by Bryant in 1986, but it can apply SDD too. This simple recursion is enough to create SDD P daia Q. L(P daia Q) = LP LQ = L(P1) daia L(Q1) = L(P) daia L(Q) The computation of P daia Q is simply decomposed into two subproblems computing P0 daia Q0 and P1 diamond Q1. Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations by Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

Hash table technique N0 x N N1 Key-Value hash tables Uniquetable Key: 〈letter x, SDD node N1, SDD node N0〉 Value: SDD node N with τ(N) = 〈x, N1, N0〉 Opcache Key: 〈operation id ♢, SDD node P, SDD node Q〉 Value: SDD node R which is R = P ♢ Q P ♢ Q P ♢ Q Key (triple) 〈x, N1, N0〉 Value (node) N Uniquetable Key (triple) 〈♢, P, Q〉 Value (node) R Opcache During computing set operations, we use two key-value type hash tables. The first one is called uniquetable. Its key is a triple, <label, 1-c, 0-c>, and its value is the node with the triple. %Key is composed by //. This is used to prevent from creating 2 nodes that have the same label, 1-child and 0-child. The other is called opcache. Its key is also a tripes, <operation id daia, SDD P, SDD Q> %Key is composed by operation name, right argument P, and left argument Q. And value is the node of the result of the binary operation P daia Q. This is used to avoid repeting of computation that has been already done. Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations by Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

 Node create process Any SDD node needed during computation is created via this process Once an internal node is registered in Uniquetable, equivalent nodes will not created anymore. Check the Uniquetable for key 〈x, N1, N0〉. Exist Not exist Uniquetable is used in the process to create nodes. Return it. Create a new node and return it. Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations by Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

Time complexity When P ♢ Q is executed Naïve method This method Every operation use Opcache At most |P| × |Q| different instances of recursive calls invoke (Assume that the access time to hash tables is constant) Naïve method Prepare |P| × |Q| size table This method No useless or redundant node Theorem Worst case O(|P| |Q|) time Example needs Ω(|P| |Q|) time exist Lower and upper bound got Check the Opcache for key 〈♢, P, Q〉. Exist Not exist P ♢ Q is already done, return it. Continue to computation on 0-side and 1-side. We show the time complexity of alpy algorithm. It is easy to understand, that there are at most PQ different instances of recursive calls. So, we can complete the computation in O(PQ) time by storing results of subproblems in opcache. Naïve method will prepare PQ size table to do this. Our method don’t create needless node, or multiple equivalent nodes. Although this doesn’t improve the worst case complexity, Our experiments demonstrate apply algorithm is much more efficient done than the naïve aproach. Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations by Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

Experiment Operation time Prepare two SDDs for all factors of random texts of length n Time to compute operation We prepare two SDDs that store all factors of given random text of length n. These lines indicate the computation time of set operations for these SDDs. We can see that apply algorithm runs in almost linear time practically. Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations by Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

Conclusion Relationship to Acyclic Automata An SDD can be |Σ| times smaller than an ADFA For real data, SDDs are 10~20 % more compact than ADFAs Computational complexity of binary set operations Worst case time complexity is quadratic Tight time bound is analyzed In our experiment, operation time is almost linear Future work Efficient implement of various operations Propose substring index on SDD Factor SDD construction algorithm Conclusion. We strudied rela Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations by Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

Thank you!