1 Ternary Directed Acyclic Word Graphs (TDAWG) Satoru Miyamoto, Shunsuke Inenaga, Masayuki Takeda and Ayumi Shinohara Present by Peera Liewlom (The Last.

Slides:



Advertisements
Similar presentations
CSC 361NFA vs. DFA1. CSC 361NFA vs. DFA2 NFAs vs. DFAs NFAs can be constructed from DFAs using transitions: Called NFA- Suppose M 1 accepts L 1, M 2 accepts.
Advertisements

Lecture 23UofH - COSC Dr. Verma 1 COSC 3340: Introduction to Theory of Computation University of Houston Dr. Verma Lecture 23.
CMPS 3223 Theory of Computation
Lecture 24 MAS 714 Hartmut Klauck
Introduction to Computer Science 2 Lecture 7: Extended binary trees
Complexity and Computability Theory I Lecture #4 Rina Zviel-Girshin Leah Epstein Winter
CSE 105 Theory of Computation Alexander Tsiatas Spring 2012 Theory of Computation Lecture Slides by Alexander Tsiatas is licensed under a Creative Commons.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 2 Mälardalen University 2005.
1 1 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 3 School of Innovation, Design and Engineering Mälardalen University 2012.
Theory Of Automata By Dr. MM Alam
On-line Linear-time Construction of Word Suffix Trees Shunsuke Inenaga (Japan Society for the Promotion of Science & Kyushu University) Masayuki Takeda.
Sparse Compact Directed Acyclic Word Graphs
Factor Oracle, Suffix Oracle 1 Factor Oracle Suffix Oracle.
G ENOME - SCALE D ISK - BASED S UFFIX T REE I NDEXING Phoophakdee and Zaki.
1 Languages. 2 A language is a set of strings String: A sequence of letters Examples: “cat”, “dog”, “house”, … Defined over an alphabet: Languages.
Introduction to Computability Theory
Data Structures & Algorithms Radix Search Richard Newman based on slides by S. Sahni and book by R. Sedgewick.
Courtesy Costas Busch - RPI1 Non Deterministic Automata.
Fall 2006Costas Busch - RPI1 Deterministic Finite Automata And Regular Languages.
Lecture 3 Goals: Formal definition of NFA, acceptance of a string by an NFA, computation tree associated with a string. Algorithm to convert an NFA to.
Review for Test 2 i206 Fall 2010 John Chuang. 2 Topics  Operating System and Memory Hierarchy  Algorithm analysis and Big-O Notation  Data structures.
1 Finite Automata. 2 Finite Automaton Input “Accept” or “Reject” String Finite Automaton Output.
1 Languages and Finite Automata or how to talk to machines...
CSC 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Nondeterminism.
Lecture 3 Goals: Formal definition of NFA, acceptance of a string by an NFA, computation tree associated with a string. Algorithm to convert an NFA to.
Backward Nondeterministic DAWG Matching Algorithm
Efficient algorithms for the scaled indexing problem Biing-Feng Wang, Jyh-Jye Lin, and Shan-Chyun Ku Journal of Algorithms 52 (2004) 82–100 Presenter:
Topics Automata Theory Grammars and Languages Complexities
1.Defs. a)Finite Automaton: A Finite Automaton ( FA ) has finite set of ‘states’ ( Q={q 0, q 1, q 2, ….. ) and its ‘control’ moves from state to state.
Finite Automata Costas Busch - RPI.
Teaching Teaching Discrete Mathematics and Algorithms & Data Structures Online G.MirkowskaPJIIT.
THEORY OF COMPUTATION 08 KLEENE’S THEOREM.
CS-5800 Theory of Computation II PROJECT PRESENTATION By Quincy Campbell & Sandeep Ravikanti.
March 1, 2009 Dr. Muhammed Al-mulhem 1 ICS 482 Natural Language Processing Regular Expression and Finite Automata Muhammed Al-Mulhem March 1, 2009.
Introduction to CS Theory Lecture 3 – Regular Languages Piotr Faliszewski
Fall 2006Costas Busch - RPI1 Deterministic Finite Automaton (DFA) Input Tape “Accept” or “Reject” String Finite Automaton Output.
1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture of a compiler PART II:
1 Assignment #1 is due on Friday. Any questions?.
1 Prove the following languages over Σ={0,1} are regular by giving regular expressions for them: 1. {w contains two or more 0’s} 2. {|w| = 3k for some.
TRANSITION DIAGRAM BASED LEXICAL ANALYZER and FINITE AUTOMATA Class date : 12 August, 2013 Prepared by : Karimgailiu R Panmei Roll no. : 11CS10020 GROUP.
String Matching of Regular Expression
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2010.
Natural Language Processing Lecture 4 : Regular Expressions and Automata.
Keisuke Goto, Hideo Bannai, Shunsuke Inenaga, Masayuki Takeda
Lecture 2 Overview Topics What I forgot from last lecture Proof techniques continued Alphabets, strings, languages Automata June 2, 2015 CSCE 355 Foundations.
Lecture Notes 
Transparency No. 2-1 Formal Language and Automata Theory Homework 2.
Formal Languages Finite Automata Dr.Hamed Alrjoub 1FA1.
Nondeterministic Finite Automata (NFAs). Reminder: Deterministic Finite Automata (DFA) q For every state q in Q and every character  in , one and only.
Discrete Methods in Mathematical Informatics Kunihiko Sadakane The University of Tokyo
Theory of Computation Automata Theory Dr. Ayman Srour.
Costas Busch - LSU1 Deterministic Finite Automata And Regular Languages.
Fall 2004COMP 3351 Finite Automata. Fall 2004COMP 3352 Finite Automaton Input String Output String Finite Automaton.
COMP9319 Web Data Compression and Search
CIS Automata and Formal Languages – Pei Wang
Tries 07/28/16 11:04 Text Compression
CSCI 2670 Introduction to Theory of Computing
Languages.
Deterministic Finite Automata And Regular Languages.
Non Deterministic Automata
Two issues in lexical analysis
Chapter 2 FINITE AUTOMATA.
Deterministic Finite Automata And Regular Languages Prof. Busch - LSU.
Non-Deterministic Finite Automata
Non-Deterministic Finite Automata
Non Deterministic Automata
Chapter 1 Regular Language
Chapter # 5 by Cohen (Cont…)
CSCE 355 Foundations of Computation
Nondeterminism The Chinese University of Hong Kong Fall 2010
Presentation transcript:

1 Ternary Directed Acyclic Word Graphs (TDAWG) Satoru Miyamoto, Shunsuke Inenaga, Masayuki Takeda and Ayumi Shinohara Present by Peera Liewlom (The Last Algorithm Group)

2 CIAA 2003 Eighth International Conference on Implementation and Application of Automata July 16-18, 2003, Santa Barbara, CA, USA Topic / Committee / Community

3 Why did I select this paper ? DAWG start 1985… not so far Continueing development cDAWG, ASDAWG, morphic DAWG, WDAWG, SDAWG, two-tree DAWG, DASG, CSDAWG etc. TST : 1997 – 98, TDAWG : 2003 DAWG : Widely Apply by Bioinformatics, NLP, Graph Theory, String Matching, Automata etc. Speed & Space Trends in Huge Data Management Topic for Algorithm Group Matching the interesting topics in this seminar group

4 Content DFA (use in string matching’s problem) DAWG Ternary Search Tree Paper : TDAWG, Experiment & Result Paper : Conclusion Paper : Discussion

5 DFA Deterministic Finite Automata

6 Formalities Deterministic Finite Accepter (DFA) : set of states : input alphabet : transition function : initial state : set of final states

7 Set of States

8 Input Aplhabet

9 Initial State

10 Set of Final States

11 Transition Function

12

13

14

15 Transition Function

16 Another Example accept

17 = { all substrings with prefix } accept

18 = { all strings without substring }

19 DAWG Directed Acyclic Word Graph

20 DAWG

21 DAWG

22 DAWG

23 cDAWG

24

25 TST Ternary Search Tree

26 TST History Jon L. Bentley and Robert Sedgewick Algorithms for Sorting and Searching Strings, Proceeding. 8th Annual ACM- SIAM Symposium on Discrete Algorithms (SODA), January Ternary Search Trees, Dr. Dobb's Journal, April Dictionary of Algorithms and Data Structures, National Institute of Standard and Technology,

27 BST DST TST

28

29 TDAWG Ternary Directed Acyclic Word Graph

30 Introduction DFA  how to implement the transitions of each state ? (Time & Space efficiency) TST  “implant” BST for transitions –Good Time DAWG  smallest DFA for all suffixes –Good Space TDAWG Proof : TDAWG VS. DAWG

31 Hypothesis / Theorem (1/2) Time = Construct + Search (useable for online) DFA function  = Alphabet (Chinese & Japan ~ 1000 chars) State Table  O(|p|) p = length of pattern Table use very large memory Link List  O(|  | x |p|) search time If  is large … problem for search time

32 Hypothesis / Theorem (2/2) For TDAWG –Use O(|S|) space –Use O(log|  | x |p|) for search time –Use O(|  | x |S| 2 ) construct time (Bentley & Sedwick) –Use O(|  | x |S|) construct time (this paper … apply from Blummer’s online DAWG construction) Comparison : TDAWG VS. DAWG(table & link list) –Space, Search Time, Construction Time

33 TST  TDAWG

34 Online DAWG Construction

35 Online TDAWG Construction

36 Experiment Result

37 Conclusion New data structure … TDAWG Construction time (English text 256) –TDAWG < linklistDAWG < tableDAWG Space Requirment –linklistDAWG < TDAWG ~ 20 % –tableDAWG not compare in same scale Search Time –Short pattern: tableDAWG best, TDAWG < linklistDAWG –Log curve VS. Linear Curve (long pattern?)

38 Discussion & Future Work In Asian Language (characters~1000s) should have better search time than English (character 256) because log(|  |x|p|) Apply to other DAWG… cDAWG, minimumDAWG …etc. More efficiency by AVL tree (AVL-balance) Bioinformatic have 4 character. But, Sliding window with 12 characters = 4 12