Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Ternary Directed Acyclic Word Graphs (TDAWG) Satoru Miyamoto, Shunsuke Inenaga, Masayuki Takeda and Ayumi Shinohara Present by Peera Liewlom (The Last.

Similar presentations


Presentation on theme: "1 Ternary Directed Acyclic Word Graphs (TDAWG) Satoru Miyamoto, Shunsuke Inenaga, Masayuki Takeda and Ayumi Shinohara Present by Peera Liewlom (The Last."— Presentation transcript:

1 1 Ternary Directed Acyclic Word Graphs (TDAWG) Satoru Miyamoto, Shunsuke Inenaga, Masayuki Takeda and Ayumi Shinohara Present by Peera Liewlom (The Last Algorithm Group)

2 2 CIAA 2003 Eighth International Conference on Implementation and Application of Automata July 16-18, 2003, Santa Barbara, CA, USA Topic / Committee / Community

3 3 Why did I select this paper ? DAWG start 1985… not so far Continueing development cDAWG, ASDAWG, morphic DAWG, WDAWG, SDAWG, two-tree DAWG, DASG, CSDAWG etc. TST : 1997 – 98, TDAWG : 2003 DAWG : Widely Apply by Bioinformatics, NLP, Graph Theory, String Matching, Automata etc. Speed & Space Trends in Huge Data Management Topic for Algorithm Group Matching the interesting topics in this seminar group

4 4 Content DFA (use in string matching’s problem) DAWG Ternary Search Tree Paper : TDAWG, Experiment & Result Paper : Conclusion Paper : Discussion

5 5 DFA Deterministic Finite Automata

6 6 Formalities Deterministic Finite Accepter (DFA) : set of states : input alphabet : transition function : initial state : set of final states

7 7 Set of States

8 8 Input Aplhabet

9 9 Initial State

10 10 Set of Final States

11 11 Transition Function

12 12

13 13

14 14

15 15 Transition Function

16 16 Another Example accept

17 17 = { all substrings with prefix } accept

18 18 = { all strings without substring }

19 19 DAWG Directed Acyclic Word Graph

20 20 DAWG

21 21 DAWG

22 22 DAWG

23 23 cDAWG

24 24

25 25 TST Ternary Search Tree

26 26 TST History Jon L. Bentley and Robert Sedgewick Algorithms for Sorting and Searching Strings, Proceeding. 8th Annual ACM- SIAM Symposium on Discrete Algorithms (SODA), January 1997. Ternary Search Trees, Dr. Dobb's Journal, April 1998. Dictionary of Algorithms and Data Structures, National Institute of Standard and Technology, http://www.nist.gov/

27 27 BST DST TST

28 28

29 29 TDAWG Ternary Directed Acyclic Word Graph

30 30 Introduction DFA  how to implement the transitions of each state ? (Time & Space efficiency) TST  “implant” BST for transitions –Good Time DAWG  smallest DFA for all suffixes –Good Space TDAWG Proof : TDAWG VS. DAWG

31 31 Hypothesis / Theorem (1/2) Time = Construct + Search (useable for online) DFA function  = Alphabet (Chinese & Japan ~ 1000 chars) State Table  O(|p|) p = length of pattern Table use very large memory Link List  O(|  | x |p|) search time If  is large … problem for search time

32 32 Hypothesis / Theorem (2/2) For TDAWG –Use O(|S|) space –Use O(log|  | x |p|) for search time –Use O(|  | x |S| 2 ) construct time (Bentley & Sedwick) –Use O(|  | x |S|) construct time (this paper … apply from Blummer’s online DAWG construction) Comparison : TDAWG VS. DAWG(table & link list) –Space, Search Time, Construction Time

33 33 TST  TDAWG

34 34 Online DAWG Construction

35 35 Online TDAWG Construction

36 36 Experiment Result

37 37 Conclusion New data structure … TDAWG Construction time (English text 256) –TDAWG < linklistDAWG < tableDAWG Space Requirment –linklistDAWG < TDAWG ~ 20 % –tableDAWG not compare in same scale Search Time –Short pattern: tableDAWG best, TDAWG < linklistDAWG –Log curve VS. Linear Curve (long pattern?)

38 38 Discussion & Future Work In Asian Language (characters~1000s) should have better search time than English (character 256) because log(|  |x|p|) Apply to other DAWG… cDAWG, minimumDAWG …etc. More efficiency by AVL tree (AVL-balance) Bioinformatic have 4 character. But, Sliding window with 12 characters = 4 12


Download ppt "1 Ternary Directed Acyclic Word Graphs (TDAWG) Satoru Miyamoto, Shunsuke Inenaga, Masayuki Takeda and Ayumi Shinohara Present by Peera Liewlom (The Last."

Similar presentations


Ads by Google