Download presentation
Presentation is loading. Please wait.
Published byGarey Poole Modified over 9 years ago
1
1 Ternary Directed Acyclic Word Graphs (TDAWG) Satoru Miyamoto, Shunsuke Inenaga, Masayuki Takeda and Ayumi Shinohara Present by Peera Liewlom (The Last Algorithm Group)
2
2 CIAA 2003 Eighth International Conference on Implementation and Application of Automata July 16-18, 2003, Santa Barbara, CA, USA Topic / Committee / Community
3
3 Why did I select this paper ? DAWG start 1985… not so far Continueing development cDAWG, ASDAWG, morphic DAWG, WDAWG, SDAWG, two-tree DAWG, DASG, CSDAWG etc. TST : 1997 – 98, TDAWG : 2003 DAWG : Widely Apply by Bioinformatics, NLP, Graph Theory, String Matching, Automata etc. Speed & Space Trends in Huge Data Management Topic for Algorithm Group Matching the interesting topics in this seminar group
4
4 Content DFA (use in string matching’s problem) DAWG Ternary Search Tree Paper : TDAWG, Experiment & Result Paper : Conclusion Paper : Discussion
5
5 DFA Deterministic Finite Automata
6
6 Formalities Deterministic Finite Accepter (DFA) : set of states : input alphabet : transition function : initial state : set of final states
7
7 Set of States
8
8 Input Aplhabet
9
9 Initial State
10
10 Set of Final States
11
11 Transition Function
12
12
13
13
14
14
15
15 Transition Function
16
16 Another Example accept
17
17 = { all substrings with prefix } accept
18
18 = { all strings without substring }
19
19 DAWG Directed Acyclic Word Graph
20
20 DAWG
21
21 DAWG
22
22 DAWG
23
23 cDAWG
24
24
25
25 TST Ternary Search Tree
26
26 TST History Jon L. Bentley and Robert Sedgewick Algorithms for Sorting and Searching Strings, Proceeding. 8th Annual ACM- SIAM Symposium on Discrete Algorithms (SODA), January 1997. Ternary Search Trees, Dr. Dobb's Journal, April 1998. Dictionary of Algorithms and Data Structures, National Institute of Standard and Technology, http://www.nist.gov/
27
27 BST DST TST
28
28
29
29 TDAWG Ternary Directed Acyclic Word Graph
30
30 Introduction DFA how to implement the transitions of each state ? (Time & Space efficiency) TST “implant” BST for transitions –Good Time DAWG smallest DFA for all suffixes –Good Space TDAWG Proof : TDAWG VS. DAWG
31
31 Hypothesis / Theorem (1/2) Time = Construct + Search (useable for online) DFA function = Alphabet (Chinese & Japan ~ 1000 chars) State Table O(|p|) p = length of pattern Table use very large memory Link List O(| | x |p|) search time If is large … problem for search time
32
32 Hypothesis / Theorem (2/2) For TDAWG –Use O(|S|) space –Use O(log| | x |p|) for search time –Use O(| | x |S| 2 ) construct time (Bentley & Sedwick) –Use O(| | x |S|) construct time (this paper … apply from Blummer’s online DAWG construction) Comparison : TDAWG VS. DAWG(table & link list) –Space, Search Time, Construction Time
33
33 TST TDAWG
34
34 Online DAWG Construction
35
35 Online TDAWG Construction
36
36 Experiment Result
37
37 Conclusion New data structure … TDAWG Construction time (English text 256) –TDAWG < linklistDAWG < tableDAWG Space Requirment –linklistDAWG < TDAWG ~ 20 % –tableDAWG not compare in same scale Search Time –Short pattern: tableDAWG best, TDAWG < linklistDAWG –Log curve VS. Linear Curve (long pattern?)
38
38 Discussion & Future Work In Asian Language (characters~1000s) should have better search time than English (character 256) because log(| |x|p|) Apply to other DAWG… cDAWG, minimumDAWG …etc. More efficiency by AVL tree (AVL-balance) Bioinformatic have 4 character. But, Sliding window with 12 characters = 4 12
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.