Download presentation
Presentation is loading. Please wait.
Published byKaren Warren Modified over 9 years ago
1
Finite State Automata and Tries Sambhav Jain IIIT Hyderabad
2
Think !!! How to store a dictionary in computer? How to search for an entry in that dictionary? – Say you have each word length exactly equal to 10 characters and can take any letter from ‘a-z’ Eg. aaaaaaaaaa, abcdefghij, …. etc Language = [a-z]{10}- RegEx 2Finite State Automata and Tries
3
A Simple Way aaaaaaaaaa aaaaaaaaab aaaaaaaaac …. zzzzzzzzzz A Linear Sorted List of Entries 3Finite State Automata and Tries
4
A Simple Way aaaaaaaaaa aaaaaaaaab aaaaaaaaac …. zzzzzzzzzz Character to be stored = 26 10 = 1.41167096 × 10 14 Each character take 1 Byte ~ 141 TB 4Finite State Automata and Tries
5
Smart Way ! abcdwxyz abcdwxyz abcdwxyz …………………………………………….. ………………………………..………………………………………………………………………………………. 5Finite State Automata and Tries
6
Smart Way ! abcdwxyz abcdwxyz abcdwxyz …………………………………………….. ………………………………..………………………………………………………………………………………. Total Storage = 26x10 = 260 bytes Traverse 10 nodes 6Finite State Automata and Tries
7
Does it work for Natural Language Oxford Advanced English Learner 20 th Edition – A quarter of a million distinct English words, excluding inflections, and words from technical and regional vocabulary not covered by the OED After inflections ? – eat,eats,eaten,eating ….. What after multiple inflexion ??? – beauty, beautiful, beautifully … 7Finite State Automata and Tries
8
Example (Store & Search) e e a s t n ing 8Finite State Automata and Tries
9
Example e e a s t n ing b 9Finite State Automata and Tries
10
Example e e a s t n ing b f s a 10Finite State Automata and Tries
11
Example e e a s t n ing b f s a t e i r w 11Finite State Automata and Tries
12
Inflectional morphology Deals with word forms of a root, when there is no change in lexical category. Each word form gives different values of features like gender, number, person, etc. 12Finite State Automata and Tries
13
Paradigm For a given root, there are many word forms with different features. Ex. Forms of Hindi root laDakA (boy) DirectOblique SingularlaDakAlaDake PlurallaDakelaDakoM 13Finite State Automata and Tries
14
Paradigm - 'laDakoM' is plural with oblique case - given by feature structure {num=pl, case=obl} - 'laDake' stands for two feature structures + Singular oblique (Ex. laDake ne kahA...) - where oblique means 'laDake' is followed by a postposition marker + plural direct case (Ex. laDake Aye) 14Finite State Automata and Tries
15
Paradigm o Paradigms - What operation is done on root to obtain word forms - Model using pairs: (delete string, add string) | direct oblique ---|----------------------- sg | (O,O) (A,e) pl | (A,e) (A,oM) o List roots with paradigms they follow: - ghoDA follows paradigm laDakA - charkhA follows paradigm laDakA - laDakA follows paradigm laDakA 15Finite State Automata and Tries
16
l k | | a a | | D p | | -------- a | | | a A D | | | k ------- | | | | ------------ | I i | | | ------- | A e o | | | A | | | | | | A e o M M | M 16Finite State Automata and Tries
17
Abstracting out suffixes k l | | a a | | p D | | a --------- | | | D #1 a A | | k (#1) I #1: Corresponds to paradigm for 'laDakA' 17Finite State Automata and Tries
18
- Suffix trie (forward) #1 | -------------- | | | e o A | M 18Finite State Automata and Tries
19
Can we further optimize our search ? - Use knowledge of paradigms - Use suffix tree 19Finite State Automata and Tries
20
Store suffix tree in main memory Store rest of the categorized by paradigm in hard disk Do backward search for suffix tree Identify the paradigm Search only in that paradigm set Eg. if ‘–ing’ occur you first won’t be searching word like home, cat, god … 20Finite State Automata and Tries
21
Finite State Automata Trie is a data structure FSA is the computational approach Slight difference in representation – Putting characters on edges rather than nodes 21Finite State Automata and Tries
22
+ / \ l / \ k + + a | | a | | + + D | | p | | + + a | | a | | + + k | | D | | + + \ / 0 \ / 0 +______ e/ \o \ A / \ \ (+) + (+) | |M (+) 22Finite State Automata and Tries
23
FSA o A deterministic finite-state machine formally is - Q: A finite set of states (Ex.:{q0,q1,q2}) - SIGMA: A finite set of input alphabet (Ex.: {a,b,c}) - Start state: A state in Q, from which machine starts (Ex.: q0) - F: A set of accepting states (Ex.: {q2}) - DELTA (q,i): A transition function or transition matrix where: - q MEMBER Q, i MEMBER SIGMA, - DELTA(q,i) MEMBER Q Thus, DELTA(q,i): Q x SIGMA --> Q 23Finite State Automata and Tries
24
RECOGNITION Problem Till now we were handling only RECOGNITION problem If FSA reach a final state at the end of input string then EXIST Else NOT 24Finite State Automata and Tries
25
But we seek analyzed output We want the machine to tell – Root – Gender – Number – Person – Case – Etc …… 25Finite State Automata and Tries
26
Finite State Transducer FST is like the finite state automation defined earlier, except each arc is labelled by a pair of symbols: i:o where i: symbol in input string o: symbol output by FST when are is taken + Ex. arc in finite state transducer corresponding to 'e' in 'ladake' e : ((+pl, -direct), (+sg, +dir)) q1 +----------------->--------------------+ q2 Two pairs of symbols: i : o - i is: 'e' - o is: '((+pl, -direct), (+sg, +dir))' + Ex. Morph Analyzer: Match input with i, if successful go ahead & produce o in output 26Finite State Automata and Tries
27
o Formally: Finite state transducer - Q: Finite set of states q0,..., qN - SIGMA_IN: Finite set of input symbols - SIGMA_OUT: Finite set of pairs output symbols - q0: Start state (q0 IN Q) - F: Set of final accepting states (F SUBSET Q) - DELTA (q, i:o) : For every state q, gives a set of states that can be reached from q with i in SIGMA_IN, and o in SIGMA_OUT. 27Finite State Automata and Tries
28
Example on board 28Finite State Automata and Tries
29
Tools for FSA Lex OpenFST – (www.openfst.org/)www.openfst.org/ AT&T FSM Toolkit – (http://www2.research.att.com/~fsmtools/fsm/)http://www2.research.att.com/~fsmtools/fsm/ 29Finite State Automata and Tries
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.