Download presentation
Presentation is loading. Please wait.
Published byDamien Brunelle Modified over 6 years ago
1
Introducing Domain and Typing Bias in Automata Inference
François Coste Daniel Fredouille (speaker) Christopher Kermorvant Colin de la Higuera INRIA/IRISA (France) Robert Gordon University (UK) Université de Montréal (Canada) EURISE, Université Jean Monnet (France) RGU
2
Automata Inference Search Space
Pruning with counter-examples MCA UA generalization UA UA MCA S+ ={baaa,aaa,bba} L(MCA)=S+
3
Where to introduce bias in the state merging framework?
A ¬ MCA while merge_choice(A,q1,q2) do A’ ¬ merge(A,q1,q2) if compatible(A’) then A ¬ A’ endif endwhile return A A ¬ MCA while merge_choice(A,q1,q2) do A’ ¬ merge(A,q1,q2) if compatible(A’) then A ¬ A’ endif endwhile return A A ¬ MCA while merge_choice(A,q1,q2) do A’ ¬ merge(A,q1,q2) if compatible(A’) then A ¬ A’ endif endwhile return A UA MCA
4
Syntactic and semantic bias
5
Language Bias A background knowledge on the syntax of strings
Set of all strings (S*) Inferred language Lg (= S* − L-) L- (Infinite) set of counter-example (L-) Domain (Lg) More general language (Lg)
6
Language Bias: Formalisation
The set L- is given by an automaton: L- = L(A-) (complementation needed if given Lg) The algorithm ensures: L(A-) L(A) = Ø Examples: Lg: Correct boolean expression L- : Forbidden pattern, e.g. ‘¬)’ should not appear in a correct boolean expression Strings Automaton Automata inference A-
7
Typing Bias A background knowledge on the semantic of strings represented in the “shape” of the target automaton ...CSKPGVIFLTKRSRQVRQC... ...FLTKVIRCSKPSRQVCGFL... ...GVKPIFLTKRSRQVCCSKP... ...FCSKGVIGVIPLTKSKSRQ...
8
Typing Bias: Formalisation
As we possibly know types on an infinite number of strings, we need something to express this knowledge [KH02] Typing function : S* ´ S ´ S* ® T acbbcacbabc b abcccbabc acbbcacbabc b abcccbabc acbbcacbabc b abcccbabc context : left right typed element Typing automaton: S-a a b S
9
Typing bias: Examples Prosite Motifs: Typing protein sequences
The motifs can be transformed in typing automata. Brill Tagger: Typing strings in natural language The machine is NOT a typing automaton but can still be partially used (see paper). Motif PDOC00028 (Zinc-finger): C-x(2,4)-C-x(3)-[LIVMFYWC]-x(8)-H-x(3,5)-H
10
Bias: Improvements on the state of the art
Semantic bias are no more mixed with syntactic ones (Both a conceptual and algorithmic improvement) The proposed formalisation relaxes all constraints existing in the previous formalism [KH02]: Non-determinism allowed: Any kind of language/typing automaton allowed Any kind of inferred automaton allowed Incomplete typing function allowed: Typing is not required to be given on all strings and/or all letters of strings Typing is not required to cover the examples Other: Typing automaton can have an identical type for two different states, …
11
What kind of Background Knowledge ?
Sample annotation Annotation: Knowledge on the examples set - Parenthesizing (for CFG) [Sa92, SM03] - Typing [GBE96, KH02] Formalised BK Formalised BK: Knowledge on any string Automata inference Strings Automaton
12
Algorithms and experiments
13
Algorithm: overview Complexity: Particular cases identified:
O(N1xN2) per merge. N1 factor amortized along different merges Particular cases identified: O(1): specialization of a grammar (extension and better understanding of the [KH02] results) O(merge operator): DFA/UFA with typing on examples Idea: look for common acceptances between BK and inferred automata
14
Algorithm: Language inconsistency detection
b a c S-a a b S : “share a common prefix word”
15
Algorithm: Language inconsistency detection
b a c S-a a b S : “share a common suffix word”
16
Algorithm: Typing projection
b a c b a c S-a a b S : “share both a common prefix and a common suffix word”
17
Experimental Results: Artificial Data
Gowachin generated automaton and examples RPNI algorithm Recognition level on a test set (y-axis) 2 dimensions: Increasing |S+ S-| (x-axis) Increasing “size” of BK (different curves)
18
Experimental Results: Typing on real data
Task: Atis Algorithm: Alergia Typing: part of speech tags (Brill tagger) Evaluation: Perplexity & coverage Best results
19
Background Knowledge can now be introduced in regular GI !
New ! Algorithms can express complex BK: They handle independently syntax and semantic. They handle non-determinism, incompleteness. Algorithms have been tested both on artificial and real data. Needs to be tested on more real world data. Theoretical basis ? Amount of knowledge needed to identify the target ? What are the links with MAT and similar ? Extensions Using these bias in heuristics (handling noise ?) Extensions to more powerful representations (CFG...).
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.