Introducing Domain and Typing Bias in Automata Inference

Introducing Domain and Typing Bias in Automata Inference
François Coste Daniel Fredouille (speaker) Christopher Kermorvant Colin de la Higuera INRIA/IRISA (France) Robert Gordon University (UK) Université de Montréal (Canada) EURISE, Université Jean Monnet (France) RGU

Automata Inference Search Space
Pruning with counter-examples MCA UA generalization UA UA MCA S+ ={baaa,aaa,bba} L(MCA)=S+

Where to introduce bias in the state merging framework?
A ¬ MCA while merge_choice(A,q1,q2) do A’ ¬ merge(A,q1,q2) if compatible(A’) then A ¬ A’ endif endwhile return A A ¬ MCA while merge_choice(A,q1,q2) do A’ ¬ merge(A,q1,q2) if compatible(A’) then A ¬ A’ endif endwhile return A A ¬ MCA while merge_choice(A,q1,q2) do A’ ¬ merge(A,q1,q2) if compatible(A’) then A ¬ A’ endif endwhile return A UA MCA

Syntactic and semantic bias

Language Bias A background knowledge on the syntax of strings
Set of all strings (S*) Inferred language Lg (= S* − L-) L- (Infinite) set of counter-example (L-) Domain (Lg) More general language (Lg)

Language Bias: Formalisation
The set L- is given by an automaton: L- = L(A-) (complementation needed if given Lg) The algorithm ensures: L(A-)  L(A) = Ø Examples: Lg: Correct boolean expression L- : Forbidden pattern, e.g. ‘¬)’ should not appear in a correct boolean expression Strings Automaton Automata inference A-

Typing Bias A background knowledge on the semantic of strings represented in the “shape” of the target automaton ...CSKPGVIFLTKRSRQVRQC... ...FLTKVIRCSKPSRQVCGFL... ...GVKPIFLTKRSRQVCCSKP... ...FCSKGVIGVIPLTKSKSRQ...

Typing Bias: Formalisation
As we possibly know types on an infinite number of strings, we need something to express this knowledge [KH02] Typing function : S* ´ S ´ S* ® T acbbcacbabc b abcccbabc acbbcacbabc b abcccbabc acbbcacbabc b abcccbabc context : left right typed element Typing automaton: S-a a b S

Typing bias: Examples Prosite Motifs: Typing protein sequences
The motifs can be transformed in typing automata. Brill Tagger: Typing strings in natural language The machine is NOT a typing automaton but can still be partially used (see paper). Motif PDOC00028 (Zinc-finger): C-x(2,4)-C-x(3)-[LIVMFYWC]-x(8)-H-x(3,5)-H

Bias: Improvements on the state of the art
Semantic bias are no more mixed with syntactic ones (Both a conceptual and algorithmic improvement) The proposed formalisation relaxes all constraints existing in the previous formalism [KH02]: Non-determinism allowed: Any kind of language/typing automaton allowed Any kind of inferred automaton allowed Incomplete typing function allowed: Typing is not required to be given on all strings and/or all letters of strings Typing is not required to cover the examples Other: Typing automaton can have an identical type for two different states, …

What kind of Background Knowledge ?
Sample annotation Annotation: Knowledge on the examples set - Parenthesizing (for CFG) [Sa92, SM03] - Typing [GBE96, KH02] Formalised BK Formalised BK: Knowledge on any string Automata inference Strings Automaton

Algorithms and experiments

Algorithm: overview Complexity: Particular cases identified:
O(N1xN2) per merge. N1 factor amortized along different merges Particular cases identified: O(1): specialization of a grammar (extension and better understanding of the [KH02] results) O(merge operator): DFA/UFA with typing on examples Idea: look for common acceptances between BK and inferred automata

Algorithm: Language inconsistency detection
b a c S-a a b S : “share a common prefix word”

Algorithm: Language inconsistency detection
b a c S-a a b S : “share a common suffix word”

Algorithm: Typing projection
b a c b a c S-a a b S : “share both a common prefix and a common suffix word”

Experimental Results: Artificial Data
Gowachin generated automaton and examples RPNI algorithm Recognition level on a test set (y-axis) 2 dimensions: Increasing |S+  S-| (x-axis) Increasing “size” of BK (different curves)

Experimental Results: Typing on real data
Task: Atis Algorithm: Alergia Typing: part of speech tags (Brill tagger) Evaluation: Perplexity & coverage Best results

Background Knowledge can now be introduced in regular GI !
New ! Algorithms can express complex BK: They handle independently syntax and semantic. They handle non-determinism, incompleteness. Algorithms have been tested both on artificial and real data. Needs to be tested on more real world data. Theoretical basis ? Amount of knowledge needed to identify the target ? What are the links with MAT and similar ? Extensions Using these bias in heuristics (handling noise ?) Extensions to more powerful representations (CFG...).

Introducing Domain and Typing Bias in Automata Inference

Similar presentations

Presentation on theme: "Introducing Domain and Typing Bias in Automata Inference"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Introducing Domain and Typing Bias in Automata Inference

Similar presentations

Presentation on theme: "Introducing Domain and Typing Bias in Automata Inference"— Presentation transcript:

Similar presentations

About project

Feedback