Presentation is loading. Please wait.

Presentation is loading. Please wait.

An Index of Data Size to Extract Decomposable Structures in LAD Hirotaka Ono Mutsunori Yagiura Toshihide Ibaraki (Kyoto University)

Similar presentations


Presentation on theme: "An Index of Data Size to Extract Decomposable Structures in LAD Hirotaka Ono Mutsunori Yagiura Toshihide Ibaraki (Kyoto University)"— Presentation transcript:

1 An Index of Data Size to Extract Decomposable Structures in LAD Hirotaka Ono Mutsunori Yagiura Toshihide Ibaraki (Kyoto University)

2 Overview 1.Overview of LAD 2.Decomposability -Importance & motivation 3.An index of decomposability -#data vectors needed to extract reliable decomposable structures -Based on probabilistic analyses 4.Numerical experiments 5.Conclusion

3 Logical Analysis of Data (LAD) Input: Output: discriminant function T: positive examples (the phenomenon occurs) F: negative examples (the phenomenon does not occur) f(x): a logical explanation of the phenomenon For a phenomenon

4 Example: influenza FeverHeadacheCoughSnivelStomachache 11011 10111 11110 10011 11000 01011 : Set of patients having influenza : Set of patients having common cold An example of discriminant functions: 1=Yes, 0=No Discriminant function f (x) represents knowledge “influenza”. One form of knowledge acquisition

5 Guideline to find a discriminant function Simplicity Explain the structure of the phenomenon {0,1} n space Positive example Negative example We focus on decomposability.

6 x1x1 x2x2 x3x3 x4x4 x5x5 h(x[S 1 ]) T 110111 101111 111101 F 100110 110001 010111 Decomposability S 0  {1, 4, 5} h(x[S 1 ])  x 2  x 3 f (x)  x 1 x 2 x 4  x 1 x 3 x 4    x 1 x 4 h(x[S 1 ]) decomposable! S 1  {2, 3} f is decomposable  f (x)  g(x[S 0 ], h(x[S 1 ])) (T, F) is decomposable   decomposable discriminant f

7 Another example: concept of “square” Square f (x 1, x 2, x 3 ) -x 2 the lengths of all edges are equal -x 3 the number of vertices is 4 -x 1 contains a right angle Square f (x 1, x 2, x 3 ) = g(x 1,h(x 2,x 3 )) - h rhombus - x 2 the lengths of all edges are equal - x 3 the number of vertices is 4

8 The number of data and decomposable structures Case 1: The size of given data is small. –Advantage: Less computational time is needed to find a decomposable structure. –Disadvantage: Decomposable structures easily exist in data (because of less constraints) = Most decomposable structures are deceptive.

9 The number of data and decomposable structures Case 2: The size of given data is large. –Advantage: Deceptive decomposable structures will not be found. –Disadvantage: More computational time is needed. How many data vectors should be prepared to extract real decomposable structures? Index of decomposability

10 (T, F) is decomposable conflict graph of (T, F) is bipartite (Boros et al.1994) Overview of our approach Assume that (T, F) is the set of l randomly chosen vectors from {0, 1} n. 1.Compute the probability of an edge to appear in the conflict graph 2.Regard the conflict graph as a random graph Investigate the probability of the conflict graph to be non-bipartite

11 Conflict graph 1 0 01 0 1 01 0 1 0 01 0 1 0 00 1 0 1 01 0 1 00 1 00 01 11 10 Conflict graph (T, F) is decomposable conflict graph of (T, F) is bipartite

12 Probability of an edge to appear in conflict graph There exists a linked pair. A pair of vectors is called linked if

13 Define a random variable by where edge appears in the conflict graph. We want to compute. There exists a linked pair.

14 How to compute is easier to compute. 1. Both of 2. They have different values (i.e., 0 and 1). 2. 1. L=|T|+|F| p:q=|T|:|F| M=2 n m=2 |S 0 |

15 Approximation of By Inclusion and Exclusion Principle,

16 Random graph In our analysis, is assumed to be the probability of an edge to appear in the conflict graph. Random graph G(N, r) - N: the number of vertices - Each edge e  (u, v) appears in G(N, r) with probability r independently

17 Probability of a random graph to be non-bipartite Y odd : Random variable representing the number of odd cycles in G(N, r) Pr(Y odd  1): Probability that G(N, r) is not bipartite Markov’s inequality The number of sequences of k vertices For sufficiently large N,

18 Assumptions Our index Probability of an edge to appear in conflict graph Threshold for a random graph to be bipartite or not - probabilities p and q are given by p : q  |T| : |F| - conflict graph is a random graph (|S 0 |  |S 1 |  n)

19 Our index If, tends to have many deceptive decomposable structures. If tends to have no deceptive decomposable structure.

20 Numerical Experiments 1.Prepare non-decomposable randomly generated functions and construct 10 for each data size ( ) 2.Check their decomposability Randomly generated data Target functions are not decomposable Dimensions of data are n  10, 20 Two types of data: are biased and not biased

21 Randomly generated data our index Sampling ratio (%) Ratio of decomposable (T, F)s (%)

22 Randomly generated data Sampling ratio (%) Ratio of decomposable (T, F)s (%) our index

23 Breast Cancer in Wisconsin (a.k.a BCW) Already binarized The dimension is n  11 Comparison with randomly generated data with the same n, p and q Real-world data

24 BCW and randomly generated data BCWRandomly generated data Sampling ratio (%) Ratio of decomposable (T, F)s (%) our index

25 Discussion and conclusion An index to extract reliable decomposable structures Computational experiments on random & real-world data - proposed index is a good estimate - |S 0 |  1 or |S 1 |  2  threshold behavior is not clear

26 Future work Analyses on sharpness of the threshold behavior: to know sufficient |T| + |F| to extract reliable decomposable structures Apply similar approach to other classes of Boolean functions |T|  |F| #decomposable structures proposed index we want to estimate


Download ppt "An Index of Data Size to Extract Decomposable Structures in LAD Hirotaka Ono Mutsunori Yagiura Toshihide Ibaraki (Kyoto University)"

Similar presentations


Ads by Google