An Index of Data Size to Extract Decomposable Structures in LAD Hirotaka Ono Mutsunori Yagiura Toshihide Ibaraki (Kyoto Univ.)

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Impact of Interference on Multi-hop Wireless Network Performance Kamal Jain, Jitu Padhye, Venkat Padmanabhan and Lili Qiu Microsoft Research Redmond.
The Theory of Zeta Graphs with an Application to Random Networks Christopher Ré Stanford.
Shortest Vector In A Lattice is NP-Hard to approximate
16.4 Estimating the Cost of Operations Project GuidePrepared By Dr. T. Y. LinVinayan Verenkar Computer Science Dept San Jose State University.
Approximation Algorithms Chapter 14: Rounding Applied to Set Cover.
Active Learning for Streaming Networked Data Zhilin Yang, Jie Tang, Yutao Zhang Computer Science Department, Tsinghua University.
FTP Biostatistics II Model parameter estimations: Confronting models with measurements.
Pair-copula constructions of multiple dependence Workshop on ''Copulae: Theory and Practice'' Weierstrass Institute for Applied Analysis and.
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
Optimization of thermal processes2007/2008 Optimization of thermal processes Maciej Marek Czestochowa University of Technology Institute of Thermal Machinery.
1 The Monte Carlo method. 2 (0,0) (1,1) (-1,-1) (-1,1) (1,-1) 1 Z= 1 If  X 2 +Y 2  1 0 o/w (X,Y) is a point chosen uniformly at random in a 2  2 square.
1 Polynomial Time Probabilistic Learning of a Subclass of Linear Languages with Queries Yasuhiro TAJIMA, Yoshiyuki KOTANI Tokyo Univ. of Agri. & Tech.
Using Probabilistic Finite Automata to Simulate Hourly series of GLOBAL RADIATION. Mora-Lopez M. Sidrach-de-Cardona Shah Jayesh Valentino Crespi CS-594.
EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.
Inferring Mixtures of Markov Chains Tuğkan BatuSudipto GuhaSampath Kannan University of Pennsylvania.
Variational Inference and Variational Message Passing
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
1 Def: Let and be random variables of the discrete type with the joint p.m.f. on the space S. (1) is called the mean of (2) is called the variance of (3)
Statistical Methods Chichang Jou Tamkang University.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
Preference Analysis Joachim Giesen and Eva Schuberth May 24, 2006.
Feature Selection and Error Tolerance for the Logical Analysis of Data Craig Bowles Kathryn Davidson Cornell University University of Pennsylvania Mentor:
Part I: Classification and Bayesian Learning
Radial Basis Function Networks
Bayes Net Perspectives on Causation and Causal Inference
Approximation Algorithms for Stochastic Combinatorial Optimization Part I: Multistage problems Anupam Gupta Carnegie Mellon University.
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
Chapter 7 Optimization. Content Introduction One dimensional unconstrained Multidimensional unconstrained Example.
Tree Decomposition Benoit Vanalderweireldt Phan Quoc Trung Tram Minh Tri Vu Thi Phuong 1.
Hardness of Learning Halfspaces with Noise Prasad Raghavendra Advisor Venkatesan Guruswami.
Yaomin Jin Design of Experiments Morris Method.
Chapter 12 The Analysis of Categorical Data and Goodness-of-Fit Tests.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
Comp. Genomics Recitation 3 The statistics of database searching.
A New Method to Forecast Enrollments Using Fuzzy Time Series and Clustering Techniques Kurniawan Tanuwijaya 1 and Shyi-Ming Chen 1, 2 1 Department of Computer.
An Index of Data Size to Extract Decomposable Structures in LAD Hirotaka Ono Mutsunori Yagiura Toshihide Ibaraki (Kyoto University)
Eurostat Weighting and Estimation. Presented by Loredana Di Consiglio Istituto Nazionale di Statistica, ISTAT.
Inferential Statistics. The Logic of Inferential Statistics Makes inferences about a population from a sample Makes inferences about a population from.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
1 Probability and Statistical Inference (9th Edition) Chapter 5 (Part 2/2) Distributions of Functions of Random Variables November 25, 2015.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Section 6-4 Sampling Distributions and Estimators.
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
Heuristics for Efficient SAT Solving As implemented in GRASP, Chaff and GSAT.
CS Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct
September 28, 2000 Improved Simultaneous Data Reconciliation, Bias Detection and Identification Using Mixed Integer Optimization Methods Presented by:
Introduction The rate of change is a ratio that describes how much one quantity changes with respect to the change in another quantity. For a function,
CS 8751 ML & KDDComputational Learning Theory1 Notions of interest: efficiency, accuracy, complexity Probably, Approximately Correct (PAC) Learning Agnostic.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
G. Cowan Lectures on Statistical Data Analysis Lecture 9 page 1 Statistical Data Analysis: Lecture 9 1Probability, Bayes’ theorem 2Random variables and.
Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.
A Binary Linear Programming Formulation of the Graph Edit Distance Presented by Shihao Ji Duke University Machine Learning Group July 17, 2006 Authors:
An Index of Data Size to Extract Decomposable Structures in LAD Hirotaka Ono Mutsunori Yagiura Toshihide Ibaraki (Kyoto University)
Unconstrained Submodular Maximization Moran Feldman The Open University of Israel Based On Maximizing Non-monotone Submodular Functions. Uriel Feige, Vahab.
Probabilistic Skylines on Uncertain Data (VLDB2007) Jian Pei et al Supervisor: Dr Benjamin Kao Presenter: For Date: 22 Feb 2008 ??: the possible world.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Support Vector Machines (SVMs) Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
1 CS 391L: Machine Learning: Computational Learning Theory Raymond J. Mooney University of Texas at Austin.
1 11. Certification Certification – algorithms/methods to verify that a program is safe. Used when restriction does not work because we use a combination.
Virtual University of Pakistan
Statistical Quality Control, 7th Edition by Douglas C. Montgomery.
Software Engineering (CSI 321)
i) Two way ANOVA without replication
Markov Properties of Directed Acyclic Graphs
Chapter 7 Optimization.
Warm-up (10 min.) I. Factor the following expressions completely over the real numbers. 3x3 – 15x2 + 18x x4 + x2 – 20 II. Solve algebraically and graphically.
Presentation transcript:

An Index of Data Size to Extract Decomposable Structures in LAD Hirotaka Ono Mutsunori Yagiura Toshihide Ibaraki (Kyoto Univ.)

Overview of this presentation 1.What is LAD? 2.Decomposable structures in LAD Their significance 3.An index of decomposability Based on probabilistic analysis 4.Numerical experiments 5.Conclusion

Logical Analysis of Data Input: Output: discriminant function Positive examples (the phenomenon occurs) Negative examples (the phenomenon doesn’t occur) A logical explanation of the phenomenon For a phenomenon,

Example: influenza FeverHeadacheCoughSnivelStomachache : Set of patients having influenza : Set of patients having common cold Examples of discriminant function 1=Yes, 0=No Discriminant function represents knowledge “influenza”. One kind of knowledge acquisition

Guideline to find a discriminant function Simplicity Explain the structure of the phenomenon

Decomposable function General function Decomposable structure Simplicity Explain the structure of the phenomenon

Example: concept of “square” i1110 ii1111 iii0110 iv1001 v1101 : the lengths of all edges are equal : the number of vertices is 4 : contains a right angle : the area is over 100 iii iv i ii v

Example: concept of “square” Square the lengths of all edges are equal the number of vertices is 4 contains a right angle Square rhombus the lengths of all edges are equal the number of vertices is 4

Hierarchical structures and decomposable structures Concept attribute

Hierarchical structures and decomposable structures Concept attribute Sub-Concept

Past research for decomposable structures Finding basic decomposable functions (e.g, ) for given and attribute sets case: polynomial time [Boros, et al 1994] Finding other classes (positive, Horn, and their mixtures ) of decomposable functions for and attribute set [Makino, et al 1995] Finding a (positive) decomposable functions for given ( is not given) proposing a heuristic algorithm [Ono, et al 1999]

The Number of data and decomposable structures Case 1: The size of given data is small. –Advantage: less computational time is needed to find a decomposable structure. –Disadvantage: Decomposable structures easily exist in data (because of less constraints) = Most decomposable structures are deceptive.

The Number of data and decomposable structures Case 2: The size of given data is large. –Advantage: Deceptive decomposable structures will not be found. –Disadvantage: More computational time is needed. How many data vectors should be prepared to extract real decomposable structures? Index of decomposability

Overview of our approach Assume that is the set of randomly chosen vectors from. 1.Compute the probability of an edge to appear in the conflict graph 2.Regard the conflict graph as a random graph Investigate the probability of the conflict graph to be non-bipartite Decomposability of Conflict graph of is bipartite.

Conflict graph Conflict graph Decomposability of Conflict graph of is bipartite.

Random graph – the number of vertices –Each Edge appears in with probability independently. In our analysis, is assumed to be the probability of an edge to appear in the conflict graph.

Probability of an edge to appear in conflict graph There exists a linked pair. A pair of vectors is called linked if

Define a random variable by where edge appears in the conflict graph. We want to compute. There exists a linked pair.

How to compute ? Assumptions Generation of vectors are randomly sampled from without replacement. A sampled vector is in with probability, and in with probability.

How to compute ? is easier to compute. 1. Both of 2. They have different values (i.e., 0 and 1)

Upper and lower bound on By Markov’s inequality and linearity of expectation, By the principle of inclusion and exclusion, Upper Bound Lower Bound

Approximation of

Probability of random graph to be non-bipartite : Random variable for the number of odd cycles in : Probability that is bipartite. Compute (approximation) of (Markov’s inequality) The number of sequences of vertices

Taylor series of When does hold? Upper bound:

When does hold? Lower bound when : if For sufficiently large, ( and are constant.)

Our index Probability of an edge to appear in conflict graph Threshold for a random graph to be bipartite or not

Our index If, has many deceptive decomposable structures. If tends to have no deceptive decomposable structure.

Numerical Experiments 1.Prepare non-decomposable randomly generated functions and construct 10 for each data size ( ) 2.Check their decomposability Randomly generated data Target functions are not decomposable Dimensions of data are Two types of data: are biased and not biased

Randomly generated data our index Sampling ratio (%) Ratio of decomposable pdBfs (%)

Randomly generated data Sampling ratio (%) Ratio of decomposable pdBfs (%)

Real-world data Breast Cancer in Wisconsin (a.k.a BCW) Already binarized The dimension is Comparison with randomly generated data with same size and

BCW and Randomly generated data BCWRandomly generated data Sampling ratio (%) Ratio of decomposable pdBfs (%)

Discussion and conclusion In most cases, our index is a good estimate of the threshold point of decomposability; i.e., it is useful to know how many data vectors are indispensable to extract decomposable structures. In case of and, threshold behavior of our index is not clear. We suppose that the following two violations of the assumptions cause it: The edges in the conflict graph appear independently. is small.

Future work Under some distributions of, how many data vectors are enough to extract decomposable structures. Apply this kind of approach to other classes of Boolean functions.