Download presentation
Presentation is loading. Please wait.
Published byChad Wheeler Modified over 9 years ago
1
Constraint Based (CB) Approach - ‘PC algorithm’ CB algorithm that learns a structure from complete undirected graph and then "thins" it to its accurate structure. Advantage: run time (quick). Disadvantage: arbitrary significance level (threshold) to decide on independencies. Introduction Machine Learning (ML) investigates the mechanisms by which knowledge is acquired through experience. Hard-core ML based applications: Web search engines, On-line help services Document processing (text classification, OCR) Biological data analysis, Military applications Bayesian network (BN) has become one of the most studied machine learning models for knowledge representation and probabilistic inference. Rafi Bojmel, supervised by Dr. Boaz Lerner Department of Electrical Engineering, Ben-Gurion University Bayesian network (BN) E fficient graphical model to represent joint probability distribution (arcs) of a set of random variables (nodes). BN Structure Learning: Find the in/dependence relations between variables and set of conditional probability distributions to represent the problem most accurately. Two approaches (type of algorithms): search-and-score : heuristic search method to construct a model and then evaluates it using a score. constraint-based (CB) : constructs the network by analyzing dependency relationships among nodes. These relationships are measured by performing conditional independence (CI) tests. Figure 1. describes BN of a simple apple/orange classifier. After learning the structure, we are able to use it for Bayesian inference. For example, we measure all attributes 1-4, use the statistic relations, and answer the question “how likely is the fruit to be an apple”? Figure 3. Ideal Mutual Information probability density function (PDF) Experiments and Results Ten Real-world databases were taken from the UCI Repository. All databases were analyzed by CV10 experiment. CI tests were carrying using the normalized CMI criteria: I * =I/N, where N=∏(|Xi|·|Xj| · |S|), with threshold selection using either of the ATS techniques Prediction accuracies of the PC algorithm using manual selection of thresholds, ZCD, BC as well as best performance achieved out of other CB algorithms are summarized in Table 1. Literature cited Bishop, C. M. Neural Networks for Pattern Recognition. Oxford. 1995. Cheng, J. & Greiner, R. Learning Bayesian belief network classifiers: Algorithms and system. Proc. 14th Canadian Conference of Artificial Intelligence, pages 141-150, 2001. Murphy, K. The Bayes Net Toolbox for Matlab. Computing Science and Statistics, vol 33. 2001. http://www.cs.ubc.ca/~murphyk/http://www.cs.ubc.ca/~murphyk/ Newman, D.J., Hettich, S., Blake, C.L., and Merz, C.J. Repository of machine learning databases. U. of California, Irvine, Dept. of Information and Computer Science 1998 http://www.ics.uci.edu/~mlearn/MLRepository.html http://www.ics.uci.edu/~mlearn/MLRepository.html Spirtes, P. Glymour, C. & Scheines, R. Causation Prediction and Search, 2nd edition, MIT Press, 2000. Yehezkel, R. & Lerner, B. Recursive autonomy identification for Bayesian network structure learning. The 10th International Workshop on Artificial Intelligence & Statistics, AISTATS 2005, pages 429-436, 2005. Table 1. Mean classification accuracy (STD) of the PC and other CB algorithms on a CV10 experiment. Numbers inside the table are classification accuracies (percent) together with the standard deviation in brackets where reported. * 'NO CNV' (No Convergence) databases were taken out of average values (ignored) Thus, comparison between average values is not reliable. Figure 2. The three stages PC algorithm. In the first stage, each edge is removed if and only if the two nodes are mutual independent or conditional mutual independent, derived from CI tests. Stage I Stage II Stage III Figure 4. Illustrates the ATS - ZCD technique. ZCD (order=0) ZCD (order=1) Conclusions On 80% out of the 10 databases each one of the ATS techniques improved the PC algorithm accuracy, as well as any other CB algorithm as examined in previous papers. Dominance of condition set from order zero: This fact clarifies that direct and simple relations between attributes are significantly more common in real life. Results testify that there is a potential of both enjoying automatic process and improve performance. Further research is to be executed in the future in order to valid and improve the proposed techniques. Automatic threshold selection for conditional independence tests in learning a Bayesian network Stage I: Find all independent nodes and remove edges accordingly. Based on conditional independence (CI) tests. I.e., if I(Xi,Xj | {S}) < ε then remove edge, where: Automatic Threshold Selection (ATS) The Challenge: Automate the learning process by finding a technique to define the best threshold (ε) candidate. Preserve or improve classification accuracy. The Methods: Zero Crossing Decision (ZCD) Best Candidate (BC) Both are based on the hypothesis that a histogram/PDF of the CMI values for a given database will allow differing between dependent and independent nodes, as demonstrated in Figure 3. Zero Crossing Desicision (ZCD) Perform the PC algorithm as needed. On each step where a threshold selection is needed: Draw a histogram of CMI values of all nodes in the database for the relevant conditional order Select the first CMI value where the histogram crosses the CMI counter axis (see figure 4). Best Candidate (BC) Hill-Climbing technique where we select the best candidate threshold out of a set of them for each conditional set size separately PC Other Best CB from previous experiment nDatabase Automatic Thresholds Selection (ATS) Manual Threshold Selection BC ZCD Order = 0 72.5 (15.3)86.2 (6.0)85.51 (0.52)86.2 (1.5) 15Australian 93.8 (2.4) 85.07 (1.83)92.94 (1.06) 7Car 51.3 (3.6) 50.92 (2.3)51.12 (3.16) 10Cmc 100 (0) 84.53 (15.45)98.52 (3.31) 7Corral 86.4 (2.6)84.5 (3.8)86.38 (2.63) 16Crx 84.3 (2.2)83.9 (2.5)84.3 (2.5)84.3 (2.54) 11Flare 96.0 (3.4)94 (4.9)96.0 (4.35) 5Iris 71.6 (2.8)NO CNV73.31 (1.8)73.59 (1.56) 8Led-7 94.1 (7.7)100 (0)81.493.16 11Mofn 97.1 (3.4)NO CNV95.64 (1.87)95.87 (1.71) 17Voting 84.71 (15.4)86.7* (15.7)82.3 (12.8)85.8 (14.2) Average values (std): Stages II+III: Orient edges is the Conditionl Mutual Information (CMI) and ε is the selected threshold
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.