Download presentation
Presentation is loading. Please wait.
Published byRoderick Sutton Modified over 9 years ago
1
Rafi Bojmel supervised by Dr. Boaz Lerner Automatic Threshold Selection for conditional independence tests in learning a Bayesian network
2
Overview Machine Learning (ML) investigates the mechanisms by which knowledge is acquired through experience. Hard-core ML based applications: Web search engines, On-line help services Document processing (text classification, OCR) Biological data analysis, Military applications The Bayesian network (BN) has become one of the most studied machine learning models for knowledge representation, probabilistic inference and recently also classification
3
Recent visit to Asia Tuberculosis Smoker Lung cancer Positive X-ray Either Tuberculosis or Lung cancer Bronchitis Dyspnea (shortness-of-breath) BN Example (1) A=yesA=no P(A)50%50% D=yesD=no P(D | B=yes)90%10% P(D | B=no)5%95% Chest Clinic (Asia) Problem
4
Recent visit to Abroad Tuberculosis Smoker Lung cancer Positive X-ray Either Tuberculosis or Lung cancer Bronchitis Dyspnea (shortness-of-breath) Markov Blanket of Lung cancer BN Example (2) Chest Clinic (Asia) Problem
5
Bayesian Networks Learning Bayesian networks Structure learning Parameter learning Search-and-score Constraint-based Inference (e.g., classification) Bayesian networkStructure/Graph
6
BN Structure Learning Database Training Set Model Construction Test set Bayesian inference (classification) Two main approaches in the area of BN Structure learning: Search-and-Score, uses heuristic search method Constraint based, analyzes dependency relationships among nodes, using conditional independence (CI) tests. The PC algorithm is a CB based algorithm. ……………………… 10000000#6 01100101#5 01101110#4 10111011#3 10010000#2 10010010#1 D D yspnea X -ray E ither B B ronchitis L ung cancer T uberculosis S moker A sia
7
PC algorithm (1) Inputs: V: set of variables (and corresponding database) I * (Xi,Xj|{S}) <> ε: A test of conditional independence ε: Threshold Order{V}: Ordering of V Output: Directed Acyclic Graph (DAG) Xi,Xj = any two nodes in the graph I * (Xi,Xj|{S}) = Normalized Conditional Mutual Information {S} = subset of variables (other than Xi,Xj)
8
PC algorithm (2) The algorithm contains three stages: Stage I: Start from the complete graph and find an undirected graph using conditional independence tests Stage II: Find some head to head (V-Structures) links ( X – Y – Z becomes X Y Z ) Stage III: Orient all those links that can be oriented
9
Recent visit to Asia Tuberculosis Smoker Lung cancer Positive X-ray Either Tuberculosis or Lung cancer Bronchitis Dyspnea (shortness-of-breath) PC Algorithm Simulation Stage I END Stage II V-structure Stage III Precise Structure
10
Threshold Selection – existing methods Arbitrary (trial-and-error) selection Disadvantages: haphazardness, inaccuracy, time Likelihood or Classifier Accuracy based selection Disadvantages: exponentially run-time The “risk” in selecting the wrong threshold: Too small too many edges causality run-time Too large loose important edges inaccuracy
11
Threshold selection - Novel Technique (1) M utual i nformation P robability D ensity F unctions based: I*(Xi,Xj | {S}) Calculate the MI values, I*(Xi,Xj | {S}), for different sizes (orders) of condition set, S. Create histograms (PDF estimation technique). Techniques to define the best threshold automatically: Zero-Crossing-Decision (ZCD) Best-Candidate (BC)
12
Threshold selection - Novel Technique (2)
13
ZCD (order=0) ZCD (order=1) Zero-Crossing-Decision (ZCD)
14
Experiment and Results Classification experiments with 8 real-world databases have been performed (UCI Repository) Databases sizes: 128 - 3,200 cases. Graph sizes: 5 - 17 nodes. Dimension of class variable: 2 - 10.
15
Summary The PC algorithm requires selecting a threshold for structure learning, which is a time-consuming process that also undermines automatic structure learning. Initial examination of our novel techniques testifies that there is a potential of both enjoying the automatic process and improving performance. Further research is executed in order to valid and improve the proposed techniques.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.