Presentation is loading. Please wait.

Presentation is loading. Please wait.

Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org Permutation.

Similar presentations


Presentation on theme: "Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org Permutation."— Presentation transcript:

1 Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org Permutation Genetic Algorithms for Score-Based Bayesian Network Structure Learning Monday, 16 August 2004 William H. Hsu and Roby Joehanes Joint work with: Haipeng Guo, Benjamin B. Perry, Julie A. Thornton Thanks to: Jeffrey M. Barber, Andrew King, Chris Meyer Laboratory for Knowledge Discovery in Databases Department of Computing and Information Sciences Kansas State University http://www.kddresearch.org This presentation is: http://www.kddresearch.org/KSU/CIS/CCCT-20040816.ppt Computing, Communications and Control Technologies (CCCT) 2004

2 Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org Research Overview Graphical Models of Probability –Markov graphs –Bayesian (belief) networks –Causal semantics –Direction-dependent separation (d-separation) property Learning and Reasoning: Problems, Algorithms –Inference: exact and approximate Junction tree – Lauritzen and Spiegelhalter (1988) (Bounded) loop cutset conditioning – Horvitz and Cooper (1989) Variable elimination – Dechter (1996) –Structure learning K2 algorithm – Cooper and Herskovits (1992) Variable ordering problem – Larannaga (1996), Hsu et al. (2002, 2004) Probabilistic Reasoning in Machine Learning, Data Mining Current Research and Open Problems

3 Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org Graphical Models of Probability P(20s, Female, Low, Non-Smoker, No-Cancer, Negative, Negative) = P(T) · P(F) · P(L | T) · P(N | T, F) · P(N | L, N) · P(N | N) · P(N | N) Conditional Independence –X is conditionally independent (CI) from Y given Z iff P(X | Y, Z) = P(X | Z) for all values of X, Y, and Z –Example: P(Thunder | Rain, Lightning) = P(Thunder | Lightning)  T  R | L Bayesian (Belief) Network –Acyclic directed graph model B = (V, E,  ) representing CI assertions over  –Vertices (nodes) V: denote events (each a random variable) –Edges (arcs, links) E: denote conditional dependencies Markov Condition for BBNs (Chain Rule): Example BBN X1X1 X3X3 X4X4 X5X5 Age Exposure-To-Toxins Smoking Cancer X6X6 Serum Calcium X2X2 Gender X7X7 Lung Tumor

4 Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org Model Averaging Procedure (Schuurmans et al.) General-Case BBN Structure Learning: Use Inference to Compute Scores Optimal Strategy: Bayesian Model Averaging –Assumption: models h  H are mutually exclusive and exhaustive –Combine predictions of models in proportion to marginal likelihood Compute conditional probability of hypothesis h given observed data D i.e., compute expectation over unknown h for unseen cases Let h  structure, parameters   CPTs Posterior ScoreMarginal Likelihood Prior over StructuresLikelihood Prior over Parameters

5 Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org Greedy Score-Based Algorithm for Structure Learning (K2, Cooper & Herskovits) Algorithm Learn-BBN-Structure-K2 (D, Max-Parents) FOR i  1 to n DO// arbitrary ordering of variables {x 1, x 2, …, x n } WHILE (Parents[x i ].Size < Max-Parents) DO// find best candidate parent Best  argmax j>i (P(D | x j  Parents[x i ])// max Dirichlet score IF (Parents[x i ] + Best).Score > Parents[x i ].Score) THEN Parents[x i ] += Best RETURN ({Parents [x i ] | i  {1, 2, …, n}}) A Logical Alarm Reduction Mechanism [Beinlich et al, 1989] –BN2O (3-layer) graphical model for patient monitoring in surgical anesthesia –Vertices (37): findings (e.g., esophageal intubation), intermediates, observables –K2: finds BBN different in only 1 edge from gold standard (elicited from expert)

6 Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org BNJ Development History Bayesian Network Tools (BNTools, 2000-2002) –Junction Tree –Editor –Structure learning (K2) BNJ v1 (2002-2003) –Semistructured data format (XML) based on MSBN, XBN –ConverterFactory: Hugin, Ergo, Netica, MSBN –Importance sampling –Other inference algorithms (Guo: Multi-Start Hill Climbing, Tabu Search) BNJ v2 (2003-2004) –Relational Models –Wizards: Learning, Inference BNJ v3 (2004-present) –Visualization Framework –Run Mode: online constraint propagation –Refactoring for speed: 600-2000% speedup over v2 –Better memory management

7 Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org BNJ Graphical User Interface: Editor © 2004 KSU BNJ Development TeamAsia (Chest Clinic) Network

8 Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org [2] Representation Evaluator for Learning Problems Genetic Wrapper for Change of Representation and Inductive Bias Control D: Training Data : Inference Specification D train (Inductive Learning) D val (Inference) [1] Genetic Algorithm α Candidate Representation f(α) Representation Fitness Optimized Representation Permutation GA for Greedy Structure Learning

9 Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org Fitness Function

10 Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org Results: Asia (Chest Clinic) Histogram of estimated fitness for all 8! = 40320 permutations of Asia variables K2FS Samples Best f of final gen 500015000.944 1000015000.960 200001500.935 200004500.977 2000015000.978 Results for Asia (5000 samples per fitness evaluation in D val and D test )

11 Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org Results: ALARM-13

12 Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org BNJ Core [1] Design

13 Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org BNJ Core [2] Graph Architecture © 2004 KSU BNJ Development TeamCPCS-54 Network

14 Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org BNJ Graphical User Interface: Network © 2004 KSU BNJ Development Team ALARM Network

15 Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org BNJ Visualization [1] Framework © 2004 KSU BNJ Development Team

16 Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org BNJ Visualization [2] Pseudo-Code Annotation (Code Page) © 2004 KSU BNJ Development Team ALARM Network

17 Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org BNJ Visualization [3] Network © 2004 KSU BNJ Development Team Poker Network

18 Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org Current Work: Features in Progress Scalability –Large networks (50+ vertices, 10+ parents) –Very large data sets (10 6 +) Other Visualizations –K2 for structure learning –Conditioning BNJ v1-2 ports –Guo’s dissertation algorithms –Importance sampling (CABeN) Lazy Evaluation © 2004 KSU BNJ Development TeamBarley Network

19 Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org Future Work: Desired Features Grid Computing –Very large networks (200+ vertices) New Visualizations –Variable Elimination (difficult) –Other structure learning New Representations –Relational Graphical Models –Dynamic Bayes nets –Decision Networks BNJ v1-2 Reimplementations –Database GUI –Wizards

20 Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org Treatment 1 (Control) Treatment 2 (Pathogen) Messenger RNA (mRNA) Extract 1 Messenger RNA (mRNA) Extract 2 cDNA DNA Hybridization Microarray (under LASER) Adapted from Friedman et al. (2000) http://www.cs.huji.ac.il/labs/compbio/http://www.cs.huji.ac.il/labs/compbio/ Current Research Topics: Bioinformatics Learning Environment G = (V, E) Specification Fitness (Inferential Loss) B = (V, E,  ) [B] Parameter Estimation G1G1 G2G2 G3G3 G4G4 G5G5 [A] Structure Learning G1G1 G2G2 G3G3 G4G4 G5G5 D val (Model Validation by Inference) D: Data (User, Microarray)

21 Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org References: Graphical Models and Inference Algorithms Inference Algorithms –Junction Tree (Join Tree, L-S, Hugin): Lauritzen & Spiegelhalter (1988) http://citeseer.nj.nec.com/huang94inference.html http://citeseer.nj.nec.com/huang94inference.html –(Bounded) Loop Cutset Conditioning: Horvitz & Cooper (1989) http://citeseer.nj.nec.com/shachter94global.html http://citeseer.nj.nec.com/shachter94global.html –Variable Elimination (Bucket Elimination, ElimBel): Dechter (1986) http://citeseer.nj.nec.com/dechter96bucket.html –Recommended Books Neapolitan (1990) – out of print; see Pearl (1988), Jensen (2001) Castillo, Gutierrez, Hadi (1997) Cowell, Dawid, Lauritzen, Spiegelhalter (1999) –Stochastic Approximation http://citeseer.nj.nec.com/cheng00aisbn.htmlhttp://citeseer.nj.nec.com/cheng00aisbn.html Bioinformatics –European Bioinformatics Institute Tutorial: Brazma et al. (2001) http://www.ebi.ac.uk/microarray/biology_intro.htm http://www.ebi.ac.uk/microarray/biology_intro.htm –K-State BMI Group: literature survey and resource catalog (2002) http://www.kddresearch.org/Groups/Bioinformatics http://www.kddresearch.org/Groups/Bioinformatics

22 Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org Acknowledgements Kansas State University Lab for Knowledge Discovery in Databases –Undergraduates Jeff Barber Andrew King –Graduate Students Chris Meyer Julie A. Thornton Other Universities –Carnegie Mellon University: Dr. Clark Glymour, Dr. Richard Scheines –Iowa State University: Dr. Vasant Honavar, Dr. Dimitris Margaritis, Dr. Jin Tian BNJ v3 Test Sites

23 Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org For More Information Commercial Tools: Ergo, Netica, TETRAD, Hugin Bayes Net Toolbox (BNT) – Murphy (1997-present) –Distribution page http://http.cs.berkeley.edu/~murphyk/Bayes/bnt.html http://http.cs.berkeley.edu/~murphyk/Bayes/bnt.html –Development group http://groups.yahoo.com/group/BayesNetToolbox http://groups.yahoo.com/group/BayesNetToolbox Bayesian Network tools in Java (BNJ) – Hsu et al. (2000-present) –Distribution page http://bnj.sourceforge.net http://bnj.sourceforge.net –Development group http://groups.yahoo.com/group/bndev http://groups.yahoo.com/group/bndev –Current (re)implementation projects for KSU KDD Lab Continuous state: Minka (2002) – Hsu, Barber Formats: XML BNIF (MSBN), Netica – Guo, Hsu Bounded cutset conditioning – Chandak Space-efficient DBN inference


Download ppt "Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org Permutation."

Similar presentations


Ads by Google