Constructing a Large Node Chow-Liu Tree Based on Frequent Itemsets Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory.

Slides:



Advertisements
Similar presentations
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Advertisements

ICONIP 2005 Improve Naïve Bayesian Classifier by Discriminative Training Kaizhu Huang, Zhangbing Zhou, Irwin King, Michael R. Lyu Oct
Decision Tree Approach in Data Mining
LOGO Association Rule Lecturer: Dr. Bo Yuan
Minimum Redundancy and Maximum Relevance Feature Selection
Learning on Probabilistic Labels Peng Peng, Raymond Chi-wing Wong, Philip S. Yu CSE, HKUST 1.
SLIQ: A Fast Scalable Classifier for Data Mining Manish Mehta, Rakesh Agrawal, Jorma Rissanen Presentation by: Vladan Radosavljevic.
Relevance Feedback Content-Based Image Retrieval Using Query Distribution Estimation Based on Maximum Entropy Principle Irwin King and Zhong Jin Nov
Classifying Categorical Data Risi Thonangi M.S. Thesis Presentation Advisor: Dr. Vikram Pudi.
Learning Maximum Likelihood Bounded Semi-Naïve Bayesian Network Classifier Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory.
On Appropriate Assumptions to Mine Data Streams: Analyses and Solutions Jing Gao† Wei Fan‡ Jiawei Han† †University of Illinois at Urbana-Champaign ‡IBM.
ACM Multimedia th Annual Conference, October , 2004
Variations of Minimax Probability Machine Huang, Kaizhu
1 Integrating User Feedback Log into Relevance Feedback by Coupled SVM for Content-Based Image Retrieval 9-April, 2005 Steven C. H. Hoi *, Michael R. Lyu.
Huang,Kaizhu Classifier based on mixture of density tree CSE Department, The Chinese University of Hong Kong.
Discriminative Naïve Bayesian Classifiers Kaizhu Huang Supervisors: Prof. Irwin King, Prof. Michael R. Lyu Markers: Prof. Lai Wan Chan, Prof. Kin Hong.
Fast Algorithms for Association Rule Mining
Finite mixture model of Bounded Semi- Naïve Bayesian Network Classifiers Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory.
Classification and Prediction by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.
Relevance Feedback Content-Based Image Retrieval Using Query Distribution Estimation Based on Maximum Entropy Principle Irwin King and Zhong Jin The Chinese.
Data Mining: A Closer Look
Selective Sampling on Probabilistic Labels Peng Peng, Raymond Chi-Wing Wong CSE, HKUST 1.
Crash Course on Machine Learning
Basic Data Mining Techniques
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
An Exercise in Machine Learning
Abrar Fawaz AlAbed-AlHaq Kent State University October 28, 2011
Rule Generation [Chapter ]
DATA MINING : CLASSIFICATION. Classification : Definition  Classification is a supervised learning.  Uses training sets which has correct answers (class.
Bayesian Networks. Male brain wiring Female brain wiring.
1 Formal Models for Expert Finding on DBLP Bibliography Data Presented by: Hongbo Deng Co-worked with: Irwin King and Michael R. Lyu Department of Computer.
Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Michael Baron + * Department of Computer Science, University of Texas at Dallas + Department of Mathematical.
Bayesian Networks for Data Mining David Heckerman Microsoft Research (Data Mining and Knowledge Discovery 1, (1997))
ICML2004, Banff, Alberta, Canada Learning Larger Margin Machine Locally and Globally Kaizhu Huang Haiqin Yang, Irwin King, Michael.
1 Bayesian Methods. 2 Naïve Bayes New data point to classify: X=(x 1,x 2,…x m ) Strategy: – Calculate P(C i /X) for each class C i. – Select C i for which.
Expert Systems with Applications 34 (2008) 459–468 Multi-level fuzzy mining with multiple minimum supports Yeong-Chyi Lee, Tzung-Pei Hong, Tien-Chin Wang.
Study of Bayesian network classifier Huang Kaizhu Huang Kaizhu Supervisors: Prof. Irwin King Supervisors: Prof. Irwin King Prof. Lyu Rung Tsong Michael.
1 Publishing Naive Bayesian Classifiers: Privacy without Accuracy Loss Author: Barzan Mozafari and Carlo Zaniolo Speaker: Hongwei Tian.
A Passive Approach to Sensor Network Localization Rahul Biswas and Sebastian Thrun International Conference on Intelligent Robots and Systems 2004 Presented.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Lecture notes 9 Bayesian Belief Networks.
Tell Me What You See and I will Show You Where It Is Jia Xu 1 Alexander G. Schwing 2 Raquel Urtasun 2,3 1 University of Wisconsin-Madison 2 University.
Slides for “Data Mining” by I. H. Witten and E. Frank.
Theoretic Frameworks for Data Mining Reporter: Qi Liu.
Dependency Networks for Collaborative Filtering and Data Visualization UAI-2000 발표 : 황규백.
Dimensionality Reduction in Unsupervised Learning of Conditional Gaussian Networks Authors: Pegna, J.M., Lozano, J.A., Larragnaga, P., and Inza, I. In.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
An Interval Classifier for Database Mining Applications Rakes Agrawal, Sakti Ghosh, Tomasz Imielinski, Bala Iyer, Arun Swami Proceedings of the 18 th VLDB.
Chap 6: Association Rules. Rule Rules!  Motivation ~ recent progress in data mining + warehousing have made it possible to collect HUGE amount of data.
CIS750 – Seminar in Advanced Topics in Computer Science Advanced topics in databases – Multimedia Databases V. Megalooikonomou Link mining ( based on slides.
Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.
1 Mining the Smallest Association Rule Set for Predictions Jiuyong Li, Hong Shen, and Rodney Topor Proceedings of the 2001 IEEE International Conference.
BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.
1 Discriminative Frequent Pattern Analysis for Effective Classification Presenter: Han Liang COURSE PRESENTATION:
Dependency Networks for Inference, Collaborative filtering, and Data Visualization Heckerman et al. Microsoft Research J. of Machine Learning Research.
Ontology Engineering and Feature Construction for Predicting Friendship Links in the Live Journal Social Network Author:Vikas Bahirwani 、 Doina Caragea.
Ning Jin, Wei Wang ICDE 2011 LTS: Discriminative Subgraph Mining by Learning from Search History.
The Chinese University of Hong Kong Learning Larger Margin Machine Locally and Globally Dept. of Computer Science and Engineering The Chinese University.
Chapter 7. Classification and Prediction
By Arijit Chatterjee Dr
Consistent and Efficient Reconstruction of Latent Tree Models
Efficient Image Classification on Vertically Decomposed Data
Discriminative Training of Chow-Liu tree Multinet Classifiers
Waikato Environment for Knowledge Analysis
Bayesian Classification
Efficient Image Classification on Vertically Decomposed Data
Zhenjiang Lin, Michael R. Lyu and Irwin King
Discriminative Frequent Pattern Analysis for Effective Classification
An Algorithm for Bayesian Network Construction from Data
Pegna, J.M., Lozano, J.A., and Larragnaga, P.
Decision Trees for Mining Data Streams
Presentation transcript:

Constructing a Large Node Chow-Liu Tree Based on Frequent Itemsets Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory The Chinese University of Hong Kong Shatin, NT. Hong Kong {kzhuang, king, ICONIP2002, November 19, 2002 Orchid Country ClubOrchid Country Club, SingaporeSingapore

Outline  Background Probabilistic Classifiers Chow-Liu Tree  Motivation  Large Node Chow-Liu tree  Experimental Results  Conclusion

A Typical Classification Problem  Given a set of symptoms, one wants to find out whether these symptoms give rise to a particular disease.

 Probabilistic Classifiers The classification function is defined as: Background a constant for a given instance of A 1,A 2,…A n The joint probability is not easily estimated from the dataset; thus the assumption about the distribution has to be made, dependence or independence relationship among variables.

A tree dependence structure Background  Chow-Liu Tree (CLT) Assumption: a dependence tree exists among the variables, given the class variable C.

Background  Chow-Liu Tree Advantages Comparable with some of the state-of-the-art classifiers. The tree structure enables it a resistance to the over-fitting problem and a decomposition characteristic. Disadvantages It cannot model non-tree dependence relationship among attributes or variables. Class Variable

Motivation 1.Fig. (b) can represent the same independence relationship as Fig. (a): Given B and E, there is an independence relationship among A, C, and D. 2.Fig. (b) is still a tree structure, which inherits the advantages of a tree. Class Variable 3.By combining several nodes, a large node tree structure can represent a non-tree structure. This motivates our Large Node Chow-Liu tree approach.

Overview of Large Node Chow-Liu Tree (LNCLT) Step 1. Draft the Chow-Liu tree Draft the CL-tree of the dataset according to the CLT algorithm. Underlying structure Step 1:draft the Chow- Liu tree Step 2. Refine the Chow-Liu tree based on some combination rules Refine the Chow-Liu tree into a large node Chow-Liu tree based on some combination rules Step 2:draft the Chow- Liu tree The same independence relationship

Combination Rules  Bounded cardinality The cardinality of each large node should not greater than a bound “k”.  Frequent Itemsets Each large node should be Frequent itemset.  Father-son or sibling relationship The nodes in a large node should be a father-son or sibling relationship.

Combination Rules (1)  Bounded Cardinality The cardinality of each large node ( the number of nodes in a large node) should not greater than a bound “k”. An example is that: if we set “k” as the number of the attributes or variables, the LNCLT will be a “one large node tree”, which will lose all the merits as a tree. “One node tree” will lose all the merits of the “tree”.

Combination Rules (2)  Frequent Itemsets Food store example: In a food store, if you buy {bread}, it will be highly possible for you to buy {butter}. Thus {bread, butter} is called a frequent itemset. Frequent Itemsets are possible “large nodes”, since the attributes in a Frequent Itemset act just like one “attribute”— they occur with each other frequently at the same time. Frequent Itemset is the set of attributes that occur with each other frequently.

Combination Rules (3) Father-son Combination  Father-son or sibling relationship  Combining Father-son and sibling nodes will increase the data fitness of the tree structure on the datasets (Proved in the paper).  Combining Father-son and sibling nodes will maintain the graphical structure as a “tree structure”. Sibling Combination Combining non-father or non-sibling nodes may result in a non-tree structure

Constructing Large Node Chow-Liu Tree 1.Generate the frequent itemsets Call Apriori[AS94] to generate the frequent itemsets, which have the size less than k. Record all the frequent itemsets together with their frequnecy into list L. 2.Draft the Chow-Liu tree Draft the CL-tree of the dataset according to the CLT algorithm. 3.Combine nodes based on Combining rules Iteratively combine the frequent itemset with maximum frequency, which satisfy the combination conditions: father-son or sibling relationship until L is NULL.

Example:Constructing LNCLT Example: We assume the k is 2, after step 1, we get the frequent itemsets {A, B} {A, C},{B, C}, {B, E}, {B, D}, {D, E}. And f({B, C})>f({A, B})> f({B, E}) >f({B, D})>f({D, E}) (f(*) represents the frequency of frequent itemsets). (b) is the CLT in step2. Example: We assume the k is 2, after step 1, we get the frequent itemsets {A, B} {A, C},{B, C}, {B, E}, {B, D}, {D, E}. And f({B, C})>f({A, B})> f({B, E}) >f({B, D})>f({D, E}) (f(*) represents the frequency of frequent itemsets). (b) is the CLT in step2. 1.{A,C} does not satisfy the combination condition, filter out {A,C} 2.f({B,C}) is the biggest and satisfies combination condition, combine them into (c) 3..Filter the frequent itemsets which have coverage with {B,C}, the {D,E} is left. 4.{D, E } is the frequent itemset and satisfies the combination condition, combine them into (d)

Experimental Setup  Dataset: MNIST-handwritten digit (28*28 gray-level bitmap) database training dataset size: testing dataset size:  Experimental Environments Platform: win2000 Developing tool: Visual C++ 6.0

Experiments  Data fitness Comparison

Experiments  Data fitness Comparison

Experiments  Recognition Rate

Future Work  Evaluate our algorithm extensively in other benchmark datasets.  Examine other combining rules.

Conclusion  A novel Large Node Chow-Liu tree is constructed based on Frequent Itemsets.  LNCLT can partially overcome the disadvantages of CLT, i.e., inability to represent non-tree structures.  We demonstrate that our LNCLT model has a better data fitness and a better prediction accuracy theoretically and experimentally.

Main References  [AS1994] R. Agrawal, R. Srikant, 1994,“Fast algorithms for mining association rules”, Proc. VLDB  [Chow, Liu1968] Chow, C.K. and Liu, C.N. (1968). Approximating discrete probability distributions with dependence trees. IEEE Trans. on Information Theory, 14,(pp )  [Friedman1997] Friedman, N., Geiger, D. and Goldszmidt, M. (1997). Bayesian Network Classifiers. Machine Learning, 29,(pp ).  [Cheng1997] Cheng, J. Bell, D.A. Liu, W. 1997, Learning Belief Networks from Data: An Information Theory Based Approach. In Proceedings of ACM CIKM’97  [Cheng2001] Cheng, J. and Greiner, R. 2001, Learning Bayesian Belief Network Classifiers: Algorithms and System, E.Stroulia and S. Matwin(Eds.): AI 2001, LNAI 2056, (pp ),  Learning Bayesian Belief Network Classifiers: Algorithms and System, E.Stroulia and S. Matwin(Eds.): AI 2001, LNAI 2056, (pp ).

Q & A Thanks.