Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang

Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang
Using Discretization and Bayesian Inference Network Learning for Automatic Filtering Profile Generation Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang

Contents Introduction Overview of the approach
Automatic document pre-processing Feature selection Feature discretization Learning Bayesian networks Experiments and results Conclusions and future work

Information Filtering

The Filtering Profile Information filtering system deals with users who have a relatively stable and long-term information need. An information need is usually represented by a filtering profile.

Construction of the Filtering Profile
Collect training data through the interactions with users. Ex) gathering user feedback information about the relevance judgments for a certain information need or topic. Analyze this kind of training data and construct the filtering profile by machine learning techniques. Use this filtering profile to determine the relevance of a new document.

The Uncertainty Issue It is difficult to specify absolutely whether a document is relevant to a topic as it may only partially match with the topic. Ex) “the economic policy of government” The probabilistic approach is appropriate for this kind of task.

An Overview of the Approach
- For each topic Transformation of each document into an internal form Feature selection Discretization of the feature value Gathering training data by interactions with users Bayesian network learning

Document Representation
All stop words are eliminated. Ex) “the”, “are”, “and”, etc. Stemming of the remaining words. Ex) “looks”  “look”, “looking”  “look”, etc. A document is represented by a vector form. Each element in the vector is either the word frequency or the word weight. The word weight is calculated as follows: where N is the total number of documents and ni is the number of documents that contains the term i.

Word Frequency Representation of a Document
Term id Term Frequency 21 gover 3 17 annouc 1 98 take 34 student 4 …

Feature Selection Expected mutual information measure is given as
where Wi is a feature and Cj denotes the fact that the document is relevant to topic j. Mutual information measures the information contained in the term Wi about topic j. A document is represented as follows:

Discretization Scheme
The goal of discretization is to find a mapping m such that the feature value is represented by a discrete value. The mapping is characterized by a series of threshold levels (0, w1, …, wk) where 0 < w1 < w2 < … < wk. The mapping m has the following property: where q is the feature value.

Predefine Level Discretization
One determine the discretization level k and the threshold values. Ex) Integers between 0 and 15 are discretized into three levels by the threshold values 5.5 and 10.5.

Lloyd’s Algorithm Consider the distribution of feature values.
Step 1: determine the discretization level k. Step 2: select the initial threshold levels (y1, y2, …, yk - 1). Step 3: repeat the following steps for all i. Calculate the mean feature value i of ith region. Generate all possible threshold levels between i and i+1. Select the threshold level which minimizes the following distortion measure. Step 4: If the distortion measure of this new set of threshold levels is less than that of the old set, then go to Step 3.

Relevance Dependence Discretization (1/3)
Consider the dependency between the feature and the relevance of the topic. The relevance information entropy is given as where S is the group of feature values.

The partition entropy of the region induced by w is defined as where S1 is the subset of S with feature values smaller than w and S2 is S – S1. The more homogeneous of the region, the smaller is the partition entropy. The partition entropy controls the recursive partition algorithm.

A criterion for recursive partition algorithm is as follows: where (m; S) is defined as where k number of relevance classes in the partition S; k1 number of relevance classes in the partition S1; k2 number of relevance classes in the partition S2.

Bayesian Inference for Document Classification
The probability of Cj given the document by Bayes’ Theorem is as follows:

Background of Bayesian Networks
The process of inference is to use the evidence of some of the nodes that have observations to find the probability of some of the other nodes in the network.

Learning Bayesian Networks
Parametric learning The conditional probability for each node is estimated from the training data. Structural learning Best-first search MDL score A classification-based network simplifies the structural learning process.

MDL Score for Bayesian Networks
The MDL (Minimum Description Length) score for a Bayesian network B is defined as where X is a node in the network. The score for each node is calculated as

Complexity of the Network Structure
Lnetwork is the network description length and corresponds to the topological complexity of a network and computed as follows: where N is the number of training documents, sj is the number of possible states the variable Tji can take.

Accuracy of the Network Structure
The data description length is given by the following formula: where M() is the number of cases that match a particular instantiation in the training data. The more accurate the network, the shorter is this length.

The Process of Information Filtering based on Bayesian Network Learning
Gather the training documents. For all training documents, determine the relevance to each topic. Feature selection for each topic. 5 and 10 features were used in the experiments. Discretization of the feature values. Learn a Bayesian network for each topic. Set the probability threshold value for the relevance decision. Each Bayesian network corresponds to the filtering profile.

Document Collections Reuters 21 578
29 topics. In chronological order, first documents were chosen as the training set and the other documents were used as test set. FBIS (Foreign Broadcast Information Service) 38 topics used in TREC (Text REtrieval Conferences). In chronological order, documents were chosen as the training set and the other documents were used as test set.

Evaluation Metrics for Information Retrieval
True Relevant Non-relevant Algorithm n1 n2 n3 n4

Filtering Performance of the Bayesian Network on the Reuters Collection

Comparison of the Bayesian Network Approach and the Naïve Bayesian Approach

Filtering Performance of the Bayesian Network on the FBIS Collection

Comparison of the Bayesian Network Approach and the Naïve Bayesian Approach

Conclusions and Future Work
Discretization methods. Structural learning. Large data Better performance over naïve Bayesian approach.

Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang

Similar presentations

Presentation on theme: "Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang

Similar presentations

Presentation on theme: "Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang"— Presentation transcript:

Similar presentations

About project

Feedback