Download presentation
Presentation is loading. Please wait.
Published byLindsey Strickland Modified over 9 years ago
1
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Mining Positive and Negative Patterns for Relevance Feature Discovery Presenter : Cheng-Hui Chen Author : Yuefeng Li, Abdulmohsen Algarni, Ning Zhong KDD 2010
2
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 2 Outlines Motivation Objectives Methodology Experiments Conclusions Comments
3
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Motivation Over the years, people have often held the hypothesis that pattern-based methods should perform better than term-based ones in describing user preferences, but many experiments do not support this hypothesis. Many text mining only consider term’s distributions. 3
4
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Objectives The innovative technique presented in paper makes a breakthrough for this difficulty. To purpose consider both term’s distributions and their specificities when we use them for text mining and classification. 4
5
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology 5 Frequency weight Specificity Weight Specificity Weight New weight
6
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Definitions Frequent pattern ─ Absolute support: ─ Relative support : ─ A termset X is called, if sup a (or sup r ) >= min_sup Closed pattern ─ ─ Cls (X) = termset (coverset (X)) ─ A termset X is called, if and only if X = Cls (X) ─, for all pattern X 1 X Closed sequential pattern 6
7
Intelligent Database Systems Lab N.Y.U.S.T. I. M. The deploying method To improve the efficiency of the pattern taxonomy mining (PTM), an algorithm, SPMining(D+; min_sup). ─ For a given term t, its support (or called weight) in discovered patterns can be described as follow: ─ the following rank will be assigned to every incoming document d to decide its relevance. 7
8
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Mining Algorithms 8
9
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Specificity of low-level features We define the specificity of a given term t in the training set D = D+ ∪ D- as follows: ─ 9
10
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Revision of discovered features Revision of discovered Features ─ 10
11
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Revision Algorithms 11
12
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments Data ─ This research uses Reuters Corpus Volume1 (RCV1) and the 50 assessor topics to evaluate the proposed model. Compare ─ The up-to date pattern mining ─ The well-known term-based method 12
13
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments The well-known term-based methods ─ The Rocchio model ─ BM25 ─ SVM 13
14
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments 14
15
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 15 Experiments
16
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments 16
17
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments 17
18
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments 18
19
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Conclusions Compared with the state-of-the-art models, the experiments on RCV1 and TREC topics demonstrate that the effectiveness of relevance feature discovery can be significantly improved by the proposed approach. This paper recommends to classify low-level terms into three categories in order to largely improve the performance of the revision. 19
20
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Comments Advantages ─ The effectiveness of relevance feature discovery can be significantly improved by the proposed approach. Drawback ─ … Applications ─ Text mining ─ Classification 20
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.