Download presentation
Presentation is loading. Please wait.
Published byAdele Hancock Modified over 9 years ago
1
META-LEARNING FOR AUTOMATIC SELECTION OF ALGORITHMS FOR TEXT CLASSIFICATION Karol Furdík, Ján Paralič, Gabriel Tutoky Karol.Furdik@intersoft.sk, {Jan.Paralic, Gabriel.Tutoky}@ tuke.sk Technical University of Košice, Slovakia September 24-26, 2008 University of Zagreb, Varaždin, Croatia
2
2/22 Introduction Text classification Method for knowledge extraction from textual documents Originally, the classification was designed as a semi-automatic procedure, where the users were responsible for selection of proper classification settings In the most of applications (e.g. in KP-Lab project (http://www.kp-lab.org) ) is requirement for fully automated text classification Meta-Learning Allows to automatize text classification process by automatic selection of the proper algorithms K. Furdik, J. Paralič, G. Tutoky: Meta-learning for automatic selection of algorithms for text classification CECIIS 2008, University of Zagreb, Varaždin, Croatia, September 24-26, 2008
3
Theoretical analyses
4
4/22 Text classification – two steps process K. Furdik, J. Paralič, G. Tutoky: Meta-learning for automatic selection of algorithms for text classification CECIIS 2008, University of Zagreb, Varaždin, Croatia, September 24-26, 2008 Creation of the classifier Training set of documents Preprocessing of documents Learning of Classifier Classifier Usage of the classifier Document of unknown category Classifier application Categorized document Preprocessing of current document
5
5/22 Meta-learning, MUDOF algorithm MUDOF – Meta-learning Using Document Feature Characteristics Introduced in 2002 by Wai and Kwok-Yin Meta-learning targets: Selection of algorithms for classifiers Selection of algorithms is on category level (for each category is possible to select other algorithm) Automatize and optimalize the classifiers creation process K. Furdik, J. Paralič, G. Tutoky: Meta-learning for automatic selection of algorithms for text classification CECIIS 2008, University of Zagreb, Varaždin, Croatia, September 24-26, 2008
6
6/22 Meta-learning – scheme (1/4) K. Furdik, J. Paralič, G. Tutoky: Meta-learning for automatic selection of algorithms for text classification CECIIS 2008, University of Zagreb, Varaždin, Croatia, September 24-26, 2008 Construction of the meta-model Training set for creation of the meta-model (TM) Training set for creation of the meta-model (TM) Values of effectiveness Testing set of documents (TE) Usage of the meta-model Meta-model Classifier
7
7/22 Values of effectiveness The A 1,... A n algorithms are “one by one” applicated on C 1,... C m categories from training set The n x m binary classifiers are created Evaluation of binary classifiers by testing data collection Efficiency of each algorithm on each category is obtained The most computational step in the meta-learning K. Furdik, J. Paralič, G. Tutoky: Meta-learning for automatic selection of algorithms for text classification CECIIS 2008, University of Zagreb, Varaždin, Croatia, September 24-26, 2008
8
8/22 Meta-learning – scheme (2/4) K. Furdik, J. Paralič, G. Tutoky: Meta-learning for automatic selection of algorithms for text classification CECIIS 2008, University of Zagreb, Varaždin, Croatia, September 24-26, 2008 Construction of the meta-model Training set for creation of the meta-model (TM) Training set for creation of the meta-model (TM) Values of effectiveness Testing set of documents (TE) Feature characteristics of particular categories Feature characteristics of particular categories Usage of the meta-model
9
9/22 Feature characteristics The categories are characterized by statistical view Examples of characteristics: PosTr – ratio of positive and negative instances AvgDocLen – average document length AvgTermVal – average term weight AvgTopInfoGain – average info gain of best m terms NumInfoGainThres – numbers of terms over threshold value of info gain K. Furdik, J. Paralič, G. Tutoky: Meta-learning for automatic selection of algorithms for text classification CECIIS 2008, University of Zagreb, Varaždin, Croatia, September 24-26, 2008
10
10/22 Meta-learning – scheme (3/4) K. Furdik, J. Paralič, G. Tutoky: Meta-learning for automatic selection of algorithms for text classification CECIIS 2008, University of Zagreb, Varaždin, Croatia, September 24-26, 2008 Construction of the meta-model Training set for creation of the meta-model (TM) Training set for creation of the meta-model (TM) Values of effectiveness Testing set of documents (TE) Feature characteristics of particular categories Feature characteristics of particular categories Usage of the meta-model Meta-model
11
11/22 Meta-model Modeling relations between feature characteristics of categories and efficiency of algorithms Meta-model can be: Prediction (MUDOF_R) – linear regression Classification (MUDOF_K) – k-NN Meta-model advantages: “Engine” for selection of proper algorithms Possible to use it for more than one collection of documents In the ideal case, it is sufficient to learn a meta-model only once and then it can be used for selection of algorithms K. Furdik, J. Paralič, G. Tutoky: Meta-learning for automatic selection of algorithms for text classification CECIIS 2008, University of Zagreb, Varaždin, Croatia, September 24-26, 2008
12
12/22 Meta-learning – scheme (4/4) K. Furdik, J. Paralič, G. Tutoky: Meta-learning for automatic selection of algorithms for text classification CECIIS 2008, University of Zagreb, Varaždin, Croatia, September 24-26, 2008 Construction of the meta-model Training set for creation of the meta-model (TM) Training set for creation of the meta-model (TM) Values of effectiveness Testing set of documents (TE) Feature characteristics of particular categories Feature characteristics of particular categories Usage of the meta-model Training set for creation of the classifier (TC) Training set for creation of the classifier (TC) Feature characteristics of particular categories Feature characteristics of particular categories Classifier Selection of algorithms for particular categories Meta-model Learning of classifiers
13
Experiments
14
14/22 Data description Reuters-21578 10 788 documents; 90 categories TM (3815); TC (3961); TE (3019) Not balanced data 20 Newsgroups 19 997 documents; 20 categories TC (10 025); TE (9972) Well balanced data K. Furdik, J. Paralič, G. Tutoky: Meta-learning for automatic selection of algorithms for text classification CECIIS 2008, University of Zagreb, Varaždin, Croatia, September 24-26, 2008 No. of categories Approx. no. of positive instances 2 14 44 30 up to 1500 100 – 550 10 – 100 <10 No. of categories Approx. no. of positive instances 201000
15
15/22 Experiment 1 (1/3) Testing of the meta-learning approach on single data set (Reuters text collection) Assumes – training set is divided on: Training set for creation of the meta-model (TM) Training set for creation of the classifier (TC) Target: Increase of effectiveness of the final classifier in comparison with the base classifiers K. Furdik, J. Paralič, G. Tutoky: Meta-learning for automatic selection of algorithms for text classification CECIIS 2008, University of Zagreb, Varaždin, Croatia, September 24-26, 2008
16
16/22 Experiment 1 (2/3) Classifier effectiveness – with F1 optimized measure K. Furdik, J. Paralič, G. Tutoky: Meta-learning for automatic selection of algorithms for text classification CECIIS 2008, University of Zagreb, Varaždin, Croatia, September 24-26, 2008
17
17/22 Experiment 1 (3/3) Selection of algorithms – over AVERAGE K. Furdik, J. Paralič, G. Tutoky: Meta-learning for automatic selection of algorithms for text classification CECIIS 2008, University of Zagreb, Varaždin, Croatia, September 24-26, 2008
18
18/22 Experiment 2 (1/3) Test the usability of the meta-learning approach on two different sets of documents (Reuters & 20 Newsgroups) Assumes: Training set of one data collection is used for creation of the meta-model Training set of other data collection is used for creation of the classifier Targets: Full automatically selection of algorithms without re-learning of meta- model (meta-model learned on other data collection is used) Better effectiveness of classifier K. Furdik, J. Paralič, G. Tutoky: Meta-learning for automatic selection of algorithms for text classification CECIIS 2008, University of Zagreb, Varaždin, Croatia, September 24-26, 2008
19
19/22 Experiment 2 (2/3) Classifier effectiveness – with F1 optimized measure K. Furdik, J. Paralič, G. Tutoky: Meta-learning for automatic selection of algorithms for text classification CECIIS 2008, University of Zagreb, Varaždin, Croatia, September 24-26, 2008
20
20/22 Experiment 2 (3/3) Selection of algorithms – over AVERAGE K. Furdik, J. Paralič, G. Tutoky: Meta-learning for automatic selection of algorithms for text classification CECIIS 2008, University of Zagreb, Varaždin, Croatia, September 24-26, 2008
21
21/22 Conclusion Advantages of meta-learning Full automated text categorization – selection of algorithms is automatic Increasing of effectiveness of the final classifier (on one data collection) Usability of one meta-model for various data collection Disadvantages of meta-learning Is needed big computing and time capacity K. Furdik, J. Paralič, G. Tutoky: Meta-learning for automatic selection of algorithms for text classification CECIIS 2008, University of Zagreb, Varaždin, Croatia, September 24-26, 2008
22
Thank you for your attention
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.