Download presentation
Presentation is loading. Please wait.
1
Text Classification Using Stochastic Keyword Generation Cong Li, Ji-Rong Wen and Hang Li Microsoft Research Asia August 22nd, 2003
2
Outline Introduction Text Classification Using Stochastic Keyword Generation Experimental Results Conclusion and Future Work Introduction Text Classification Using Stochastic Keyword Generation Experimental Results Conclusion and Future Work
3
Introduction Supervised Text Classification Question: how to use additional data in training to improve the performance? New Text Classification Problem Summaries of texts are available in training, which are more indicative of contents Note: Summaries are not available in classification Example: classification at a help desk
4
Example Email When getting emails I get a notice that an email has been received but when I try to view the message it is blank. I have also tried to run the repair program off the install disk but that it did not take care of the problem. Categories Empty Outlook Message Cannot Open Word File Summary receive emails; some emails have no subject and message body
5
Outline Introduction Text Classification Using Stochastic Keyword Generation Experimental Results Conclusion and Future Work
6
New Text Classification Problem Spaces Users ’ emails: space X Categories: space Y Engineers ’ summaries (for training): space S Assumption Summaries are much easier to be classified ConventionalNew Classification Training Data Test Data
7
Text Classification Using SKG Conventional Text ClassificationText Classification Using SKG email: x X When getting emails I get a notice that an email has been received but when I try to view the message it is blank. I have also tried to run the repair program off the install disk but that it did not take care of the problem. category: y Y Empty Outlook Message classification email: x X When getting emails I get a notice that an email has been received but when I try to view the message it is blank. I have also tried to run the repair program off the install disk but that it did not take care of the problem. category: y Y Empty Outlook Message classification probability vector: (x) (email 0.75, receive 0.68, subject 0.45, body 045, … ) SKG
8
Stochastic Keyword Generation Generating Keywords from a Given Text Stochastic Keyword Generation (SKG) Generate keywords and their conditional probabilities of occurrence given the text Example When getting emails I get a notice that an email has been received but when I try to view the message it is blank. I have also tried to run the repair program off the install disk but that it did not take care of the problem. Stochastic Keyword Generation emails 0.75 receive 0.68 subject 0.45 body 0.45
9
SKG Model new text x
10
Model for Each Keyword new text x
11
Learning Using SKG SKG classification
12
Outline Introduction Text Classification Using Stochastic Keyword Generation Experimental Results Conclusion and Future Work
13
Data in Experiments Data of the Help Desk of Microsoft 2517 texts from 52 categories About 10000 unique words in texts About 1500 unique words in summaries Conducted stopword removal, but not stemming Training/Test Split 5-fold cross validation
14
Experimental Settings Classifiers Linear SVM (Platt 1998; Dumais et al. 1998) Perceptron algorithm with margins (PAM) (Li et al. 2002) Methods Text classification using SKG Methods for comparison: Prior Texts for training Summaries for training (text+summary)s for training Deterministic keyword generation (DKG)
15
Experimental Results Method Top 1 Accuracy (%) Top 3 Accuracy (%) Prior34.147.8 Text (PAM)58.773.7 Sum (PAM)50.069.1 Text+Sum (PAM)56.270.4 SKG (PAM)63.678.9 Text (SVM)57.376.7 Sum (SVM)56.771.2 Text+Sum (SVM)47.473.7 SKG (SVM)61.581.5
16
SKG versus DKG Method Top 1 Accuracy (%) Top 3 Accuracy (%) DKG (PAM)59.873.9 SKG (PAM)63.678.9 DKG (SVM)57.676.9 SKG (SVM)61.581.5
17
Discussion email: x X When getting emails I get a notice that an email has been received but when I try to view the message it is blank. I have also tried to run the repair program off the install disk but that it did not take care of the problem. category: y Y Empty Outlook Message classification probability vector: (x) (email 0.75, receive 0.68, subject 0.45, body 045, … ) SKG summary: x X receive emails; some emails have no subject and message body
18
Outline Introduction Text Classification Using Stochastic Keyword Generation Experimental Results Conclusion and Future Work
19
Conclusion Text classification using SKG significantly outperforms the methods without using it Future Work Theoretical analysis of the problem and the proposed method Applied in different settings
20
Thank You
21
Supervised Text Classification Classifiers Naïve Bayesian classifiers Symbolic rules Margin classifiers SVM, boosting, perceptron algorithm with margins Feature Selection and Transformation Feature selection Information gain, mutual information, etc. Feature Transformation Multi-layer NN, kernel tricks, etc.
22
Mapping between Spaces Statistical Machine Translation Probabilistic models between two languages Information Retrieval Mapping between query terms and document terms Statistics-based Summarization Probabilistic models between spaces of documents and abstracts
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.