Download presentation
Presentation is loading. Please wait.
1
Sentence Classifier for Helpdesk Emails Anthony 6 June 2006 Supervisors: Dr. Yuval Marom Dr. David Albrecht
2
Outline of Topics Introduction Domain Approach Feature Selection Context-based Sentence Classification Experiments and Results Question & Answer
3
Introduction Increased usage of emails: communication, organizing workflow, managing tasks A need to process these emails to better organize them Existing tasks: Email Summarization Email Classification Spam filtering Use of words-only
4
Sentence Types Can they be useful for existing tasks? Depend on the domain Examples Invitation Instruction Suggestion Complaint
5
Domain Domain: Email helpdesk Specifically, email responses (more structured) Availability of data Analyze the efficiency of email responses
6
Sample Email This is with reference to your email regarding `Service Pack 1’ Download and install the latest service packs from the link provided Compaq insight Manager 7 http://www.hp.com/ Please email us in case of any queries
7
Thesis Aim Develop a sentence classifier Focus: Investigate several feature selection methods Investigate the use of context in sentence classification
8
Motivation Applications Identify informative sentences to summarize biography Classify sentences in online product reviews Use important sentences to help classify documents Use sentences in the email to determine sender’s intention
9
Approach Determine the sentence types Create training set Classification Methods: Naïve Bayes Decision Trees SVM
10
Class Proportion Class% Statement28.5 Thanking15.3 Request9.8 Salutation8.7 Instruction8.5 Instruction-item6.3 URL5.4 Class% Response-ack4.2 Suggestion3.7 Specification2.8 Signature2.2 Apology1.5 Questions1.5 Others1.5
11
Feature Set Sentences need to be transformed into an appropriate representation for most classification algorithms Common features: Bag-of-Words (“version”, “latest”, “software”, “install”) Bigram, Trigram (“thank you”, “we are sorry”)
12
Feature Selection Purpose High feature space: (tens of) thousands of features A need to reduce the feature space for Computational efficiency Remove redundant features (possibly) Improve classification accuracy
13
Feature Selection Methods Feature selection methods: Stop-words removal (“of”, “the”, “a”) Lemmatization Sentence Frequency Information Gain Chi Square
14
Context-based Sentence Classification Classify a sequence of sentences extracted from an email Context of a sentence refers to its surrounding sentences... Set serial speed at least 38.4K. Issue AT^H carriage return. Begin your ASCII file upload. …
15
Context-based Sentence Classification Assume given the class of previous sentence Find the upper bound of improvement... Set the speed at least 38.4 Issue AT^H carriage return Begin your ASCII file upload …
16
Evaluation Metrics (1) Evaluation for each category Evaluation for average of all categories Common metrics Precision Recall F 1 -measure
17
Evaluation Metrics (2) P A Precision = Recall = F 1 -measure =
18
Experiment and Results Classifier F1-measure Without Feature Selection With Feature Selection Naïve Bayes0.6660.814 Decision Trees0.8290.844 SVM0.8830.888
19
Effect of Feature Selection
20
Class-by-Class Analysis
21
Effect of Context Classifier F 1 -measure Without ContextWith Context Naïve Bayes 0.8140.844 Decision Trees 0.8440.846 SVM 0.8880.864
22
Analysis on Context ClassifierCorrectionsMisclassificationsDifference Naïve Bayes451134 Decision Trees862 SVM251114... Set the speed at least 38.4 Issue AT^H carriage return. Begin your ASCII file upload. …
23
Conclusion Feature selection methods have positive effect SVM > Decision Trees > Naïve Bayes Context shows minor improvement Need more data
24
Future Work Parse Trees Consider the structure of the sentence Viterbi Algorithm Find the best sequence of classes to map to the sequence of sentences Forward-backward Algorithm Include next sentences as the context to predict current sentence
25
Question and Answer
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.