Download presentation
Presentation is loading. Please wait.
1
Q/A System First Stage: Classification Project by: Abdullah Alotayq, Dong Wang, Ed Pham
2
Query Processing Classification Package: Mallet Classifiers: Maxent, DecisionTree, C45, NaiveBayes, AdaBoost, Winnow, Balanced Winnow, Bagging Trainer.etc
3
Main Techniques
4
Features Semantic Morphological Neighboring (Syntactic)
5
Stemming nltk stemmer
6
N-grams Bigrams:
7
Trigrams: – Poor Classification results 0.48 0.478 Not A good strategy.
8
NER (Named Entity Recognition) nltk NER pre-trained model to do this task. 6 types of NE
9
Frequencies TypeFreq. GSP22 FACILITY3 GPE1203 PERSON600 LOCATION21 ORGANIZATION622 Training Data:
10
Test Data: TypeFreq. GSP2 FACILITY0 GPE90 PERSON35 LOCATION3 ORGANIZATION42
11
NO Named Entity detected In training data: 3533, namely 64.8% In test data, 353, 70.6%. -> data sparseness problem
12
NER Results & Future work Test data accuracy= 0.802 we might try other NE tools, which would give more NE types and cover more percentage on training and test data.
13
Binary and Real Values Testing for potential improvement. Best performing classifiers: For Binary: – BalancedWinnow: Test data accuracy= 0.804 – MaxEnt: Test accuracy mean = 0.78 For Real Values: -BalancedWinnow: Test data accuracy= 0.784 -MaxEnt: Test data accuracy= 0.758
14
Data set1: TypeTrainerResults Binary BalancedWinnow0.804 DecisionTree0.68 MaxEnt0.756 NaiveBayes0.546 Real Values BalancedWinnow0.784 DecisionTree0.42 MaxEnt0.758 NaiveBayes0.54 NER Binary BalancedWinnow0.802 DecisionTree0.5 MaxEnt0.772 NaiveBayes0.54 NER Real Values DecisionTree0.48 MaxEnt0.768 NaiveBayes0.538 Bigrams Binary MaxEnt0.702 NaiveBayes0.624 BalancedWinnow0.76 Bigrams Real Values MaxEnt0.698 NaiveBayes0.624 BalancedWinnow0.76 Trigrams Binary NaiveBayes0.4 BalancedWinnow0.478 Trigrams Real Values NaiveBayes0.4 BalancedWinnow0.478
15
Data set2: TypeTrainerResults Binary BalancedWinnow0.74 MaxEnt0.74 NaiveBayes0.72 Real Values BalancedWinnow0.784 MaxEnt0.75 NaiveBayes0.71 Stemmed Binary BalancedWinnow0.78 MaxEnt0.76 NaiveBayes0.76 Stemmed Real Values BalancedWinnow0.75 MaxEnt0.77
16
Proposed future improvement WordNet Senses Class-Specific Related Words
17
Issues Performing poorly on some refinements. – Low accuracy scores: 0.42 0.54 Memory consuming classifiers. – Classifiers showed some error messages.
18
Successes Made progress in creating the system. Had some hands-on experience dealing with classifiers, and NLP packages. Learned ways to improve classification results.
19
Readings that helped Employing Two Question Answering Systems in TREC-2005, Sanda Harabagiu & others.
20
Software packages participated Mallet NLTK Porter-stemmer Self-written code files Stanford Parser, Berkeley Parser
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.