Download presentation
Presentation is loading. Please wait.
1
Deliverable #2: Question Classification Group 5 Caleb Barr Maria Alexandropoulou
2
Software used JAVA in order to perform feature extraction Illinois Chunker was applied to extract chunks Python – Automating classification tasks – Preprocessing of data when necessary Mallet was used for the classification task
3
System Properties Classification Algorithms – MaxEnt – NaiveBayes Training data – Sum of: Li and Roth Training set 5 (5500 questions) TREC-2004 Test data – Li and Roth test data set – TREC-2005.xml
4
System Properties (cont.) Features extracted Focused on syntactic features since we targeted coarse classification (i.e. conclusion in Li and Roth) – Unigrams – Bigrams – Trigrams – Chunks with POS tags e.g. [NP (DT) (JJ) (NN)] – Head NP/VP chunks as in Li and Roth e.g. [NP (DT the) (JJS oldest) ] in “What is the oldest profession ? “
5
Runs performed Runs were performed for all combinations of classification algorithms and feature templates e.g. MaxEnt, Unigrams NaiveBayes, Unigrams, Bigrams, Chunks etc
6
Charts
8
Conclusions Maximum test accuracy – TREC10: 0.892 UnigramsBigramsHeads Maxent – TREC2005: 0.81758 UnigramsBigramsHeads NaiveBayes (MaxEnt was very close) Trigrams affect accuracy negatively – bad feature
9
Sample confusion matrix for our best accuracy TREC_10_MaxEnt_UnigramBigramHeads: label012345total 0 DESC1362----138 1 ENTY1276-2-494 2 ABBR2-7---9 3 HUM14-59-165 4 NUM93--983113 5 LOC56---7081
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.