A Novel Approach to Event Duration Prediction Pranav Khaitan Divye Raj Khilnani Ye Jin
Introduction Liverpool will be playing inter-Milan this Friday. Predicting event duration has been a challenging problem and can solve some major challenges being faced in question answering systems. Examples: Liverpool will be playing inter-Milan this Friday. The United States has been fighting a cold war with the Soviet Union. duration of the match is in hours duration of the war was in decades
Duration is Non-trivial Same event can have different bounds in different contexts. James watched a movie. Hour James watched the birds fly. Minute More Features Subject Object Grammatical Part of Speech Tense Modality Context Class Hypernym Aspect
System Design Feature Extraction Feature Selection Learning and Classification Evaluation Parse Tree Web Count Hypernym Named Entity Recognition X2 score MI score Emperical observation Supervised learning: Naïve Bayes Logistic Regression Maximum Entropy Unsupervised Learning Agglomerative Clustering Multinomial clustering Precision Recall F1 Kappa Approximate Agreement
Feature Analysis Subject-object Base verb lemmatization Tense Jonathan is watching a movie vs Jonathan is watching an advertisement Base verb lemmatization eating, ate, has eaten, will be eating Tense Jonathan will play football in the evening vs Jonathan has been playing football for the past ten years Sentential Dependencies He read the report quickly vs He read the report slowly Part of speech tagging The government’s move was anticipated Named Entity Recognition The body will define the role of the United Nations
Feature Analysis Hypernyms Contextual Features Web Counts Generic Features: Modality, Aspect, Class Report Feature
Results
Results
Feature Selection Total extracted features: 10,000+. Need to scale down. MI score for features drops quickly Effectiveness of feature selection
Unsupervised Clustering
Conclusion Significant gain in event duration prediction accuracy using supervised learning Unsupervised learning results look promising and gives opportunity to do duration prediction across domains with little annotated data Important to automatically select features and reduce human involvement Classification Task Our Results Feng Pan et al Human Agreement Coarse Grain 75.16% 70.3% 87.7% Fine Grain 63.69% 65.8% 79.8%