Deliverable #2: Question Classification Group 5 Caleb Barr Maria Alexandropoulou.

Slides:



Advertisements
Similar presentations
Three Basic Problems Compute the probability of a text: P m (W 1,N ) Compute maximum probability tag sequence: arg max T 1,N P m (T 1,N | W 1,N ) Compute.
Advertisements

Feature Forest Models for Syntactic Parsing Yusuke Miyao University of Tokyo.
Three Basic Problems 1.Compute the probability of a text (observation) language modeling – evaluate alternative texts and models P m (W 1,N ) 2.Compute.
Ling 570 Day 6: HMM POS Taggers 1. Overview Open Questions HMM POS Tagging Review Viterbi algorithm Training and Smoothing HMM Implementation Details.
Group 3 Chad Mills Esad Suskic Wee Teck Tan. Outline  System and Data  Document Retrieval  Passage Retrieval  Results  Conclusion.
Tweet Classification for Political Sentiment Analysis Micol Marchetti-Bowick.
Distant Supervision for Emotion Classification in Twitter posts 1/17.
Sequence Classification: Chunking Shallow Processing Techniques for NLP Ling570 November 28, 2011.
Great Food, Lousy Service Topic Modeling for Sentiment Analysis in Sparse Reviews Robin Melnick Dan Preston
Chapter 8 – Logistic Regression
Part-Of-Speech Tagging and Chunking using CRF & TBL
Named Entity Classification Chioma Osondu & Wei Wei.
Probabilistic Detection of Context-Sensitive Spelling Errors Johnny Bigert Royal Institute of Technology, Sweden
Learning with Probabilistic Features for Improved Pipeline Models Razvan C. Bunescu Electrical Engineering and Computer Science Ohio University Athens,
Shallow Parsing CS 4705 Julia Hirschberg 1. Shallow or Partial Parsing Sometimes we don’t need a complete parse tree –Information extraction –Question.
POS Tagging & Chunking Sambhav Jain LTRC, IIIT Hyderabad.
Applications of Sequence Learning CMPT 825 Mashaal A. Memon
Re-ranking for NP-Chunking: Maximum-Entropy Framework By: Mona Vajihollahi.
Mallet & MaxEnt POS Tagging Shallow Processing Techniques for NLP Ling570 November 16, 2011.
Question Classification (cont’d) Ling573 NLP Systems & Applications April 12, 2011.
Q/A System First Stage: Classification Project by: Abdullah Alotayq, Dong Wang, Ed Pham.
Stock Volatility Prediction using Earnings Calls Transcripts and their Summaries Naveed Ahmad Aram Zinzalian.
Part-of-speech Tagging cs224n Final project Spring, 2008 Tim Lai.
Ch 10 Part-of-Speech Tagging Edited from: L. Venkata Subramaniam February 28, 2002.
1 CSC 594 Topics in AI – Applied Natural Language Processing Fall 2009/ Shallow Parsing.
1 Combining Lexical and Syntactic Features for Supervised Word Sense Disambiguation Saif Mohammad Ted Pedersen University of Toronto University of Minnesota.
1/1/ Question Classification in English-Chinese Cross-Language Question Answering: An Integrated Genetic Algorithm and Machine Learning Approach Min-Yuh.
Part of speech (POS) tagging
1 Complementarity of Lexical and Simple Syntactic Features: The SyntaLex Approach to S ENSEVAL -3 Saif Mohammad Ted Pedersen University of Toronto, Toronto.
A Memory-Based Approach to Semantic Role Labeling Beata Kouchnir Tübingen University 05/07/04.
Course Summary LING 572 Fei Xia 03/06/07. Outline Problem description General approach ML algorithms Important concepts Assignments What’s next?
Sequence labeling and beam search LING 572 Fei Xia 2/15/07.
Handwritten Character Recognition using Hidden Markov Models Quantifying the marginal benefit of exploiting correlations between adjacent characters and.
STRUCTURED PERCEPTRON Alice Lai and Shi Zhi. Presentation Outline Introduction to Structured Perceptron ILP-CRF Model Averaged Perceptron Latent Variable.
MAchine Learning for LanguagE Toolkit
QA SYSTEM Maria Alexandropoulou Max Kaufmann Alena Hrynkevich.
An Evaluation of A Commercial Data Mining Suite Oracle Data Mining Presented by Emily Davis Supervisor: John Ebden.
1 A study on automatically extracted keywords in text categorization Authors:Anette Hulth and Be´ata B. Megyesi From:ACL 2006 Reporter: 陳永祥 Date:2007/10/16.
Authors: Ting Wang, Yaoyong Li, Kalina Bontcheva, Hamish Cunningham, Ji Wang Presented by: Khalifeh Al-Jadda Automatic Extraction of Hierarchical Relations.
Comparative study of various Machine Learning methods For Telugu Part of Speech tagging -By Avinesh.PVS, Sudheer, Karthik IIIT - Hyderabad.
Ling 570 Day 17: Named Entity Recognition Chunking.
HW7 Extracting Arguments for % Ang Sun March 25, 2012.
A Language Independent Method for Question Classification COLING 2004.
AQUAINT Workshop – June 2003 Improved Semantic Role Parsing Kadri Hacioglu, Sameer Pradhan, Valerie Krugler, Steven Bethard, Ashley Thornton, Wayne Ward,
Improving Classification Accuracy Using Automatically Extracted Training Data Ariel Fuxman A. Kannan, A. Goldberg, R. Agrawal, P. Tsaparas, J. Shafer Search.
Mebi 591D – BHI Kaggle Class Baselines kaggleclass.weebly.com/
A Systematic Exploration of the Feature Space for Relation Extraction Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois,
Enhanced Answer Type Inference from Questions using Sequential Models Vijay Krishnan Sujatha Das Soumen Chakrabarti IIT Bombay.
TEXT ANALYTICS - LABS Maha Althobaiti Udo Kruschwitz Massimo Poesio.
LING 573 Deliverable 3 Jonggun Park Haotian He Maria Antoniak Ron Lockwood.
Emotions from text: machine learning for text-based emotion prediction Cecilia Alm, Dan Roth, Richard Sproat UIUC, Illinois HLT/EMPNLP 2005.
Relational Duality: Unsupervised Extraction of Semantic Relations between Entities on the Web Danushka Bollegala Yutaka Matsuo Mitsuru Ishizuka International.
Shallow Parsing for South Asian Languages -Himanshu Agrawal.
11 Project, Part 3. Outline Basics of supervised learning using Naïve Bayes (using a simpler example) Features for the project 2.
HMM vs. Maximum Entropy for SU Detection Yang Liu 04/27/2004.
1 Fine-grained and Coarse-grained Word Sense Disambiguation Jinying Chen, Hoa Trang Dang, Martha Palmer August 22, 2003.
Chunk Parsing II Chunking as Tagging. Chunk Parsing “Shallow parsing has become an interesting alternative to full parsing. The main goal of a shallow.
Course Review #2 and Project Parts 3-6 LING 572 Fei Xia 02/14/06.
Web Intelligence and Intelligent Agent Technology 2008.
Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.
Semantic Role Labelling Using Chunk Sequences Ulrike Baldewein Katrin Erk Sebastian Padó Saarland University Saarbrücken Detlef Prescher Amsterdam University.
Question Classification Ling573 NLP Systems and Applications April 25, 2013.
A Simple Approach for Author Profiling in MapReduce
Question Classification II
A Straightforward Author Profiling Approach in MapReduce
Lei Sha, Jing Liu, Chin-Yew Lin, Sujian Li, Baobao Chang, Zhifang Sui
ABDULLAH ALOTAYQ, DONG WANG, ED PHAM PROJECT BY:
Identifying Confusion from Eye-Tracking Data
network of simple neuron-like computing elements
Presentation transcript:

Deliverable #2: Question Classification Group 5 Caleb Barr Maria Alexandropoulou

Software used JAVA in order to perform feature extraction Illinois Chunker was applied to extract chunks Python – Automating classification tasks – Preprocessing of data when necessary Mallet was used for the classification task

System Properties Classification Algorithms – MaxEnt – NaiveBayes Training data – Sum of: Li and Roth Training set 5 (5500 questions) TREC-2004 Test data – Li and Roth test data set – TREC-2005.xml

System Properties (cont.) Features extracted Focused on syntactic features since we targeted coarse classification (i.e. conclusion in Li and Roth) – Unigrams – Bigrams – Trigrams – Chunks with POS tags e.g. [NP (DT) (JJ) (NN)] – Head NP/VP chunks as in Li and Roth e.g. [NP (DT the) (JJS oldest) ] in “What is the oldest profession ? “

Runs performed Runs were performed for all combinations of classification algorithms and feature templates e.g. MaxEnt, Unigrams NaiveBayes, Unigrams, Bigrams, Chunks etc

Charts

Conclusions Maximum test accuracy – TREC10: UnigramsBigramsHeads Maxent – TREC2005: UnigramsBigramsHeads NaiveBayes (MaxEnt was very close) Trigrams affect accuracy negatively – bad feature

Sample confusion matrix for our best accuracy TREC_10_MaxEnt_UnigramBigramHeads: label012345total 0 DESC ENTY ABBR HUM NUM LOC