Q/A System First Stage: Classification Project by: Abdullah Alotayq, Dong Wang, Ed Pham.

Slides:

Advertisements

Similar presentations

Group 3 Chad Mills Esad Suskic Wee Teck Tan. Outline  System and Data  Document Retrieval  Passage Retrieval  Results  Conclusion.

Advertisements

Great Food, Lousy Service Topic Modeling for Sentiment Analysis in Sparse Reviews Robin Melnick Dan Preston

TÍTULO GENÉRICO Concept Indexing for Automated Text Categorization Enrique Puertas Sanz Universidad Europea de Madrid.

Named Entity Classification Chioma Osondu & Wei Wei.

Learning with Probabilistic Features for Improved Pipeline Models Razvan C. Bunescu Electrical Engineering and Computer Science Ohio University Athens,

Semantic Entailment Nathaniel Story Ginger Buckbee Greg Lorge Billy Dean.

47 th Annual Meeting of the Association for Computational Linguistics and 4 th International Joint Conference on Natural Language Processing Of the AFNLP.

Question Classification (cont’d) Ling573 NLP Systems & Applications April 12, 2011.

LING 581: Advanced Computational Linguistics Lecture Notes May 5th.

Deliverable #2: Question Classification Group 5 Caleb Barr Maria Alexandropoulou.

Shallow Processing: Summary Shallow Processing Techniques for NLP Ling570 December 7, 2011.

Chapter 11 Beyond Bag of Words. Question Answering n Providing answers instead of ranked lists of documents n Older QA systems generated answers n Current.

Automatic Name Transliteration via OCR and NLP Yu Cao Tao Wang.

A simple classifier Ridge regression A variation on standard linear regression Adds a “ridge” term that has the effect of “smoothing” the weights Equivalent.

Information Retrieval: Models and Methods October 15, 2003 CMSC Gina-Anne Levow.

Course Summary LING 572 Fei Xia 03/06/07. Outline Problem description General approach ML algorithms Important concepts Assignments What’s next?

Named Entity Recognition and the Stanford NER Software Jenny Rose Finkel Stanford University March 9, 2007.

Question-Answering: Systems & Resources Ling573 NLP Systems & Applications April 8, 2010.

Examples taken from: nltk.sourceforge.net/tutorial/introduction/index.html Natural Language Toolkit.

AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.

Answer Validation Exercise - AVE QA subtrack at Cross-Language Evaluation Forum 2007 UNED (coord.) Anselmo Peñas Álvaro Rodrigo Valentín Sama Felisa Verdejo.

Overview: Humans are unique creatures. Everything we do is slightly different from everyone else. Even though many times these differences are so minute.

Automatic Extraction of Opinion Propositions and their Holders Steven Bethard, Hong Yu, Ashley Thornton, Vasileios Hatzivassiloglou and Dan Jurafsky Department.

Lecture 6 Hidden Markov Models Topics Smoothing again: Readings: Chapters January 16, 2013 CSCE 771 Natural Language Processing.

A search-based Chinese Word Segmentation Method ——WWW 2007 Xin-Jing Wang: IBM China Wen Liu: Huazhong Univ. China Yong Qin: IBM China.

Classification and Ranking Approaches to Discriminative Language Modeling for ASR Erinç Dikici, Murat Semerci, Murat Saraçlar, Ethem Alpaydın 報告者：郝柏翰 2013/01/28.

A Language Independent Method for Question Classification COLING 2004.

S1: Chapter 1 Mathematical Models Dr J Frost Last modified: 6 th September 2015.

Biologically Inspired Defenses against Computer Viruses International Joint Conference on Artificial Intelligence 95’ J.O. Kephart et al.

NTCIR /21 ASQA: Academia Sinica Question Answering System for CLQA (IASL) Cheng-Wei Lee, Cheng-Wei Shih, Min-Yuh Day, Tzong-Han Tsai, Tian-Jian Jiang,

Natural language processing tools Lê Đức Trọng 1.

Exploiting Wikipedia Categorization for Predicting Age and Gender of Blog Authors K Santosh Aditya Joshi Manish Gupta Vasudeva Varma

TEXT ANALYTICS - LABS Maha Althobaiti Udo Kruschwitz Massimo Poesio.

CSKGOI'08 Commonsense Knowledge and Goal Oriented Interfaces.

Using Semantic Relations to Improve Passage Retrieval for Question Answering Tom Morton.

LING 573 Deliverable 3 Jonggun Park Haotian He Maria Antoniak Ron Lockwood.

1/21 Automatic Discovery of Intentions in Text and its Application to Question Answering (ACL 2005 Student Research Workshop )

AQUAINT June 2002 Workshop June 2002 Just-in-Time Interactive Question Answering Sanda Harabagiu: PI Language Computer Corporation.

Evaluating Answer Validation in multi- stream Question Answering Álvaro Rodrigo, Anselmo Peñas, Felisa Verdejo UNED NLP & IR group nlp.uned.es The Second.

 General domain question answering system.  The starting point was the architecture described in Brill, Eric. ‘Processing Natural Language without Natural.

公司標誌 Question Answering System Introduction to Q-A System 資訊四 B 張弘霖資訊四 B 王惟正.

Question Classification using Support Vector Machine Dell Zhang National University of Singapore Wee Sun Lee National University of Singapore SIGIR2003.

Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.

1 Question Answering and Logistics. 2 Class Logistics  Comments on proposals will be returned next week and may be available as early as Monday  Look.

Overview of Statistical NLP IR Group Meeting March 7, 2006.

Consumer Health Question Answering Systems Rohit Chandra Sourabh Singh

AQUAINT Mid-Year PI Meeting – June 2002 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.

Relation Extraction (RE) via Supervised Classification See: Jurafsky & Martin SLP book, Chapter 22 Exploring Various Knowledge in Relation Extraction.

DOWeR Detecting Outliers in Web Service Requests Master’s Presentation of Christian Blass.

Question Classification Ling573 NLP Systems and Applications April 25, 2013.

Question Classification II

CRF &SVM in Medication Extraction

Authorship Attribution Using Probabilistic Context-Free Grammars

Named Entity Tagging with Conditional Random Fields

Natural Language Processing (NLP)

Title Goal Method Result

Studying Humour Features - Bolla, Whelan

CSCE 590 Web Scraping – Information Retrieval

Text Analytics Giuseppe Attardi Università di Pisa

LING 388: Computers and Language

Question Answering via Question-to-Question Mapping

Donna M. Gates Carnegie Mellon University

ABDULLAH ALOTAYQ, DONG WANG, ED PHAM PROJECT BY:

CSE 635 Multimedia Information Retrieval

Natural Language Processing (NLP)

CS224N Section 3: Corpora, etc.

Give 6 different extension strategies and explain the problems that businesses might face implementing them. What’s happening here and what potential problems.

CS224N Section 3: Project,Corpora

Natural Language Processing (NLP)

Presentation transcript:

Q/A System First Stage: Classification Project by: Abdullah Alotayq, Dong Wang, Ed Pham

Query Processing Classification Package: Mallet Classifiers: Maxent, DecisionTree, C45, NaiveBayes, AdaBoost, Winnow, Balanced Winnow, Bagging Trainer.etc

Main Techniques

Features Semantic Morphological Neighboring (Syntactic)

Stemming nltk stemmer

N-grams Bigrams:

Trigrams: – Poor Classification results Not A good strategy.

NER (Named Entity Recognition) nltk NER pre-trained model to do this task. 6 types of NE

Frequencies TypeFreq. GSP22 FACILITY3 GPE1203 PERSON600 LOCATION21 ORGANIZATION622 Training Data:

Test Data: TypeFreq. GSP2 FACILITY0 GPE90 PERSON35 LOCATION3 ORGANIZATION42

NO Named Entity detected In training data: 3533, namely 64.8% In test data, 353, 70.6%. -> data sparseness problem

NER Results & Future work Test data accuracy= we might try other NE tools, which would give more NE types and cover more percentage on training and test data.

Binary and Real Values Testing for potential improvement. Best performing classifiers: For Binary: – BalancedWinnow: Test data accuracy= – MaxEnt: Test accuracy mean = 0.78 For Real Values: -BalancedWinnow: Test data accuracy= MaxEnt: Test data accuracy= 0.758

Data set1: TypeTrainerResults Binary BalancedWinnow0.804 DecisionTree0.68 MaxEnt0.756 NaiveBayes0.546 Real Values BalancedWinnow0.784 DecisionTree0.42 MaxEnt0.758 NaiveBayes0.54 NER Binary BalancedWinnow0.802 DecisionTree0.5 MaxEnt0.772 NaiveBayes0.54 NER Real Values DecisionTree0.48 MaxEnt0.768 NaiveBayes0.538 Bigrams Binary MaxEnt0.702 NaiveBayes0.624 BalancedWinnow0.76 Bigrams Real Values MaxEnt0.698 NaiveBayes0.624 BalancedWinnow0.76 Trigrams Binary NaiveBayes0.4 BalancedWinnow0.478 Trigrams Real Values NaiveBayes0.4 BalancedWinnow0.478

Data set2: TypeTrainerResults Binary BalancedWinnow0.74 MaxEnt0.74 NaiveBayes0.72 Real Values BalancedWinnow0.784 MaxEnt0.75 NaiveBayes0.71 Stemmed Binary BalancedWinnow0.78 MaxEnt0.76 NaiveBayes0.76 Stemmed Real Values BalancedWinnow0.75 MaxEnt0.77

Proposed future improvement WordNet Senses Class-Specific Related Words

Issues Performing poorly on some refinements. – Low accuracy scores: Memory consuming classifiers. – Classifiers showed some error messages.

Successes Made progress in creating the system. Had some hands-on experience dealing with classifiers, and NLP packages. Learned ways to improve classification results.

Readings that helped Employing Two Question Answering Systems in TREC-2005, Sanda Harabagiu & others.

Software packages participated Mallet NLTK Porter-stemmer Self-written code files Stanford Parser, Berkeley Parser