NTCIR 2005 1/21 ASQA: Academia Sinica Question Answering System for CLQA (IASL) Cheng-Wei Lee, Cheng-Wei Shih, Min-Yuh Day, Tzong-Han Tsai, Tian-Jian Jiang,

Slides:



Advertisements
Similar presentations
University of Sheffield NLP Module 11: Advanced Machine Learning.
Advertisements

Large-Scale Entity-Based Online Social Network Profile Linkage.
1/1/ A Knowledge-based Approach to Citation Extraction Min-Yuh Day 1,2, Tzong-Han Tsai 1,3, Cheng-Lung Sung 1, Cheng-Wei Lee 1, Shih-Hung Wu 4, Chorng-Shyong.
Dialogue – Driven Intranet Search Suma Adindla School of Computer Science & Electronic Engineering 8th LANGUAGE & COMPUTATION DAY 2009.
Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)
Chapter 11 Beyond Bag of Words. Question Answering n Providing answers instead of ranked lists of documents n Older QA systems generated answers n Current.
Ang Sun Ralph Grishman Wei Xu Bonan Min November 15, 2011 TAC 2011 Workshop Gaithersburg, Maryland USA.
1/1/ An Integrated Knowledge-based and Machine Learning Approach for Chinese Question Classification Min-Yuh Day 1,2, Cheng-Wei Lee 1, Shih-Hung Wu 3,
Academia Sinica, Taiwan 1/10 Argument Score Combination for Constituents Tzong-Han Tsai, Chia-Wei Wu, Yu- Chun Lin, and Wen-Lian Hsu Institute of Information.
The Informative Role of WordNet in Open-Domain Question Answering Marius Paşca and Sanda M. Harabagiu (NAACL 2001) Presented by Shauna Eggers CS 620 February.
Basi di dati distribuite Prof. M.T. PAZIENZA a.a
Shared Ontology for Knowledge Management Atanas Kiryakov, Borislav Popov, Ilian Kitchukov, and Krasimir Angelov Meher Shaikh.
1/1/ Question Classification in English-Chinese Cross-Language Question Answering: An Integrated Genetic Algorithm and Machine Learning Approach Min-Yuh.
Digital Library Service Integration (DLSI) --> Looking for Collections and Services to be DLSI Testbeds
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
1/1/ Designing an Ontology-based Intelligent Tutoring Agent with Instant Messaging Min-Yuh Day 1,2, Chun-Hung Lu 1,3, Jin-Tan David Yang 4, Guey-Fa Chiou.
Employing Two Question Answering Systems in TREC 2005 Harabagiu, Moldovan, et al 2005 Language Computer Corporation.
Artificial Intelligence Research Centre Program Systems Institute Russian Academy of Science Pereslavl-Zalessky Russia.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
CSC 9010 Spring Paula Matuszek A Brief Overview of Watson.
A Technical Seminar on Question Answering SHRI RAMDEOBABA COLLEGE OF ENGINEERING & MANAGEMENT Presented By: Rohini Kamdi Guided By: Dr. A.J.Agrawal.
RuleML-2007, Orlando, Florida1 Towards Knowledge Extraction from Weblogs and Rule-based Semantic Querying Xi Bai, Jigui Sun, Haiyan Che, Jin.
“How much context do you need?” An experiment about context size in Interactive Cross-language Question Answering B. Navarro, L. Moreno-Monteagudo, E.
AnswerBus Question Answering System Zhiping Zheng School of Information, University of Michigan HLT 2002.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.
1 A Unified Relevance Model for Opinion Retrieval (CIKM 09’) Xuanjing Huang, W. Bruce Croft Date: 2010/02/08 Speaker: Yu-Wen, Hsu.
Ling 570 Day 17: Named Entity Recognition Chunking.
Automatic Detection of Tags for Political Blogs Khairun-nisa Hassanali Vasileios Hatzivassiloglou The University.
A Probabilistic Graphical Model for Joint Answer Ranking in Question Answering Jeongwoo Ko, Luo Si, Eric Nyberg (SIGIR ’ 07) Speaker: Cho, Chin Wei Advisor:
Natural Language Based Reformulation Resource and Web Exploitation for Question Answering Ulf Hermjakob, Abdessamad Echihabi, Daniel Marcu University of.
QUALIFIER in TREC-12 QA Main Task Hui Yang, Hang Cui, Min-Yen Kan, Mstislav Maslennikov, Long Qiu, Tat-Seng Chua School of Computing National University.
21/11/2002 The Integration of Lexical Knowledge and External Resources for QA Hui YANG, Tat-Seng Chua Pris, School of Computing.
Date: 2014/02/25 Author: Aliaksei Severyn, Massimo Nicosia, Aleessandro Moschitti Source: CIKM’13 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Building.
Opinion Holders in Opinion Text from Online Newspapers Youngho Kim, Yuchul Jung and Sung-Hyon Myaeng Reporter: Chia-Ying Lee Advisor: Prof. Hsin-Hsi Chen.
GTRI.ppt-1 NLP Technology Applied to e-discovery Bill Underwood Principal Research Scientist “The Current Status and.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
A Novel Pattern Learning Method for Open Domain Question Answering IJCNLP 2004 Yongping Du, Xuanjing Huang, Xin Li, Lide Wu.
Talk Schedule Question Answering from Bryan Klimt July 28, 2005.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
Next Generation Search Engines Ehsun Daroodi 1 Feb, 2003.
Using Semantic Relations to Improve Passage Retrieval for Question Answering Tom Morton.
LING 573 Deliverable 3 Jonggun Park Haotian He Maria Antoniak Ron Lockwood.
For Monday Read chapter 24, sections 1-3 Homework: –Chapter 23, exercise 8.
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
Multilingual Opinion Holder Identification Using Author and Authority Viewpoints Yohei Seki, Noriko Kando,Masaki Aono Toyohashi University of Technology.
From Text to Image: Generating Visual Query for Image Retrieval Wen-Cheng Lin, Yih-Chen Chang and Hsin-Hsi Chen Department of Computer Science and Information.
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
Ranking Definitions with Supervised Learning Methods J.Xu, Y.Cao, H.Li and M.Zhao WWW 2005 Presenter: Baoning Wu.
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
Support Vector Machines and Kernel Methods for Co-Reference Resolution 2007 Summer Workshop on Human Language Technology Center for Language and Speech.
Answer Mining by Combining Extraction Techniques with Abductive Reasoning Sanda Harabagiu, Dan Moldovan, Christine Clark, Mitchell Bowden, Jown Williams.
A Patent Document Retrieval System Addressing Both Semantic and Syntactic Properties Liang Chen*,Naoyuki Tokuda+, Hisahiro Adachi+ *University of Northern.
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
Using Semantic Relations to Improve Information Retrieval
Integrating linguistic knowledge in passage retrieval for question answering J¨org Tiedemann Alfa Informatica, University of Groningen HLT/EMNLP 2005.
Consumer Health Question Answering Systems Rohit Chandra Sourabh Singh
AQUAINT Mid-Year PI Meeting – June 2002 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
An Ontology-based Automatic Semantic Annotation Approach for Patent Document Retrieval in Product Innovation Design Feng Wang, Lanfen Lin, Zhou Yang College.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
Reading Report on Hybrid Question Answering System
Terminology problems in literature mining and NLP
Web IR: Recent Trends; Future of Web Search
Combining Keyword and Semantic Search for Best Effort Information Retrieval  Andrew Zitzelberger 1.
CS246: Information Retrieval
Question Answer System Deliverable #2
Topic: Semantic Text Mining
Presentation transcript:

NTCIR /21 ASQA: Academia Sinica Question Answering System for CLQA (IASL) Cheng-Wei Lee, Cheng-Wei Shih, Min-Yuh Day, Tzong-Han Tsai, Tian-Jian Jiang, Chia-Wei Wu, Cheng-Lung Sung, Yu-Ren Chen, Shih-Hung Wu, Wen-Lian Hsu Academia Sinica, Taipei

2/21 Outline The Design Principal System Architecture Question Processing Passage Retrieval Answer Extraction Answer Ranking Performance Conclusion

3/21 The Design Principals of ASQA Reduce the cost by adopting existing components InfoMap: a knowledge representation framework Mencius: an NER engine AutoTag: a Chinese word segmentation tool Lucene: an open source IR engine SVMLight and opennlp.maxent : machine learning packages Minimizing system complexity Only shallow NLP techniques are used We want to see how a Chinese QA system performs without deep NLP techniques Incorporating human knowledge with machine learning methods Knowledge editing tool Knowledge as machine learning features Knowledge as dominant strategy

4/21 Chinese Word Segmentation Chinese text lacks explicit word boundaries. Word segmentation is a necessary step in many Chinese applications There are some word segmentation tools, but not designed for QA Combination rules are applied to form meaningful words for our QA system 第一 (Neu) 銅 (Na) 鐵 (Na) 公司 (Nc) First Copper Iron Corp. 第一銅鐵公司 第一 (Neu) 銅鐵 (Na) 公司 (Nc)

5/21 Architecture of ASQA SVM InfoMap Question Processing AutoTagMencius ME LuceneAutoTag Passage Retrieval Answer Ranking Mencius Filter word index char index documents Passages QType SegmentsQFocus, QLimitations Answer Candidates Answers Answer Extraction

6/21 Question Processing SVM InfoMap Question Processing AutoTagMencius ME LuceneAutoTag Passage Retrieval Answer Ranking Mencius Filter word index char index documents Passages QType SegmentsQFocus, QLimitations Answer Candidates Answers Answer Extraction

7/21 Question Processing Capture what the user want Question classification Goal: accurately classify a Chinese question into a question type Chinese Question: 奧運的發源地在哪裡? Where is the originating place of the Olympics? Question Type: Q_LOCATION| 地 QFocus analysis Goal: Capture other detail information about the question such as QFocus, NE, Time, QFDescription

8/21 Taxonomy of Question Types

9/21 A Hybrid Approach for Chinese Question Classification Hybrid Approach SVM: machine learning binary classifiers for each question type InfoMap: knowledge representation framework syntactic templates for classifying questions Features for SVM QC model Character Character bigram HowNet Main Definition

10/21 A Hybrid Approach for Chinese Question Classification InfoMap and SVM are integrated according to their individual advantages The templates in InfoMap for matching question types are designed with high precision. The SVM model has the Hownet Main Definition semantic feature. It has better recall. Use InfoMap approach as the dominant strategy Only fallback to SVM if there is no InfoMap template matched

11/21 QFocus Analysis QFocus analysis is a tagging problem which is different from QType classification Some types of information are extracted by QFocus analysis QFocus: a QFocus is the category name of the answers Time (TI): Time or Date expressions Named Entities (NE): PERSON, LOCATION, ORGANIZATION QF Description (QFD): other description about the answer 請問 [2000 年 /TI] 的 [G8 高峰會 /NE] 在 [ 日本 /NE] 何地舉行 ? Year 2000 G8 summit Japan Which place in Japan hosted the G8 summit in 2000? 請問 [ 芬蘭第一位女總統 /QF] 為誰 ? Finland's first woman president Who is the Finland's first woman president? 請問 [2000 年 /TI] [ 沉沒於北極圈巴倫支海 /QFD] 的 [ 俄羅斯核子潛艇 /QF] 的名字 ? Year 2000 sank in the Barents Sea Russian nuclear submarine Which Russian nuclear submarine sank in the Barents Sea in 2000?

12/21 A Hybrid Approach of QFocus Analysis Combine syntactic rules and ME-model ME-model Tagging problem The ME Features are Context words, Context POS, Previous Tags 718 tagged question sentence Syntactic rules examples “Noun” string located behind “ 的 ”, “ 之 ”  QF “Noun” string located in front of “ 是 ”, “ 為 ”, “ 於 ”, and “ 在 ”  QF string quoted by “ 「」 ” and “( )“  QFD

13/21 Passage Retrieval with Lucene SVM InfoMap Question Processing AutoTagMencius ME LuceneAutoTag Passage Retrieval Answer Ranking Mencius Filter word index char index documents Passages QType SegmentsQFocus, QLimitations Answer Candidates Answers Answer Extraction

14/21 Passage Retrieval with Lucene The required operator Initial Query (IQ) sets quoted and noun terms as required Relaxed Query (RQ) doesn’t set any term as required The boosting operator Quoted terms: 2 Nouns: 1.2 Verbs:0.7 Q by IQ with W-idx Q by IQ with C-idx Sort Q by RQ with W-idx Q by RQ with C-idx Sort Any result? End NO YES Passage retrieval runtime workflow 請問台灣童謠「天黑黑」是由哪位作曲家所創作? Initial query example: +" 作曲家 "^1.2 +" 台灣 "^1.2 " 創作 "^0.7 +" 童謠 "^1.2 +" 天黑黑 "^2 Relaxed query example: " 作曲家 "^1.2 " 台灣 "^1.2 " 創作 "^0.7 " 童謠 "^1.2 " 天黑黑 "^2 請問台灣童謠「天黑黑」是由哪位作曲家所創作? Initial query example: +" 作曲家 "^1.2 +" 台灣 "^1.2 " 創作 "^0.7 +" 童謠 "^1.2 +" 天黑黑 "^2 Relaxed query example: " 作曲家 "^1.2 " 台灣 "^1.2 " 創作 "^0.7 " 童謠 "^1.2 " 天黑黑 "^2

15/21 Answer Extraction SVM InfoMap Question Processing AutoTagMencius ME LuceneAutoTag Passage Retrieval Answer Ranking Mencius Filter word index char index documents Passages QType SegmentsQFocus, QLimitations Answer Candidates Answers Answer Extraction

16/21 Answer Extraction Top 5 passages are sent to answer extraction module Named entity recognition (Mencius) PERSON, LOC, and ORG are recognized by ME-based NER engine Fined-grained and other coarse-grained types are identified by taxonomy and templates in InfoMap Answer filtering Answers which are incompatible with the QType are filtered out Compatibility of question and answer types is defined by a mapping table

17/21 Answer Ranking SVM InfoMap Question Processing AutoTagMencius ME LuceneAutoTag Passage Retrieval Answer Ranking Mencius Filter word index char index documents Passages QType SegmentsQFocus, QLimitations Answer Candidates Answers Answer Extraction

18/21 Answer Ranking Rank answer candidates with ranking scores A ranking score is calculated according to the QFocus analysis results NE Score QFocus Scores Cue Score

19/21 System Performance of CLQA Chinese to Chinese Task

Performance Fig.1 Fig.3 Fig.4 Fig.2

21/21 Conclusions We have demonstrated that an effective Chinese QA system can be created Shallow NLP techniques Integrating knowledge templates (InfoMap) and machine learning methods (SVM, ME) Open Source IR engine is usable for Chinese QA In the future work, we would like to include deeper NLP techniques Parsing Event structure/Relation