4 ~ Question Answering.

Slides:



Advertisements
Similar presentations
IMPLEMENTATION OF INFORMATION RETRIEVAL SYSTEMS VIA RDBMS.
Advertisements

Bayu Priyambadha, S.Kom.  Final Keyword  Class Abstraction  Object Interfaces.
Modern information retrieval Modelling. Introduction IR systems usually adopt index terms to process queries IR systems usually adopt index terms to process.
Information Retrieval in Practice
Ch 4: Information Retrieval and Text Mining
1 Extending PRIX for Similarity-based XML Query Group Members: Yan Qi, Jicheng Zhao, Dan Situ, Ning Liao.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
DB2 Net Search Extender Presenter: Sudeshna Banerji (CIS 595: Bioinformatics)
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
Important Task in Patents Retrieval Recall is an Important Factor Given Query Patent -> the Task is to Search all Related Patents Patents have Complex.
Chapter 5: Information Retrieval and Web Search
Overview of Search Engines
1 Searching through the Internet Dr. Eslam Al Maghayreh Computer Science Department Yarmouk University.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
Dr. Monira Al-Mohizea MORPHOLOGY & SYNTAX WEEK 11.
The Development of a search engine & Comparison according to algorithms Sungsoo Kim Haebeom Lee The mid-term progress report.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
The PATENTSCOPE search system: CLIR February 2013 Sandrine Ammann Marketing & Communications Officer.
Ontology-Based Information Extraction: Current Approaches.
A Probabilistic Graphical Model for Joint Answer Ranking in Question Answering Jeongwoo Ko, Luo Si, Eric Nyberg (SIGIR ’ 07) Speaker: Cho, Chin Wei Advisor:
XP New Perspectives on The Internet, Sixth Edition— Comprehensive Tutorial 3 1 Searching the Web Using Search Engines and Directories Effectively Tutorial.
INTERESTING NUGGETS AND THEIR IMPACT ON DEFINITIONAL QUESTION ANSWERING Kian-Wei Kor, Tat-Seng Chua Department of Computer Science School of Computing.
1 CS 430: Information Discovery Lecture 3 Inverted Files.
Chapter 6: Information Retrieval and Web Search
Relevance Detection Approach to Gene Annotation Aid to automatic annotation of databases Annotation flow –Extraction of molecular function of a gene from.
Information Retrieval Model Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
NTCIR /21 ASQA: Academia Sinica Question Answering System for CLQA (IASL) Cheng-Wei Lee, Cheng-Wei Shih, Min-Yuh Day, Tzong-Han Tsai, Tian-Jian Jiang,
IR Homework #2 By J. H. Wang Mar. 31, Programming Exercise #2: Query Processing and Searching Goal: to search relevant documents for a given query.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
Next Generation Search Engines Ehsun Daroodi 1 Feb, 2003.
Answer Mining by Combining Extraction Techniques with Abductive Reasoning Sanda Harabagiu, Dan Moldovan, Christine Clark, Mitchell Bowden, Jown Williams.
1 Answering English Questions by Computer Jim Martin University of Colorado Computer Science.
8 December 1997Industry Day Applications of SuperTagging Raman Chandrasekar.
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
Using Semantic Relations to Improve Information Retrieval
Selecting Relevant Documents Assume: –we already have a corpus of documents defined. –goal is to return a subset of those documents. –Individual documents.
UIC at TREC 2006: Blog Track Wei Zhang Clement Yu Department of Computer Science University of Illinois at Chicago.
General Architecture of Retrieval Systems 1Adrienn Skrop.
An Ontology-based Automatic Semantic Annotation Approach for Patent Document Retrieval in Product Innovation Design Feng Wang, Lanfen Lin, Zhou Yang College.
Trends in NL Analysis Jim Critz University of New York in Prague EurOpen.CZ 12 December 2008.
Information Retrieval in Practice
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Search Engine Architecture
Text Based Information Retrieval
Statistical Learning Methods for Natural Language Processing on the Internet 徐丹云.
Information Retrieval and Web Search
Information Retrieval and Web Search
Information Retrieval and Web Search
Boolean Retrieval Term Vocabulary and Posting Lists Web Search Basics
Multimedia Information Retrieval
Introduction to javadoc
Query Languages.
Thanks to Bill Arms, Marti Hearst
Basic Information Retrieval
Improving DevOps and QA efficiency using machine learning and NLP methods Omer Sagi May 2018.
Tutorial Developing a Basic Web Page
CSE 635 Multimedia Information Retrieval
What is XML?.
Lecture 8 Information Retrieval Introduction
Chapter 5: Information Retrieval and Web Search
Introduction to javadoc
CS246: Information Retrieval
Just Google It! Internet Searches.
Boolean and Vector Space Retrieval Models
Dennis Zhao,1 Dragomir Radev PhD1 LILY Lab
Information Retrieval and Web Design
Introduction to Search Engines
System Model Acquisition from Requirements Text
Presentation transcript:

4 ~ Question Answering

QA Question Answering Query : Output : Teknik QA : NLP (Natural Language Processing) What is the capital city of Spain? Output : Jawaban yang tepat bukan dokumen Teknik QA : Kombinasi dari IR dan NLP

Jenis-jenis Pertanyaan Orang : pekerjaan, peranan, posisi Who is George W Bush? Organisasi : nama, kegiatan, dsb What is UNICEF? Lokasi What is the Capital city of Somalia? Waktu dari suatu kejadian When did Napoleon die? Cara suatu hal bisa terjadi (How) How did Aryton Senna die?

Evaluasi Dilakukan oleh manusia Jawaban dinilai dari segi: Responsiveness (pasangan jawaban-dokid) Ketapatan (untuk setiap jawaban) Pemberian penilaian Wrong (W) : jawaban salah Unsupported (U) : jawaban benar tapi dokumen tidak mendukung Inexact (X) : jawaban dan id benar tetapi terlalu panjang Right (R) : jawaban & dokumen benar

Moldovan, et.al. (1999) LASSO: A Tool for Surfing the Answer Net

Architecture System It consists of 3 Modules : Query processing module Paragraph indexing module (search engine) Answer processing (extraction) module

Architecture System LASSO

Query Processing Module Determine the type of question Determine the type of answer expected Build a focus for the answer Is a word or sequence word which define the question and disambiguate it in the sense that it indicates what the question is looking for, or what the question is all about. Transform the question into queries for the search engine

Question Type and Answer Type Q – Class : What Q-Sub Class Answer Type Example of Question Focus Basic What Money / Number / Definition / Title / Undefined What was the monetary value of the Nobel Peace Prize in 1989? monetary value What – Who Person / Organization What costume designer decided that Michael Jackson should only wear one glove? costume designer What – When Date In what year did Ireland elect its first woman president? year What – Where Location What is the capital of Uruguay? capital

Question Type and Answer Type Q – Class : Who Q-Sub Class Answer Type Example of Question Focus Person / Organization Who is the author of the book “The Iron Lady: A Biography of Margaret Tetcher”? author Q – Class : Whom Q-Sub Class Answer Type Example of Question Focus Person / Organization Whom did the Chicago Bulls beat in 1993 championship? Chicago Bulls

Question Type and Answer Type Q – Class : Where Q-Sub Class Answer Type Example of Question Focus Location Where is Taj Mahal? Taj Mahal Q – Class : When Q-Sub Class Answer Type Example of Question Focus Date When did the Jurassic Period end? Jurasic Period

Question Type and Answer Type Q – Class : Which Q-Sub Class Answer Type Example of Question Focus Which –What Organization Which Japanese car maker had its biggest percentage of sales in the domestic market? Japanese car maker Which – Who Person Which former Klu Klux Klan member won an elected office in the U.S.? Klu Klux Klan member Which – When Date In which year was New Zealand excluded from the ANZUS alliance? year Which – Where Location Which city has the oldest relationship as sister city with Los Angeles? city

Question Type and Answer Type Q – Class : How Q-Sub Class Answer Type Example of Question Focus Basic How Manner How did Socrates die? Socrates How many Number How many people died in when the Estonia sank in 1994 People How long Time/Distance How long does it take to travel from Tokyo to Niigata - How much Money/Price How much did Mercury spend on advertising in 1993 Mercury How much <modifier> Undefined How much stronger is the new carbon material invented by the Tokyo Institute compared with another material? New carbon material

Question Type and Answer Type (lanjutan : how) Q-Sub Class Answer Type Example of Question Focus How far Distance How far is Madrid from Barcelona? Madrid How tall Number How tall is Mount Everest Mount Everest How rich Undefined How rich is Bill Gates Bill Gates How large How large is the Arctic Arctic

Question Type and Answer Type Q – Class : Name Q-Sub Class Answer Type Example of Question Focus Name who Person / Organization How did Socrates die? Socrates Name where Location How many people died in when the Estonia sank in 1994 People Name what Title How long does it take to travel from Tokyo to Niigata -

Extracting Question Keywords There are 8 ordered heuristics : Each heuristic returns a set of keywords, that is added in the same order to the question keywords Initial : First 6 heuristics are considered In the retrieval loop : The other two heuristics are added

The 6 heuristics : Whenever quotes expressions are recognized in a question, all non-stop words of the quotation became keywords All name entities recognized as proper nouns (penggolongan kata benda untuk orang – person, tempat – place, dan benda secara spesifik) , are selected as keywords All complex nominals and their adjectival modifiers are selected as keywords Ex : an occasional cup of coffee (a cup of coffee that someone drinks occasionally)

All other complex nominals are selected as keywords All noun and their adjectival modifiers are selected as keywords All other nouns recognized in the question are selected as keywords

The other two heuristics All verbs from the question are selected as keywords The question focus is added to the keywords

Sample

Architecture System LASSO

Search Engine 1st Development : 2nd Development : Vector Space Model The model doesn’t allow for extraction of those document which include all of the keywords The model extracts extract the similarity measure between the document and the query based on cosine similarity value 2nd Development : Required search engine are much more rigid Documents should be retrieved only when all of the keywords present in the document Boolean Model

Paragraph Filtering Background : Steps : The number of documents that contain the keywords returned by the search engine may be large since only weak Boolean were used Steps : Perform the segmentation process over the documents retrieved into sentences and paragraphs. Sentence separators using punctuation Paragraph separators that were implemented are HTML tags, empty lines, and paragraph indentation.

Paragraph Ordering ~ Step 1. Paragraph Windowing ~

Paragraph Ordering ~ Step 1. Paragraph Scoring ~ Same_word_sequence_distance: Computes the number of words from the question that are recognized in the same sequence in the current paragraph-window Max_Distance_score: Represent the number of words that separate the most distant keywords in the window Missing_keywords_score: Computes the number of un-matched keywords

Terima kasih

PILIHAN TOPIK Relevance Feedback with Query Expansion Relevance Feedback with Terms Reweighting CLIR – Controlled Vocabulary CLIR Free Text – Thesaurus Based Free Text – Dictionary Based CLIR Free Text – Paralel Corpus CLIR Free Text – Comparable Corpus CLIR Free Text – Unaligned Corpus

Laporan Tertulis Deskripsi Data Sumber Daya Yang Digunakan Dokumen : min 50 data Query : min 10 query Sumber Daya Yang Digunakan Hardware Software Bagan Alir Metode Penyelesaian Masalah

CD Source Code Program Readme File berisi petunjuk terkait dengan data dan program