4 ~ Question Answering.

1 4 ~ Question Answering

2 QA Question Answering Query : Output : Teknik QA :
NLP (Natural Language Processing) What is the capital city of Spain? Output : Jawaban yang tepat bukan dokumen Teknik QA : Kombinasi dari IR dan NLP

3 Jenis-jenis Pertanyaan
Orang : pekerjaan, peranan, posisi Who is George W Bush? Organisasi : nama, kegiatan, dsb What is UNICEF? Lokasi What is the Capital city of Somalia? Waktu dari suatu kejadian When did Napoleon die? Cara suatu hal bisa terjadi (How) How did Aryton Senna die?

4 Evaluasi Dilakukan oleh manusia Jawaban dinilai dari segi:
Responsiveness (pasangan jawaban-dokid) Ketapatan (untuk setiap jawaban) Pemberian penilaian Wrong (W) : jawaban salah Unsupported (U) : jawaban benar tapi dokumen tidak mendukung Inexact (X) : jawaban dan id benar tetapi terlalu panjang Right (R) : jawaban & dokumen benar

5 Moldovan, (1999) LASSO: A Tool for Surfing the Answer Net

6 Architecture System It consists of 3 Modules : Query processing module
Paragraph indexing module (search engine) Answer processing (extraction) module

7 Architecture System LASSO

8 Query Processing Module
Determine the type of question Determine the type of answer expected Build a focus for the answer Is a word or sequence word which define the question and disambiguate it in the sense that it indicates what the question is looking for, or what the question is all about. Transform the question into queries for the search engine

9 Question Type and Answer Type
Q – Class : What Q-Sub Class Answer Type Example of Question Focus Basic What Money / Number / Definition / Title / Undefined What was the monetary value of the Nobel Peace Prize in 1989? monetary value What – Who Person / Organization What costume designer decided that Michael Jackson should only wear one glove? costume designer What – When Date In what year did Ireland elect its first woman president? year What – Where Location What is the capital of Uruguay? capital

10 Question Type and Answer Type
Q – Class : Who Q-Sub Class Answer Type Example of Question Focus Person / Organization Who is the author of the book “The Iron Lady: A Biography of Margaret Tetcher”? author Q – Class : Whom Q-Sub Class Answer Type Example of Question Focus Person / Organization Whom did the Chicago Bulls beat in 1993 championship? Chicago Bulls

11 Question Type and Answer Type
Q – Class : Where Q-Sub Class Answer Type Example of Question Focus Location Where is Taj Mahal? Taj Mahal Q – Class : When Q-Sub Class Answer Type Example of Question Focus Date When did the Jurassic Period end? Jurasic Period

12 Question Type and Answer Type
Q – Class : Which Q-Sub Class Answer Type Example of Question Focus Which –What Organization Which Japanese car maker had its biggest percentage of sales in the domestic market? Japanese car maker Which – Who Person Which former Klu Klux Klan member won an elected office in the U.S.? Klu Klux Klan member Which – When Date In which year was New Zealand excluded from the ANZUS alliance? year Which – Where Location Which city has the oldest relationship as sister city with Los Angeles? city

13 Question Type and Answer Type
Q – Class : How Q-Sub Class Answer Type Example of Question Focus Basic How Manner How did Socrates die? Socrates How many Number How many people died in when the Estonia sank in 1994 People How long Time/Distance How long does it take to travel from Tokyo to Niigata - How much Money/Price How much did Mercury spend on advertising in 1993 Mercury How much <modifier> Undefined How much stronger is the new carbon material invented by the Tokyo Institute compared with another material? New carbon material

14 Question Type and Answer Type
(lanjutan : how) Q-Sub Class Answer Type Example of Question Focus How far Distance How far is Madrid from Barcelona? Madrid How tall Number How tall is Mount Everest Mount Everest How rich Undefined How rich is Bill Gates Bill Gates How large How large is the Arctic Arctic

15 Question Type and Answer Type
Q – Class : Name Q-Sub Class Answer Type Example of Question Focus Name who Person / Organization How did Socrates die? Socrates Name where Location How many people died in when the Estonia sank in 1994 People Name what Title How long does it take to travel from Tokyo to Niigata -

16 Extracting Question Keywords
There are 8 ordered heuristics : Each heuristic returns a set of keywords, that is added in the same order to the question keywords Initial : First 6 heuristics are considered In the retrieval loop : The other two heuristics are added

17 The 6 heuristics : Whenever quotes expressions are recognized in a question, all non-stop words of the quotation became keywords All name entities recognized as proper nouns (penggolongan kata benda untuk orang – person, tempat – place, dan benda secara spesifik) , are selected as keywords All complex nominals and their adjectival modifiers are selected as keywords Ex : an occasional cup of coffee (a cup of coffee that someone drinks occasionally)

18 All other complex nominals are selected as keywords
All noun and their adjectival modifiers are selected as keywords All other nouns recognized in the question are selected as keywords

19 The other two heuristics
All verbs from the question are selected as keywords The question focus is added to the keywords

20 Sample

21 Architecture System LASSO

22 Search Engine 1st Development : 2nd Development : Vector Space Model
The model doesn’t allow for extraction of those document which include all of the keywords The model extracts extract the similarity measure between the document and the query based on cosine similarity value 2nd Development : Required search engine are much more rigid Documents should be retrieved only when all of the keywords present in the document Boolean Model

23 Paragraph Filtering Background : Steps :
The number of documents that contain the keywords returned by the search engine may be large since only weak Boolean were used Steps : Perform the segmentation process over the documents retrieved into sentences and paragraphs. Sentence separators using punctuation Paragraph separators that were implemented are HTML tags, empty lines, and paragraph indentation.

24 Paragraph Ordering ~ Step 1. Paragraph Windowing ~

25 Paragraph Ordering ~ Step 1. Paragraph Scoring ~
Same_word_sequence_distance: Computes the number of words from the question that are recognized in the same sequence in the current paragraph-window Max_Distance_score: Represent the number of words that separate the most distant keywords in the window Missing_keywords_score: Computes the number of un-matched keywords


