Lecture 30 Dialogues Question Answering

Lecture 30 Dialogues Question Answering
November 9, 2005 9/22/2018

Information-State and Dialogue Acts
If we want a dialogue system to be more than just form-filling Needs to: Decide when the user has asked a question, made a proposal, rejected a suggestion Ground a user’s utterance, ask clarification questions, suggestion plans Suggests: Conversational agent needs sophisticated models of interpretation and generation In terms of speech acts and grounding Needs more sophisticated representation of dialogue context than just a list of slots 9/22/2018

Information-state architecture
Dialogue act interpreter Dialogue act generator Set of update rules Update dialogue state as acts are interpreted Generate dialogue acts Control structure to select which update rules to apply 9/22/2018

Information-state 9/22/2018

Dialogue acts Also called “conversational moves”
An act with (internal) structure related specifically to its dialogue function Incorporates ideas of grounding Incorporates other dialogue and conversational functions that Austin and Searle didn’t seem interested in 9/22/2018

Verbmobil task Two-party scheduling dialogues
Speakers were asked to plan a meeting at some future date Data used to design conversational agents which would help with this task (cross-language, translating, scheduling assistant) 9/22/2018

Verbmobil Dialogue Acts
THANK thanks GREET Hello Dan INTRODUCE It’s me again BYE Allright, bye REQUEST-COMMENT How does that look? SUGGEST June 13th through 17th REJECT No, Friday I’m booked all day ACCEPT Saturday sounds fine REQUEST-SUGGEST What is a good day of the week for you? INIT I wanted to make an appointment with you GIVE_REASON Because I have meetings all afternoon FEEDBACK Okay DELIBERATE Let me check my calendar here CONFIRM Okay, that would be wonderful CLARIFY Okay, do you mean Tuesday the 23rd? 9/22/2018

DAMSL: forward looking func.
STATEMENT a claim made by the speaker INFO-REQUEST a question by the speaker CHECK a question for confirming information INFLUENCE-ON-ADDRESSEE (=Searle's directives) OPEN-OPTION a weak suggestion or listing of options ACTION-DIRECTIVE an actual command INFLUENCE-ON-SPEAKER (=Austin's commissives) OFFER speaker offers to do something COMMIT speaker is committed to doing something CONVENTIONAL other OPENING greetings CLOSING farewells THANKING thanking and responding to thanks 9/22/2018

DAMSL: backward looking func.
AGREEMENT speaker's response to previous proposal ACCEPT accepting the proposal ACCEPT-PART accepting some part of the proposal MAYBE neither accepting nor rejecting the proposal REJECT-PART rejecting some part of the proposal REJECT rejecting the proposal HOLD putting off response, usually via subdialogue ANSWER answering a question UNDERSTANDING whether speaker understood previous SIGNAL-NON-UNDER. speaker didn't understand SIGNAL-UNDER. speaker did understand ACK demonstrated via continuer or assessment REPEAT-REPHRASE demonstrated via repetition or reformulation COMPLETION demonstrated via collaborative completion 9/22/2018

9/22/2018

Automatic Interpretation of Dialogue Acts
How do we automatically identify dialogue acts? Given an utterance: Decide whether it is a QUESTION, STATEMENT, SUGGEST, or ACK Recognizing illocutionary force will be crucial to building a dialogue agent Perhaps we can just look at the form of the utterance to decide? 9/22/2018

Can we just use the surface syntactic form?
YES-NO-Q’s have auxiliary-before-subject syntax: Will breakfast be served on USAir 1557? STATEMENTs have declarative syntax: I don’t care about lunch COMMAND’s have imperative syntax: Show me flights from Milwaukee to Orlando on Thursday night 9/22/2018

Surface form != speech act type
Locutionary Force Illocutionary Can I have the rest of your sandwich? Question Request I want the rest of your sandwich Declarative Give me your sandwich! Imperative 9/22/2018

Dialogue act disambiguation is hard! Who’s on First?
Abbott: Well, Costello, I'm going to New York with you. Bucky Harris the Yankee's manager gave me a job as coach for as long as you're on the team. Costello: Look Abbott, if you're the coach, you must know all the players. Abbott: I certainly do. Costello: Well you know I've never met the guys. So you'll have to tell me their names, and then I'll know who's playing on the team. Abbott: Oh, I'll tell you their names, but you know it seems to me they give these ball players now-a-days very peculiar names. Costello: You mean funny names? Abbott: Strange names, pet names...like Dizzy Dean... Costello: His brother Daffy Abbott: Daffy Dean... Costello: And their French cousin. Abbott: French? Costello: Goofe' Abbott: Goofe' Dean. Well, let's see, we have on the bags, Who's on first, What's on second, I Don't Know is on third... Costello: That's what I want to find out. Abbott: I say Who's on first, What's on second, I Don't Know's on third. 9/22/2018

Dialogue act ambiguity
Who’s on first? INFO-REQUEST or STATEMENT 9/22/2018

Dialogue Act ambiguity
Can you give me a list of the flights from Atlanta to Boston? This looks like an INFO-REQUEST. If so, the answer is: YES. But really it’s a DIRECTIVE or REQUEST, a polite form of: Please give me a list of the flights… What looks like a QUESTION can be a REQUEST 9/22/2018

Dialogue Act ambiguity
Similarly, what looks like a STATEMENT can be a QUESTION: Us OPEN-OPTION I was wanting to make some arrangements for a trip that I’m going to be taking uh to LA uh beginnning of the week after next Ag HOLD OK uh let me pull up your profile and I’ll be right with you here. [pause] CHECK And you said you wanted to travel next week? ACCEPT Uh yes. 9/22/2018

Indirect speech acts Utterances which use a surface statement to ask a question Utterances which use a surface question to issue a request 9/22/2018

DA interpretation as statistical classification
Lots of clues in each sentence that can tell us which DA it is: Words and Collocations: Please or would you: good cue for REQUEST Are you: good cue for INFO-REQUEST Prosody: Rising pitch is a good cue for INFO-REQUEST Loudness/stress can help distinguish yeah/AGREEMENT from yeah/BACKCHANNEL Conversational Structure Yeah following a proposal is probably AGREEMENT; yeah following an INFORM probably a BACKCHANNEL 9/22/2018

HMM model of dialogue act interpretation
A dialogue is an HMM The hidden states (the thing we are trying to detect) are the dialogue acts The observation sequences (what we see) are sentences Each observation is one sentence, including words+ acoustics We have 3 probabilistic models: Conversational Structure: Probability of one dialogue act following another P(Answer|Question) Words and Syntax: Probability of a sequence of words given a dialogue act: P(“do you” | Question) Prosody: probability of prosodic features given a dialogue act : P(“rise at end of sentence” | Question) 9/22/2018

An example of dialogue act detection: Correction Detection
Despite all these clever confirmation/rejection strategies, dialogue systems still make mistakes (Surprise!) If system misrecognizes an utterance, and either Rejects Via confirmation, displays its misunderstanding Then user has a chance to make a correction Repeat themselves Rephrasing Saying “no” to the confirmation question. 9/22/2018

Corrections Unfortunately, corrections are harder to recognize than normal sentences! Swerts et al (2000): corrections misrecognized twice as often (in terms of WER) as non-corrections!!! Why? Prosody seems to be largest factor: hyperarticulation English Example from Liz Shriberg “NO, I am DE-PAR-TING from Jacksonville) A German example from Bettina Braun from a talking elevator 9/22/2018

A Labeled dialogue (Swerts et al)
9/22/2018

Machine Learning and Classifiers
Given a labeled training set We can build a classifier to label observations into classes Decision Tree Regression SVM I won’t introduce the algorithms here. But these are at the core of NLP/computational linguistics/Speech/Dialogue 9/22/2018

Machine learning to detect user corrections
Build classifiers using features like Lexical information (words “no”, “correction”, “I don’t”, swear words) Prosodic features (various increases in F0 range, pause duration, and word duration that correlation with hyperarticulation) Length ASR confidence LM probability Various dialogue features (repetition) 9/22/2018

New Idea: Modeling a dialogue system as a probabilistic agent
A conversational agent can be characterized by: The current knowledge of the system A set of states S the agent can be in a set of actions A the agent can take A goal G, which implies A success metric that tells us how well the agent achieved its goal A way of using this metric to create a strategy or policy  for what action to take in any particular state. 9/22/2018

What do we mean by actions A and policies ?
Kinds of decisions a conversational agent needs to make: When should I ground/confirm/reject/ask for clarification on what the user just said? When should I ask a directive prompt, when an open prompt? When should I use user, system, or mixed initiative? 9/22/2018

A threshold is a human-designed policy!
Could we learn what the right action is Rejection Explicit confirmation Implicit confirmation No confirmation By learning a policy which, given various information about the current state, dynamically chooses the action which maximizes dialogue success 9/22/2018

Answering English Questions by Computer
Jim Martin University of Colorado Computer Science

Question Answering from text
An idea originating from the IR community With massive collections of full-text documents, simply finding relevant documents is of limited use: we want answers from textbases QA: give the user a (short) answer to their question, perhaps supported by evidence. The common person’s view? [From a novel] “I like the Internet. Really, I do. Any time I need a piece of shareware or I want to find out the weather in Bogota … I’m the first guy to get the modem humming. But as a source of information, it sucks. You got a billion pieces of data, struggling to be heard and seen and downloaded, and anything I want to know seems to get trampled underfoot in the crowd.” M. Marshall. The Straw Men. HarperCollins Publishers, 2002. Of course it is a difficult area to evaluate the performance of systems as different people have different ideas as to what constitutes an answer, especially if the systems are trying to return exact answers. 9/22/2018

People want to ask questions…
Examples from AltaVista query log who invented surf music? how to make stink bombs where are the snowdens of yesteryear? which english translation of the bible is used in official catholic liturgies? how to do clayart how to copy psx how tall is the sears tower? Examples from Excite query log (12/1999) how can i find someone in texas where can i find information on puritan religion? what are the 7 wonders of the world how can i eliminate stress What vacuum cleaner does Consumers Guide recommend Around 12–15% of query logs 9/22/2018

AskJeeves AskJeeves is probably most hyped example of “Question answering” It largely does pattern matching to match your question to their own knowledge base of questions If that works, you get the human-curated answers to that known question If that fails, it falls back to regular web search A potentially interested middle ground, but a fairly weak shadow of real QA 9/22/2018

Online QA Examples Examples
AnswerBus is an open-domain question answering system: Ionaut: EasyAsk, AnswerLogic, AnswerFriend, Start, Quasm, Mulder, Webclopedia, etc. 9/22/2018

Finding Answers in Text
Not a new idea… (Simmons et al 1965) Take an encyclopedia and load it onto a computer. Take a question and parse it into a logical form Perform simple information retrieval to get relevant texts Parse those into a logical form Match and rank

Simmons What do worms eat? Worms eat ??? Candidates Worms eat grass
Grass is eaten by worms Birds eat worms Worms eat their way through the ground Horses with worms eat grain

Finding Answers in Text
Fundamentally, this is about modifying, processing, enriching or marking up both the question and potential answer texts to allow a simple match. All current systems do pretty much that.

People do ask questions…
Examples from various query logs Which english translation of the bible is used in official Catholic liturgies? How tall is the sears tower? How can i find someone in texas Where can i find information on puritan religion? What are the 7 wonders of the world How can i eliminate stress What vacuum cleaner does Consumers Guide recommend 9/22/2018

Question Answering at TREC
Question answering competition at TREC consists of answering a set of 500 fact-based questions, e.g., “When was Mozart born?”. For the first three years systems were allowed to return 5 ranked answer snippets (50/250 bytes) to each question. IR think Mean Reciprocal Rank (MRR) scoring: 1, 0.5, 0.33, 0.25, 0.2, 0 for 1, 2, 3, 4, 5, 6+ doc Mainly Named Entity answers (person, place, date, …) From 2002 the systems are only allowed to return a single exact answer and the notion of confidence has been introduced. Mozart was born in 1756. The reason for wanting only exact answers to be returned, is to get systems to accurately pinpoint the answer within the text. Systems can then provide as much context as the user requests, but to return context you need to know the extent of the answer within the text, i.e. exactly what piece of text constitutes just the answer and nothing else. 9/22/2018

The TREC Document Collection
The current collection uses news articles from the following sources: AP newswire, New York Times newswire, Xinhua News Agency newswire, In total there are 1,033,461 documents in the collection. 3GB of text Clearly this is too much text to process entirely using advanced NLP techniques so the systems usually consist of an initial information retrieval phase followed by more advanced processing. Many supplement this text with use of the web, and other knowledge bases 9/22/2018

Full-Blown System Parse and analyze the question
Formulate queries suitable for use with an IR system (search engine) Retrieve ranked results Break into suitable units Perform NLP on those Rank snippets based on NLP processing Done

QA Block Architecture Q A Question Processing Passage Retrieval Answer
Extracts and ranks passages using surface-text techniques Captures the semantics of the question Selects keywords for PR Extracts and ranks answers using NL techniques Question Semantics Q Question Processing Passage Retrieval Answer Extraction A Keywords Passages WordNet WordNet Document Retrieval Parser Parser NER NER 9/22/2018

Question Processing Two main tasks Determining the type of the answer
Extract keywords from the question and formulate a query 9/22/2018

Answer Types Factoid questions… Who, where, when, how many…
The answers fall into a limited and somewhat predictable set of categories Who questions are going to be answered by… Where questions… Generally, systems select answer types from a set of Named Entities, augmented with other types that are relatively easy to extract 9/22/2018

Answer Types Of course, it isn’t that easy…
Who questions can have organizations as answers Who sells the most hybrid cars? Which questions can have people as answers Which president went to war with Mexico? 9/22/2018

Answer Type Taxonomy Contains ~9000 concepts reflecting expected answer types Merges named entities with the WordNet hierarchy 9/22/2018

Answer Type Detection Most systems use a combination of hand-crafted rules and supervised machine learning to determine the right answer type for a question. But remember our notion of matching. It doesn’t do any good to do something complex here if it can’t also be done in potential answer texts. 9/22/2018

Keyword Selection Answer Type indicates what the question is looking for, but that doesn’t really help in finding relevant texts (i.e. Ok, let’s look for texts with people in them) Lexical terms (keywords) from the question, possibly expanded with lexical/semantic variations provide the required context. 9/22/2018

Lexical Terms Extraction
Questions approximated by sets of unrelated words (lexical terms) Similar to bag-of-word IR models Question (from TREC QA track) Lexical terms Q002: What was the monetary value of the Nobel Peace Prize in 1989? monetary, value, Nobel, Peace, Prize Q003: What does the Peugeot company manufacture? Peugeot, company, manufacture Q004: How much did Mercury spend on advertising in 1993? Mercury, spend, advertising, 1993 Q005: What is the name of the managing director of Apricot Computer? name, managing, director, Apricot, Computer 9/22/2018

Keyword Selection Algorithm
Select all non-stopwords in quotations Select all NNP words in recognized named entities Select all complex nominals with their adjectival modifiers Select all other complex nominals Select all nouns with adjectival modifiers Select all other nouns Select all verbs Select the answer type word 9/22/2018

Passage Retrieval Q A Question Processing Passage Retrieval Answer

Passage Extraction Loop
Passage Extraction Component Extracts passages that contain all selected keywords Passage size dynamic Start position dynamic Passage quality and keyword adjustment In the first iteration use the first 6 keyword selection heuristics If the number of passages is lower than a threshold  query is too strict  drop a keyword If the number of passages is higher than a threshold  query is too relaxed  add a keyword 9/22/2018

Passage Scoring Passages are scored based on keyword windows k1 k2 k3
For example, if a question has a set of keywords: {k1, k2, k3, k4}, and in a passage k1 and k2 are matched twice, k3 is matched once, and k4 is not matched, the following windows are built: Window 1 Window 2 k k2 k3 k2 k1 k k2 k3 k2 k1 Window 3 Window 4 k k2 k3 k2 k1 k k2 k3 k2 k1 9/22/2018

Passage Scoring Passage ordering is performed using a sort that involves three scores: The number of words from the question that are recognized in the same sequence in the window The number of words that separate the most distant keywords in the window The number of unmatched keywords in the window 9/22/2018

Answer Extraction Q A Question Processing Passage Retrieval Answer

Ranking Candidate Answers
Q066: Name the first private citizen to fly in space. Answer type: Person Text passage: “Among them was Christa McAuliffe, the first private citizen to fly in space. Karen Allen, best known for her starring role in “Raiders of the Lost Ark”, plays McAuliffe. Brian Kerwin is featured as shuttle pilot Mike Smith...” 9/22/2018

Ranking Candidate Answers
Q066: Name the first private citizen to fly in space. Answer type: Person Text passage: “Among them was Christa McAuliffe, the first private citizen to fly in space. Karen Allen, best known for her starring role in “Raiders of the Lost Ark”, plays McAuliffe. Brian Kerwin is featured as shuttle pilot Mike Smith...” Best candidate answer: Christa McAuliffe 9/22/2018

Features for Answer Ranking
Number of question terms matched in the answer passage Number of question terms matched in the same phrase as the candidate answer Number of question terms matched in the same sentence as the candidate answer Flag set to 1 if the candidate answer is followed by a punctuation sign Number of question terms matched, separated from the candidate answer by at most three words and one comma Number of terms occurring in the same order in the answer passage as in the question Average distance from candidate answer to question term matches SIGIR ‘01 9/22/2018

Evaluation Evaluation of this kind of system is usually based on some kind of TREC-like metric. In Q/A the most frequent metric is Mean reciprocal rank You’re allowed to return N answers. Your score is based on 1/Rank of the first right answer. Averaged over all the questions you answer. 9/22/2018

LCC/UTD/SMU 9/22/2018

Is the Web Different? In TREC (and most commercial applications), retrieval is performed against a smallish closed collection of texts. The diversity/creativity in how people express themselves necessitates all that work to bring the question and the answer texts together. But… 9/22/2018

The Web is Different On the Web popular factoids are likely to be expressed in a gazzilion different ways. At least a few of which will likely match the way the question was asked. So why not just grep (or agrep) the Web using all or pieces of the original question. 9/22/2018

AskMSR Process the question by… Get some results
Forming a search engine query from the original question Detecting the answer type Get some results Extract answers of the right type based on How often they occur 9/22/2018

Step 1: Rewrite the questions
Intuition: The user’s question is often syntactically quite close to sentences that contain the answer Where is the Louvre Museum located? The Louvre Museum is located in Paris Who created the character of Scrooge? Charles Dickens created the character of Scrooge. 9/22/2018

Query rewriting Classify question into seven categories
Who is/was/are/were…? When is/did/will/are/were …? Where is/are/were …? a. Hand-crafted category-specific transformation rules e.g.: For where questions, move ‘is’ to all possible locations Look to the right of the query terms for the answer. “Where is the Louvre Museum located?”  “is the Louvre Museum located”  “the is Louvre Museum located”  “the Louvre is Museum located”  “the Louvre Museum is located”  “the Louvre Museum located is” 9/22/2018

Step 2: Query search engine
Send all rewrites to a Web search engine Retrieve top N answers ( ) For speed, rely just on search engine’s “snippets”, not the full text of the actual document 9/22/2018

Step 3: Gathering N-Grams
Enumerate all N-grams (N=1,2,3) in all retrieved snippets Weight of an n-gram: occurrence count, each weighted by “reliability” (weight) of rewrite rule that fetched the document Example: “Who created the character of Scrooge?” Dickens Christmas Carol 78 Charles Dickens 75 Disney Carl Banks A Christmas Christmas Carol 45 Uncle 9/22/2018

Step 4: Filtering N-Grams
Each question type is associated with one or more “data-type filters” = regular expressions for answer types Boost score of n-grams that match the expected answer type. Lower score of n-grams that don’t match. 9/22/2018

Step 5: Tiling the Answers
Scores 20 15 10 merged, discard old n-grams Charles Dickens Dickens Mr Charles Score 45 Mr Charles Dickens 9/22/2018

Results Standard TREC contest test-bed (TREC 2001): 1M documents; 900 questions Technique does ok, not great (would have placed in top 9 of ~30 participants) But with access to the Web… They do much better, would have come in second on TREC 2001 Be suspicious of any after the bake-off is over metrics 9/22/2018

Harder Questions Factoid question answering is really pretty silly.
A more interesting task is one where the answers are fluid and depend on the fusion of material from disparate texts over time. Who is Condoleezza Rice? Who is Mahmoud Abbas? Why was Arafat flown to Paris? 9/22/2018

Lecture 30 Dialogues Question Answering

Similar presentations

Presentation on theme: "Lecture 30 Dialogues Question Answering"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Lecture 30 Dialogues Question Answering

Similar presentations

Presentation on theme: "Lecture 30 Dialogues Question Answering"— Presentation transcript:

Similar presentations

About project

Feedback