Download presentation
Presentation is loading. Please wait.
Published byMilo Heath Modified over 9 years ago
1
Aware Discovering characteristics of habitable question answering systems with iterative formative evaluation Bill Ogden Ron Zacharski New Mexico State University
2
CR L COMPUTING RESEARCH LABORATORY Habitability Watt (1968) –A language is considered habitable if users can express everything that is needed for a task using language they would expect the system to understand. –Describes how easily, naturally, and effectively users can use language to express themselves within the constraints of a system language. –If there are 26 ways that a user population would be likely to ask a question, a habitable system will process all 26.
3
CR L COMPUTING RESEARCH LABORATORY Today Progress w/ three methodologies –User protocol analysis –Wizard of Oz dialog –Formative evaluation Collaboration
4
CR L COMPUTING RESEARCH LABORATORY Project Goals Identify and/or develop new interface elements aiding Q&A visibility. Innovative user interfaces are best achieved through iterative user testing New evaluation methodologies will emerge from interactive Q&A user testing.
5
CR L COMPUTING RESEARCH LABORATORY User Protocol Task e.g. “ You are not sure about the safety of genetically engineered foods, and would like to find more information and research on this topic. Name four potential types of safety problems that have been raised. ” 8 tasks, 6 users, 2-3 sessions. Recorded screen and voice. User protocols are now being analyzed
8
CR L COMPUTING RESEARCH LABORATORY
9
CR L COMPUTING RESEARCH LABORATORY Dialog pre-evaluation We used primarily Wizard responses. Two NIST analysts, each with 10 tasks. Surprises: –Users asked complex questions – and were satisfied with simple answers.
10
CR L COMPUTING RESEARCH LABORATORY Referring Expressions in the Wizard of Oz Study Kehler (2000) in an analysis of referring expressions in a Woz study simulating a multimodal travel guide application While reference resolution for human-human conversations is extremely difficult, reference resolution for human-computer conversations can be computed by means of a simple model. He found that all third person pronominal forms (24% of the referring expressions) referred to an entity introduced by an NP or displayed as an object on the display.
11
CR L COMPUTING RESEARCH LABORATORY Referring expressions 217 total Third person pronominal forms accounted for only 3.6% of referring expressions Only 11% where anaphoric expressions and 1/3 of these did not have NP antecedents.
12
CR L COMPUTING RESEARCH LABORATORY Examples Definite Determiner the operating system used by microsoft has been infiltrated with a computer virus which disrupts the overall system operation. I would like to know what microsoft is doing to correct the situation. Demonstrative Determiner i would like to know what microsoft is doing to correct the situation and what statements they have made as to their progress in this problem solution
13
CR L COMPUTING RESEARCH LABORATORY Examples There was only one occurrence of a third person pronoun whose referent was introduced in an earlier sentence by the system: Wizard:Perhaps the following is relevent: I can find reports of nine recent joint ventures with Japanese companies. Do you want to view all or just the first one User: :all of them
14
CR L COMPUTING RESEARCH LABORATORY START Evaluation START Natural language system Developed by Boris Katz at MIT's Artificial Intelligence Laboratory Answers questions in English about the MIT AI Laboratory, geography, and assorted other topics
15
CR L COMPUTING RESEARCH LABORATORY Iterative Formative Evaluation Short, empirical design evaluation studies Focus is on improvement, not validation Fixing ‘ details ’ often leads to expected productivity gains
16
CR L COMPUTING RESEARCH LABORATORY START examples ===> which countries export tea? Sorry - I don't know. ===> what does Burundi export? Burundi Exports - commodities: coffee, tea, sugar, cotton, hides ===> Does Burundi export tea? I'm not following what you're saying. Please try it a different way ===> Does China export tea? I am sorry to say I don't know whether or not China exports tea.
17
CR L COMPUTING RESEARCH LABORATORY START examples ===> Is Montana bigger than California? Unfortunately, I don't know whether Montana is bigger than California. ===> Is Mars bigger than Mercury? Mars has a diameter of 6,780 km, and Mercury has a diameter of 4,880 km. Mars' diameter is 1,900 km larger than Mercury's. Source: Planetary Sciences at the National Space Science Data Center
18
CR L COMPUTING RESEARCH LABORATORY START examples ===> How big is Montana? The area of Montana is 145,556 square miles. Source: START KB ===> How big is California? There are 155,973 square miles in California. Source: START KB
19
CR L COMPUTING RESEARCH LABORATORY START Evaluation “find the three biggest: countries in Asia, states in the US, planets. Order made a difference. Users will echo systems responses
20
CR L COMPUTING RESEARCH LABORATORY START Evaluation QUERY: What are the three largest countries in Asia? START's reply The following are the countries ranked in the top three places for area among countries in Asia. China India Indonesia
21
CR L COMPUTING RESEARCH LABORATORY
22
CR L COMPUTING RESEARCH LABORATORY START Evaluation QUERY: What are the states ranked in the top three places for area among the states in the United States? START's reply The word RANKED may be misspelled. Please choose one of the following: Ranged Yanked Accept Word Abort
23
CR L COMPUTING RESEARCH LABORATORY Decomposition in START QUERY: What is the population of the capitol of Greenland? I don't know the answer to your question. QUERY: What is the capitol of Greenland? Nuuk is the capital of Greenland.NuukGreenland QUERY: What is the population of Nuuk? Main Entry: Nuuk Pronunciation: 'nük Variant(s): or Godt·håb /'got-"hop/ Usage: geographical name town capital of Greenland on SW coast population 12,181
24
CR L COMPUTING RESEARCH LABORATORY Solutions. Fix the system Use dialog “ I don ’ t know if Montana is bigger than Texas but I have information you can use to calculate it yourself ” Give partial answers. Montana area : 145,556 square miles California area: 155,973 square miles
25
CR L COMPUTING RESEARCH LABORATORY Project Goals for Collaboration –Identify the characteristics of habitable Aquaint systems –Use prototype Aquaint systems for iterative formative evaluation
26
CR L COMPUTING RESEARCH LABORATORY Remaining Issue Using surrogate users/tasks may not capture ‘ real ’ Q&A user behavior We are looking for ways to observe users who are working on their own questions. –Lack of control will be offset by richness of behavior
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.