Presentation is loading. Please wait.

Presentation is loading. Please wait.

Structured Browsing for Unstructured Text

Similar presentations


Presentation on theme: "Structured Browsing for Unstructured Text"— Presentation transcript:

1 Structured Browsing for Unstructured Text
IDEA NAVIGATION Structured Browsing for Unstructured Text Robin Stewart, MIT CSAIL Gregory Scott, Tufts University Vladimir Zelevinsky, Endeca CHI 2008 • Florence, Italy

2 WHY?

3 Medici Clinton Data set: US news corpus, year 2000
Wanted to give an example that somehow relates to Florence Clinton

4 Q: What did Hillary Clinton propose in October 2000?

5 A: let’s search! hillary clinton proposed 30 results
Mr. Lazio proposed the repeal of [...] ran against Hillary Clinton. mrs. clinton proposed 25 results Mrs. Clinton! Mr. Lazio! Mr. Lazio! Mrs. Clinton! Arrrgh! “clinton proposed” 112 results Some guy named Bill…? "mrs. clinton proposed" 1 result Only one?! Hmm….

6 IDEA What if we could search for: Noun: (Hillary) Clinton
Verb: propose(d) – or synonym Noun: ??? Subject (noun phrase) Verb phrase Object (noun phrase) IDEA

7 Not keywords IDEAS

8 HOW?

9 How do we obtain ideas from data?
N V V N N V A N V V V N It is said Mrs. Clinton promises new jobs will be created by her. Make sure to mention we extract ALL triples from ALL sentences from ALL documents in our corpus. part of speech tagging noun / verb phrase extraction sentence structure analysis anaphora resolution passive tense flipping triple filtering hierarchy generation

10 Hierarchy generation:
Nouns by head noun: [Mrs. + Hillary + Bill + President] → Clinton Verbs by hypernyms (broadening synonyms): [say + tell + propose + suggest + declare] → express

11 WHAT?

12 Here’s the interface of our prototype system which contains about 9,000 news articles from October We group all of the extracted subject-verb-object triples into a navigable summary widget on the left, with each column sorted by frequency. Let’s use idea navigation to answer the question Vladimir asked: what did Hillary Clinton propose? We see that “Clinton” is in the “Subject” column, so let’s click that.

13 We can see that the system has grouped many Clinton’s under the heading. Let’s click “Mrs. Clinton.”

14 Meanwhile, the verb and object columns have been updated automatically to only display the triples which have “Mrs. Clinton” as their subject - so we see only the things that Mrs. Clinton did. We want to find things she proposed… well, proposing is a type of expressing so let’s try that.

15 Yep, here’s “proposed” which has been grouped under “express” by WordNet. We click it.

16 Now we see all five triples that match
Now we see all five triples that match. On the right, you can see the sentences that the triples were extracted from, to get some of the context of the idea. If interested, you can click on “in full” to see the full article. One last note: at any point, users can narrow the results via a keyword search using this search box.

17 Different types of search tasks:
User study Different types of search tasks: Noun-verb relationship We carried out a formative evaluation to test whether users would understand the idea navigation interface after a brief introduction, choose to use it when given the option alongside a standard search box, and successfully complete tasks with its help. We gave 11 users a range of search tasks which either depended on a noun-verb relationship (like we have just seen) or were abstract or subjective (such as “find quotations that you consider controversial”). “What did Hillary Clinton propose?” Abstract / subjective “Find quotations that you consider controversial”

18 Users progressively abandoned the search box
Result: Users progressively abandoned the search box } User searches for “controversy”... scans sentences... starts over. Refines by verb: “express” → “say”... scans... Searches for “offensive”... scans... starts over. Searches for “race black”… no results. Searches for “african american” Refines by verb: “resegregate”... and reads the article. Initial search fails } The primary result of the study was that users usually started out by using keyword search, and then progressed to idea navigation when the search results turned out to be inadequate. For example, searching for “controversy” doesn’t actually return controversial articles in most cases. When this failed, users turned to idea navigation and found promising terms such as “resegregate” which did lead to controversial articles. Overall, the users made 100 idea navigation refinements and 61 searches. All tasks were successfully completed and 79% of them were completed with idea navigation as the final search step. Idea Navigation provides a path to an answer • Overall: 100 idea navigation refinements vs. 61 searches • 79% of completed tasks used idea navigation as the final step

19 Future work More: sentence structures (gerunds as noun phrases?)
similarity-grouping methods facet refinement features Test with other domains: health science; legal; patents Comparative user study There are many ways that our prototype system could be extended and enhanced, including the ability to extract triples from more sentence types, use better techniques to group similar terms together, and provide more query refinement features such as the ability to select multiple refinements in the same column. We also expect that Idea Navigation will be most useful in domains such as health science, legal case studies, and patents, where users often need to search for concepts that depend on a noun-verb relationship. Finally, we would like to do a more extensive, comparative user study to find out how idea navigation compares to other search components such as metadata facets and tag clouds.

20 Contributions A method of extracting subject-verb-object triples that can be presented to end users A faceted browsing interface for summarizing and easily navigating these extracted ideas In summary, we’ve developed a method of extracting subject-verb object triples that are suitable for summarizing and presenting directly to end users, and we’ve designed a faceted browsing-style interface for summarizing these triples and letting users easily navigate through them to help find what they want. And last, we’d like to thank our colleagues at Endeca and MIT CSAIL for their advice, feedback, and support. Thank you. Many thanks to all our colleagues at Endeca and MIT CSAIL


Download ppt "Structured Browsing for Unstructured Text"

Similar presentations


Ads by Google