Structured Browsing for Unstructured Text

Slides:



Advertisements
Similar presentations
Support.ebsco.com Points of View Reference Center Tutorial.
Advertisements

Using CAB Abstracts to Search for Articles. Objectives Learn what CAB Abstracts is Know the main features of CAB Abstracts Learn how to conduct searches.
PubMed/How to Search, Display, Download & (module 4.1)
Database Searching: How to Find Journal Articles? START.
Database VS. Search Engine
Welcome to the Academic Search Premier tutorial By the end of this tutorial you should be able to: Do a basic search to find references Use search techniques.
Advanced Searching Engineering Village.
Dialogue – Driven Intranet Search Suma Adindla School of Computer Science & Electronic Engineering 8th LANGUAGE & COMPUTATION DAY 2009.
Engineering Village ™ ® Basic Searching On Compendex ®
Predicting Text Quality for Scientific Articles AAAI/SIGART-11 Doctoral Consortium Annie Louis : Louis A. and Nenkova A Automatically.
Access Tutorial 3 Maintaining and Querying a Database
Welcome to the Web of Science tutorial By the end of this tutorial you should be able to: Do a basic search to find references Use search techniques to.
SciFinder Web Version Pootorn R. Book Promotion & Service Co.,Ltd. Thailand.
NoveList Advanced Searching. Advanced Searching Access the Advanced Search page by clicking the “Advanced Search” link under the NoveList logo in the.
Welcome to the Science Direct tutorial By the end of this tutorial you should be able to: Do a basic search to find references Use search techniques to.
Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009.
Moodle (Course Management Systems). Glossaries Moodle has a tool to help you and your students develop glossaries of terms and embed them in your course.
4 1 SEARCHING THE WEB Using Search Engines and Directories Effectively New Perspectives on THE INTERNET.
Welcome to the Business Source Premier tutorial By the end of this tutorial you should be able to: Do a basic search to find references Use search techniques.
Introduction to Information Retrieval Example of information need in the context of the world wide web: “Find all documents containing information on computer.
Power Searching 501 (?): a crash course The stuff you need to know about searching, but may have forgotten along the way! (And, the stuff I want you to.
Answer Mining by Combining Extraction Techniques with Abductive Reasoning Sanda Harabagiu, Dan Moldovan, Christine Clark, Mitchell Bowden, Jown Williams.
User Guide Enhanced Knowledge Hub. 2 Note Accessing Knowledge Hub 1 2 Access K-Hub by selecting: 1.Knowledge Hub tab, OR 2.Knowledge Hub under My Communities.
Tutorial support.ebsco.com. Welcome to Explora, EBSCO’s engaging interface for schools and public libraries. Designed to meet the unique needs of its.
Exploring ProFile cont’d.
Microsoft Office Access 2010 Lab 3
Vocabulary Module 2 Activity 5.
Visual Information Retrieval
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Points of View Reference Center
Microsoft Office Access 2010 Lab 2
After this course you will be able to:
User Awareness Program ‘Accessing Emerald’ Universitas Lancang Kuning
Tutorial support.ebsco.com.
Points of View Reference Center
Searching corpora.
Introduction to Library Research: CO 1003
A Guide to Using Partner Publishers’ Resources (module 3)
Lesson 6: Databases and Web Search Engines
Writing a Research Proposal
Search Engine Architecture
HCT: The Library Catalogue
Improvements to Search
THE QUESTIONS—SKILLS ANALYSE EVALUATE INFER UNDERSTAND SUMMARISE
Pilot project training
Unit4 Partner Portal for Case Creator
Points of View Reference Center
Search Techniques and Advanced tools for Researchers
eTapestry Workshop Session 3: Queries and Reports
Core Concepts Lecture 1 Lexical Frequency.
Module 5: Data Cleaning and Building Reports
Accessing and searching for journals and wider material
Introduction of KNS55 Platform
How do I research effectively? Part 2
Accessing and searching for journals and wider material
IL Step 2: Searching for Information
Using journals and accessing electronic journal articles
CSE 635 Multimedia Information Retrieval
Lesson 6: Databases and Web Search Engines
NUR2300 – Guide to Searching ClinicalKey for Nursing
Literature Reviews.
Search Engine Architecture
EBSCOhost Digital Archives Viewer
Precise Condition Synthesis for Program Repair
Table of Contents – Part B
Jeopardy Passive Voice Stative Passive Passive Causative $100 $100
The BAWE Quicklinks project
PubMed/How to Search, Display, Download & (module 4.1)
Presentation transcript:

Structured Browsing for Unstructured Text IDEA NAVIGATION Structured Browsing for Unstructured Text Robin Stewart, MIT CSAIL Gregory Scott, Tufts University Vladimir Zelevinsky, Endeca CHI 2008 • Florence, Italy

WHY?

Medici Clinton Data set: US news corpus, year 2000 Wanted to give an example that somehow relates to Florence Clinton

Q: What did Hillary Clinton propose in October 2000?

A: let’s search! hillary clinton proposed 30 results Mr. Lazio proposed the repeal of [...] ran against Hillary Clinton. mrs. clinton proposed 25 results Mrs. Clinton! Mr. Lazio! Mr. Lazio! Mrs. Clinton! Arrrgh! “clinton proposed” 112 results Some guy named Bill…? "mrs. clinton proposed" 1 result Only one?! Hmm….

IDEA What if we could search for: Noun: (Hillary) Clinton Verb: propose(d) – or synonym Noun: ??? Subject (noun phrase) Verb phrase Object (noun phrase) IDEA

Not keywords IDEAS

HOW?

How do we obtain ideas from data? N V V N N V A N V V V N It is said Mrs. Clinton promises new jobs will be created by her. Make sure to mention we extract ALL triples from ALL sentences from ALL documents in our corpus. part of speech tagging noun / verb phrase extraction sentence structure analysis anaphora resolution passive tense flipping triple filtering hierarchy generation

Hierarchy generation: Nouns by head noun: [Mrs. + Hillary + Bill + President] → Clinton Verbs by hypernyms (broadening synonyms): [say + tell + propose + suggest + declare] → express

WHAT?

Here’s the interface of our prototype system which contains about 9,000 news articles from October 2000. We group all of the extracted subject-verb-object triples into a navigable summary widget on the left, with each column sorted by frequency. Let’s use idea navigation to answer the question Vladimir asked: what did Hillary Clinton propose? We see that “Clinton” is in the “Subject” column, so let’s click that.

We can see that the system has grouped many Clinton’s under the heading. Let’s click “Mrs. Clinton.”

Meanwhile, the verb and object columns have been updated automatically to only display the triples which have “Mrs. Clinton” as their subject - so we see only the things that Mrs. Clinton did. We want to find things she proposed… well, proposing is a type of expressing so let’s try that.

Yep, here’s “proposed” which has been grouped under “express” by WordNet. We click it.

Now we see all five triples that match Now we see all five triples that match. On the right, you can see the sentences that the triples were extracted from, to get some of the context of the idea. If interested, you can click on “in full” to see the full article. One last note: at any point, users can narrow the results via a keyword search using this search box.

Different types of search tasks: User study Different types of search tasks: Noun-verb relationship We carried out a formative evaluation to test whether users would understand the idea navigation interface after a brief introduction, choose to use it when given the option alongside a standard search box, and successfully complete tasks with its help. We gave 11 users a range of search tasks which either depended on a noun-verb relationship (like we have just seen) or were abstract or subjective (such as “find quotations that you consider controversial”). “What did Hillary Clinton propose?” Abstract / subjective “Find quotations that you consider controversial”

Users progressively abandoned the search box Result: Users progressively abandoned the search box } User searches for “controversy”... scans sentences... starts over. Refines by verb: “express” → “say”... scans... Searches for “offensive”... scans... starts over. … Searches for “race black”… no results. Searches for “african american” Refines by verb: “resegregate”... and reads the article. Initial search fails } The primary result of the study was that users usually started out by using keyword search, and then progressed to idea navigation when the search results turned out to be inadequate. For example, searching for “controversy” doesn’t actually return controversial articles in most cases. When this failed, users turned to idea navigation and found promising terms such as “resegregate” which did lead to controversial articles. Overall, the users made 100 idea navigation refinements and 61 searches. All tasks were successfully completed and 79% of them were completed with idea navigation as the final search step. Idea Navigation provides a path to an answer • Overall: 100 idea navigation refinements vs. 61 searches • 79% of completed tasks used idea navigation as the final step

Future work More: sentence structures (gerunds as noun phrases?) similarity-grouping methods facet refinement features Test with other domains: health science; legal; patents Comparative user study There are many ways that our prototype system could be extended and enhanced, including the ability to extract triples from more sentence types, use better techniques to group similar terms together, and provide more query refinement features such as the ability to select multiple refinements in the same column. We also expect that Idea Navigation will be most useful in domains such as health science, legal case studies, and patents, where users often need to search for concepts that depend on a noun-verb relationship. Finally, we would like to do a more extensive, comparative user study to find out how idea navigation compares to other search components such as metadata facets and tag clouds.

Contributions A method of extracting subject-verb-object triples that can be presented to end users A faceted browsing interface for summarizing and easily navigating these extracted ideas In summary, we’ve developed a method of extracting subject-verb object triples that are suitable for summarizing and presenting directly to end users, and we’ve designed a faceted browsing-style interface for summarizing these triples and letting users easily navigate through them to help find what they want. And last, we’d like to thank our colleagues at Endeca and MIT CSAIL for their advice, feedback, and support. Thank you. Many thanks to all our colleagues at Endeca and MIT CSAIL