Presentation is loading. Please wait.

Presentation is loading. Please wait.

Oxygen Indexing Relations from Natural Language Jimmy Lin, Boris Katz, Sue Felshin Oxygen Workshop, January, 2002.

Similar presentations


Presentation on theme: "Oxygen Indexing Relations from Natural Language Jimmy Lin, Boris Katz, Sue Felshin Oxygen Workshop, January, 2002."— Presentation transcript:

1 oxygen Indexing Relations from Natural Language Jimmy Lin, Boris Katz, Sue Felshin Oxygen Workshop, January, 2002

2 oxygen The Information Access Problem Widespread electronic access to knowledge… But users are overwhelmed with information! –Different sites. –Different formats. –Different access protocols. Two different methods: –Natural language question answering: START. –Information retrieval: Web search engines.

3 oxygen START: Natural Language Processing Sophisticated natural language processing. –Syntax: the structure of language. –Semantics: the meaning of language.

4 oxygen Tradeoffs Advantages: –Returns “Just the Right Information.” –High precision. –Intuitive and easy to use. Disadvantages: –Coverage is narrow. Annotations are wonderful, but… –Trained individuals are required build the knowledge base. –Expanding the knowledge coverage is time intensive.

5 oxygen Information Retrieval Use of Boolean, probabilistic, or vector-space models.

6 oxygen Tradeoffs Advantages: –Fast, automatic, large scale indexing. –Open-domain, broad coverage. Disadvantages: –Users are required to sort through irrelevant documents. –“Bag-of-words” paradigm can’t capture meaning. The bird ate the snake. The snake ate the bird. the meaning of life a meaningful life the house by the river The river by the house the largest planet’s volcanoes the planet’s largest volcanoes

7 oxygen Best of Both Worlds NLP + IR = high precision + broad coverage Syntactic relations for Question Answering: –Automatically extractable from natural language text. –Amenable to large-scale indexing, retrieval, and matching. –Reliable for capturing “meaning.” unexplored area Precision Coverage NLP IR

8 oxygen Lessons Learned from START Borrow Ternary Expressions. –To capture syntactic relations. –Proven to be suitable for representing natural language. –Leverage previous experience. –Simplified for large-scale storage and retrieval. Match Questions and Answers… –At the Ternary Expression level. –Bring to bear sophisticated linguistic techniques:  Synonymy  Ontological relations  Transformational rules  … etc.

9 oxygen Using Syntactic Relations Syntactic relations as a clue to meaning. the largest planet’s volcanoes the planet’s largest volcanoes The bird ate the snake. The snake ate the bird. the house by the river The river by the house the meaning of life a meaningful life

10 oxygen Database Natural Language Parser System Architecture Indexing Documents … How tall is the Sears Tower? Who killed Lincoln? Where is Belize located? John Wilkes Booth Relations Matcher Answers: Central America 1,454 feet tall Abraham Lincoln, the 16 th president of the United States… blah blah blah blah blah blah

11 oxygen The Experiment Corpus: World Encyclopedia –20,000 articles. –50 Megabytes in size. Test Set: 16 sample questions Index of relations created at the sentence level. Matcher returns corpus sentences that have the most relations in common with the question. Inverted index created at the sentence level. All words stemmed, stopwords dropped. Matcher returns corpus sentences that have the most keywords in common with the question. Syntactic relationsBoolean retrieval system Test SystemBaseline System

12 oxygen Results Relations Indexing Keyword Indexing Question Number Precision

13 oxygen Numerical Data Average Precision: –Relations: 0.84 –Baseline: 0.29 Average Number of Sentences Returned: –Relations: 4.0 –Baseline: 43.9 Average Number of Correct Sentences (per question): –Relations: 3.1 –Baseline: 5.9

14 oxygen Our Test Set Specifically crafted… –To highlight the ambiguities of natural language. –To demonstrate relations that are critical to question answering. Examples (similar words, different meanings) What do frogs eat? What eats snakes? What countries have invaded Russia? What does Japan import? When do lions hunt? Who defeated the Spanish Armada? What eats frogs? What do snakes eat? What countries have Russia invaded? What does the United States import from Japan? Where are lions hunted? Who did the Spanish Armada defeat?

15 oxygen Sample Results What do frogs eat? (1) Alligators eat many kinds of small animals that live in or near the water, including fish, snakes, frogs, turtles, small mammals, and birds. (2) Some bats catch fish with their claws, and a few species eat lizards, rodents, small birds, tree frogs, and other bats. (3) Bowfins eat mainly other fish, frogs, and crayfish. (4) Adult frogs eat mainly insects and other small animals, including earthworms, minnows, and spiders. (5) Kookaburras eat caterpillars, fish, frogs, insects, small mammals, snakes, worms, and even small birds. … (32) Retrieving based on relations produces “Just the Right Information.”

16 oxygen More Sample Results What is the world's largest country? (1) Russia is the world's largest country in terms of area. (2) In terms of population, China is the world's largest country. (3) France ranks as the world's second largest wine-producing country, after Italy (4) Germany is the world's third largest manufacturer of automobiles; Japan and the United States are the largest automobile-producing countries. (5) … Retrieving based on relations produces “Just the Right Information.”

17 oxygen Conclusion Language is complicated… But keywords are not enough. Extraction of certain syntactic relations from large amounts of text is practical. Question Answering using Syntactic Relations. –Automatically generated. –Indexed on a large scale. Significant improvement in precision.


Download ppt "Oxygen Indexing Relations from Natural Language Jimmy Lin, Boris Katz, Sue Felshin Oxygen Workshop, January, 2002."

Similar presentations


Ads by Google