Oxygen Indexing Relations from Natural Language Jimmy Lin, Boris Katz, Sue Felshin Oxygen Workshop, January, 2002.

Slides:



Advertisements
Similar presentations
Chapter 5: Introduction to Information Retrieval
Advertisements

Multimedia Database Systems
Semantic Access to Data from the Web Raquel Trillo *, Laura Po +, Sergio Ilarri *, Sonia Bergamaschi + and E. Mena * 1st International Workshop on Interoperability.
QA and Language Modeling (and Some Challenges) Eduard Hovy Information Sciences Institute University of Southern California.
Basic IR: Queries Query is statement of user’s information need. Index is designed to map queries to likely to be relevant documents. Query type, content,
Search Strategies Online Search Techniques. Universal Search Techniques Precision- getting results that are relevant, “on topic.” Recall- getting all.
6/16/20151 Recent Results in Automatic Web Resource Discovery Soumen Chakrabartiv Presentation by Cui Tao.
Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
The Informative Role of WordNet in Open-Domain Question Answering Marius Paşca and Sanda M. Harabagiu (NAACL 2001) Presented by Shauna Eggers CS 620 February.
Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
Shared Ontology for Knowledge Management Atanas Kiryakov, Borislav Popov, Ilian Kitchukov, and Krasimir Angelov Meher Shaikh.
FACT: A Learning Based Web Query Processing System Hongjun Lu, Yanlei Diao Hong Kong U. of Science & Technology Songting Chen, Zengping Tian Fudan University.
Computer comunication B Information retrieval. Information retrieval: introduction 1 This topic addresses the question on how it is possible to find relevant.
1 Information Retrieval and Web Search Introduction.
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
Web Search – Summer Term 2006 II. Information Retrieval (Basics) (c) Wolfgang Hürst, Albert-Ludwigs-University.
Recuperação de Informação. IR: representation, storage, organization of, and access to information items Emphasis is on the retrieval of information (not.
Chapter 5: Information Retrieval and Web Search
 IR: representation, storage, organization of, and access to information items  Focus is on the user information need  User information need:  Find.
Query Relevance Feedback and Ontologies How to Make Queries Better.
A Technical Seminar on Question Answering SHRI RAMDEOBABA COLLEGE OF ENGINEERING & MANAGEMENT Presented By: Rohini Kamdi Guided By: Dr. A.J.Agrawal.
Semantic Search via XML Fragments: A High-Precision Approach to IR Jennifer Chu-Carroll, John Prager, David Ferrucci, and Pablo Duboue IBM T.J. Watson.
Modern Information Retrieval Computer engineering department Fall 2005.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
A Tripartite Question Answering Architecture for Integrating Diverse Knowledge Resources Boris Katz, Gary Borchardt, Sue Felshin and Jimmy Lin MIT Computer.
1 Technologies for (semi-) automatic metadata creation Diana Maynard.
Natural Language Processing Guangyan Song. What is NLP  Natural Language processing (NLP) is a field of computer science and linguistics concerned with.
Flexible Text Mining using Interactive Information Extraction David Milward
Planning a search strategy.  A search strategy may be broadly defined as a conscious approach to decision making to solve a problem or achieve an objective.
Natural Language Based Reformulation Resource and Web Exploitation for Question Answering Ulf Hermjakob, Abdessamad Echihabi, Daniel Marcu University of.
©2003 Paula Matuszek CSC 9010: Text Mining Applications Document Summarization Dr. Paula Matuszek (610)
Semantic Technologies & GATE NSWI Jan Dědek.
Chapter 6: Information Retrieval and Web Search
Search Engine Architecture
Searching the web Enormous amount of information –In 1994, 100 thousand pages indexed –In 1997, 100 million pages indexed –In June, 2000, 500 million pages.
Evaluation of (Search) Results How do we know if our results are any good? Evaluating a search engine  Benchmarks  Precision and recall Results summaries:
Next Generation Search Engines Ehsun Daroodi 1 Feb, 2003.
Understanding Search Engines. Basic Defintions: Search Engine Search engines are information retrieval (IR) systems designed to help find specific information.
Introduction to Information Retrieval Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
1 Latent Concepts and the Number Orthogonal Factors in Latent Semantic Analysis Georges Dupret
1 Language Specific Crawler for Myanmar Web Pages Pann Yu Mon Management and Information System Engineering Department Nagaoka University of Technology,
1 Information Retrieval LECTURE 1 : Introduction.
Information Retrieval CSE 8337 Spring 2007 Introduction/Overview Some Material for these slides obtained from: Modern Information Retrieval by Ricardo.
Information Retrieval
Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq
What Does the User Really Want ? Relevance, Precision and Recall.
Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 1 Mining knowledge from natural language texts using fuzzy associated concept mapping Presenter : Wu,
Information Retrieval and Web Search Introduction to IR models and methods Rada Mihalcea (Some of the slides in this slide set come from IR courses taught.
MIT Artificial Intelligence Laboratory — Research Directions The START Information Access System Boris Katz
8 December 1997Industry Day Applications of SuperTagging Raman Chandrasekar.
I NFORMATION R ETRIEVAL AND W EB S EARCH Jianping Fan Department of Computer Science UNC-Charlotte 1.
Shallow & Deep QA Systems Ling 573 NLP Systems and Applications April 9, 2013.
Temperate forest The temperate forest is in north america.
Semantic Wiki: Automating the Read, Write, and Reporting functions Chuck Rehberg, Semantic Insights.
Integrating linguistic knowledge in passage retrieval for question answering J¨org Tiedemann Alfa Informatica, University of Groningen HLT/EMNLP 2005.
Why indexing? For efficient searching of a document
Information Retrieval and Web Search
Information Retrieval and Web Search
Information Retrieval and Web Search
Multimedia Information Retrieval
Information Retrieval
CSE 635 Multimedia Information Retrieval
Introduction to Information Retrieval
CS246: Information Retrieval
Information Retrieval and Web Design
Information Retrieval and Web Search
Presentation transcript:

oxygen Indexing Relations from Natural Language Jimmy Lin, Boris Katz, Sue Felshin Oxygen Workshop, January, 2002

oxygen The Information Access Problem Widespread electronic access to knowledge… But users are overwhelmed with information! –Different sites. –Different formats. –Different access protocols. Two different methods: –Natural language question answering: START. –Information retrieval: Web search engines.

oxygen START: Natural Language Processing Sophisticated natural language processing. –Syntax: the structure of language. –Semantics: the meaning of language.

oxygen Tradeoffs Advantages: –Returns “Just the Right Information.” –High precision. –Intuitive and easy to use. Disadvantages: –Coverage is narrow. Annotations are wonderful, but… –Trained individuals are required build the knowledge base. –Expanding the knowledge coverage is time intensive.

oxygen Information Retrieval Use of Boolean, probabilistic, or vector-space models.

oxygen Tradeoffs Advantages: –Fast, automatic, large scale indexing. –Open-domain, broad coverage. Disadvantages: –Users are required to sort through irrelevant documents. –“Bag-of-words” paradigm can’t capture meaning. The bird ate the snake. The snake ate the bird. the meaning of life a meaningful life the house by the river The river by the house the largest planet’s volcanoes the planet’s largest volcanoes

oxygen Best of Both Worlds NLP + IR = high precision + broad coverage Syntactic relations for Question Answering: –Automatically extractable from natural language text. –Amenable to large-scale indexing, retrieval, and matching. –Reliable for capturing “meaning.” unexplored area Precision Coverage NLP IR

oxygen Lessons Learned from START Borrow Ternary Expressions. –To capture syntactic relations. –Proven to be suitable for representing natural language. –Leverage previous experience. –Simplified for large-scale storage and retrieval. Match Questions and Answers… –At the Ternary Expression level. –Bring to bear sophisticated linguistic techniques:  Synonymy  Ontological relations  Transformational rules  … etc.

oxygen Using Syntactic Relations Syntactic relations as a clue to meaning. the largest planet’s volcanoes the planet’s largest volcanoes The bird ate the snake. The snake ate the bird. the house by the river The river by the house the meaning of life a meaningful life

oxygen Database Natural Language Parser System Architecture Indexing Documents … How tall is the Sears Tower? Who killed Lincoln? Where is Belize located? John Wilkes Booth Relations Matcher Answers: Central America 1,454 feet tall Abraham Lincoln, the 16 th president of the United States… blah blah blah blah blah blah

oxygen The Experiment Corpus: World Encyclopedia –20,000 articles. –50 Megabytes in size. Test Set: 16 sample questions Index of relations created at the sentence level. Matcher returns corpus sentences that have the most relations in common with the question. Inverted index created at the sentence level. All words stemmed, stopwords dropped. Matcher returns corpus sentences that have the most keywords in common with the question. Syntactic relationsBoolean retrieval system Test SystemBaseline System

oxygen Results Relations Indexing Keyword Indexing Question Number Precision

oxygen Numerical Data Average Precision: –Relations: 0.84 –Baseline: 0.29 Average Number of Sentences Returned: –Relations: 4.0 –Baseline: 43.9 Average Number of Correct Sentences (per question): –Relations: 3.1 –Baseline: 5.9

oxygen Our Test Set Specifically crafted… –To highlight the ambiguities of natural language. –To demonstrate relations that are critical to question answering. Examples (similar words, different meanings) What do frogs eat? What eats snakes? What countries have invaded Russia? What does Japan import? When do lions hunt? Who defeated the Spanish Armada? What eats frogs? What do snakes eat? What countries have Russia invaded? What does the United States import from Japan? Where are lions hunted? Who did the Spanish Armada defeat?

oxygen Sample Results What do frogs eat? (1) Alligators eat many kinds of small animals that live in or near the water, including fish, snakes, frogs, turtles, small mammals, and birds. (2) Some bats catch fish with their claws, and a few species eat lizards, rodents, small birds, tree frogs, and other bats. (3) Bowfins eat mainly other fish, frogs, and crayfish. (4) Adult frogs eat mainly insects and other small animals, including earthworms, minnows, and spiders. (5) Kookaburras eat caterpillars, fish, frogs, insects, small mammals, snakes, worms, and even small birds. … (32) Retrieving based on relations produces “Just the Right Information.”

oxygen More Sample Results What is the world's largest country? (1) Russia is the world's largest country in terms of area. (2) In terms of population, China is the world's largest country. (3) France ranks as the world's second largest wine-producing country, after Italy (4) Germany is the world's third largest manufacturer of automobiles; Japan and the United States are the largest automobile-producing countries. (5) … Retrieving based on relations produces “Just the Right Information.”

oxygen Conclusion Language is complicated… But keywords are not enough. Extraction of certain syntactic relations from large amounts of text is practical. Question Answering using Syntactic Relations. –Automatically generated. –Indexed on a large scale. Significant improvement in precision.