Text Mining In InQuery Vasant Kumar, Peter Richards August 25th, 1999.

Slides:



Advertisements
Similar presentations
Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Advertisements

Relevance Feedback Limitations –Must yield result within at most 3-4 iterations –Users will likely terminate the process sooner –User may get irritated.
Chapter 5: Introduction to Information Retrieval
Beyond Boolean Queries Ranked retrieval  Thus far, our queries have all been Boolean.  Documents either match or don’t.  Good for expert users with.
New Technologies Supporting Technical Intelligence Anthony Trippe, 221 st ACS National Meeting.
Web Mining Research: A Survey Authors: Raymond Kosala & Hendrik Blockeel Presenter: Ryan Patterson April 23rd 2014 CS332 Data Mining pg 01.
IS530 Lesson 12 Boolean vs. Statistical Retrieval Systems.
Data Mining Techniques Cluster Analysis Induction Neural Networks OLAP Data Visualization.
Information Retrieval in Practice
Automatic Discovery of Technology Trends from Patent Text Youngho Kim, Yingshi Tian, Yoonjae Jeong, Ryu Jihee, Sung-Hyon Myaeng School of Engineering Information.
Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.
Shared Ontology for Knowledge Management Atanas Kiryakov, Borislav Popov, Ilian Kitchukov, and Krasimir Angelov Meher Shaikh.
Natural Language Query Interface Mostafa Karkache & Bryce Wenninger.
Digital Library Service Integration (DLSI) --> Looking for Collections and Services to be DLSI Testbeds
1 CS 430: Information Discovery Lecture 20 The User in the Loop.
Interfaces for Querying Collections. Information Retrieval Activities Selecting a collection –Lists, overviews, wizards, automatic selection Submitting.
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
Overview of Search Engines
Information Retrieval – and projects we have done. Group Members: Aditya Tiwari ( ) Harshit Mittal ( ) Rohit Kumar Saraf ( ) Vinay.
The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation SEASR Overview Loretta Auvil and Bernie Acs National.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Probabilistic Model for Definitional Question Answering Kyoung-Soo Han, Young-In Song, and Hae-Chang Rim Korea University SIGIR 2006.
COMP423.  Query expansion  Two approaches ◦ Relevance feedback ◦ Thesaurus-based  Most Slides copied from ◦
1 A study on automatically extracted keywords in text categorization Authors:Anette Hulth and Be´ata B. Megyesi From:ACL 2006 Reporter: 陳永祥 Date:2007/10/16.
Survey of Semantic Annotation Platforms
A hybrid method for Mining Concepts from text CSCE 566 semester project.
Evaluating Statistically Generated Phrases University of Melbourne Department of Computer Science and Software Engineering Raymond Wan and Alistair Moffat.
Automated Creation of a Forms- based Database Query Interface Magesh Jayapandian H.V. Jagadish Univ. of Michigan VLDB
1 Retrieval and Feedback Models for Blog Feed Search SIGIR 2008 Advisor : Dr. Koh Jia-Ling Speaker : Chou-Bin Fan Date :
Dr. Susan Gauch When is a rock not a rock? Conceptual Approaches to Personalized Search and Recommendations Nov. 8, 2011 TResNet.
Using Text Mining and Natural Language Processing for Health Care Claims Processing Cihan ÜNAL
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
Chapter 2 Data Models Database Systems: Design, Implementation, and Management, Rob and Coronel Adapted for INFS-3200.
Internet Information Retrieval Sun Wu. Course Goal To learn the basic concepts and techniques of internet search engines –How to use and evaluate search.
Péter Schönhofen – Ad Hoc Hungarian → English – CLEF Workshop 20 Sep 2007 Performing Cross-Language Retrieval with Wikipedia Participation report for Ad.
Probabilistic Query Expansion Using Query Logs Hang Cui Tianjin University, China Ji-Rong Wen Microsoft Research Asia, China Jian-Yun Nie University of.
21/11/2002 The Integration of Lexical Knowledge and External Resources for QA Hui YANG, Tat-Seng Chua Pris, School of Computing.
Information retrieval 1 Boolean retrieval. Information retrieval (IR) is finding material (usually documents) of an unstructured nature (usually text)
UMass at TDT 2000 James Allan and Victor Lavrenko (with David Frey and Vikas Khandelwal) Center for Intelligent Information Retrieval Department of Computer.
Presenter: Shanshan Lu 03/04/2010
Summarization of XML Documents K Sarath Kumar. Outline I.Motivation II.System for XML Summarization III.Ranking Model and Summary Generation IV.Example.
Page 1 Alliver™ Page 2 Scenario Users Contents Properties Contexts Tags Users Context Listener Set of contents Service Reasoner GPS Navigator.
Presentation for CS490 Other Topics By: Chihwei Hsu By: Chihwei Hsu Date: Nov 17, 2003 Date: Nov 17, 2003 Class: CS490 Class: CS490.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
Retrieval of Highly Related Biomedical References by Key Passages of Citations Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan.
For Monday Read chapter 24, sections 1-3 Homework: –Chapter 23, exercise 8.
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
Data Mining: Knowledge Discovery in Databases Peter van der Putten ALP Group, LIACS Pre-University College LAPP-Top Computer Science February 2005.
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
 Frequent Word Combinations Mining and Indexing on HBase Hemanth Gokavarapu Santhosh Kumar Saminathan.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
4. Relationship Extraction Part 4 of Information Extraction Sunita Sarawagi 9/7/2012CS 652, Peter Lindes1.
Generating Query Substitutions Alicia Wood. What is the problem to be solved?
Ranked Retrieval INST 734 Module 3 Doug Oard. Agenda Ranked retrieval  Similarity-based ranking Probability-based ranking.
Michael Bendersky, W. Bruce Croft Dept. of Computer Science Univ. of Massachusetts Amherst Amherst, MA SIGIR
The Development of a search engine & Comparison according to algorithms Sung-soo Kim The final report.
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
CS791 - Technologies of Google Spring A Web­based Kernel Function for Measuring the Similarity of Short Text Snippets By Mehran Sahami, Timothy.
Intelligent Database Systems Lab Presenter: YU-TING LU Authors: Yong-Bin Kang, Pari Delir Haghighi, Frada Burstein ESA CFinder: An intelligent key.
Information Retrieval in Practice
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Lecture 12: Relevance Feedback & Query Expansion - II
Text Based Information Retrieval
Data and Applications Security Developments and Directions
Automated MS Word and PowerPoint Translator
Publication Output on the Topical Area of "Energy" and Real Estate (Education) Bob Martens.
Databases and Information Systems
Retrieval Utilities Relevance feedback Clustering
Information Retrieval and Web Design
Information Retrieval
Presentation transcript:

Text Mining In InQuery Vasant Kumar, Peter Richards August 25th, 1999.

History InQuery was originally a research product from Center for Intelligent Information Retrieval at the University of Massachusetts, Amherst A commercial-strength InQuery API from Sovereign Hill Software InQuery 5.0 with LCA and Graphical User Interface

Outline Text mining Text mining using “local context analysis” (LCA). Text mining using “top concepts” Concept recognizers Demonstration of LCA and “top concepts” Q & A

Text Mining Helps find needle in the hay stack Query expansion Discovers interesting relationships between concepts Discovers characteristics about the database

Concepts Words Noun phrases People names Company names User-defined concepts

Local Context Analysis (LCA) Associates a query to a ranked list of concepts for several concept types (noun phrases, people names..) Concept association is done on the fly –no complex databases to be created –changes to the database are immediately taken into account.

Background Unit of retrieval is a passage (local context), in contrast to a document in regular search. A passage is a window of words of length n Overlapping passages are used.

LCA Process Generate candidate passages (sub- documents) Extract concepts and their statistics Apply LCA algorithm to rank the concepts for each concept type

Step 1: Generate Candidate Passages The documents are split into passages (virtual sub-documents) Evaluate the query on these passages to generate a weight for each passage Rank the passages Select the top m best passages

Step 2: Extract Concepts Extract the passages from their respective documents for all the passages in the candidate passage list. Each passage in the candidate list is passed through a set of “concept recognizers” to extract respective concept lists. Generate passage level statistics for all concepts and query terms

Step 3: Apply LCA Algorithm Generate local context statistics for concepts and query terms (specific to the set of candidate passages) Use LCA algorithm to generate weights for concepts. The passage level and local context level statistics are used.. Rank the concepts and select top n The above steps are repeated for all concept types.

Text Mining Using Top Concepts Retrieve documents Extract concepts from each document using “concept recognizers” Generate most frequently occurring concepts for all concept types. Persist the most frequently occurring concepts.

Noun Phrase Recognizer Tokenization to generate words Parts-of-speech tagging (noun, verb, etc.) Select noun phrases

Other Recognizers Company and people name recognizers –based on pattern matching rules –uses external lists of names for normalization and additional evidence. User-defined recognizer –uses a user provided list of concepts (single/multiword) –generates a state machine

Demonstration of LCA and Top Concepts in InQuery 5.1