Download presentation
Presentation is loading. Please wait.
Published byChristal Dean Modified over 9 years ago
1
Enhancing Internet Search Engines to Achieve Concept- based Retrieval F. Lu, T. Johnsten, V. Raghavan, and D. Traylor InForum ‘99 May 5 -6, 1999
2
Agenda Information on the Internet. Boolean Retrieval Model and the Internet. Concept-Based Retrieval (RUBRIC / CS 3 ). CS 3 and Boolean Search Engines. Future Work.
3
Information on the Internet Large volume. Rapid growth rate. Wide variations in quality and type.
4
Boolean Retrieval Model and the Internet Most Internet search engines are based on the Boolean Retrieval Model. Boolean Retrieval Model is relatively easy to implement. Limitations: –Inability to assign weights to query or document terms. –Inability to rank retrieved documents. – Naïve users have difficulty in using
5
Concept-Based Retrieval Address shortcomings of Boolean Retrieval Model. Search Requests specified in terms of concepts structured as rule-base trees.
6
Development of Rule-Base Trees (General) Top-down refinement strategy. Support for AND / OR relationships. Support for user-defined weights.
8
Development of Rule-Base Trees (CS 3 ) Concept-Set Structuring System (CS 3 ) CS 3 supports the creation, storage and modification of user-defined concepts Post-processing of results of sub-queries CS 3 user-interface.
9
CS3 User Interface
10
Evaluation of Rule-Base Trees (RUBRIC) Run-time, bottom-up analysis. Propagation of weight values (MIN / MAX). Disadvantage of run-time analysis.
12
Evaluation of Rule-Base Trees (CS 3 ) Static, bottom-up analysis. Construct Minimal Term Set (MTS). Propagation of terms. CS 3 user-interface.
13
MTS-Minimal Term Set lA MTS for a topic is a set of terms such that if each term in the set appears in the document, the document would get a RSV larger than 0. If not, the RSV would be 0. lA topic could have more than one MTSs. lA user can choose from those MTSs to perform a search to his needs.
18
Concept-Based Retrieval and Boolean Search Engines CS 3 is designed to interface with existing Boolean search engines. U.S. Department of Energy’s “Information- Bridge” search engine. U.S. Department of Transportation’s “National Transportation Library” search engine.
19
System Architecture Client (Java/ Applet ) CORBACGI Server (JAVA)Server (JAVA/C++) JDBC ORACLE DOE InfoBridge … etc.
20
Information-Bridge and CS 3 Search request: Boolean Vs. Concept Output: Non-Ranked Vs. Ranked. Calculation of RSV: –Given a document D and a set S of MTS expressions satisfied by D, the RSV of D is equal to the sum of all the weights of S plus the maximum weight in S.
21
Information-Bridge and CS 3 (Example) Boolean search request (“Environmental Science Network” Form): –(“Hydrogeology” OR “Dnapl” OR (“Colloid*” AND “Environmental Transport”)). Concept (CS 3 ): –“Hydrogeology”. –Rule-Base Tree.
22
CS3 Hydrogeology Rule Base
23
CS3 search results
24
Current and Future Work Conduct experiments to evaluate effectiveness (future). Investigate alternative methods to compute RSVs [KADR00, KDR01*]. Learning edge weights through relevanace feedback [KR00]. Thesaurii based rulebase generation [KLR00].
25
Relevant URLs www.cacs.usl.edu/~linc-projects/cs3/ [LJRT99*] RaghavanHome Publications since 1991
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.