Download presentation
Presentation is loading. Please wait.
Published byΕυρώπη Κοσμόπουλος Modified over 6 years ago
1
March 8, 2000 IS 240: Principles of Information Retrieval
The SMART System: Progress Report on System Acquisition and Set-Up Danyel Fisher Jonathan Henke Jason Hong Jonathan Huang Jeane Stetson March 8, 2000 IS 240: Principles of Information Retrieval
2
Background Developed 1961-64 at Harvard
Maintained at Cornell University Tested at every TREC conference Emphasis: automatic retrieval (rather than interactive) Vector-based analysis, tf x idf weighting Current version: 13.3 (we have 11.0)
3
Bibliography Salton, Gerard. The SMART retrieval system; experiments in automatic document processing. Englewood Cliffs, N.J., Prentice-Hall. 1971 Salton, Gerard. “Developments in Automatic Text Retrieval.” Science, 1991 Aug 30, v253 n5023: TREC Proceedings SMART Staff, “User's Manual for the SMART Information Retrieval System’”. Technical Report 71-95, Revised April Cornell University (1974). C. Buckley, Implemetation of the SMART Information Retrieval System. Technical Report , Cornell University (1985).
4
Indexing (Creating a Collection)
Document pre-parsing recognize document structure and convert to a standard format Finding & handling indexable information parsing, stopword removal, stemming, term clustering, synonym dictionaries, etc. Query handling parsing, stopword removal, stemming, etc. (parallel to document handling)
5
Indexing (Creating a Collection)
Retrieval methods term weighting and similarity evaluation Default: standard tf x idf weighting, vector inner product Output format & display
6
Indexing: Customizable Elements
Document location & format Indexable information & index format Query format Retrieval method (document/query comparison) Output/display format
7
System Architecture 350 source files 45,000 lines of code
Can include user-programmed modules
8
Set-up Procedure Download source code Compile Look for documentation
ftp://ftp.cs.cornell.edu/pub/smart Compile Look for documentation Indexing completed using default settings Unable to complete query yet Unable to examine index Cannot verify success of indexing!
9
System Documentation Minimal Poorly explained Cryptic
Uses their own specific terminology
10
Problems Faced Virtually every feature is customizable
Somewhere there are people who know how to do the customization….. “SMART suffers from the advantages and disadvantages of most academic research software. It's designed to be extremely flexible (as long as you know what you're doing!)” - SMART manual Documentation is too high level.
11
Further Steps Complete a query using default settings.
Identify specific files for adjusting each customizable feature. Determine how to modify each feature.
12
Recommendations & Advice
Find someone who has actually worked with the system before. Understanding operation requires examination of C source code. Customization requires modifying / creating C code.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.