Seamless Searching of Numeric and Textual Resources Funded by a National Library Leadership Grant from the Institute of Museum and Library Services Michael Buckland, Aitao Chen, Fredric Gey and Ray Larson Friday Afternoon Seminar, Feb 14,
From numbers to texts: Iritani, Evelyn. "Normalizing ties to Vietnam important steps for U.S. firms; California stands to profit handsomely when barriers fall to trade with fast-growing country." Los Angeles Times v114 (July 12, 1995):D1. An article found using the keywords “Import” and “Vietnam” as query.
From text to numbers: "U.S. bans import of most European meat". Los Angeles Times v116, n14 (Dec 14, 1997):A22. (On fear of mad cow disease.) "Ban on cattle and sheep is extended to all Europe." New York Times v147, sec1 (Dec 14, 1997):16(N), 42(L). (The U.S. Agriculture Department responds to threat of 'Mad Cow' disease). Topic of interest: imports of beef to the United States from Britain The sources at showhttp://govinfo.kerr.orst.edu/import/import.html No reported edible beef imports from the United Kingdom.
Seamless Search Project Goals: Phase I: The development and demonstration of a library gateway providing search support for searching both text and socio-economic numeric databases. Phase II: The demonstration of a library gateway supporting searches between text and numeric database.
Data Sets to create Entry Vocabulary Indexes: MELVYL MARC Files A study of operant conditioning under delayed reinforcement in early infancy Infant psychology. Operant conditioning. Number of MARC records in the training data set: ~4,246,000. Book title LC Subject Headings A sample training record extracted from a MARC record.
doc1 doc2 doc3 doc4 doc5 behavior infant infancy psychology Infant psychology Operant conditioning Infant development Psychology Parent and child child attitude baby development Title WordsDoc IDsLCSHs Statistical association of title words and LCSH
Word to LCSH Entry Vocabulary Index (EVI) 1alcoholism alcoholic alcohol alcoholism and employment drug abuse alcohol, ethyl drinking of alcoholic beverages substance abuse Rank LCSHWeight List of the LCSHs that are most closely associated, statistically, with the query word: alcoholism.
Words to LCSH Entry Vocabulary Index (EVI) 1 economic policy german (west) switzerland regional planning economics92.14 Rank LCSHWeight List of LCSHs that are most closely associated, statistically, with the German query word: Wirtschaftspolitik. Note: The top-ranked LCSH “economic policy” happens to be the English translation of the German word “Wirtschaftspolitik”.
Words to LCSH Entry Vocabulary Index (EVI) 1 peanut cookery (peanut butter) cookery (peanuts) peanut industry peanut butter butter schulz, charles m cookery Rank LCSHWeight List of LCSHs that are most closely associated, statistically, with the phrase peanut butter as a query.
Word to LCSH Entry Vocabulary Index (EVI) 1 world war, vietnamese conflict, united states world war, vietnam Rank LCSHWeight List of LCSHs that are most closely associated with the German query: Vietnam War. Note: “Vietnam War” is not an established (authorized) LCSH. The established LCSH is “Vietnamese conflict”.
LCSH to Words Entry Vocabulary Index 1 alcohol alcoholism abuse drug drink alcoholic treatment prevention problem addiction Rank WordsWeight List of words that are most closely associated, statistically, with the Library of Congress Subject Heading: Alcoholism.
EVI-based Access to MELVYL Free-form query Ranked list of LCSHs MELVYL Z39.50 SERVER HTTP/Z39.50 Gateway httpd evi access Search results Full MARC record Web server gateway access EVI Web Browser Other Z39.50 SERVERS Z39.50 HTTP CGI
Counting California Database ( A collection of some 3,000 numeric tables. Organized into 16 topics and 184 subtopics. Sample topics: Banking, Finance and Insurance Elections Population and Demographics Social Services and Public Assistance Sample subtopics under Agriculture and Natural Resources: Farms and Farming Fishing Forestry and Lumber Minerals
Enhanced Access to Counting California Database Conventional probabilistic retrieval of numeric tables using table captions, mapping query to text of captions. Access to numeric tables through the words-to-subtopic entry vocabulary index. education libraries STATISTICS, STATEWIDE SUMMARY BY TYPE OF LIBRARY CALIFORNIA, TO A sample record created from
Probabilistic Access to Counting California Database Search results for the query: public libraries in California gives ranked list of captions:
EVI-based Access to Counting California Database Ranked list of subtopics that are most closely associated, statistically, with the query: personal/individual income tax. 1income government earnings and tax revenues property tax property tax personal income tax59.99
Numeric Tables with Subtopic: Personal income tax.
EVI LCSH marcnew query search results captions numeric table numeric database online catalog search interface 1 search interface Traverse Searching Between Online Catalogs and Numeric Databases
Melvyl MARC record as source of a query
Extract from MARC as a query Any caption can become a query
Final Report on “Seamless Searching of Numeric and Textual Resources” Project, Two sequels: 1.Adding search by place: “Going Places in the Catalog: Improved Geographic Access,” funded by a National Library Leadership Project from the Institute of Museum and Library Services, Multilingual Search Across Multiple Genres: Proposal submitted Feb 13, 2003!