March 2006NaCTeM – Ray R. Larson Prof. Ray R. Larson University of California, Berkeley School of Information Metadata as Infrastructure for Information.

Slides:



Advertisements
Similar presentations
Collections Management Software for Museums and Archives r e d i s c o v e r y s o f t w a r e. c o m O V E R V I E W P R E S E N T A T I O N.
Advertisements

Catherine Worrall Slide Library Co-ordinator, University College Falmouth.
Using Reference Sources Fleet RISD. Why Use Reference Sources? Reference Sources provide an overview of a subject at the beginning of the research.
Using Print Reference Sources for Research
“How Can Research Help Me?” Please make SURE your notes are similar to what I have written in mine.
MARC 101 for Non-Catalogers Colorado Horizon Users Group Meeting Philip S. Miller Library Castle Rock, CO May 29, 2007.
Grid & Libraries, 10/18/04.1 Second Invitational Berkeley – Academia Sinica Grid Digital Libraries Workshop, Taipei, October 18, 2004 Grid Middleware Application.
Hanoi, Dec 6, 2008ECAI-PNC Laptops1 Laptops and Libraries: Decentralized Access to Explanatory Resources Michael Buckland University of California, Berkeley.
7/16/2002JCDL 2002, Ray Larson The “Entry Vocabulary Index” Approach to Multilingual Search Ray R. Larson, Fredric Gey, Aitao Chen, Michael Buckland University.
Data and design issues in historical GIS II: The place-based information interface. Contextualizing Places: Gazetteers, Maps, and Bibliographical Searches.
8/28/97Information Organization and Retrieval Metadata and Data Structures University of California, Berkeley School of Information Management and Systems.
USF, Feb 3, 2009Reference resources1 Access to Reference Resources In a Digital Environment Michael Buckland University of California, Berkeley Electronic.
Searching Text and Data via Common Geography 1 SEARCHING TEXT AND DATA via COMMON GEOGRAPHY Geographic Information Retrieval: Searching Text and Data via.
Topic, Place, and Time Period Notes for Discussion Michael Buckland Library of Congress, February 11, 2005.
Using Metadata in CONTENTdm Diana Brooking and Allen Maberry Metadata Implementation Group, Univ. of Washington Crossing Organizational Boundaries Oct.
WISER: History Advanced OLIS searches Isabel Holowaty, History Librarian Kate Petherbridge, Upper Camera Superintendent.
Access to Digital Heritage Resources using What, Where, When and Who Michael Buckland Electronic Cultural Atlas Initiative University of California, Berkeley.
Temporal and Geographic Context for Digital Books Ray R. Larson ECDL Books Online Workshop 2009 Credits: Ryan Shaw, Michael Buckland, Jeanette Zerneke,
SLIDE 1IS 240 – Spring 2007 Lecture 8: Clustering University of California, Berkeley School of Information IS 245: Organization of Information.
Nov 15, 2005Ohio State University Libraries1 What, Where, When, and Who: A Renaissance for the Reference Collection Michael Buckland School of Information.
Geography, Time, and the Representation of Cultural Change – Experience from a Large Collaboration: The Electronic Cultural Atlas Initiative (ECAI) Michael.
Seamless Searching of Numeric and Textual Resources Funded by a National Library Leadership Grant from the Institute of Museum and Library Services Michael.
1 CS 502: Computing Methods for Digital Libraries Lecture 17 Descriptive Metadata: Dublin Core.
Sept 21, 2007Friday Afternoon Seminar1 Friday Afternoon Seminar, Sept 21, 2007 Reference Library Service in a Digital Environment: A Question; an Explanation;
The Library Cataloging Tradition
SLIDE 1IS 245 – Spring 2009 Codes and Rules for Description: History University of California, Berkeley School of Information IS 245: Organization.
Bringing Lives to Light: Biography in Context Ray R. Larson Berkeley ISchool + Kyoto University Workshop 2009 Credits: Ryan Shaw, Michael Buckland, Jeanette.
Printed Resources and Digital Information The Digital Difference in Reference Collections Michael Buckland, School of Information Management & Systems,
Oct 2, 2008SALT2, Uppsala1 The Educational Role of the Library in a Digital Environment Part II: Design for Learning. Michael Buckland NORSLIS Visiting.
Mar 24, 2009ECAI/CAA Williamsburg1 Electronic Cultural Atlas Initiative – Computer Applications in Archaeology Joint Conference, Williamsburg, “Making.
ECAI – CAA Conference, Fargo, April 19, 2006 Geo-temporal Indexing: Events, Lives, and Geographical Features Michael Buckland also Kim Carl, Sarah Ellinger.
Incorporating Historical and Geographical Dimensions into a Search Interface Michael Buckland Electronic Cultural Atlas Initiative University of California,
Prof. Ray R. Larson University of California, Berkeley School of Information Developing a Metadata Infrastructure for Information Access: What, Where,
SLIDE 1IS 257 – Fall 2007 Codes and Rules for Description: History University of California, Berkeley School of Information IS 245: Organization.
July 7, 2008ISKO Montréal1 ISKO 2008, Montréal 4W Vocabulary Mapping Across Diverse Reference Genres Michael Buckland and Ryan Shaw (& others) Electronic.
Teaching Metadata and Networked Information Organization & Retrieval The UNT SLIS Experience William E. Moen School of Library and Information Sciences.
Answer-Providing Tools (APTs) Dr. Dania Bilal IS 530 Spring 2006.
8/28/97Organization of Information in Collections Introduction to Description: Dublin Core and History University of California, Berkeley School of Information.
1 Catalog Displays, Retrieval, and FAST May 31, 2005.
Michael Buckland and Ryan Shaw. Electronic Cultural Atlas Initiative, and School of Information, University of California, Berkeley. ECAI / PNC Joint Meeting,
Lecture Four: Steps 3 and 4 INST 250/4.  Does one look for facts, or opinions, or both when conducting a literature search?  What is the difference.
Library Research. Objectives Locate books and articles in the library using the online catalog Explore subject directories Explore digital libraries and.
International Document Summer School, Tromsø, 2005 Documents and Media Convergence. Michael Buckland, University of California, Berkeley
The Library Cataloging Tradition Marty Kurth CS 431 February 9, 2005 [slides stolen from Diane Hillmann]
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
SIRS Issues Researcher Insight into today’s Leading Issues sks.sirs.com | proquestk12.com.
Cultural Heritage Markup Strategies Bibliotheca Alexandria –Digital Library of the Middle East –January, 2006.
NCSU Libraries Kristin Antelman NCSU Libraries June 24, 2006.
NCSU Libraries Andrew Pace & Emily Lynema NCSU Libraries May 24, 2006.
Medline on OvidSP. Medline Facts Extensive MeSH thesaurus structure with many synonyms used in mapping and multidatabase searching with Embase Thesaurus.
History Study Centre Demonstration. History Study Centre A wealth of primary and secondary resources for historians. Content is selected and organised.
DISCUS South Carolina’s Virtual Library A program overview.
PACSCL Consortial Survey Initiative Group Training Session February 12, 2008 at The Historical Society of Pennsylvania.
Lazerow Lecture, UTK, University of Tennessee, Knoxville, School of Information Sciences, What, Where, When, and Who: Redesigning the Reference Environment.
Digital Libraries Lillian N. Cassel Spring A digital library An informal definition of a digital library is a managed collection of information,
LIS 204: Introduction to Library and Information Science Week Nine Kevin Rioux, PhD.
Types of Reference Sources If you are finding information there are several ways to do this..
Commission on Cyberinfrastructure for the Humanities and Social Sciences Metadata as Infrastructure, Interoperability, and the Larger Context Michael Buckland,
8/28/97Information Organization and Retrieval Introduction University of California, Berkeley School of Information Management and Systems SIMS 245: Organization.
Memory Masters Preserving Digitized Histories— for today, for tomorrow, and for the future This project is made possible by a grant from the federal Institute.
How Researchers Search for Manuscript and Archival Collections Susan Hamburger, Ph.D. Penn State Society of American Archivists Denver, CO August 31, 2000.
Theory, Tools, History: A Brief Introduction August 17, 2016.
Slides Template for Module 3 Contextual details needed to make data meaningful to others CC BY-NC.
Dr. Dania Bilal IS 530 Spring 2005
Categories of APTs Dr. Dania Bilal IS 530 Fall 2005.
Professional development training on cataloging at the University Wisconsin-Madison Memorial Library, USA 14th October -24th October, 2016 Aigerim Shurshenova.
Multilingual Indexes for Detection and Translation
Federated & Meta Search
University of California, Berkeley
Time Period Directories
Presentation transcript:

March 2006NaCTeM – Ray R. Larson Prof. Ray R. Larson University of California, Berkeley School of Information Metadata as Infrastructure for Information Retrieval and Text Mining

March 2006NaCTeM – Ray R. Larson Overview Metadata as Infrastructure –What, Where, When and Who? What are Entry Vocabulary Indexes? –Notion of an EVI –How are EVIs Built Time Period Directories –Mining Metadata for new metadata

March 2006NaCTeM – Ray R. Larson Metadata as Infrastructure The difference between memorization and understanding lies in knowing the context and relationships of whatever is of interest. When setting out to learn about a new topic, a well-tested practice is to follow the traditional 5Ws and the H: Who?, What?, When?, Where?, Why?, and How?

March 2006NaCTeM – Ray R. Larson Metadata as Infrastructure The reference collections of paper-based libraries provide a structured environment for resources, with encyclopedias and subject catalogs, gazetteers, chronologies, and biographical dictionaries, offering direct support for at least What, Where, When, and Who. The digital environment does not yet provide an effective, and easily exploited, infrastructure comparable to the traditional reference library.

March 2006NaCTeM – Ray R. Larson What? Searching texts by topic, e.g. Dewey, LCSH, any subject index, or category scheme applied to documents. Two kinds of mapping in every search: Documents are assigned to topic categories, e.g. Dewey Queries have to map to topic categories, e.g. Deweys Relativ Index from ordinary words/phrases to Decimal Classification numbers. Also mapping between topic systems, e.g. US Patent classification and International Patent Classification.

March 2006NaCTeM – Ray R. Larson Texts What searches involve mapping to controlled vocabularies Thesaurus/ Ontology

March 2006NaCTeM – Ray R. Larson Start with a collection of documents.

March 2006NaCTeM – Ray R. Larson Classify and index with controlled vocabulary Or use a pre- indexed collection. Index

March 2006NaCTeM – Ray R. Larson Problem: Controlled Vocabularies can be difficult for people to use. pass mtr veh spark ign eng Index Use: Economic Policy In Library of Congress subj For: Wirtschaftspolitik

March 2006NaCTeM – Ray R. Larson Solution: Entry Level Vocabulary Indexes. Index EVI pass mtr veh spark ign eng = Automobile

March 2006NaCTeM – Ray R. Larson What and Entry Vocabulary Indexes EVIs are a means of mapping from users vocabulary to the controlled vocabulary of a collection of documents…

March 2006NaCTeM – Ray R. Larson Has an Entry Vocabulary Module been built? User selects a subject domain of interest. Download a set of training data. Build associations between extracted terms & controlled vocabularies. Map users query to ranked list of controlled vocabulary terms Part of speech tagging Use an existing EVI. Extract terms (words and noun phrases) from titles and abstracts. User selects search terms from the ranked list of terms returned by the EVI. YES Building an Entry Vocabulary Module (EVI) Searching For noun phrases Internet DB indexed with a controlled vocabulary. Domains to select from: Engineering, Medicine, Biology, Social science, etc. User has question but is unfamiliar with the domain he wants to search. NO Building and Searching EVIs

March 2006NaCTeM – Ray R. Larson Technical Details Download a set of training data. Build associations between extracted terms & controlled vocabularies. Part of speech tagging Extract terms (words and noun phrases) from titles and abstracts. Building an Entry Vocabulary Module (EVI) For noun phrases Internet DB indexed with a controlled vocabulary.

March 2006NaCTeM – Ray R. Larson Association Measure C ¬C t a b ¬t c d Where t is the occurrence of a term and C is the occurrence of a class in the training set

March 2006NaCTeM – Ray R. Larson Association Measure Maximum Likelihood ratio W(C,t) = 2[logL(p 1,a,a+b) + logL(p 2,c,c+d) - logL(p,a,a+b) – logL(p,c,c+d)] where logL(p,n,k) = klog(p) + (n – k)log(1- p) and p 1 = p 2 = p= a a+b c c+d a+c a+b+c+d Vis. Dunning

March 2006NaCTeM – Ray R. Larson Alternatively Because the evidence terms in EVIs can be considered a document, you can also use IR techniques and use the top-ranked classes for classification or query expansion

March 2006NaCTeM – Ray R. Larson Find Plutonium In Arabic Chinese Greek Japanese Korean Russian Tamil Statistical association Digital library resources

March 2006NaCTeM – Ray R. Larson EVI example EVI 1 Index term: pass mtr veh spark ign eng User Query Automobile EVI 2 Index term: automobiles OR internal combustible engines

March 2006NaCTeM – Ray R. Larson But why stop there? Index EVI

March 2006NaCTeM – Ray R. Larson Which EVI do I use? Index EVI Index EVI Index EVI

March 2006NaCTeM – Ray R. Larson EVI to EVIs Index EVI Index EVI Index EVI EVI 2

March 2006NaCTeM – Ray R. Larson Find Plutonium In Arabic Chinese Greek Japanese Korean Russian Tamil Why not treat language the same way?

March 2006NaCTeM – Ray R. Larson Texts Numeric datasets It is also difficult to move between different media forms Thesaurus/ Ontology EVI

March 2006NaCTeM – Ray R. Larson Searching across data types Different media can be linked indirectly via metadata, but often (e.g. for socio-economic numeric data series) you also need to specify WHERE to get correct results

March 2006NaCTeM – Ray R. Larson Texts Numeric datasets But texts associated with numeric data can be mapped as well… Thesaurus/ Ontology captions EVI

March 2006NaCTeM – Ray R. Larson EVI to Numeric Data example EVI LCSH marcnew query search results captions numeric table numeric database online catalog search interface 1 search interface

March 2006NaCTeM – Ray R. Larson Texts Numeric datasets But there are also geographic dependencies… Thesaurus/ Ontology captionsMaps/ Geo Data EVI

March 2006NaCTeM – Ray R. Larson WHERE: Place names are problematic… Variant forms: St. Petersburg, Санкт Петербург, Saint-Pétersbourg,... Multiple names: Cluj, in Romania / Roumania / Rumania, is also called Klausenburg and Kolozsvar. Names changes: Bombay Mumbai. Homographs:Vienna, VA, and Vienna, Austria; –50 Springfields. Anachronisms: No Germany before 1870 Vague, e.g. Midwest, Silicon Valley Unstable boundaries: 19th century Poland; Balkans; USSR Use a gazetteer!

March 2006NaCTeM – Ray R. Larson WHERE. Geo-temporal search interface. Place names found in documents. Gazetteer provided lat. & long. Places displayed on map. Timebar

March 2006NaCTeM – Ray R. Larson Zoom on map. Click on place for a list of records. Click on record to display text.

March 2006NaCTeM – Ray R. Larson Catalogs and gazetteers should talk to each other! Geographic sort / display of catalog search result. Catalog search Gazetteer search

March 2006NaCTeM – Ray R. Larson Texts Numeric datasets So geographic search becomes part of the infrastructure Thesaurus/ Ontology GazetteerscaptionsMaps/ Geo Data EVI

March 2006NaCTeM – Ray R. Larson WHEN: Search by time is also weakly supported… Calendars are the standard for time But people use the names of events to refer to time periods Named time periods resemble place names in being: –Unstable: European War, Great War, First World War –Multiple: Second World War, Great Patriotic War –Ambiguous: Civil war in different centuries in England, USA, Spain, etc. Places have temporal aspects & periods have geographical aspects: When the Stone Age was, varies by region

March 2006NaCTeM – Ray R. Larson Suggests a similar solution: A gazetteer- like Time Period Directory. Gazetteer: –Place name – Type – Spatial markers (Lat & long) -- When Time Period Directory: –Period name – Type – Time markers (Calendar) – Where Note the symmetry in the connections between Where and When. Similarity between place names and period names

March 2006NaCTeM – Ray R. Larson Solution - Time Period Directories Initial development involved mining the Library of Congress Subject Authority file for named time periods…

March 2006NaCTeM – Ray R. Larson LC MARC Authorities Records sh Magdeburg (Germany) History Siege, g Sieges Germany Work cat.: : Besselmeier, S. Warhafftige history vnd beschreibung des Magdeburgischen Kriegs, Cath. encyc. (Magdeburg: besieged ( ) by the Margrave Maurice of Saxony) Ox. encyc. reformation (Magdeburg:... during the siege of Magdeburg...)

March 2006NaCTeM – Ray R. Larson timePeriodEntry Time Period Directory Instance Contains components described below - periodID Unique identifier - periodName Period name, can be repeated for alternative names Information about language, script, transliteration scheme Source information and notes (where was the period name mentioned) - descriptiveNotes Description of time period - dates Calendar and date format Begin & end date (exact, earliest, latest, most-likely, advocated-by- source, ongoing) Notes, sources - periodClassification Period type, e.g. Period of Conflict, Art movement Can plug in different classification schemes Can be repeated for several classifications - location Associated places with time period Contains both place name and entry to a gazetteer providing more specific place information like latitude / longitude coordinates Can plug in different location indicators (e.g. ADL gazetteer, Getty Thesaurus of Geographic names) Recently added coordinates for direct use - relatedPeriod Related time periods periodID of related periods Information about relationship type (part-of, successor etc.) Can plug in different relationship type schemes - entryMetadata Notes about creator / creation of instance Entry date Modification date

March 2006NaCTeM – Ray R. Larson

March 2006NaCTeM – Ray R. Larson Time periods by named location

March 2006NaCTeM – Ray R. Larson Catalog Search Result

March 2006NaCTeM – Ray R. Larson Web Interface - Access by map

March 2006NaCTeM – Ray R. Larson Zoomable interface gives access to geographically focused info…

March 2006NaCTeM – Ray R. Larson Link initiates search of the Library of Congress catalog for all records relating to this time period. Web Interface - Access by timeline

March 2006NaCTeM – Ray R. Larson WHEN and WHAT These named time periods are derived from Library of Congress catalog subject headings and so can be used for catalog searching which finds books on topics important for that time period

March 2006NaCTeM – Ray R. Larson Texts Numeric datasets Time period directories link via the place (or time) Thesaurus/ Ontology GazetteerscaptionsMaps/ Geo Data EVI Time Period Directory Time lines, Chronologies

March 2006NaCTeM – Ray R. Larson WHEN, WHERE and WHO Catalog records found from a time period search commonly include names of persons important at that time. Their names can be forwarded to, e.g., biographies in the Wikipedia encyclopedia.

March 2006NaCTeM – Ray R. Larson Place and time are broadly important across numerous tools and genres including, e.g. Language atlases, Library catalogs, Biographical dictionaries, Bibliographies, Archival finding aids, Museum records, etc., etc. Biographical dictionaries are heavy on place and time: Emanuel Goldberg, Born Moscow PhD under Wilhelm Ostwald, Univ. of Leipzig, Director, Zeiss Ikon, Dresden, Moved to Palestine Died Tel Aviv, Life as a series of episodes involving Activity (WHAT), WHERE, WHEN, and WHO else.

March 2006NaCTeM – Ray R. Larson Texts Numeric datasets A new form of biographical dictionary would link to all Thesaurus/ Ontology GazetteerscaptionsMaps/ Geo Data EVI Time Period Directory Time lines, Chronologies Biographical Dictionary

March 2006NaCTeM – Ray R. Larson A Metadata Infrastructure CATALOGS Achives Historical Societies Libraries Museums Public Television Publishers Booksellers Audio Images Numeric Data Objects Texts Virtual Reality Webpages RESOURCES INTERMEDIA INFRASTRUCTURE Text and ImagesBiographical DictionaryWHO TimelinesTime Period DirectoryWHEN MapsGazetteer WHERE Syndetic StructureThesaurusWHAT Special Display ToolsAuthority ControlFacet Learners Dossiers

March 2006NaCTeM – Ray R. Larson Acknowledgements Electronic Cultural Atlas Initiative project This work was partially supported by the Institute of Museum and Library Services through a National Leadership Grant for Libraries, award number LG , Oct Sept 2006 entitled Supporting the Learner: What, Where, When and Who – See: Michael Buckland, Fred Gey, Vivien Petras, Matt Meiske, Kim Carl Contact: