Prof. Ray R. Larson University of California, Berkeley School of Information Developing a Metadata Infrastructure for Information Access: What, Where,

Slides:



Advertisements
Similar presentations
GMD German National Research Center for Information Technology Darmstadt University of Technology Perspectives and Priorities for Digital Libraries Research.
Advertisements

March 2006NaCTeM – Ray R. Larson Prof. Ray R. Larson University of California, Berkeley School of Information Metadata as Infrastructure for Information.
July 11, 2008CNI-JISC-UKOLN, Belfast1 Irish Scholarly Resources: Digitisation, Access, and Context: 2. Making Contextual Resources Accessible for Digital.
Using Print Reference Sources for Research
MARC 101 for Non-Catalogers Colorado Horizon Users Group Meeting Philip S. Miller Library Castle Rock, CO May 29, 2007.
Grid & Libraries, 10/18/04.1 Second Invitational Berkeley – Academia Sinica Grid Digital Libraries Workshop, Taipei, October 18, 2004 Grid Middleware Application.
Entering A New ERA : The European Research Area Ken Miller UK Data Archive University Of Essex June 11-15, 2002.
Information Retrieval in Practice
Hanoi, Dec 6, 2008ECAI-PNC Laptops1 Laptops and Libraries: Decentralized Access to Explanatory Resources Michael Buckland University of California, Berkeley.
7/16/2002JCDL 2002, Ray Larson The “Entry Vocabulary Index” Approach to Multilingual Search Ray R. Larson, Fredric Gey, Aitao Chen, Michael Buckland University.
Data and design issues in historical GIS II: The place-based information interface. Contextualizing Places: Gazetteers, Maps, and Bibliographical Searches.
USF, Feb 3, 2009Reference resources1 Access to Reference Resources In a Digital Environment Michael Buckland University of California, Berkeley Electronic.
Searching Text and Data via Common Geography 1 SEARCHING TEXT AND DATA via COMMON GEOGRAPHY Geographic Information Retrieval: Searching Text and Data via.
Topic, Place, and Time Period Notes for Discussion Michael Buckland Library of Congress, February 11, 2005.
WISER: History Advanced OLIS searches Isabel Holowaty, History Librarian Kate Petherbridge, Upper Camera Superintendent.
Access to Digital Heritage Resources using What, Where, When and Who Michael Buckland Electronic Cultural Atlas Initiative University of California, Berkeley.
Exemplar Projects in Humanities Grid Computing Paul S. Ell Centre for Data Digitisation & Analysis Queen’s Belfast ISGC 2007.
Temporal and Geographic Context for Digital Books Ray R. Larson ECDL Books Online Workshop 2009 Credits: Ryan Shaw, Michael Buckland, Jeanette Zerneke,
SLIDE 1IS 240 – Spring 2007 Lecture 8: Clustering University of California, Berkeley School of Information IS 245: Organization of Information.
Nov 15, 2005Ohio State University Libraries1 What, Where, When, and Who: A Renaissance for the Reference Collection Michael Buckland School of Information.
Geography, Time, and the Representation of Cultural Change – Experience from a Large Collaboration: The Electronic Cultural Atlas Initiative (ECAI) Michael.
Seamless Searching of Numeric and Textual Resources Funded by a National Library Leadership Grant from the Institute of Museum and Library Services Michael.
SLIDE 1IS 257 – Fall 2009 Controlled Vocabularies University of California, Berkeley School of Information IS 245: Organization of Information.
Sept 21, 2007Friday Afternoon Seminar1 Friday Afternoon Seminar, Sept 21, 2007 Reference Library Service in a Digital Environment: A Question; an Explanation;
The Library Cataloging Tradition
SLIDE 1IS 245 – Spring 2009 Codes and Rules for Description: History University of California, Berkeley School of Information IS 245: Organization.
Bringing Lives to Light: Biography in Context Ray R. Larson Berkeley ISchool + Kyoto University Workshop 2009 Credits: Ryan Shaw, Michael Buckland, Jeanette.
Printed Resources and Digital Information The Digital Difference in Reference Collections Michael Buckland, School of Information Management & Systems,
Oct 2, 2008SALT2, Uppsala1 The Educational Role of the Library in a Digital Environment Part II: Design for Learning. Michael Buckland NORSLIS Visiting.
Mar 24, 2009ECAI/CAA Williamsburg1 Electronic Cultural Atlas Initiative – Computer Applications in Archaeology Joint Conference, Williamsburg, “Making.
ECAI – CAA Conference, Fargo, April 19, 2006 Geo-temporal Indexing: Events, Lives, and Geographical Features Michael Buckland also Kim Carl, Sarah Ellinger.
The Significance of Vocabulary Michael Buckland School of Information Management and Systems University of California, Berkeley.
Incorporating Historical and Geographical Dimensions into a Search Interface Michael Buckland Electronic Cultural Atlas Initiative University of California,
SLIDE 1IS 257 – Fall 2007 Codes and Rules for Description: History University of California, Berkeley School of Information IS 245: Organization.
July 7, 2008ISKO Montréal1 ISKO 2008, Montréal 4W Vocabulary Mapping Across Diverse Reference Genres Michael Buckland and Ryan Shaw (& others) Electronic.
Overview of Search Engines
Araba Dawson-Andoh 122 A Alden Library
SLIDE 1ISGC Taipei, Taiwan Grid-based Search and Data Mining Using Cheshire3 In collaboration with Robert Sanderson University of Liverpool.
Result presentation. Search Interface Input and output functionality – helping the user to formulate complex queries – presenting the results in an intelligent.
Answer-Providing Tools (APTs) Dr. Dania Bilal IS 530 Spring 2006.
Improving the Catalogue Interface using Endeca Tito Sierra NCSU Libraries.
Next generation library catalogs and the integration of gazetteer information for geographical research Julie Sweetkind-Singer Assistant Director of Geospatial,
1 Catalog Displays, Retrieval, and FAST May 31, 2005.
Project Builder and MediaMatrix: Redefining Access in the Digital Age Dean Rehberger and Michael Fegan MERLOT August 7-10, 2006 New Orleans, LA.
Michael Buckland and Ryan Shaw. Electronic Cultural Atlas Initiative, and School of Information, University of California, Berkeley. ECAI / PNC Joint Meeting,
Library Research. Objectives Locate books and articles in the library using the online catalog Explore subject directories Explore digital libraries and.
International Document Summer School, Tromsø, 2005 Documents and Media Convergence. Michael Buckland, University of California, Berkeley
The Library Cataloging Tradition Marty Kurth CS 431 February 9, 2005 [slides stolen from Diane Hillmann]
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
Cultural Heritage Markup Strategies Bibliotheca Alexandria –Digital Library of the Middle East –January, 2006.
NCSU Libraries Kristin Antelman NCSU Libraries June 24, 2006.
Information Sources and Classification. Where does Information Come From?                  
History Study Centre Demonstration. History Study Centre A wealth of primary and secondary resources for historians. Content is selected and organised.
Lazerow Lecture, UTK, University of Tennessee, Knoxville, School of Information Sciences, What, Where, When, and Who: Redesigning the Reference Environment.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
Subject Headings for Reference Everything You Need to Know About Subject Headings in One Easy Lesson By Dr. Nancy J. Becker Presented by Dr. Kevin Rioux.
Controlled Vocabulary & Thesaurus Design Types of Controlled Vocabularies.
LIS 204: Introduction to Library and Information Science Week Nine Kevin Rioux, PhD.
Commission on Cyberinfrastructure for the Humanities and Social Sciences Metadata as Infrastructure, Interoperability, and the Larger Context Michael Buckland,
How Researchers Search for Manuscript and Archival Collections Susan Hamburger, Ph.D. Penn State Society of American Archivists Denver, CO August 31, 2000.
Subject Headings for Reference
Dr. Dania Bilal IS 530 Spring 2005
Categories of APTs Dr. Dania Bilal IS 530 Fall 2005.
Professional development training on cataloging at the University Wisconsin-Madison Memorial Library, USA 14th October -24th October, 2016 Aigerim Shurshenova.
Multilingual Indexes for Detection and Translation
Vocabulary, Statistics, Time and Geography
University of California, Berkeley
Time Period Directories
Márton Németh – László Drótos How to catalogue a web archive?
Metadata supported full-text search in a web archive
Presentation transcript:

Prof. Ray R. Larson University of California, Berkeley School of Information Developing a Metadata Infrastructure for Information Access: What, Where, When and Who?

Overview  Metadata as Infrastructure –What, Where, When and Who?  What are Entry Vocabulary Indexes? –Notion of an EVI –How are EVIs Built  Time Period Directories –Mining Metadata for new metadata  4W Demo  New Project: Bringing Lives to Light

Metadata as Infrastructure  The difference between memorization and understanding lies in knowing the context and relationships of whatever is of interest. When setting out to learn about a new topic, a well-tested practice is to follow the traditional “5Ws and the H”: Who?, What?, When?, Where?, Why?, and How?

Metadata as Infrastructure  The reference collections of paper-based libraries provide a structured environment for resources, with encyclopedias and subject catalogs, gazetteers, chronologies, and biographical dictionaries, offering direct support for at least What, Where, When, and Who.  The digital environment does not yet provide an effective, and easily exploited, infrastructure comparable to the traditional reference library.

What? Searching texts by topic, e.g. Dewey, LCSH, any subject index, or category scheme applied to documents. Two kinds of mapping in every search: Documents are assigned to topic categories, e.g. Dewey Queries have to map to topic categories, e.g. Dewey’s Relativ Index from ordinary words/phrases to Decimal Classification numbers. Also mapping between topic systems, e.g. US Patent classification and International Patent Classification.

Texts ‘What’ searches involve mapping to controlled vocabularies Thesaurus/ Ontology

Start with a collection of documents. Building a Search Term Recommender

Classify and index with controlled vocabulary Or use a pre- indexed collection. Index

Problem: Controlled Vocabularies can be difficult for people to use. “pass mtr veh spark ign eng” Index Use: “Economic Policy” In Library of Congress subj For: “Wirtschaftspolitik”

Solution: Entry Level Vocabulary Indexes. Index EVI pass mtr veh spark ign eng” = “Automobile”

“What” and Entry Vocabulary Indexes  EVIs are a means of mapping from user’s vocabulary to the controlled vocabulary of a collection of documents…

Has an Entry Vocabulary Module been built? User selects a subject domain of interest. Download a set of training data. Build associations between extracted terms & controlled vocabularies. Map user’s query to ranked list of controlled vocabulary terms Part of speech tagging Use an existing EVI. Extract terms (words and noun phrases) from titles and abstracts. User selects search terms from the ranked list of terms returned by the EVI. YES Building an Entry Vocabulary Module (EVI) Searching For noun phrases Internet DB indexed with a controlled vocabulary. Domains to select from: Engineering, Medicine, Biology, Social science, etc. User has question but is unfamiliar with the domain he wants to search. NO Building and Searching EVIs

Technical Details Download a set of training data. Build associations between extracted terms & controlled vocabularies. Part of speech tagging Extract terms (words and noun phrases) from titles and abstracts. Building an Entry Vocabulary Module (EVI) For noun phrases Internet DB indexed with a controlled vocabulary.

Association Measure C ¬C t a b ¬t c d Where t is the occurrence of a term and C is the occurrence of a class in the training set

Association Measure  Maximum Likelihood ratio W(C,t) = 2[logL(p 1,a,a+b) + logL(p 2,c,c+d) - logL(p,a,a+b) – logL(p,c,c+d)] where logL(p,n,k) = klog(p) + (n – k)log(1- p) and p 1 = p 2 = p= a a+b c c+d a+c a+b+c+d Vis. Dunning

Alternatively  Because the “evidence” terms in EVIs can be considered a document, you can also use IR techniques and use the top-ranked classes for classification or query expansion

Find Plutonium In Arabic Chinese Greek Japanese Korean Russian Tamil Statistical association Digital library resources

EVI example EVI 1 Index term: “pass mtr veh spark ign eng” User Query “Automobile” EVI 2 Index term: “automobiles” OR “internal combustible engines”

But why stop there? Index EVI

“Which EVI do I use?” Index EVI Index EVI Index EVI

EVI to EVIs Index EVI Index EVI Index EVI EVI 2

Find Plutonium In Arabic Chinese Greek Japanese Korean Russian Tamil Why not treat language the same way?

Support for the Learner with a Query Any resource: Audio, Images, Texts, Numeric data, Objects, Virtual reality, Webpages Any catalog: Archives, Libraries, Museums, TV, Publishers Facet Vocabulary Displays WHAT Thesaurus Cross- e.g. LCSH references WHERE Gazetteer Map WHEN Period directory Timeline WHO Biograph. dict. Personal e.g. Who’s Who relations

Texts Numeric datasets It is also difficult to move between different media forms Thesaurus/ Ontology EVI

Searching across data types  Different media can be linked indirectly via metadata, but often (e.g. for socio-economic numeric data series) you also need to specify WHERE to get correct results

Texts Numeric datasets But texts associated with numeric data can be mapped as well… Thesaurus/ Ontology captions EVI

Texts Numeric datasets But there are also geographic dependencies… Thesaurus/ Ontology captionsMaps/ Geo Data EVI

WHERE: Place names are problematic…  Variant forms: St. Petersburg, Санкт Петербург, Saint-Pétersbourg,...  Multiple names: Cluj, in Romania / Roumania / Rumania, is also called Klausenburg and Kolozsvar.  Names changes: Bombay  Mumbai.  Homographs:Vienna, VA, and Vienna, Austria; –50 Springfields.  Anachronisms: No Germany before 1870  Vague, e.g. Midwest, Silicon Valley  Unstable boundaries: 19th century Poland; Balkans; USSR  Use a gazetteer!

WHERE. Geo-temporal search interface. Place names found in documents. Gazetteer provided lat. & long. Places displayed on map. Timebar 

Zoom on map. Click on place for a list of records. Click on record to display text.

Texts Numeric datasets So geographic search becomes part of the infrastructure Thesaurus/ Ontology GazetteerscaptionsMaps/ Geo Data EVI

WHEN: Search by time is also weakly supported…  Calendars are the standard for time  But people use the names of events to refer to time periods  Named time periods resemble place names in being: –Unstable: European War, Great War, First World War –Multiple: Second World War, Great Patriotic War –Ambiguous: “Civil war” in different centuries in England, USA, Spain, etc.  Places have temporal aspects & periods have geographical aspects: When the Stone Age was, varies by region

Vocabularies are the key! Want: Kung-fu movies? Use LCSH: Hand-to-hand fighting, oriental, in motion pictures. Linking vocabularies WHAT, WHERE, WHEN Library subject headings Topic – Geographic subdivision – Chronological subdivision Place name gazetteer: Place name – Type – Spatial markers (Lat & long) – When Time Period Directory Period name – Type – Time markers (Calendar) – Where

Texts Numeric datasets Time period directories link via the place (or time) Thesaurus/ Ontology GazetteerscaptionsMaps/ Geo Data EVI Time Period Directory Time lines, Chronologies

WHEN: Time Period Directory Timeline Link to Catalog Link to Wikipedia

WHO: Biographical Dictionary Complex relationships Life events metadata WHAT: Actions prisoner WHERE: Places Holstein WHEN: Times WHO: People Margaret Sambiria Need external links

Any document, object, or performance Any resource: Audio, Images, Texts, Numeric data, Objects, Virtual reality, Webpages Any catalog: Archives, Libraries, Museums, TV, Publishers Connect it with its context – and other resources. Facet Vocabulary Displays WHAT Thesaurus Cross- e.g. LCSH references WHERE Gazetteer Map WHEN Period directory Timeline WHO Biograph. dict. Personal e.g. Who’s Who relations

Demo of search interface

Entry Vocabulary Index suggests correct LCSH with different spelling

Related places

Potentially related people

Potentially related periods

Mostly in India 16 th - 18 th century

Find out more about this area.

Different Browsing Options!

Zooming in to South Asia Restricting time frame Select

More information about the country of India…

Wikipedia CIA Factbook BBCEthnologue Berkeley Natural History Museums

Historical events – linked to Library catalog & Wikipedia : none avail. for this time period

ECAI Cultural Atlases: presenting history in its geographical & chronological contexts

Mongol Empire Video

Demo Interface 

New Project: Bringing Lives to Light: Biography in Context Ray R. Larson, Michael Buckland, Fredric Gey University of California, Berkeley

Overview  Focussing on the Who in Who, What, Where and When  Types of Biographical Markup

WHEN, WHERE and WHO  Catalog records found from a time period search commonly include names of persons important at that time. Their names can be forwarded to, e.g., biographies in the Wikipedia encyclopedia.

Place and time are broadly important across numerous tools and genres including, e.g. Language atlases, Library catalogs, Biographical dictionaries, Bibliographies, Archival finding aids, Museum records, etc., etc. Biographical dictionaries are also heavy on place and time: Emanuel Goldberg, Born Moscow PhD under Wilhelm Ostwald, Univ. of Leipzig, Director, Zeiss Ikon, Dresden, Moved to Palestine Died Tel Aviv, Life as a series of episodes involving Activity (WHAT), WHERE, WHEN, and WHO else.

Texts Numeric datasets A new form of biographical dictionary would link to all Thesaurus/ Ontology GazetteerscaptionsMaps/ Geo Data EVI Time Period Directory Time lines, Chronologies Biographical Dictionary

Projected Work  Develop XML markup for Biographical Events  Most likely to be adaptation and extension of existing biographical event markup –Example: EAC/EAD  Harvest biographical resources –Wikipedia, etc.  Integrate as next generation of current interface

EAC/EAD Biographical Note 1892, May 7 Born, Glencoe, Ill A.B., Yale University, New Haven, Conn Married Ada Hitchcock Served in United States Army

Wikipedia data Life events metadata WHAT: Actions prisoner WHERE: Places Holstein WHEN: Times WHO: People Margaret Sambiria Need external links

A Metadata Infrastructure CATALOGS Achives Historical Societies Libraries Museums Public Television Publishers Booksellers Audio Images Numeric Data Objects Texts Virtual Reality Webpages RESOURCES INTERMEDIA INFRASTRUCTURE Biographical DictionaryWHO TimelinesTime Period DirectoryWHEN MapsGazetteer WHERE Syndetic StructureThesaurusWHAT Special Display ToolsAuthority ControlFacet Learners Dossiers

Acknowledgements  Electronic Cultural Atlas Initiative project  This work is being supported supported by the Institute of Museum and Library Services through a National Leadership Grant for Libraries  Contact: