Evidence from Metadata LBSC 796/INFM 718R Session 9: November 5, 2007 Douglas W. Oard.

Slides:



Advertisements
Similar presentations
Putting the Pieces Together Grace Agnew Slide User Description Rights Holder Authentication Rights Video Object Permission Administration.
Advertisements

ETD Preservation Workshop Session Four: Collection Management for Preservation Gail McMillan, Virginia Tech.
UKOLN, University of Bath
February Harvesting RDF metadata Building digital library portals with harvested metadata workshop EU-DL All Projects concertation meeting DELOS.
METS: An Introduction Structuring Digital Content.
Developing a Metadata Exchange Format for Mathematical Literature David Ruddy Project Euclid Cornell University Library DML 2010 Paris 7 July 2010.
Module 5a: Authority Control and Encoding Schemes IMT530: Organization of Information Resources Winter 2007 Michael Crandall.
Search and Ye Shall Find (maybe) Seminar on Emergent Information Technology August 20, 2007 Douglas W. Oard.
Discovery and Delivery Week 8 LBSC 671 Creating Information Infrastructures.
Leveraging Your Taxonomy to Increase User Productivity MAIQuery and TM Navtree.
Discove r Humanities and Social Science Electronic Thesaurus - HASSET Faceted search HASSET is the subject thesaurus that the UK Data Service uses to index.
Evidence from Metadata LBSC 796/CMSC 828o Session 6 – March 1, 2004 Douglas W. Oard.
Information Retrieval in Practice
Information Retrieval Review
RDF Kitty Turner. Current Situation there is hardly any metadata on the Web search engine sites do the equivalent of going through a library, reading.
Introducing Symposia : “ The digital repository that thinks like a librarian”
1 CS 502: Computing Methods for Digital Libraries Lecture 17 Descriptive Metadata: Dublin Core.
Resource Description Framework ( RDF ) Xinxia An.
IMT530- Organization of Information Resources1 Feedback Like exercises –But want more instructions and feedback on them –Wondering about grading on these.
Overview of Search Engines
Metadata: Its Functions in Knowledge Representation for Digital Collections 1 Summary.
Guest Lecture LIS 656, Spring 2011 Kathryn Lybarger.
UKOLUG - July Metadata for the Web RDF and the Dublin Core Andy Powell UKOLN, University of Bath UKOLN.
Metadata Standards and Applications 4. Metadata Syntaxes and Containers.
Metadata and identifiers for e- journals Copenhagen Juha Hakala Helsinki University Library
Metadata Week 4 LBSC 671 Creating Information Infrastructures.
1 © Netskills Quality Internet Training, University of Newcastle Metadata Explained © Netskills, Quality Internet Training.
The Boolean Retrieval Model LBSC 708A/CMSC 838L Session 2 - September 11, 2001 Philip Resnik.
8/28/97Organization of Information in Collections Introduction to Description: Dublin Core and History University of California, Berkeley School of Information.
Metadata: An Overview Katie Dunn Technology & Metadata Librarian
Metadata Week 4 LBSC 671 Creating Information Infrastructures.
Metadata Considerations Implementing Administrative and Descriptive Metadata for your digital images 1.
Evidence from Metadata LBSC 796/INFM 718R Session 9: April 6, 2011 Douglas W. Oard.
By: Dan Johnson & Jena Block. RDF definition What is Semantic web? Search Engine Example What is RDF? Triples Vocabularies RDF/XML Why RDF?
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 1 Metadata Helen Aristar Dry Eastern Michigan University LINGUIST List.
Lifecycle Metadata for Digital Objects September 11, 2002 Major archival and digital library metadata schemes.
Overview of EAD Jenn Riley Metadata Librarian Digital Library Program.
1 Metadata –Information about information – Different objects, different forms – e.g. Library catalogue record Property:Value: Author Ian Beardwell Publisher.
Metadata and Documentation Iain Wallace Performing Arts Data Service.
Metadata Bridget Jones Information Architecture I February 23, 2009.
 Why do we catalog?  Why do we classify?  What aspects are important?  What aspects can we let go of?
Introduction to metadata
Lifecycle Metadata for Digital Objects November 1, 2004 Descriptive Metadata: “Modeling the World”
Structure of IR Systems INST 734 Module 1 Doug Oard.
Evidence from Metadata INST 734 Doug Oard Module 8.
RDA DAY 1 – part 2 web version 1. 2 When you catalog a “book” in hand: You are working with a FRBR Group 1 Item The bibliographic record you create will.
1 Dublin Core & DCMI – an introduction Some slides are from DCMI Training Resources at:
The physical parts of a computer are called hardware.
Metadata : an overview XML and Educational Metadata, SBU, London, 10 July 2001 Pete Johnston UKOLN, University of Bath Bath, BA2 7AY UKOLN is supported.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
THE BIBFRAME EDITOR AND THE LC PILOT Module 3 – Unit 1 The Semantic Web and Linked Data : a Recap of the Key Concepts Library of Congress BIBFRAME Pilot.
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
Evidence from Content INST 734 Module 2 Doug Oard.
Sharing Digital Scores: Will the Open Archives Initiative Protocol for Metadata Harvesting Provide the Key? Constance Mayer, Harvard University Peter Munstedt,
Evidence from Metadata INST 734 Doug Oard Module 8.
PubMed …featuring more than 20 million citations for biomedical literature from MEDLINE, life science journals, and online books.
Differences and distinctions: metadata types and their uses Stephen Winch Information Architecture Officer, SLIC.
Feature Assignment LBSC 878 February 22, 1999 Douglas W. Oard and Dagobert Soergel.
Describing resources II: Dublin Core CERN-UNESCO School on Digital Libraries Rabat, Nov 22-26, 2010 Annette Holtkamp CERN.
The ___ is a global network of computer networks Internet.
Metadata & Repositories Jackie Knowles RSP Support Officer.
Dublin Core Basics Workshop Lisa Gonzalez KB/LM Librarian.
Attributes and Values Describing Entities. Metadata At the most basic level, metadata is just another term for description, or information about an entity.
Some basic concepts Week 1 Lecture notes INF 384C: Organizing Information Spring 2016 Karen Wickett UT School of Information.
WHAT DOES THE FUTURE HOLD? Ann Ellis Dec. 18, 2000
Attributes and Values Describing Entities.
Cataloging the Internet
The new RDA: resource description in libraries and beyond
Attributes and Values Describing Entities.
Presentation transcript:

Evidence from Metadata LBSC 796/INFM 718R Session 9: November 5, 2007 Douglas W. Oard

Problems with “Free Text” Search Homonymy –Terms may have many unrelated meanings –Polysemy (related meanings) is less of a problem Synonymy –Many ways of saying (nearly) the same thing Anaphora –Alternate ways of referring to the same thing

Behavior Helps, But not Enough Privacy limits access to observations Queries based on behavior are hard to craft –Explicit queries are rarely used –Query by example requires behavior history “Cold start” problem limits applicability

A “Solution:” Concept Retrieval Develop a concept inventory –Uniquely identify concepts using “descriptors” –Concept labels form a “controlled vocabulary” –Organize concepts using a “thesaurus” Assign concept descriptors to documents –Known as “indexing” Craft queries using the controlled vocabulary

Two Ways of Searching Write the document using terms to convey meaning Author Content-Based Query-Document Matching Document Terms Query Terms Construct query from terms that may appear in documents Free-Text Searcher Retrieval Status Value Construct query from available concept descriptors Controlled Vocabulary Searcher Choose appropriate concept descriptors Indexer Metadata-Based Query-Document Matching Query Descriptors Document Descriptors

Boolean Search Example Canine AND Fox –Doc 1 Canine AND Political action –Empty Canine OR Political action –Doc 1, Doc 2 The quick brown fox jumped over the lazy dog’s back. Document 1 Document 2 Now is the time for all good men to come to the aid of their party. Volunteerism Political action Fox Canine Descriptor Doc 1Doc 2 [Canine] [Fox] [Political action] [Volunteerism]

Applications When implied concepts must be captured –Political action, volunteerism, … When terminology selection is impractical –Searching foreign language materials When no words are present –Photos w/o captions, videos w/o transcripts, … When user needs are easily anticipated –Weather reports, yellow pages, …

Agenda  Designing metadata Generating metadata Putting the pieces together

Aspects of Metadata What kinds of objects can we describe? –MARC, Dublin Core, FRBR, … How can we convey it? –MODS, RDF, OAI-PMH, METS What can we say? –LCSH, MeSH, PREMIS, … What can we do with it? –Discovery, description, reasoning

Functional Requirements for Bibliographic Records (FRBR) Work (e.g., a specific play) –Expression (e.g., a specific performance) Manifestation (e.g., a specific publisher’s DVD) –Item (e.g., a specific DVD) Responsible Entities (person, corporate body) Subject (concept, object, event, place)

FRBR in OCLC’s FictionFinder

Dublin Core Goals: –Easily understood, implemented and used –Broadly applicable to many applications Approach: –Intersect several standards (e.g., MARC) –Suggest only “best practices” for element content Implementation: –Initially 15 optional and repeatable “elements” Refined using a growing set of “qualifiers” –Now extended to 22 elements

Dublin Core Elements (version 1.1) Content Title Subject [LCSH, MeSH, …] Description Type Coverage [spatial, temporal, …] Related resource Rights Instantiation Date [Created, Modified, Copyright, …] Format Language Identifier [URI, Citation, …] Responsibility Creator Contributor Source Publisher

Resource Description Framework XML schema for describing resources Can integrate multiple metadata standards –Dublin Core, P3P, PICS, vCARD, … Dublin Core provides a XML “namespace” –DC Elements are XML “properties DC Refinements are RDF “subproperties” –Values are XML “content”

A Rose By Any Other Name … <rdf:RDF xmlns:rdf=" xmlns:dc=" <rdf:Description rdf:about=" Rose Bush A Guide to Growing Roses Describes process for planting and nurturing different kinds of rose bushes

Open Archives Initiative- Protocol for Metadata Harvesting (OAI-PMH)

Metadata Encoding and Transmission Standard (METS) Descriptive metadata (e.g., subject, author) Administrative metadata (e.g., rights, provenance) Technical metadata (e.g., resolution, color space) Behavior (which program can render this?) Structural map (e.g., page order) –Structural links (e.g., Web site navigation links) Files (the raw data) Root (meta-metadata!)

Open Archival Information System (OAIS) Reference Model

Agenda Designing metadata  Generating metadata Putting the pieces together

Thesaurus Design Thesaurus must match the document collection –Literary warrant Thesaurus must match the information needs –User-centered indexing Thesaurus can help to guide the searcher –Broader term (“is-a”), narrower term, used for, …

Challenges Changing concept inventories –Literary warrant and user needs are hard to predict Accurate concept indexing is expensive –Machines are inaccurate, humans are inconsistent Users and indexers may think differently –Diverse user populations add to the complexity Using thesauri effectively requires training –Meta-knowledge and thesaurus-specific expertise

Machine-Assisted Indexing Goal: Automatically suggest descriptors –Better consistency with lower cost Approach: Rule-based expert system –Design thesaurus by hand in the usual way –Design an expert system to process text String matching, proximity operators, … –Write rules for each thesaurus/collection/language –Try it out and fine tune the rules by hand

Machine-Assisted Indexing Example //TEXT: science IF (all caps) USE research policy USE community program ENDIF IF (near “Technology” AND with “Development”) USE community development USE development aid ENDIF near: within 250 words with: in the same sentence Access Innovations system:

Machine Learning: kNN Classifier

“Folksonomies”

“Named Entity” Tagging Machine learning techniques can find: –Location –Extent –Type Two types of features are useful –Orthography e.g., Paired or non-initial capitalization –Trigger words e.g., Mr., Professor, said, …

Normalization Variant forms of names (“name authority”) –Pseudonyms, partial names, citation styles Acronyms and abbreviations –Organizations, political entities, projects, … Co-reference resolution –References to roles or objects rather than names –Anaphoric pronouns for an antecedent name

Example: Bibliographic References

Agenda Designing metadata Generating metadata  Putting the pieces together

Supporting the Search Process Source Selection Search Query Selection Ranked List Examination Document Delivery Document Query Formulation IR System Indexing Index Acquisition Collection

Putting It All Together Free TextBehaviorMetadata Topicality Quality Reliability Cost Flexibility

Before You Go! On a sheet of paper, please briefly answer the following question (no names): What was the muddiest point in today’s class?