Automatic Metadata Generation Charles Duncan
JISC Project March – July 2009 Gather use cases both to inform uptake of available automatic metadata tools and to inform future tool requirements Deliverables –Synthesis report on automated metadata generation and its uses at national and international levels –General guidance document on different automated metadata generation approaches for service providers in HE –Priorities for required tools and services with an outline of costs and benefits
Generic View Applicable to: –The digital library, eLearning, Scholarly Communications, eScience, Curation and Preservation
Importance of USE Generating metadata is worthless unless there is a clear USE for that metadata Generation use cases will require matching metadata use examples
Questions to consider where useful metadata lies what tools exist to extract metadata how these tools should be integrated into the deposit process how the many different formats of resources can be handled
Why use metadata? Discovery –Search –Refining searches –Exposed information allows human judgement Recommendation service –Tag clouds –Popularity measures (promote resources and resource owners) Ability to get additional information (tracks, film details, etc) Organising information helps retain knowledge Stakeholder-specific – benefits for suppliers/consumers Making links with other people with similar profiles Auditing – ability to identify gaps, quality management
Where useful metadata lies The way people organise their resources Behaviour (playlists) Personal profiles Image metadata (embedded and transportable) –Pdf, office docs, mp3, video (mpeg, dvd) Databases (imdb, albums, amazon, bar codes, isbn, etc) Identity –Authenticated in a role, attribution: capture of ownership information and affiliation Controlled vocabularies – mapping
Golddust c-values, user oriented Image geographic info (exif) gps location and direction (e.g iphone/mac photo manager) Dynamic metadata – –Use of object, comments, citations, tracking use and e.g location in a VLE –Amega report User tagging - Flickr Recommendation service –Metadata – resources –Metadata - users
What tools exist to extract metadata iTunes From input From databases Metadata “scrapers” – e.g. zotero, refworks (proquest) openURL link resolvers (identifier standards) iPhoto face recognitions Transcription of audio (e.g. Dolphin) Text mining – frequency of word use, context of word use (wordle.com, autonomy) Google, amazon, lastfm, spotify, (can also use negative results – dislikes) Creating thumbnails, validate file format (see RepoMann, Jove, Driod) ROAR harvests and checks file formats in repositories Output to multiple formats
How to integrated tools into deposit Scraping – adding own metadata - converting formats – storing iTunes ripping a cd – what is the deposit process? (gracenotes) Size of the community matters – common objects that many people use Integration tools for AMG, deposit and repositories/archives
Handle different formats Formats for resources Formats for metadata
Use case 1 Overview Metadata Generation Metadata Use
Use case 2 Overview Metadata Generation Metadata Use
Use case 3 Overview Metadata Generation Metadata Use