Building Virtual Collections

Building Virtual Collections
by Nancy Stuart University of Victoria For the BCLA pre-conference Beyond Limits: Building Open Access Collections April 19, 2007 11/18/2018

Introduction This session will examine the underpinnings of building virtual collections; metadata and application profiles Who I am – Technical Services Librarian and ETD Project Manager at University of Victoria and member of the CARL IR Metadata Working Group Good afternoon and welcome to this session on Building virtual collections. This session will attempt to give you an overview of the back room detail of building virtual collections. In this half hour I would like to present important issues to consider when building virtual collections. By the end of the session you will have a better understanding of metadata and the important role it plays in resource discovery. Standards are very important and by the end of the session you will have a better understanding of what a metadata application profile is and the role it plays when building your virtual collections. I first got involved with virtual collections when we started an ETD (Electronic Theses and Dissertations) Pilot Project at UVic in There is nothing like a real project to help you get your feet wet. So as I got more involved with the ETD Project, I started taking courses and reading the literature about such things as metadata and application profiles. In 2005 I joined the Canadian Repositories Metadata Interest Group and then in 2006 volunteered to help with the creation of a CARL metadata application profile for the CARL harvester. I will talk in more detail about each of these as we go through the session. 11/18/2018

Session Outline CARL vision CARL harvester Metadata
Application profiles CARLCore Institutional Repositories Challenges This afternoon we will look at the CARL vision for building virtual collections in Canada. We will look at the CARL harvester, how it started and its role in building virtual collections. I will do an overview of metadata; what it is and its role in resource discovery. We will look at application profiles; what they are and their role in resource discovery. I will spend some time on the CARLCore application profile and its development. I will finish with briefly looking at the state of IRs and the challenges facing us in building our virtual collections. 11/18/2018

CARL vision for virtual collections
CARL Institutional Repository Project Federated searching Scholarly publishing Open Access environment CARL’s vision to create virtual collections, is to implement institutional repositories as a coordinated and integrated strategy to aggregate the digital research output of academic institutions. So to further this vision the CARL Institutional Repositories Pilot Project is was launched in It was a Canadian initiative to implement institutional repositories at several Canadian research libraries—ensuring that Canadian institutions remain at the leading edge of innovation in scholarly publishing. The Project now continues to facilitate discussion of lessons learned and best practices for implementing IRs and paves the way for other Canadian universities by examining the feasibility of IRs in the Canadian context. One goal is to provide a single search interface for all IRs across Canada. Over the past 5 years, this has led to the development of the CARL harvester. The development of institutional repositories at academic institutions encourages scholarly publishing in the Open Access environment. You will hear more about this from three specific institutions later this afternoon. For more information visit the CARL IR Project – online resource portal. 11/18/2018

Virtual collections Definition Small or large Text, images, maps
Big picture First I would like to put virtual collections in context. Virtual collections are simply a group of items in digital format. They can be mixed, containing text, images, or maps. They can be small, such as a database of historical photographs of a few hundred images in a small library; to a large collections such as the Our Roots project or the Peel’s Prairie Provinces project. So now that we have entered the virtual or digital world, it makes us think differently about our items in our collections. To build a virtual collection, we need to consider how patrons will access it and the necessary metadata for resource discovery. So let’s step back a moment to look at the big picture of how we have moved from print to digital. 11/18/2018

Print to Digital Digital Description Local catalogue Union catalogue
Print Description Local catalogue Union catalogue Access Points AACR2/MARC Books – stacks author card ftp Maps – drawers title OPAC WorldCat Videos – film desk subject ________________________________________________________________________________ Digital Description Local catalogue Union catalogue Metadata DC/VRA Books – PDFs author OPAC OAI-PMH CARL IR Images - JPGs title DSpace OAIster Music – MP3s keyword local DB Best practices AP AP In the print world we had collections of books, maps and videos housed in different areas of the library. We created a catalogue, originally a card catalogue and then moved to an online catalogue with various access points, such as author, title and subject, so we could discover the resources in our collection. These access points were governed by a complex set of rules and coding, generally AACR2 and MARC. We then created union catalogues like WorldCat and ftpeed the data about the resources to the union catalogue. In the virtual world we operate differently. In a virtual collection all the items are stored together in a database or repository. We create metadata such as author, title and keywords to aid in resource discovery. The metadata schemas are less complex. You may be familiar with Dublin Core or VRA Core. Instead of complex rules we now have “Best practices”. If your local repository supports Open Archive Initiative – Protocol for Metadata Harvesting (OAI-PMH), the metadata in your repository can be harvested by a harvester. [continue with definition of harvester and repository] 11/18/2018

Repositories & Harvesters
What is a digital repository? Data provider Contains electronic texts, maps, learning objects, etc. & metadata What is a harvester? Service provider Harvests the metadata Single search interface We can think of the individual IR as a data provider. Typically they adopt the OAI-PMH technical framework to expose their metadata. Examples of repositories are local instances of DSpace, or E-prints which can expose their metadata for harvesting through OAI-PMH. Harvesters actually have two functions. One is to harvest or grab the metadata from individual repositories. The second is that harvesters are also called service providers because they provide a single search interface for all the metadata they harvest. The CARL harvester is a good example. Other examples you may be familiar with are OAIster or NDLTD for theses and dissertations. 11/18/2018

CARL harvester Established Mar. 2004 Hosted at Simon Fraser University
Uses OAI-PMH Currently 12 institutions – 12 more coming online this summer April 2007, over 30,000 records So let’s now turn to the CARL harvester. In March 2004 the CARL harvester was established to enhance access to Canadian IRs and intended to be an integral component of CARL’s proactive support of Institutional Repositories. First, a little background on the harvester. The Public Knowledge Project’s (PKP) Open Archives Harvester software is the basis for the CARL service. Hosted at SFU, the Library staff simplified it to be more consistent with the unqualified Dublin Core metadata schema used by the participating repositories. Using the Open Archives Initiative Protocol for Metadata Harvesting, the harvester makes a request of the participating IRs on a daily basis. There are now over 30,000 records from 12 participating institutions in this Canadian virtual collection accessible through the CARL harvester search interface. We look forward to exponential growth as 12 more IRs come online this summer. 11/18/2018

Metadata MARC is a metadata schema; coded and complex
Dublin Core is a metadata schema designed for digital resources; plain language and simple Consistency in values, such as Date and Type, affects resource discovery Metadata is a word used a lot in the library digital environment and I want to put it in context so there is an understanding of its role in building virtual collections. What is metadata? Metadata is structured information that describes, explains, locates or otherwise makes it easier to retrieve, use or manage an information resource. Metadata is often called data about data or information about information. Everyone is familiar with library WebOPACs, our online catalogues to our collections, both print and electronic. Traditional library cataloguing is a form of metadata, and MARC and the rulesets used with it such as AACR2 are metadata standards. So, MARC is a metadata schema, using coded tags to describe information about a resource. An example is the 100 tag that contains the author’s name. The 650 tag contains the subject of the resource. MARC is a very detailed schema, with many specific tags. Another metadata schema I am sure you have heard about is Dublin Core. Dublin Core is a simple yet effective metadata element set for describing a wide range of networked resources intended to facilitate discovery. It has been in development since It is less complex than MARC in that it has a basic set of 15 elements or tags. Each element is described by a name rather than a coded number. An example would be the element Author or Date. The CARL harvester is designed to harvest simple or unqualified Dublin Core. One of the challenges is to find consistency in the values used in some of the elements, such as Date and Type. This has a direct bearing on resource discovery. Metadata application profiles were developed to meet this challenge. 11/18/2018

Application profiles A defined element set, policies and “best practices” Standard vocabularies, eg. DCMI Type vocabulary Guidelines CARLCore - CARL harvester application profile So what is a metadata application profile? It is a set of metadata elements, policies, and guidelines defined for a particular application or implementation. So what this means is that an organization or user community with an Institutional Repository can create an application profile to stipulate which metadata elements are required, or recommended, or optional and can clarify “best practices” for the use of specific elements. Most of the application profiles created are human readable, in that they are printed or online documents. They can be expressed as machine readable by using XML. We could consider this for CARLCore Version 2.0. Within an application profile it is possible to standardize values for use in certain elements, such as Type and Language. Some standard vocabularies already exist such as the Dublin Core Metadata Initiative Type Vocabulary. We are working on a Type map plug-in and hoping to create a Type vocabulary based on one created by U of Montreal. One of the tools we hope to create would be a template to edit a map file. Because individual repositories can use qualified DC and the harvester uses unqualified DC, a map file is necessary to map each element. At best, application profiles are guidelines; they cannot be enforced, but if followed they can enhance resource discovery of virtual collections. CARL is supporting a Metadata Application Working Group comprised of repository implementers across Canada. Its mandate is to develop an application profile that can guide institutions contributing metadata to the CARL harvester. Two people instrumental in moving this initiative forward are Mark Jordan from SFU and Kathleen Shearer from CARL. Under their guidance and leadership the Metadata Application Working Group produced the first draft Application Profile at the Access 2006 Preconference session on Institutional Repositories in Ottawa last October. Next month the final version 1.0 will be released. 11/18/2018

ETD Application Profile
Name of Term dateCopyrighted Term URI Label Date Defined By Source Definition Date of a statement of copyright Source Comments [Domain] Comments Use the year of copyright from the thesis title page. Type of term element-refinement Refines date Refined By — Obligation M Occurrence — Here is an example of one element from an ETD Application Profile. This is for the element Date as you can see from the Label. But the qualified DC name of the term is dateCopyrighted. The definition is found at the term URI. The source definition from the Dublin Core Metadata Initiative list of terms is Date of a statement of copyright. The [Domain] Comments describe how it is used in this application. “Use the year of copyright from the thesis title page”. The Type of term is an element refinement, meaning it is qualified DC and it is refining or qualifying the element Date. The obligation M means this element is mandatory for this application. There is no one way to write an application profile, but there are examples and templates you can follow. 11/18/2018

CARLCore CARLCore Metadata Application Profile
Collaborative effort of members across Canada Based on unqualified Dublin Core Version 1.0 released mid May So let’s look at our CARL application profile for the CARL harvester, which we aptly named, CARLCore. CARLCore details a practical set of guidelines for improving consistency of the metadata aggregated in the CARL harvester. It is based on unqualified Dublin Core and is a collaborative effort of 7 working group members across Canada. The first draft was publicized in Oct and since then we have been incorporating feedback and editing the language for consistency. We will have the final version 1.0 out by mid May. It is important to have an application profile for the CARL harvester. Individual Institutional Repositories will be able to use the CARLCore Application Profile to guide their own best practices and develop their own application profiles. 11/18/2018

Institutional Repositories
Create their own metadata application profiles Summarize any follow up action items required of you CARL encourages members to implement IRs CARL harvester is one common point of access to Canadian IRs With CARLCore as their model, each institution with a repository should be writing their own Metadata Application profiles. Specific communities or collections within an institutional repository can have their own application profile. At UVic we have developed an ETD Application Profile just for the ETD Collection in our IR. The use of application profiles will ultimately aid in enhanced resource discovery of scholarly output across Canada. CARLCore can be used as a model metadata application profile for individual institutions to create their own. In the following session you will hear from three specific IRs. ________________________________________ This Harvester is the search service for the CARL Institutional Repositories Pilot Project and aggregates material from each of the participating Canadian institutions, allowing users to seamlessly search all of the repositories at once, using one common point of access. 11/18/2018

CARLCore element - Description
Definition: An account of the content of the resource. Obligation: Optional Generated By: Person Recommended Encoding: None Element Guidelines: Repeatable. Free text. May include an abstract, table of contents, summary description of the contents, or a URI which points to a description. Examples: Illustrates the use of a Webcam. [Image] This thesis describes the high velocity vibration of the proton ... [Text] Saturn. Galiano. Mayne. North and South Pender. [Text] Here is an element, Description, from our CARLCore application profile. It is quite different from the other example I showed, but includes the necessary guidelines for consistent usage within an application. 11/18/2018

Challenges to building virtual collections
Canadian context means French and English Author generated metadata Collective vs. institutional level Diverse and decentralized There are still many challenges. One that always confronts a Canadian national initiative is how to account for both French and English metadata created by contributing institutions. To quote Mark Jordan “It may prove challenging to use unqualified Dublin Core to express metadata in both languages or to allow limiting searches to records in a particular language”. Another is author created metadata. Many IRs allow author submitted metadata and even though there are guidelines for input, it is expensive to do quality control. In the same way it is expensive to have quality metadata applied by library staff. These are challenges for administrators to consider. The challenge is to present an application profile for the CARL harvester that allows for effective resource discovery at the aggregator level, but still allowing for rich metadata creation at the local or institutional level. CARL member institutions are diverse and decentralized yet the ability to search across all Canadian institutions for scholarly research material is an exciting and challenging goal. I know with creative energy and financial support this can be accomplished. So to recap, metadata and application profiles are the two key underpinnings of building good virtual collections. I hope this has helped you to better understand what is involved in building virtual collections. We can thank CARL for taking the initiative and supporting us as we build a Canadian scholarly virtual collection. 11/18/2018

More information Nancy Stuart: nstuart@uvic.ca
CARL harvester at: CARL website: CARL IR Project Please try out the CARL harvester. Add it to your resources on your library gateways. Are there any questions or comments? 11/18/2018

Building Virtual Collections

Similar presentations

Presentation on theme: "Building Virtual Collections"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Building Virtual Collections

Similar presentations

Presentation on theme: "Building Virtual Collections"— Presentation transcript:

Similar presentations

About project

Feedback