Download presentation
Presentation is loading. Please wait.
1
University of North Florida
Meeting 2016 Preparing your Digital Commons Metadata to be Harvested into the OAI: A Best Practice Marielle Veve University of North Florida June 3, 2016
2
Learning objectives: 1) Get some basic knowledge about the Open Archives Initiative 2) Customize a metadata submission form in Digital Commons in order to display specific metadata fields 3) Map the metadata to the appropriate BePress proprietary schema elements that can display correctly into the Open Archives Initiative (OAI) for harvesting purposes 4) Choose the most appropriate schema to display metadata for particular collections 5) Choose which metadata elements to display and which ones to hide from the Digital Commons public display
3
What is the Open Archives Initiative (OAI)?
Open Archives Initiative (OAI) is a protocol that harvests metadata from different institutional repositories or “Data providers” from all over the world and place it in one place. The metadata harvested by OAI is usually in Dublin Core schema (simple or qualified) and mostly describes digital or digitized items of scholarly nature in all formats, such as electronic theses & dissertations, digitized manuscripts, still images, moving images and audio. To get access to these records, users need to use an OAI “Service provider” (OAI Catalogs).
4
How does Open Archives Initiative (OAI) works?
OAI-PMH (Open Archives Initiative-Protocol for Metadata Harvesting) So how does the Open Archives Initiative (OAI) works? First the Data Providers who are registered members of the OAI (this can be Academic Institutions or Universities) provide metadata to describe the resources they host in an OAI compliant repository. This type of repository, such as the Digital Commons, will make these metadata available to others to be harvested. Then the Open Archives Initiative harvests this metadata from the different participant institutional repositories and hosts the data in one place, making it openly accessible to any other interested parties. The metadata will be displayed in Dublin Core schema (Simple or Qualified), depending on how someone formulates the search for it. Finally an OAI Service Provider needs to perform a feed request to the OAI for a particular set of metadata they are interested in harvesting. A feed request can be placed for a particular set of metadata for a specific collection that belongs to an institution in a particular schema. The purpose of a metadata request can be to either create an aggregated catalog that contains that type of material (for example a catalog that contains only Civil War digitized manuscripts from different institutions), or can be also used to generate other metadata records for local purposes, or simply if an institution wants to see how their metadata for a particular collection is mapped to the OAI and correct it. OAI Data Provider OAI Service Provider Harvests metadata from the different participant institutional repositories and hosts this data in one place, making it openly accessible to any interested parties OAI Data Provider members (i.e. Academic Institutions and Universities) create records in Dublin Core schema to describe the digitized items located in their IR. OAI Service Provider members (i.e. Organization or University) harvest the metadata from the OAI to build an end-user interface (i.e. catalog) or record to describe these materials
5
Making an OAI metadata feed request
What we NEED: 1) Institutional repository base url 2) Name of collection set 3) Schema metadata can be displayed So to place a feed request for a set of metadata for a particular collection, the service provider needs to specify in the request 3 specific things : First, the base url of the Institutional repository from where they wish to harvest metadata from. Second, the name of the collection set they which to harvest and third the metadata schema they want it displayed in. To get those 3 pieces of information, the user will have to perform 3 types of searches:
6
OAI Service Providers -First, to get the institutional repository base url, the user will need to go to the OAI Registry for Service Providers. This is available from the OAI website. The Registry of OAI Service Providers looks something like this. It contains a list of the registered Service providers from around the world that make their metadata available for others to harvest. Most of the IR at the US universities are listed here. This list contains the base urls that can be used to place a request from a particular institution.
7
Making an OAI metadata request
1) Institutional repository base url Ex. Of IR Base URL 1) Here is an example of an institutional repository base url. Is the base url from my institution, the University of North Florida. As you can see here, it provides the name of the type of IR, which in our case is Digital commons, then followed by the institutions’ code, and then OAI. So now we have the first piece of information we need to perform a metadata feed request to the OAI. Now we need to get the second piece of information to attach to this base url, which will specify the name of the collection we wish to harvest from.
8
Making an OAI metadata request
2) Name of collection set + ?verb=ListSets IR Base URL Collection sets 2) That second piece of information is the name of collection set. To get the name of a particular collection set you are interested in, you will need to formulate a request for all the collection sets that exist within a particular IR. To place this request for the name of all sets in an IR, type the institutional repository base url followed by the command ?verb=ListSets at the end. (See this example here, where I used my institutions’ repository base url and added the command to look for sets at the end). This request provided me with a list of the collections sets that are contained in my IR.
9
List of Metadata schemas
Making an OAI metadata request 3) Schema metadata can be displayed + ?verb=ListMetadataFormats IR Base URL List of Metadata schemas 3) Now we need to get the third piece of information to perform a metadata feed request to the OAI. This piece of information is the metadata schema we want our feed to display in. To get a list of the metadata formats (or schemas) that are provided by a specific institutional repository, you will need to formulate a request by typing the institutional repository base url followed by the command ?verb=ListMetadataFormats at the end. As seen in this example, the request provides a list of the metadata formats (or schemas) that are provided by my institutional repository.
10
Making an OAI metadata feed request
After getting these 3 things: 1) Institutional repository base url : Ex. 2) Name of collection set : Ex. etd 3) Schema metadata can be displayed: Ex. qdc Now can generate a complete request: -Now after getting the 3 pieces of information needed to perform a metadata feed request : the Institutional repository base url, the name of the collection set I want, and the metadata schema I want material to be displayed, I can go ahead and generate the request to the OAI. In this example, I generated a feed request to see all the metadata that is available for ETDs from the University of North Florida IR in Qualified Dublin core.
11
Customizing metadata upload form:
1) Make institutional decision of which metadata elements will be used to describe each type of material hosted in their Digital Commons IR. Maps 1. So now lets talk about customizing the metadata upload form in Digital Commons. Before any metadata is customized, the first step is to make an institutional decision (or list) of the descriptive metadata elements the institution wishes to use to describe each type of material they will host in their institutional repository. For example, most institutions that use Digital Commons host digitized materials from their special collections such as photographs, documents, or maps. In addition to this, others also host scholarship produced by the university such as ETDs (aka electronic theses & dissertations), conference proceedings, presentations, or grey literature. Usually, there are general guidelines that can be followed to describe each type of material format, but sometimes this decision can differ between institutions as each one will have a different set of collections they prioritize over others. Photographs Documents Theses & Dissertations
12
Same material type (photos), different institutions
-For that reason we find some institutions provide a higher level of detailed metadata for some type of collections while other institutions provide less detail to that same type of material. An example of this can be illustrated in this slide. As you can see, for the same type of material (a photograph) the institution to the left side provided many different types of elements for description, while the other one in the right side used other type of elements. [Point differences, sh, keywords…]
13
Customizing metadata upload form:
2) Make a list of metadata elements to be used for each type of material Photographs 2. Now make a list of the metadata elements that will be used to describe each type of material. Common fields that can be used for Special Collections materials are: creator’s name or unknown, title, keywords, abstract, creation date or circa date if an exact date is not known but can be estimated. For maps providing the coordinates for the exact location covered by the map can be useful metadata. For electronic theses & dissertations providing authorized headings or NACO for the authors that will distinguish him from other authors with the same name, the season of publication, year, department and college that granted the degree, as well as the advisors names will be useful to describe this type of materials. Also author provided keywords. In general, when deciding on which metadata elements to include for each material, take in consideration the metadata elements that will be most useful to the users of these materials. The purpose of these fields is to describe materials, but not all the fields that are included in the metadata submission form need to be displayed to the public in the Digital Commons public display. You can still choose which ones can be displayed publicly and which ones will be hidden and only used for organizational purposes. Maps Documents Theses & Dissertations
14
Customizing metadata upload form:
3) Compare metadata elements in the “wish” list with elements included in the Digital Commons metadata template Theses & Dissertations 3. Now after a list of the metadata elements that will be used to describe each type of material have been compiled, compare the metadata elements in each list with the elements that are already included in the Digital Commons metadata submission form for that format. In the example displayed here a comparison is done between the metadata elements in the wish list with the metadata elements that were included in the Digital Commons submission form for electronic theses & dissertations. Notice, are there any particular elements in your wish list that were not included in the Digital Commons template?
15
Customizing metadata upload form:
4) For the metadata elements in “wish” list that are NOT included in the Digital Commons metadata template, request BePress to add these additional fields Theses & Dissertations 4. If so, for the metadata elements in the “wish” list that were NOT included in the Digital Commons metadata template, make a request to BePress to add these additional fields to your institution’s metadata submission form for that type of material format.
16
Mapping of fields to the OAI:
Now for the fields that were included in the Digital Commons metadata submission template, check IF they mapped and HOW they mapped to the OAI feed. Digital Commons displays its metadata in the OAI in 3 schemas: Simple Dublin Core, Qualified Dublin Core, and the oai dublin core. Check how the metadata for a specific collection mapped into these 3 schemas. In this example we will see how the metadata in the Digital Commons submission template for photographs mapped in 2 different schemas: the Simple dublin core and Qualified dublin core. As you can see here in this example, some of the metadata fields from the original submission template display differently in each schema, while others display the same way. An example of this can be seen here with the “creation date” in QDC vs the plain “date” element used in SDC. Also the abstract/description field are different in both schemas. These are things to take in consideration when later choosing a schema to display your metadata.
17
Mapping metadata to BePress Proprietary Schema:
Now for the metadata fields that were NOT included in the Digital Commons submission template that you will request to add later, check how they can be possibly mapped into the OAI feed. Because BePress uses a proprietary schema, make sure you map your desired fields to the appropriate element name used in that schema. You will need to provide BePress with this information on how you would like your newly added metadata fields to map into the OAI.
18
Tools for Mapping metadata:
Description of desired field for particular format type Needed Fields in DigCom template How that maps into QDC element in OAI Author of ETD Authorized form NACO Name & Birth Date <dc:creator> Will display as 2nd creator element Title <dc:title> Pub. place *Will not be added at this point *Will not show in OAI mapping Publisher Pub. date Year of publication <dc:date.created> Physical Description Degree Degree Name <dc:thesis.degree.name> Season pub (Spring, Summer or Fall) Season of publication <dc:date.issued> Abstract <dc:description.abstract> Rights LCSH (controlled subjects) Controlled terms Will show in one element <dc:subject.lcsh> Author provided subjects (uncontrolled) Keywords Will show in separate elements <dc:subject> Thesis advisor(s) in inverted order For each advisor: LCNA or AACR2 form of (first, sec, thi..) advisor name <dc:contributor.advisor> Univ & Dept. who granted degree NACO controlled Corp Body <dc:thesis.degree.grantor> URL <dc:identifier> Fixed fields ? To further assist you in the successful mapping of metadata fields from your IR submission template to the OAI feed, you can use a crosswalk between the metadata in your template and the metadata in the OAI using the BePress Proprietary schema. Here is an example of a crosswalk I currently use for my institution metadata: the first column contains the description of the desired field for a particular format, the second column would be how to name that field in the DigCom template, and the 3rd column specifies the QDC element that should be used to map and display that field in the OAI feed.
19
Choosing appropriate schema:
Now test your newly customized metadata template by making sure the metadata in it maps correctly and in the way you want to the different feeds in the OAI. The example displayed here shows how the metadata elements in the upload form for ETDs would display in the OAI after some customizations were integrated to the mapping. Check and compare how the same type of descriptive metadata would display in the different schemas: in Simple DC vs. QDC. Which metadata schema provides the best description of your sources? These questions will help you choose which metadata schema would be the most appropriate for each collection.
20
Metadata elements to display or hide from Digital Commons public display:
in Digital commons After making sure all fields you want for a specific collection are there and map correctly to the OAI, then a decision needs to be made on which metadata fields will be displayed to the public in the DigCom public display and which ones will be kept hidden and only be used for administrative purposes. In the example displayed here for ETDs, metadata fields that should be hidden from the public but still can be used for administrative purposes in the OAI are the NACO authorized form of heading for author names, which sometimes contains the author’s birth date. Also the inverse form of advisor’s name because this field is used for harvesting purposes and would not make sense if displayed to the public. Metadata elements that would make sense to be displayed to the public are the abstract, the name of the advisors in direct order, the department, embargo period if any, title and institution and department who awarded the degree.
21
Final thoughts & recommendations:
The more specific and detailed metadata you want to provide, the more granular your metadata schema should be Qualified Dublin Core is a good schema option for providing a high level of granularity in metadata and flexibility Qualified Dublin Core is one of the most common used metadata schema. Because some of the fields in the template will be filled by students and others by library staff, supply instructions of which fields should only be filled by IR administrators. (Ex. NACO headings & LCSH) Remember that good metadata needs to make sense outside its context, so take in consideration the outside users who are not familiar with your institution Finally I want to provide you with some ending thoughts & recommendations for designing your metadata and for choosing the most appropriate schema to display metadata for particular collection: -The more specific and detailed metadata you want to provide, the more granular your metadata schema should be. I noticed from experience that QDC is a good schema option to chose because it provides a higher level of granularity and detail than the general Simple DC, while also provides a higher level of flexibility while been one of the most common used metadata schema out there. According to Professor Jung-ran Park at Drexel University in Philadelphia, an expert in metadata, QDC is one of the most used XML schemas in IRs. In the Journal of Library Metadata she said that: “A trend of Qualified DC being used (40.6 percent) more often than Unqualified DC (25.4 percent) is noteworthy”. She also mentioned in other article that “DC metadata schema is the second most widely employed according to this study, with Qualified DC used by 40.6 percent of responding institutions and Unqualified DC used by 25.4 percent.” Another good reason to use QDC with the Digital Commons repository is that it integrates some elements recommended by the Networked Digital Library of Theses & Dissertations (NDLTD) metadata standard, which allow for the display of ETD appropriate fields. -Another recommendation is that, because some of the fields in the template will be filled by students and others by library staff, instructions such as “only for Adm use” should be mentioned under these fields. (Examples of these fields that should be filled only by lib staff are the NACO authority headings and LCSH). -Finally, remember that good metadata needs to make sense outside its context, so when making the decision on which metadata fields will be displayed into the OAI, consider outside users who are not familiar with your institution.
22
References: BePress. (2015). BePress Proprietary Schema for Qualified Dublin Core. Retrieved from BePress. (2015). Digital Commons and OAI-PMH: Harvesting repository records, 2-6. Retrieved from Park, J.R., & Brenza, A. (2015). Evaluation of semi-automatic metadata generation tools: A survey of the current state of the art. Information Technology and Libraries, 34, (3), 24. doi: /ital.v34i3.5889 Park, J.R. & Tosaka, T. (2010). Metadata creation practices in digital repositories and collections: Schemata, selection criteria, and interoperability. Information Technology and Libraries 29, (3), 108 and 114. doi: Veve, M. (2016). From Digital Commons to OCLC: A Tailored Approach for Harvesting and Transforming ETD Metadata into High-Quality Records. Code4Lib, 33.
23
Contact: m.veve@unf.edu
Questions? Contact:
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.