Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lucas Mak & Lisa Lorenzo, Michigan State University Libraries

Similar presentations


Presentation on theme: "Lucas Mak & Lisa Lorenzo, Michigan State University Libraries"— Presentation transcript:

1 Lucas Mak & Lisa Lorenzo, Michigan State University Libraries
How to get FAST fast? Automating LCSH to FAST heading conversion with FAST Linked Data API Lucas Mak & Lisa Lorenzo, Michigan State University Libraries

2 Background The MSUL Digital Repository is an Fedora repository with an Islandora front end Metadata comes from a variety of sources, sometimes with LCSH Great! But long subject strings didn’t display nicely on item pages and were definitely too long to use in facets Recent metadata normalization project--decision to use FAST headings across all collections within the repository Michigan State University Libraries manages a Fedora digital repository that contains a variety of collections including electronic theses and dissertations, unique cultural heritage collections, and periodicals from the library’s turfgrass collection. These resources come from different places from within the library or university, so metadata comes to the repository in various formats and levels of detail. A few collections come from the library catalog and have Library of Congress Subject Headings, which is fortunate since that provides rich descriptive data for those items. Initially, the repository team also described other collections not from the catalog with LCSH as well, but found that especially long, compound subject strings didn’t display well in the repository. The text would often wrap to the next line, and that combined with design limitations of the Islandora interface made subject display confusing when it wasn’t clear where each individual subject string started and ended. The length and complexity of the strings also didn’t lend themselves to use in search facets that are meant to fit into a narrow column and work best when they convey broader subject areas than a complex and granular compound subject string (think about facets on an e-commerce websites where categories are usually broad ‘shirts’ or ‘pants’ rather than ‘red shirts in size small’). Recently, the digital repository moved from having each collection on its own, separate site to having all collections located on the same website. In order to facilitate search and display with this new infrastructure, we began a metadata normalization project to ensure that we were using our main descriptive metadata format, MODS, in the same way across all of our collections. Part of this project included the decision to use FAST headings in each collection, either exclusively or alongside LCSH. Generally, though, even if our records have LCSH, we only display FAST headings. As of now, subjects only appear as part of the metadata display on item pages, but we’re planning on using them in facets in the future.

3 FAST for linked data Easier to assign URIs to subject fragments than long strings Example: 650 _0 $a Women veterans $z United States. id.loc.gov Women veterans--United States → No match FAST conversion Women veterans--United States → Women veterans (OCoLC)fst → United States (OCoLC)fst In addition to the display issues I mentioned earlier, it’s also easier to assign URIs to FAST headings than to LCSH. This is because often the complex subject strings we can generate with LCSH don’t always have exact matches in the Library of Congress’s linked data service, id.loc.gov, and there isn’t really a great way to clearly assign URIs to different parts of a compound subject string. Conversely, with FAST, any subject included in that vocabulary will have a URI that is an exact match for the entire string. As you can see in this example, the subject heading Women veterans with the geographic subdivision for United States doesn’t return a URI match with id.loc.gov, but using the FAST converter splits the heading into two parts, each of which have a URI.

4 FAST for linked data Easier to assign URIs to subject fragments than long strings Example: 650 _0 $a Women veterans $z United States. <mods:subject valueURI=" <mods:topic>Women veterans</mods:topic> </mods:subject> <mods:subject valueURI=" <mods:geographic>United States</mods:topic> This is what that ends up looking like in MODS in our repository. These URIs were used in an experimental linked data project where we dynamically created JSON-LD records in order to improve our repository’s ranking and quality of text within Google search results.

5 In the not so distant past …
At the beginning, we used OCLC’s FAST converter to flip LCSH into FAST headings. However, there were two main issues: MARC is the only allowed input format. First of all, not every digital collection has MARC records. Secondly, as we moved away from generating MarcXML records for our digital collections, this tool would render not usable to us. An extra step outside the ingest process. This tool can’t be incorporated into our relatively automated ingest process. Not sustainable in the long run if we need to ingest more and more digital objects

6 So, what to do … OCLC provides an API access to its FAST Linked Data Service which allows searching through the SRU (Search/Retrieval via URL) protocol. Since this is an XML-based protocol, we can incorporate this conversion process into our metadata creation process, and hence the ingestion process.

7 Implementation LCSH/LCGFT Headings MarcXML MODS FAST Headings
The XSLT script extracts LCSH from MODS and MarcXML records, searchs LCSH/LCGFT strings against FAST dataset through the Linked Data API, retrieves XML results and verifies matches, and plugs matched FAST headings back into source XML records

8 Searching the API Index Description MARC oclc.altlc LC Source headings
Entire string oclc.topic Keywords in topical headings 650 $a, $x 6XX $x oclc.geographic Keywords in geographical headings 6XX $z 651 $a, $z oclc.eventName Keywords in event headings 650 $a 611 oclc.personalName Keywords in personal headings 600 $a, $b, $c, $d, $q oclc.corporateName Keywords in corporate name headings 610 $a, $b oclc.form Keywords in form headings 655$a 6XX$v The API provides a number of indexes that we can search against. Each subject string has to be broken down into its constituting strings based on their types. Search index is picked based on the type of the constituting string. Also we will try to match the entire string when necessary Adapted from

9 Searching the API Breaking down a LCSH string Examples:
650 _0 $a Women veterans $z United States Women veterans → oclc.topic United States → oclc.geographic 650_0 $a World War, $v Personal narratives, American World War, → oclc.event Personal narratives, American→ oclc.form or World War, Personal narratives, American  oclc.altlc Since 650$a can either be an event or a topic, 650$a is to be searched against both “topic” and “event” indexes

10 Verifying Search Results
Search string: Military nursing We have to sort through a bunch of FAST authority records come back from the search. First, the script will compare the 1XX in the FAST record against the search string. In this case, it is a match both to the 150 and the corresponding 750. At the same time, we will also get the 001 and turn it into a URI

11 Verifying Search Results
Search string: World War, Veterans But sometimes, the search will pull up FAST headings that are obsolete. Luckily, when a heading is obsolete, there are links to the current authorized FAST headings. In this case, since “World War ( )” is an event while “Veterans” is a topic, this topical LCSH heading is split into two FAST headings, plus a chronological facet

12 UMI conversion table In addition to converting LCSH into FAST, we also added another element to the FAST conversion process for our electronic theses and dissertations collection. In some cases, ETD records in the repository have not yet passed through cataloging and therefore don’t yet have MARC records in our catalog. In this case, we convert records from ProQuest that contain UMI headings, which is a controlled vocabulary of around 400 academic subjects created by ProQuest. Since their records don’t contain LCSH, they would be bypassed by the process of querying OCLC’s API and creating FAST headings. To address this, we created an additional process within our XSLT for these records that includes a conversion table for flipping UMI headings directly into FAST. This conversion table had to be created manually. This was time consuming and presented a few challenges, first of which was finding a comprehensive list of UMI headings. These change from year to year and can be found in ProQuest’s dissertation submission forms ( : via There wasn’t always an obvious conversion from UMI to FAST, so some of the headings are close but not exact. Some UMI headings were best expressed by multiple FAST headings, such as “Ancient languages” which we decided to convert to two headings: “Language and languages” and “History, Ancient.”

13 Limitations & Observations
Challenge: NACO personal names without widespread use aren’t included in FAST Example: 610 _0 $a Harrison, Dorothy M. $q (Dorothy McDonald), $d id.loc.gov Harrison, Dorothy M. (Dorothy McDonald), → id.loc.gov/authorities/names/no FAST conversion Harrison, Dorothy M. (Dorothy McDonald), → No match We’ve come across one relatively significant limitation in using exclusively FAST headings for display, and that is that not all names that are included in LCSH are part of the FAST vocabulary. This is especially obvious in our oral history collection, the Women’s Overseas Service League Oral History Project. Even though the names in those records have NACO authorities, since they only appear in our unique resources and not across a variety of records in WorldCat, they haven’t been added to FAST. Since our repository is indexed using Apache Solr, we worked around this by creating an indexed field that combines FAST topics and geographic headings with LCSH names in order to display them together on item pages.

14 Limitations & Observations
Individual conference/sporting event instances FAST treats them as events No qualifier unless there is a conflict, name does not convey the meaning of the conference, or type of event not clear from the name* Example: ǂa Olympic Games ǂn (10th : ǂd 1932 : ǂc Los Angeles, Calif.) → ǂa Olympic Games. ǂ2 fast ǂ0 (OCoLC)fst Personal names rarely used as subjects don’t get their FAST heading. Individual conference or sporting event names are flipped into their collective names in OCLC WorldCat * Chan, Lois Mai & O’neill, Edward T. (2010). FAST: faceted application of subject terminology: principles and applications. Santa Barbara, CA: Libraries Unlimited. p. 107

15 Limitations & Observations
LCSH free floating subdivisions may not have a corresponding stand alone FAST heading Some must be used as a topical subdivision in FAST Example: “Officials and employees” Courts -- Officials and employees ǂ0 (DLC)sh → 650 _7 Courts ǂx Officials and employees. ǂ2 fast ǂ0 (OCoLC)fst Washington (D.C.) -- Officials and employees → 651 _7 Washington (D.C.) ǂ2 fast ǂ0 (OCoLC)fst 650 _7 Employees. ǂ2 fast ǂ0 (OCoLC)fst OCLC seems having extra algorithm/logic to convert free floating subdivisions in WorldCat

16 Limitations & Observations
Topical vs Form subdivision Example: “Trials, litigation, etc.” No corresponding topical FAST heading Has corresponding form FAST heading ( ǂa Simpson, O. J., ǂd ǂx Trials, litigation, etc. → ǂa Simpson, O. J ., ǂd ǂ2 fast ǂ0 (OCoLC)fst 650 _7 ǂa Trials. ǂ2 fast ǂ0 (OCoLC)fst Some LCSH free floating subdivisions can be used as both topical and form. However, only the form subdivision gets a FAST equivalent. In this case, “Trials, litigation, etc.” can be used as either a form or a topical subdivision per Subject Cataloging Manual. However, only the form heading has its corresponding FAST heading. If the item is about the trial of a person, OCLC flips “Trials, litigation, etc.” to “Trials” in WorldCat

17 Outcome & Next Step Ran the conversion for ETD and another repository collection 13,760 FAST subjects generated for ETDs 6800+ unique FAST subjects in the repository Further refine result verification step Occasional mismatches not get caught Linked data FAST subject browser Using <skos:related> elements in FAST RDF files to build browsing functionalities Integrating Islandora discovery with local catalog


Download ppt "Lucas Mak & Lisa Lorenzo, Michigan State University Libraries"

Similar presentations


Ads by Google