Download presentation
Presentation is loading. Please wait.
1
Accommodating Diverse Search Requirements over a Fedora Repository Michael Durbin and Jon W. Dunn Fedora User Group – Open Repositories 2008 April 3, 2008
2
July 16, 2015Fedora Users Group - Open Repositories 2008 Background oIndiana University Digital Library Program Started in 1997 oDiversity of formats and collections Text, image, musical scores, audio, video, … oDiversity of search systems DLXS, XTF, Lucene, DB2 NSE, Oracle Text oCurrent project to unify architecture for storage, discovery, and delivery around Fedora
3
Search System Development oPhase one: create a search architecture and template for an image based search and discovery application oPhase two: extend the template and architecture to support more advanced search and discovery applications over different object types July 16, 2015Fedora Users Group - Open Repositories 2008
4
PHASE I: CREATING A BASIC IMAGE SEARCH July 16, 2015Fedora Users Group - Open Repositories 2008
5
Phase One: Simple Image Search oSlocum puzzle collection: ideal test case oSmall number of objects oSimple content model Each object represents a single physical puzzle Basic metadata: METS, MODS, DC RELS-EXT isMemberOf relationship with a collection object Pre-scaled derivative images July 16, 2015Fedora Users Group - Open Repositories 2008
6
July 16, 2015Fedora Users Group - Open Repositories 2008
7
Requirements: Identifier Resolution oExternal Identifiers rather than Fedora PIDs Seamless migration to Fedora No commitment to any underlying repository architecture oRequirement: Quickly resolve our identifier (PURL) to the Fedora PID July 16, 2015Fedora Users Group - Open Repositories 2008
8
Requirements: PURL Identifier Resolution July 16, 2015Fedora Users Group - Open Repositories 2008 Hypothetical ID Resolution Service OCLC PURL Resolver http://fedora.dlib.indiana.edu:8080/fedora/get/iudl:19794/THUMBNAIL http://purl.dlib.indiana.edu/iudl/lilly/slocum/thumbnail/LL-SLO-004696
9
Requirements: Keyword and Fielded Search oVery basic search requirements for any discovery and delivery web application Keyword search should maximize discovery MODS fields should be searchable to maximize accuracy of matches Search results paging Support for simple Boolean operators Wildcard searches are a requirement Full metadata record (MODS) returned July 16, 2015Fedora Users Group - Open Repositories 2008
10
Remaining Requirements oUser interface Extensible, Reusable, Customizable oService oriented approach Centralize core search system Standards-based access for integration with other services and end-user tools July 16, 2015Fedora Users Group - Open Repositories 2008
11
Requirements: Search System July 16, 2015Fedora Users Group - Open Repositories 2008 PURL Resolution Fielded Search Fedora Integration Slocum Webapp Generic Search Webapp UI LayerSearch Layer
12
Solutions: Search Protocol oSearch and Retrieve via URL (SRU) One of very few standard search protocols Extremely powerful and flexible query language (CQL) Can return records of any type Most commonly used with DC, MODS, MARCXML Has mechanisms for extension in case special needs arise July 16, 2015Fedora Users Group - Open Repositories 2008
13
Search System Solutions: SRU July 16, 2015Fedora Users Group - Open Repositories 2008 PURL Resolution Fielded Search Fedora Integration Slocum Webapp Generic Search Webapp SRU UI LayerSearch Layer
14
Solutions: Existing Products oFedora Search Good for finding items based on basic Fedora metadata, but not for more sophisticated searching oFedora Resource Index Search Also limited to searching basic metadata, not the content of datastreams July 16, 2015Fedora Users Group - Open Repositories 2008
15
Solutions: Existing Products oFedora Generic Search Service (GSearch) Hooks into Fedora Works with Lucene Easy to customize search fields though XSLT transformation of existing metadata oOCLC SRU/W Implementation Relatively complete implementation in Java, with ongoing development Others have had success using with Lucene July 16, 2015Fedora Users Group - Open Repositories 2008
16
Search System July 16, 2015Fedora Users Group - Open Repositories 2008 index OCLC SRU Implementation Lucene Database extension Fedora Generic Search Service Reads Updates SRU
17
Phase 1 Solution: General Applicability oPieces of this solution have been used for other image collections oSRU is used to expose these collections to OneSearch@IU, our federated search service oThe XSLT that assigned metadata to Lucene index fields was a solid base for the indexing needs of other collections. July 16, 2015Fedora Users Group - Open Repositories 2008
18
Phase 1 Solution: Lingering Problems oOur XSLT for the Generic Search Service wasn’t perfect oSome complications prevented full automation oWe punted on getting the perfect Lucene analyzer configuration July 16, 2015Fedora Users Group - Open Repositories 2008
19
PHASE II: EXTENDING FOR DIFFERENT COLLECTIONS July 16, 2015Fedora Users Group - Open Repositories 2008
20
EVIA Digital Archive July 16, 2015Fedora Users Group - Open Repositories 2008
21
Requirement: EVIADA Video Annotation Collection July 16, 2015Fedora Users Group - Open Repositories 2008 Video Object Field Collection Object Custom Annotation Software Field Collection
22
Requirement: EVIADA Video Annotation Collection oComplex Data model One Fedora object which is addressable and discoverable in parts oNew features Faceted Search and Browse Extensive custom fields July 16, 2015Fedora Users Group - Open Repositories 2008
23
Requirements: IN Harmony Sheet Music Collection July 16, 2015Fedora Users Group - Open Repositories 2008
24
Requirements: IN Harmony Sheet Music Collection oComplex Content model Three types of objects below the collection Sheet music Individual Score Page Image July 16, 2015Fedora Users Group - Open Repositories 2008 Chariot Race March
25
Requirements: IN Harmony Sheet Music Collection oNew Features Faceted Search and Browse Exact match searches Date range searches Dozens of very specific fields Sorting by date or title July 16, 2015Fedora Users Group - Open Repositories 2008
26
Options: oExtend our existing implementation All too appealing because of familiarity and “sunk costs” Major conflicts between existing model and desired model could result in unmaintainable “hackish” implementations July 16, 2015Fedora Users Group - Open Repositories 2008 oSwitch to a new infrastructure Would be great, if something existed that met our needs without having to rework everything oSome combination Best of both worlds?
27
Options: Faceted Search and Browse oUse Solr Built-in support for facets Is a service layer with an XML response But do we really want to abandon SRU, or maintain two search service protocols? July 16, 2015Fedora Users Group - Open Repositories 2008
28
Options: Faceted Search and Browse oExtend SRU Implementation Prevents the need for yet another service layer Has wide reuse potential Could be backed by Solr without substantially more effort. July 16, 2015Fedora Users Group - Open Repositories 2008
29
Solution: Faceted Search over SRU July 16, 2015Fedora Users Group - Open Repositories 2008 SRU Service (now with facet support)
30
Solution: Other SRU Improvements oMore complete CQL support Easy Improvements Operators (and, or, not, any, all) Application-specific fields July 16, 2015Fedora Users Group - Open Repositories 2008
31
Solutions: Other SRU Improvements oMore complete CQL support Difficult Improvements “cql.exact” relation facet implementation sort support July 16, 2015Fedora Users Group - Open Repositories 2008 dc.subject exact “United Kingdom” index dc.subject dc.subject.exact dc.subject dc.subject.sort
32
Options: Index Generation July 16, 2015Fedora Users Group - Open Repositories 2008 Fedora Generic Search Service Homegrown Solution
33
Reconsideration: GSearch oLimited by the one to one relationship between Lucene documents and fedora objects oStoring valid XML in CDATA to be stored in Lucene is messy and is prone to error as the metadata becomes more diverse oWe really only use it to generate a Lucene index July 16, 2015Fedora Users Group - Open Repositories 2008
34
Consideration: Solr oRobust wrapper for Lucene Exposes service to update index Exposes search features as a service Abstracts away much of the of complexities of Lucene oMigrating existing search indexes would be prohibitively time consuming, but it might be the best tool to bring up new collections July 16, 2015Fedora Users Group - Open Repositories 2008
35
Solution: Custom index service oA service whose initial functionality is simply to create and maintain Lucene Index directories that are served by SRU. Can easily be extended/configured to use different search engines or to delegate the process entirely (perhaps to Solr) oSupport for existing GSearch style XSLT oSimple Java interface to allow for easy index implementations. July 16, 2015Fedora Users Group - Open Repositories 2008
36
Search Service July 16, 2015Fedora Users Group - Open Repositories 2008 index OCLC SRU Implementation Lucene Database – configured for quick id resolution Custom Index Service Lucene Database – configured for basic search index Basic Index Writer GSearch Style XSLT Index Writer Lucene Database – configured for advanced search New Style XSLT Index Writer Compound Model Java Index Writer index Lucene Database – configured for compound model searches
37
Search Service July 16, 2015Fedora Users Group - Open Repositories 2008 index OCLC SRU Implementation Lucene Database – configured for quick id resolution Custom Index Service Lucene Database – configured for basic search index Basic Index Writer G Search Style XSTL Index Writer Lucene Database – configured for advanced search New Style XSTL Index Writer Compound Model Java Index Writer index Lucene Database – configured for compound model searches Solr Database – configured to interface with solr. Solr Solr Wrapping Index
38
Future Plans oFull Text searching Search text of entire books or journals Determine where in the hierarchy the match occurred Provide snippets with highlighted matches in context for the search results listing oSolutions XTF, Solr through our custom index service July 16, 2015Fedora Users Group - Open Repositories 2008
39
Conclusion oMost of the work is configuring the index which is a requirement that cannot be avoided. oMigration doesn’t have to be difficult or disruptive oAlways be willing and able to consider new products and technologies July 16, 2015Fedora Users Group - Open Repositories 2008
40
Thanks! Any Questions? owww.dlib.indiana.eduwww.dlib.indiana.edu owiki.dlib.indiana.edu/confluence/x/AQIwiki.dlib.indiana.edu/confluence/x/AQI omidurbin@indiana.edu ojwd@indiana.edu July 16, 2015Fedora Users Group - Open Repositories 2008
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.