Download presentation
Presentation is loading. Please wait.
Published byKristopher Nelson Modified over 8 years ago
1
mega, an integrated system for improved access to a digital collection Eric Sieverts Section Innovation & Development or: how to keep up with o: cómo competir con
2
© eric sieverts UB Utrecht e.sieverts@library.uu.nl http://www.library.uu.nl/medew/it/eric
3
from (aleph) to (omega) modern (university) libraries are hybrid libraries physical collection with largely physical services + digital (virtual) collection with largely digital (virtual) services © eric sieverts UB Utrecht e.sieverts@library.uu.nl http://www.library.uu.nl/medew/it/eric
4
physical collection integrated library software ( aleph) regular library catalog with: –cataloging module for catalogers –"online public access catalog" for users suit of coupled administrative modules for: –lending & lenders –serial issues registration –ordering –….. © eric sieverts UB Utrecht e.sieverts@library.uu.nl http://www.library.uu.nl/medew/it/eric
5
characteristics of physical collection "known" administrative processes data for complete "objects" (book, serial, volume) and not for smaller information entities (journal article, book chapter) very limited amounts of textual information digitally available –content metadata: title + keywords / subject headings or codes –formal metadata of the objects –no tables of contents or abstracts or … (yet?) very limited retrieval possibilities © eric sieverts UB Utrecht e.sieverts@library.uu.nl http://www.library.uu.nl/medew/it/eric characteristics of digital collection different administrative processes –contracts & licences for access and use (instead of physical property) –registration & check on accessibility (instead of receiving physical objects) –check on completeness & format of received "bytes" (instead of checking for damages) –different workflow –…. "items" are mostly separate articles large amounts of text digitally provided / made available much greater retrieval possibilities
6
© eric sieverts UB Utrecht e.sieverts@library.uu.nl http://www.library.uu.nl/medew/it/eric digital collection
7
still any problems ? administrative different administrative processes are not yet being supported sufficiently by existing library applications retrieval providers of digital content all provide their own separate retrieval systems for just only that content © eric sieverts UB Utrecht e.sieverts@library.uu.nl http://www.library.uu.nl/medew/it/eric
8
retrieval problem not very userfriendly if users need to perform searches for all those systems separately not very userfriendly if all those systems have different search interfaces © eric sieverts UB Utrecht e.sieverts@library.uu.nl http://www.library.uu.nl/medew/it/eric
10
retrieval problem not very userfriendly if users need to perform searches for all those systems separately not very userfriendly if all those systems have different search interfaces © eric sieverts UB Utrecht e.sieverts@library.uu.nl http://www.library.uu.nl/medew/it/eric
12
will our patrons still sufficiently use our expensive (and important!) library resources ? buscar debe tan fácil como si no, utilizarán solamente the underlying question © eric sieverts UB Utrecht e.sieverts@library.uu.nl http://www.library.uu.nl/medew/it/eric
13
retrieval solution ?
14
two types of approach: meta-search system which sends parallel queries to the search systems of the individual sources (distributed / federated search) the METALIB-approach indexing the content of all sources in your own central search system (local search engine) the OMEGA-approach integration of sources / search systems © eric sieverts UB Utrecht e.sieverts@library.uu.nl http://www.library.uu.nl/medew/it/eric
15
requirements: pertinent material / content must have their own search systems already these search systems must be externally accessible (through internet) or locally hosted possibilities for structured communication with these systems (sending detailed queries, interpreting answers) mostly easily met meta-search solution (most common) © eric sieverts UB Utrecht e.sieverts@library.uu.nl http://www.library.uu.nl/medew/it/eric
16
internet search integrated system: metasearch / portal solution index files search query-generator / result-collector index search index search index Z39.50 internal api httphttp xml Z39.50http configuration data for targets search files
17
also access to full-text resources? mostly no integrated metasearch facility directly within digitally available articles from more publishers simultaneously metasearch mostly restricted to bibliographic databases dynamical "reference linking" from retrieval results (with e.g. SFX) to determine whether there exists any access to digital full-text of retrieved article ( >> still, quite often no full-text is reached ) © eric sieverts UB Utrecht e.sieverts@library.uu.nl http://www.library.uu.nl/medew/it/eric
18
metasearch software (like Metalib) can communicate with various types of search systems: –Z39.50 protocol (especially bibliographic databases) quite standardised, but not very advanced –interaction based on xml (e.g. new SRU-protocol) quite flexible, great expectations but at this moment not yet very widely supported –http-protocol / web-forms("screen-scraping") widely used, but not very structured / nor stable in time –local “legacy”-systems no (open) standards used meta-search solution © eric sieverts UB Utrecht e.sieverts@library.uu.nl http://www.library.uu.nl/medew/it/eric
19
pros: –uniform search interface for all systems –single query formulation –(mostly) integrated presentation of retrieved answers from the various systems –implementation technically not very complicated –no heavy local search system to be managed & hosted –also suitable for content that - for some reason - can not be indexed by your own search engine meta-search solution © eric sieverts UB Utrecht e.sieverts@library.uu.nl http://www.library.uu.nl/medew/it/eric
20
cons: –offers only common denominator of search functionality –no advanced functions available –uniform search functionality not always really uniform –often still no more than 10 databases simultaneously searchable –often complicated configuration specifications (for Z39.50 and for http:url-syntax & screen-scraping) –efforts needed for configuration management meta-search solution © eric sieverts UB Utrecht e.sieverts@library.uu.nl http://www.library.uu.nl/medew/it/eric
21
because of these disadvantages, the future application of metasearch in Utrecht: –primarily for simple searches for first time users –... for guiding users to the right bibliographic databases –and to be used for very fragmented small databases –with recommendation to use more powerfull native search interfaces of important databases directly (silver-platter products, pubmed,...) (will it be able to compete with Google ?) meta-search solution © eric sieverts UB Utrecht e.sieverts@library.uu.nl http://www.library.uu.nl/medew/it/eric
22
examples: UB Utrecht - Omega-system (12 million records) "metadata" of articles from large number of scientific journals from many different publishers and producers, to which Utrecht has full-text access Univ. Michigan - OAIster (6 million records) metadata (Dublin Core) from 600 “institutional repositories" with (scientific) publications, “harvested” through the Open Archive protocol © eric sieverts UB Utrecht e.sieverts@library.uu.nl http://www.library.uu.nl/medew/it/eric local central index (search engine)
23
indexer internet document text files central index search integrated system: local central index solution indexing- rules for targets full-text links document text files mega
24
pros: guaranteed really uniform retrieval facilities (just a single search engine) possibility to offer advanced retrieval functionality, because we can decide ourselves what search engine to implement and how to configure it integrated uniform result presentation automatically realised local central index © eric sieverts UB Utrecht e.sieverts@library.uu.nl http://www.library.uu.nl/medew/it/eric
25
cons: heavy system (search engine + content) to be hosted and managed requires additional negotiations with publishers to get the metadata nonetheless cannot be realised for all “content” requires local standardisation of structure of the content from different providers (filtering, conversion) …. local central index © eric sieverts UB Utrecht e.sieverts@library.uu.nl http://www.library.uu.nl/medew/it/eric
26
if you can get hold of (almost?) all pertinent searchable "content” (even if only metadata) –to be realised for material from (some / large) publishers (like Elsevier, JStor, Springer, Ebsco, etc) –not to be realised for material from publishers who not (yet) want / understand / are able –not for databases with search systems already associated & intertwined with (access to) the data (like SilverPlatter, CSA, Pubmed, etc) when a local index ? © eric sieverts UB Utrecht e.sieverts@library.uu.nl http://www.library.uu.nl/medew/it/eric
27
the Utrecht solution: mega integrated system with: administrative modules specifically adapted to our digital requirements (MySQL) (almost) separate from the regular library system metadata repository for storage of data (in XML) which is received from publishers and other providers search engine indexing as many publications as possible, for which we have a licence for full-text access –from (large) publishers providing us with "metadata" –from selected universitary repositories –our own full-text material (dissertations, local articles, …) © eric sieverts UB Utrecht e.sieverts@library.uu.nl http://www.library.uu.nl/medew/it/eric
28
the Utrecht solution: mega design of the administrative modules largely based on detailed analysis of the workflows of the various processes and on requirements of the administrative staff © eric sieverts UB Utrecht e.sieverts@library.uu.nl http://www.library.uu.nl/medew/it/eric
29
mega-search strategic spearhead © eric sieverts UB Utrecht e.sieverts@library.uu.nl http://www.library.uu.nl/medew/it/eric because (again) : buscar debe tan fácil como si no, utilizarán solamente
30
the Utrecht solution: mega © eric sieverts UB Utrecht e.sieverts@library.uu.nl http://www.library.uu.nl/medew/it/eric unique selling points towards the users: –integrated uniform access to all full-text material to which Utrecht can get access (all you find, you get full-text on your screen) –advanced retrieval functionality (beyond simple boolean) for at least titles, authors and abstracts of journal articles & other material, linking to the full-text (present >12 million records are not yet all material) –browsable access: complete list with >7000 journal titles linking to full-text thru TOCs (partly still on publishers sites) –integrated current awareness, shopping cart, bookshelve, etc. good competition with Google ! (?)
31
© eric sieverts UB Utrecht e.sieverts@library.uu.nl http://www.library.uu.nl/medew/it/eric why using Autonomy ?
32
why using Autonomy search engine? for some years we had a working prototype of our search system for the public already the old search engine should be replaced (missing functionality, no further support & development) careful selection path for new software © eric sieverts UB Utrecht e.sieverts@library.uu.nl http://www.library.uu.nl/medew/it/eric
33
selection path long list apr autonomy collexis convera eidetica fast fulcrum google inxight irion northernlight verity k2 verity ultraseek © eric sieverts UB Utrecht e.sieverts@library.uu.nl http://www.library.uu.nl/medew/it/eric short list autonomy irion verity k2 func- tional require- ments func- tional require- ments request for infor- mation request for infor- mation omega proof of concept proof of concept
34
proof of concept 3 prototypes, each with (same) 1 million documents tested by team of subject specialists: –emphasis on search functionality –probabilistic search & relevance ranking –quality of language technology word stemming fuzzy search –analysis & comparison of search results investigated by ICT team, a.o.: –"accessibility" and maturity of software –experience of other users –what to be developed by ourselves? © eric sieverts UB Utrecht e.sieverts@library.uu.nl http://www.library.uu.nl/medew/it/eric
35
how to proceed?
36
line of action 1.rebuild the old system with Autonomy, offering users their familiar look-and-feel 2.implement required additional functionality 3.implement new user interface, based on user survey © eric sieverts UB Utrecht e.sieverts@library.uu.nl http://www.library.uu.nl/medew/it/eric
44
filters on additional formal characteristics (parametric search) personal query-alerts personal bookshelf step 2: implementing additional functionality already desired for long step 3: realising new user interface based on user survey "usability" study
45
sneak preview of new interface design classified - confidential
46
future decisions what additional functionality and possibilities (offered by the Autonomy software), to include in our user interface –relevance feedback –more-like-this –concept and term extraction –result clustering & visualisation –autoclassification © eric sieverts UB Utrecht e.sieverts@library.uu.nl http://www.library.uu.nl/medew/it/eric
47
questions ?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.