Presentation is loading. Please wait.

Presentation is loading. Please wait.

 mega, an integrated system for improved access to a digital collection Eric Sieverts Section Innovation & Development or: how to keep up with o: cómo.

Similar presentations


Presentation on theme: " mega, an integrated system for improved access to a digital collection Eric Sieverts Section Innovation & Development or: how to keep up with o: cómo."— Presentation transcript:

1  mega, an integrated system for improved access to a digital collection Eric Sieverts Section Innovation & Development or: how to keep up with o: cómo competir con

2 © eric sieverts UB Utrecht e.sieverts@library.uu.nl http://www.library.uu.nl/medew/it/eric

3 from  (aleph) to  (omega) modern (university) libraries are hybrid libraries physical collection with largely physical services + digital (virtual) collection with largely digital (virtual) services © eric sieverts UB Utrecht e.sieverts@library.uu.nl http://www.library.uu.nl/medew/it/eric

4 physical collection integrated library software (  aleph) regular library catalog with: –cataloging module for catalogers –"online public access catalog" for users suit of coupled administrative modules for: –lending & lenders –serial issues registration –ordering –….. © eric sieverts UB Utrecht e.sieverts@library.uu.nl http://www.library.uu.nl/medew/it/eric

5 characteristics of physical collection "known" administrative processes data for complete "objects" (book, serial, volume) and not for smaller information entities (journal article, book chapter) very limited amounts of textual information digitally available –content metadata: title + keywords / subject headings or codes –formal metadata of the objects –no tables of contents or abstracts or … (yet?)  very limited retrieval possibilities © eric sieverts UB Utrecht e.sieverts@library.uu.nl http://www.library.uu.nl/medew/it/eric characteristics of digital collection different administrative processes –contracts & licences for access and use (instead of physical property) –registration & check on accessibility (instead of receiving physical objects) –check on completeness & format of received "bytes" (instead of checking for damages) –different workflow –…. "items" are mostly separate articles large amounts of text digitally provided / made available  much greater retrieval possibilities

6 © eric sieverts UB Utrecht e.sieverts@library.uu.nl http://www.library.uu.nl/medew/it/eric digital collection

7 still any problems ? administrative different administrative processes are not yet being supported sufficiently by existing library applications retrieval providers of digital content all provide their own separate retrieval systems for just only that content © eric sieverts UB Utrecht e.sieverts@library.uu.nl http://www.library.uu.nl/medew/it/eric

8 retrieval problem not very userfriendly if users need to perform searches for all those systems separately not very userfriendly if all those systems have different search interfaces © eric sieverts UB Utrecht e.sieverts@library.uu.nl http://www.library.uu.nl/medew/it/eric

9

10 retrieval problem not very userfriendly if users need to perform searches for all those systems separately not very userfriendly if all those systems have different search interfaces © eric sieverts UB Utrecht e.sieverts@library.uu.nl http://www.library.uu.nl/medew/it/eric

11

12 will our patrons still sufficiently use our expensive (and important!) library resources ? buscar debe tan fácil como si no, utilizarán solamente the underlying question © eric sieverts UB Utrecht e.sieverts@library.uu.nl http://www.library.uu.nl/medew/it/eric

13 retrieval solution ?

14 two types of approach: meta-search system which sends parallel queries to the search systems of the individual sources (distributed / federated search) the METALIB-approach indexing the content of all sources in your own central search system (local search engine) the OMEGA-approach integration of sources / search systems © eric sieverts UB Utrecht e.sieverts@library.uu.nl http://www.library.uu.nl/medew/it/eric

15 requirements: pertinent material / content must have their own search systems already these search systems must be externally accessible (through internet) or locally hosted possibilities for structured communication with these systems (sending detailed queries, interpreting answers) mostly easily met meta-search solution (most common) © eric sieverts UB Utrecht e.sieverts@library.uu.nl http://www.library.uu.nl/medew/it/eric

16 internet search integrated system: metasearch / portal solution index files search query-generator / result-collector index search index search index Z39.50 internal api httphttp xml Z39.50http configuration data for targets search files

17 also access to full-text resources? mostly no integrated metasearch facility directly within digitally available articles from more publishers simultaneously metasearch mostly restricted to bibliographic databases dynamical "reference linking" from retrieval results (with e.g. SFX) to determine whether there exists any access to digital full-text of retrieved article ( >> still, quite often no full-text is reached ) © eric sieverts UB Utrecht e.sieverts@library.uu.nl http://www.library.uu.nl/medew/it/eric

18 metasearch software (like Metalib) can communicate with various types of search systems: –Z39.50 protocol (especially bibliographic databases) quite standardised, but not very advanced –interaction based on xml (e.g. new SRU-protocol) quite flexible, great expectations but at this moment not yet very widely supported –http-protocol / web-forms("screen-scraping") widely used, but not very structured / nor stable in time –local “legacy”-systems no (open) standards used meta-search solution © eric sieverts UB Utrecht e.sieverts@library.uu.nl http://www.library.uu.nl/medew/it/eric

19 pros: –uniform search interface for all systems –single query formulation –(mostly) integrated presentation of retrieved answers from the various systems –implementation technically not very complicated –no heavy local search system to be managed & hosted –also suitable for content that - for some reason - can not be indexed by your own search engine meta-search solution © eric sieverts UB Utrecht e.sieverts@library.uu.nl http://www.library.uu.nl/medew/it/eric

20 cons: –offers only common denominator of search functionality –no advanced functions available –uniform search functionality not always really uniform –often still no more than 10 databases simultaneously searchable –often complicated configuration specifications (for Z39.50 and for http:url-syntax & screen-scraping) –efforts needed for configuration management meta-search solution © eric sieverts UB Utrecht e.sieverts@library.uu.nl http://www.library.uu.nl/medew/it/eric

21 because of these disadvantages, the future application of metasearch in Utrecht: –primarily for simple searches for first time users –... for guiding users to the right bibliographic databases –and to be used for very fragmented small databases –with recommendation to use more powerfull native search interfaces of important databases directly (silver-platter products, pubmed,...) (will it be able to compete with Google ?) meta-search solution © eric sieverts UB Utrecht e.sieverts@library.uu.nl http://www.library.uu.nl/medew/it/eric

22 examples: UB Utrecht - Omega-system (12 million records) "metadata" of articles from large number of scientific journals from many different publishers and producers, to which Utrecht has full-text access Univ. Michigan - OAIster (6 million records) metadata (Dublin Core) from 600 “institutional repositories" with (scientific) publications, “harvested” through the Open Archive protocol © eric sieverts UB Utrecht e.sieverts@library.uu.nl http://www.library.uu.nl/medew/it/eric local central index (search engine)

23 indexer internet document text files central index search integrated system: local central index solution indexing- rules for targets full-text links document text files  mega

24 pros: guaranteed really uniform retrieval facilities (just a single search engine) possibility to offer advanced retrieval functionality, because we can decide ourselves what search engine to implement and how to configure it integrated uniform result presentation automatically realised local central index © eric sieverts UB Utrecht e.sieverts@library.uu.nl http://www.library.uu.nl/medew/it/eric

25 cons: heavy system (search engine + content) to be hosted and managed requires additional negotiations with publishers to get the metadata nonetheless cannot be realised for all “content” requires local standardisation of structure of the content from different providers (filtering, conversion) …. local central index © eric sieverts UB Utrecht e.sieverts@library.uu.nl http://www.library.uu.nl/medew/it/eric

26 if you can get hold of (almost?) all pertinent searchable "content” (even if only metadata) –to be realised for material from (some / large) publishers (like Elsevier, JStor, Springer, Ebsco, etc) –not to be realised for material from publishers who not (yet) want / understand / are able –not for databases with search systems already associated & intertwined with (access to) the data (like SilverPlatter, CSA, Pubmed, etc) when a local index ? © eric sieverts UB Utrecht e.sieverts@library.uu.nl http://www.library.uu.nl/medew/it/eric

27 the Utrecht solution:  mega integrated system with: administrative modules specifically adapted to our digital requirements (MySQL) (almost) separate from the regular library system metadata repository for storage of data (in XML) which is received from publishers and other providers search engine indexing as many publications as possible, for which we have a licence for full-text access –from (large) publishers providing us with "metadata" –from selected universitary repositories –our own full-text material (dissertations, local articles, …) © eric sieverts UB Utrecht e.sieverts@library.uu.nl http://www.library.uu.nl/medew/it/eric

28 the Utrecht solution:  mega design of the administrative modules largely based on detailed analysis of the workflows of the various processes and on requirements of the administrative staff © eric sieverts UB Utrecht e.sieverts@library.uu.nl http://www.library.uu.nl/medew/it/eric

29  mega-search strategic spearhead © eric sieverts UB Utrecht e.sieverts@library.uu.nl http://www.library.uu.nl/medew/it/eric because (again) : buscar debe tan fácil como si no, utilizarán solamente

30 the Utrecht solution:  mega © eric sieverts UB Utrecht e.sieverts@library.uu.nl http://www.library.uu.nl/medew/it/eric unique selling points towards the users: –integrated uniform access to all full-text material to which Utrecht can get access (all you find, you get full-text on your screen) –advanced retrieval functionality (beyond simple boolean) for at least titles, authors and abstracts of journal articles & other material, linking to the full-text (present >12 million records are not yet all material) –browsable access: complete list with >7000 journal titles linking to full-text thru TOCs (partly still on publishers sites) –integrated current awareness, shopping cart, bookshelve, etc. good competition with Google ! (?)

31 © eric sieverts UB Utrecht e.sieverts@library.uu.nl http://www.library.uu.nl/medew/it/eric why using Autonomy ?

32 why using Autonomy search engine? for some years we had a working prototype of our search system for the public already the old search engine should be replaced (missing functionality, no further support & development) careful selection path for new software © eric sieverts UB Utrecht e.sieverts@library.uu.nl http://www.library.uu.nl/medew/it/eric

33 selection path long list apr autonomy collexis convera eidetica fast fulcrum google inxight irion northernlight verity k2 verity ultraseek © eric sieverts UB Utrecht e.sieverts@library.uu.nl http://www.library.uu.nl/medew/it/eric short list autonomy irion verity k2 func- tional require- ments func- tional require- ments request for infor- mation request for infor- mation omega proof of concept proof of concept

34 proof of concept 3 prototypes, each with (same) 1 million documents tested by team of subject specialists: –emphasis on search functionality –probabilistic search & relevance ranking –quality of language technology word stemming fuzzy search –analysis & comparison of search results investigated by ICT team, a.o.: –"accessibility" and maturity of software –experience of other users –what to be developed by ourselves? © eric sieverts UB Utrecht e.sieverts@library.uu.nl http://www.library.uu.nl/medew/it/eric

35 how to proceed?

36 line of action 1.rebuild the old system with Autonomy, offering users their familiar look-and-feel 2.implement required additional functionality 3.implement new user interface, based on user survey © eric sieverts UB Utrecht e.sieverts@library.uu.nl http://www.library.uu.nl/medew/it/eric

37

38

39

40

41

42

43

44 filters on additional formal characteristics (parametric search) personal query-alerts personal bookshelf step 2: implementing additional functionality already desired for long step 3: realising new user interface based on user survey "usability" study

45 sneak preview of new interface design classified - confidential

46 future decisions what additional functionality and possibilities (offered by the Autonomy software), to include in our user interface –relevance feedback –more-like-this –concept and term extraction –result clustering & visualisation –autoclassification © eric sieverts UB Utrecht e.sieverts@library.uu.nl http://www.library.uu.nl/medew/it/eric

47 questions ?


Download ppt " mega, an integrated system for improved access to a digital collection Eric Sieverts Section Innovation & Development or: how to keep up with o: cómo."

Similar presentations


Ads by Google