Download presentation
Presentation is loading. Please wait.
1
A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon Kadury & Elina Masevich
2
2 A.Frank Presentation motivation Ad hoc and unsound use of Search Engines (SEs) does not help for retrieval of quality information on the Web. Digital Libraries (DLs), on the other hand, provide high quality information retrieval of authoritative results, especially when doing exploratory search. However, the awareness and discovery of DLs on the Web are still lacking. So what can be done about it?
3
3 A.Frank Contents SEs vs. DLs?! DL Definition/Types How to tilt the balance of SE/DL use? SELFDL Model/Architecture RIDDLE Model/Architecture Future directions
4
4 A.Frank Google/SE Awareness
5
5 A.Frank So how to overcome Googlism?!
6
6 A.Frank Often heard sayings “What – is there something to search with besides search engines?” “Sure I know all about search engines – I always use google.” “Sure I know all about directories – I always use yahoo!” “Sorry, never heard about digital libraries.” “Listen, I’m used to classical libraries.” “I can find only E-books in a digital library, no?”
7
7 A.Frank Digital Library Vision?!
8
8 A.Frank Sample list of Digital Libraries LOC - Library of Congress American Memory ( http://memory.loc.gov/ammem/ ) http://memory.loc.gov/ammem/ NSDL - National Science DL (http://nsdl.org)http://nsdl.org IPL - Internet Public Library (http://www.ipl.org)http://www.ipl.org CDL - California DL (http://www.cdlib.org)http://www.cdlib.org ADL – Alexandria DL ( http://www.alexandria.ucsb.edu ) http://www.alexandria.ucsb.edu BL - British Library (http://www.bl.uk/)http://www.bl.uk/ NZDL – New Zealand DL (http://www.nzdl.org/ )http://www.nzdl.org/ Einstein Archives Online ( http://www.alberteinstein.info/ ) http://www.alberteinstein.info/
9
9 A.Frank Web Index D irectory Search Engines Which kind to use? The right one Which kind to use? The right one S earch E ngine General SpecialtyGeneralSpecialty Meta-S earch E ngine
10
10 A.Frank When not to use SEs? You know it all. You prefer asking friends (or paid experts ). You know the Web site for it (and didn’t forget the exact URL or have auto-completion or bookmark or can access through another known site). You already found a specific/relevant digital library or database (maybe in Invisible Web). Tired of paid inclusions, SE spamming, and sponsored commercial results. Tired of chasing down useless URLs.
11
11 A.Frank When to use an Index? Need to search for a narrow piece of information. Have a specific objective/site in mind. Want to find/rank many related Web sites. Want to factor quantity in (index has crawler based results). Need to check/fix spelling (based on Web statistics).
12
12 A.Frank When to use a Directory? Clear about the exact topic of your query. Need general information on a rather broad topic/category. Want to amass knowledge on a fairly wide subject. Would like to browse (and then search) a certain area. Want to factor quality in (directory has human- powered results), not quantity. Need information that is usually carefully evaluated and even annotated.
13
13 A.Frank When to use a Meta-SE? When single Basic-SE fails to provide good results. One-stop shopping - prefer to search multiple SEs/sites at once to get blended ranked results (so as to save effort/time). When the query is simple (complex fields/options don't usually work). Searching for multi-faceted topics. Want to get clustered results to focus search on the relevant keywords. Looking for current events/news.
14
14 A.Frank When to use a Specialty-SE? When general-SE fails to provide good results. When your target is very topic/technology specific. Want to find more than just Web pages/sites. Need more results from the Invisible Web. Want your search terms to more likely have the meanings you intended them to have.
15
15 A.Frank SE Quantity vs. DL Quality? SE DL
16
16 A.Frank SE vs. DL Potential Coverage Resources Relevant SE DL
17
17 A.Frank Contents SEs vs. DLs?! DL Definition/Types How to tilt the balance of SE/DL use? SELFDL Model/Architecture RIDDLE Model/Architecture Future directions
18
18 A.Frank Classical (Analogical) Library
19
19 A.Frank So What is a Digital Library? There are scores of definitions. Most are very general and verbose. A managed collection of information, with associated services, where the information is stored in digital formats and accessible over a network. Arms, William, Y., Digital Libraries, MIT press, Cambridge, 2000.
20
20 A.Frank Definition - A Digital Library is: 1. Collection of digital objects 2. Collection of knowledge structures 3. Collection of library services 4. Library Categories: Domain, Focus & Topic 5. Quality Control 6. Preservation/Persistence
21
21 A.Frank 1. Collection of Digital Objects Documents (e.g., texts, HTML pages) Books Journals Multimedia (images, audio, video, etc…) Charts/Maps Data objects available directly or indirectly
22
22 A.Frank 2. Collection of Knowledge Structures Metadata: Standards, Markup Indices, Catalogs, Guides Taxonomies, Ontologies, Thesauri Dictionaries, Glossaries, Concordances Gazetteers Abstracts/Summaries
23
23 A.Frank 3. Collection of Library Services Management (computerization, communication) Collections development Search (query formulation) and Browse interfaces Multi-access/use for varied users Online Help, Reference, Consultation Logging, statistics and Performance Measurement Evaluation (PME) SDI: Selective Dissemination of Information (Push mode)
24
24 A.Frank 4. Library Categories: Domain, Focus & Topic Domain: belongs to an area (DNS TLDs). –edu, com, org, gov, us, il, ac.il, co.il, … Focus: created to serve a certain community of users/patrons. –Academic, Public, National, School, … Topic: the subject of the collection; can be relatively finely-grained. –Law, Medicine, Music, Web, …
25
25 A.Frank 5. Quality Control Selection criteria. All material is assessed and authorized (“certified”). Adhere to licensing and copyrights. Use of Digital Rights Management (DRM). Integrity enforced (proven quality). Use of filtering. Support for profiling/stereotyping.
26
26 A.Frank 6. Preservation/Persistence Access and usage is long term Serves as an archive Scanning and digitization Quality reproduction of material Material persistency –paper vs. digital media –digital formats (software tools)
27
27 A.Frank Need for a delicate balance
28
28 A.Frank Basic SE (BSE) Meta SE (MSE) Popularity SE (PSE) Stand-alone DL (SDL) Harvested DL (HDL) Federated DL (FDL) Digital Library (DL) Search Engine (SE) Directory (Catalog, Guide, Subject Gateway) Web Repositories Hierarchy
29
29 A.Frank Types of DLs Stand-alone Digital Library (SDL) – also self-contained, several collections Federated Digital Library (FDL) – also confederated, networked Harvested Digital Library (HDL) – also distributed
30
30 A.Frank Stand-alone Digital Library (SDL) The regular (classical) DL. Implemented locally in a fully computerized fashion, with networked access. Self-contained material: – edited/generated – scanned/digitized – purchased Single or Several digital collections.
31
31 A.Frank Federated Digital Library (FDL) Contains several autonomous libraries. Based on common focus and topic. Usually heterogeneous repositories. Connected via a network. Forms a flat unified library. Transparent user interface. The major problem is interoperability
32
32 A.Frank Harvested Digital Library (HDL) Virtual library providing metadata-based access to relevant items distributed over the network. Objects harvested into metadata (protocol was Harvest/SOIF, nowadays OAI-PMH can be used). Harvests digital objects, not full DLs. But has regular DL characteristics.
33
33 A.Frank SDL vs. HDL
34
34 A.Frank Parallel Evolution of SEs and DLs Search Engines Generations Digital Libraries Generations 1 st Generation – Basic SE (BSE) includes Robots, Indices, Directories, basic/advanced user interfaces. 1 st Generation – Stand-alone (SDL) local, classical, focused material, digitized or scanned. 2 nd Generation – Meta SE (MSE) uses several basic-SEs simultaneously (federated search), ranks gathered pages by relevancy. 2 nd Generation – Federated (FDL) Comprised of autonomous SDLs representing related, possibly heterogeneous, network repositories 3 rd Generation – Popularity SE ( PSE) uses link analysis and use frequency measures to filter and rank the Web pages. 3 rd Generation – Harvested (HDL) contains only summaries and metadata structures; domain focused, of fine granularity.
35
35 A.Frank Contents SEs vs. DLs?! DL Definition/Types How to tilt the balance of SE/DL use? SELFDL Model/Architecture RIDDLE Model/Architecture Future directions
36
36 A.Frank Why are SEs overused? I always use Google/Yahoo! It’s just a quick search! The truth? – not sure what I’m looking for. I’m too used to using SEs. SEs are more general, no? SEs always give me enough answers. SEs don’t care what my topic/domain is!
37
37 A.Frank SE vs. DL - Server Side
38
38 A.Frank SE vs. DL - Client Side
39
39 A.Frank So what was the message ?
40
40 A.Frank Qualitative IR from Digital Library?! Fact: Quantity orientation in SE. Fact: Quality orientation in DL. ? Assumption: Accessible DLs in sought after domain. ? Assumption: Usable information retrieval interfaces for DLs. Result: High quality information retrieval from digital libraries!
41
41 A.Frank Why are DLs underused (social)? Too used to classical libraries (fond memories). No public awareness (an unknown entity). No public relations (unlike for Portals/SEs). No money in it (marketing, banners, services). If It’s a library, you have to pay to use it, no? Are DLs up-to-date at all (as much as SEs)? No DLs in my language (localization).
42
42 A.Frank Why are DLs underused (general)? Portals don’t offer DLs (services). Aren’t DLs part of the Invisible/Deep Web? DLs are just for experts! Many interests – will need to know many DLs. How to find them at all (need to startjump)? How to find relevant ones (sounds like search). How to find the right one (too many around). Lack of domain coverage (no DL in my area).
43
43 A.Frank Why are DLs underused (technical)? SEs crawl/index DLs, no? Aren’t directories enough? Aren’t SSEs (Specialized SEs) enough? Too focused/limited (too fine granularity). Need know-how to use DLs (unlike for SEs). Non-usable interfaces (not user-friendly). Mostly textual, not multimedia (like SEs are).
44
44 A.Frank DL Awareness & Discovery Problems Lack of use and familiarity with DLs. Hard to locate and identify DLs scattered around the Web. Not enough metadata kept for and on the DLs. DLs topic and focus and user interfaces are not always clear and usable.
45
45 A.Frank So how to tilt the balance of SE/DL use?
46
46 A.Frank Sample (Digital) Library Directories Berkeley LibWeb (Library Servers via Web) – http://sunsite.berkeley.edu/Libweb/ http://sunsite.berkeley.edu/Libweb/ Academic Info: Digital Libraries – http://www.academicinfo.net/digital.html http://www.academicinfo.net/digital.html Google Directory: Digital Libraries – http://directory.google.com/Top/Reference/Libraries/Digital/ http://directory.google.com/Top/Reference/Libraries/Digital/ Librarians’ Index to the Internet – http://lii.org/ http://lii.org/
47
47 A.Frank Use General SEs and DL Directories? Why can’t just use large general SEs? –noisy results, metadata not sufficient, too many (re)tries to get relevant results. Why can’t just use existing DL Directories? –messy categorization, non-friendly UI, not all libraries are DLs, not really DL Directories.
48
48 A.Frank Some possible directions/solutions Get SEs to better index, reference, and advertise DLs. Provide specialized SEs for locating DLs. Construct and enhance DL directories. DL coverage of more topics/domains. Employ SE like interfaces in DLs: –user-friendly interface (Google-like) –easy-to-use site (usability like in SE)
49
49 A.Frank If more time... we could SEEk more
50
50 A.Frank Theory vs. Practice?
51
51 A.Frank Contents SEs vs. DLs?! DL Definition/Types How to tilt the balance of SE/DL use? SELFDL Model/Architecture RIDDLE Model/Architecture Future directions
52
52 A.Frank SELFDL Goals Search Engine Locator For Digital Libraries Discover/identify/classify/generate DL resources/sites in the (in)visible Web. Supply search tools for users to find relevant DLs for their needs. Provide better, usable (thin) interfaces for locating DLs. Raise awareness, knowledge, discovery and use of DLs.
53
53 A.Frank Naming
54
54 A.Frank SELFDL Model/Architecture IndexDirectoryMeta
55
55 A.Frank SELFDL – gateway to world of DLs
56
56 A.Frank SELFDL techniques Harness SE technologies to locate DLs on the Web using: –Extractors: Extract DLs from DLs directories. –Crawlers: focused crawl in search of DLs. –Scripts: Interface with Google/Yahoo APIs. Use site analysis (search for DL terms). Support Extended DC (Dublin Core) metadata for each DL. Provide SELFDL database indexing.
57
57 A.Frank DLs Identification test Manual collection of a list of 65 terms that could be indicative that a Web site is a DL. Check if there is statistically significant connection between each of the terms and the fact that a Web site is a DL. Initial statistical test included 100 manually identified DLs and a 100 random Web sites. The statistical measure used (in SPSS) was Cross tabulation, tested with Chi-square, phi coefficient and Cramer’s V.
58
58 A.Frank Results of DLs Identification test Terms that have been found to be statistically significant: 1.documents, book(s), journal(s), electronic/internet/web resource(s) 2.catalog(s)/catalogue(s) 3.ask a librarian, patron(s) 4.digital library, library, digital collection(s) 5.copyright(s) 6.preservation/preserve, digitization/digitize
59
59 A.Frank SELFDL Directory UI
60
60 A.Frank SELFDL Directory classifications TopicFocusDomain Digital Library DDCBreedingIANA Countries -.IL Commercial -.COM Educational -.EDU Children Academic Professional Life Science: DDC 570 Earth Science: DDC 550 Biology: DDC 574
61
61 A.Frank Example DDC topic’s tree
62
62 A.Frank SELFDL Directory results example
63
63 A.Frank Advantages of SELFDL Directory Contains just DLs. Better classification/perspective based on domain/focus/topic. Provides user-friendly interface; like Google Directory. Additional metadata (based on DC).
64
64 A.Frank SELFDL Index UI
65
65 A.Frank SELFDL Index Results from Web focused crawling. Can be searched for specific DL criteria: –keywords –DL type (SDL, FDL, HDL) –DL media/content (audio, E-books, E-serials, theses, movies, etc…) –Protocol support (OAI-PMH)
66
66 A.Frank SELFDL Index example queries topic:biology domain:com algebra domain:com source:crawler focus:children type:SDL protocol:OAI topic:math media:ebooks
67
67 A.Frank SELFDL Index results example
68
68 A.Frank Advantages of SELFDL Index Built according to insights/techniques of various studies in the field. Supports directory and crawler results. Provides specialized SE for DLs. Easy to use query interface. Supports advanced keywords search.
69
69 A.Frank SELFDL Meta
70
70 A.Frank SELFDL Meta Engine Can be searched for DL keywords like in an ordinary search engine. Intersects SE (i.e., Google/Yahoo API) results with SELFDL database to extract the current DLs to be returned as query response. Performs like a regular SE – convenient for public use.
71
71 A.Frank YAHOO! SELFDL intersects with Google & Yahoo! results SELFDL Google Relevant DLs
72
72 A.Frank SELFDL Meta results example
73
73 A.Frank Google “Sponsored” DL Interface
74
74 A.Frank Advantages of SELFDL Meta Provides all the advantages of the SELFDL model (UI, metadata). Supports query interface for terms, like existing SEs. Supports intersection between SEs results and relevant DLs. Supports different orders of results.
75
75 A.Frank SELFDL prototype testing methods Efficiency measures were computed for Directory and Meta. Satisfaction surveys were given to users before and after SELFDL use. A check was carried out to find the best GUI for SELFDL (regular or Google-like).
76
76 A.Frank Efficiency testing methods Series of queries were evaluated for results relevancy. The F-measure was used as the efficiency measure. Where: P – Precision of results R – Relative recall of results F – Weighted harmonic average of P & R = 2PR/(P+R) The two components tested were SELFDL Meta and SELFDL Directory.
77
77 A.Frank SELFDL Directory vs. DL Directories R P
78
78 A.Frank SELFDL Meta vs. Google & Yahoo RP
79
79 A.Frank Users’ satisfaction surveys 1.Usability of Web utilities. 2.Ease of locating DLs. 3.Ease of identifying if site is DL. 4.DL results relevance. 5.DL metadata readability.
80
80 A.Frank Google DL Interface
81
81 A.Frank Contents SEs vs. DLs?! DL Definition/Types How to tilt the balance of SE/DL use? SELFDL Model/Architecture RIDDLE Model/Architecture Future directions
82
82 A.Frank RIDDLE Goals Resource Inquiry and Discovery in a DL Environment Enable creation of HDLs by harvesting (filtering) relevant SDLs using OAI-PMH. Enable construction of HDLs based on composition of lower-level HDLs, so as to increase the coverage of DLs’ topics. Enable information exchange with SELFDL. Raise awareness, knowledge, discovery and use of DLs.
83
83 A.Frank Example of topics’ composition University Life Sciences Exact Sciences Social Sciences Chemistry Computer Science HardwareSoftware
84
84 A.Frank OAI-PMH Protocol OAI-PMH - Open Archive Initiative (OAI) Protocol for Metadata Harvesting Tackles lack of uniformity and interoperability between data repositories, that make information sharing between repositories difficult. Addresses these problems by defining the way queries are sent to repositories and the way answers are received. Mandates at least one format of metadata for repositories use – Dublin Core (DC).
85
85 A.Frank RIDDLE Model/Architecture Enhanced OAI-PMH Layer 4 – Aggregated Service Providers HDL Layer 1 – Internet SDL Layer 2 – Data Providers Layer 5 – Presentation Layer 3 – Service Providers Web interfaces Aggregated HDLs Web HDL OAI-PMH
86
86 A.Frank Use of OAI-PMH for FDLs/HDLs OAI-PMH was planned to support harvesting, as manifested in its name, and also in its design (i.e., selective harvesting using “Sets”). However, the number of FDLs that use the protocol is relatively large, while there very few HDLs that employ it. Since HDLs, unlike FDLs, filter the information, and not just federate it, we investigate ways by which HDLs can filter information using the OAI-PMH protocol.
87
87 A.Frank Levels of information filtering There are 3 levels where information filtering can be done, though each level has its various problems, mostly caused by lack of uniformity between SDLs: 1.Item-level metadata – relates to problems with the use of DC entries (that are well known). 2.Group-level metadata – the use of OAI-PMH Sets for selective harvesting is not well defined, so it can not be easily used for relating to groups of items. 3.Library-level metadata – description of the metadata of this level is not well defined. Creation of HDLs using OAI-PMH is not fully supported.
88
88 A.Frank Suggested extensions to OAI-PMH Since lack of uniformity in SDLs using OAI-PMH prevents effective creation of HDLs. Provide for better harvesting/filtering capabilities from SDLs, by (re-)use of standards, as follows: 1.Item-level metadata – use of extended DC for metadata description, instead of just DC. 2.Group-level metadata – use of a DDC topic as a defined Set identifier. 3.Library-level metadata – use of extended DC for the library description field in the OAI-PMH Identify verb.
89
89 A.Frank The RIDDLE Prototype Provides for regular creation of FDLs. Enables creation of HDLs by harvesting/filtering the relevant SDLs. Supports HDL aggregation based on DDC hierarchy. The user search results return not only items matching the query but also HDLs and SDLs related to the indicated topic. The user can search the HDLs hierarchy (by textual or directory search) for a specific HDL and further down the aggregated HDLs tree.
90
90 A.Frank RIDDLE entry page
91
91 A.Frank Sample results page, first entry an HDL
92
92 A.Frank HDL aggregation The HDL aggregation capability is based on: –use of the DDC topics hierarchy. –assigning each HDL a suitable DDC topic identifier. –providing it with an OAI-PMH interface, similar to the what data providers have, thus enabling and supporting a HDLs hierarchy. –supporting both offline and online construction and corresponding search.
93
93 A.Frank Directory search with topics
94
94 A.Frank RIDDLE Experimentation Several tests where carried out, as follows: 1.The quality of information retrieval when using a specific HDL vs. use of several FDLs. 2.Ease of discovering and using the aggregated HDLs. 3.User preferences in searching several FDLs vs. use of aggregated HDLs. Initial testing indicates that use of HDLs and aggregated HDLs are more efficient when compared to the use of separate FDLs.
95
95 A.Frank Efficiency measures for RIDDLE
96
96 A.Frank Contents SEs vs. DLs?! DL Definition/Types How to tilt the balance of SE/DL use? SELFDL Model/Architecture RIDDLE Model/Architecture Future directions
97
97 A.Frank Future directions Better locating, identification and ranking of DLs and their categories/types. Conduct wider, more significant, tests using SELFDL and RIDDLE. Publish a beta Web version of SELFDL and RIDDLE for public use/feedback. Better integration between SELFDL and RIDDLE. Investigate awareness and discovery of DLs on the Web.
98
98 A.Frank References Sharon, T. & Frank, A., “Digital Libraries on the Internet”, IFLA'00 66th IFLA Council and General Conference, 13-18, Jerusalem, Israel, August 2000, http://www.ifla.org/IV/ifla66/papers/029-142e.htm http://www.ifla.org/IV/ifla66/papers/029-142e.htm Hanani, U. & Frank, A., “The Parallel Evolution of Search Engines and Digital Libraries: their Convergence to the Mega-Portal”, ICDL'00 Kyoto Intl. Conf. on Digital Libraries: Research and Practice, 269-276, Kyoto, Japan, November 2000, http://csdl.computer.org/comp/proceedings/kyotodl/2000/1022/00/10220211abs.htm http://csdl.computer.org/comp/proceedings/kyotodl/2000/1022/00/10220211abs.htm Yom Tov, N. & Frank, A., “Harnessing Search Engine Technologies to Increase Awareness and Discovery of Digital Libraries”, 4th IEEE Intl. Conf. on IT: Research and Education (ITRE), Tel-Aviv, October 2006. Kadury, A. & Frank, A., “Harvesting and Aggregation of Digital Libraries in the OAI Framework”, WEBIST 2007, 3rd Intl. Conf. on Web Information Systems and Technologies, 441-446, Barcelona, Spain, March 2007.
99
99 A.Frank Bibliography Arms W. Y., Digital Libraries, MIT Press, Cambridge, 2000. Hill, L., Buchel, O., Janée, G. & Lei, Z. M., “Integration of Knowledge Organization Systems into Digital Library Architectures”, Position Paper for 13th ASIS&T SIG/CR Workshop, “Reconceptualizing Classification Research”, 62-68, Philadelphia, PA, 2002. Pace A. K., The Ultimate Digital Library, American Library Association, Chicago, 2003. Lossau N., “Search Engine Technology and Digital Libraries: Libraries Need to Discover the Academic Internet”, D-Lib Magazine, Vol. 10, No. 6, June 2004. Summann F. & Lossau N., “Search Engine Technology and Digital Libraries: Moving from Theory to Practice”, D-Lib Magazine Online, Vol. 10, No. 9, September 2004. Lippincott J. K., “Net Generation Students and Libraries”, EDUCAUSE Review, Vol. 40, No. 2, March/April 2005.
100
10 0 A.Frank Still around :-?)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.