Download presentation
Presentation is loading. Please wait.
2
© Tefko Saracevic, Rutgers University1 Web sources and library & information services Finding, evaluating and using a variety of Web sources for searching and reference
3
© Tefko Saracevic, Rutgers University2 Similarities between Web searching & IR & reference Basic principles to approach the same –human-human interaction - interview - social, organizational, cognitive, affective aspects to explore including task, need … –preparation of search concepts, terms, logic –determination of range, restrictions –estimation of relevance Basic principles to approach the same –human-human interaction - interview - social, organizational, cognitive, affective aspects to explore including task, need … –preparation of search concepts, terms, logic –determination of range, restrictions –estimation of relevance
4
© Tefko Saracevic, Rutgers University3 Differences Vastly different sources –as to contents, authority, reliability persistence –variation in amounts, depth, breadth Very different organization –little standardization, few if any fields Quite different search engines & capabilities -basic & advanced –also different from engine to engine Differing search strategies needed Vastly different sources –as to contents, authority, reliability persistence –variation in amounts, depth, breadth Very different organization –little standardization, few if any fields Quite different search engines & capabilities -basic & advanced –also different from engine to engine Differing search strategies needed
5
© Tefko Saracevic, Rutgers University4 Also: invisible Web Materials that general search engines cannot or WILL not include in their collection of Web pages (indexes) You cannot find through general search engines Contains a vast amount of information –much of it authoritative, qualitative Materials that general search engines cannot or WILL not include in their collection of Web pages (indexes) You cannot find through general search engines Contains a vast amount of information –much of it authoritative, qualitative
6
© Tefko Saracevic, Rutgers University5 Why search engines miss? Size: Web is huge, cannot cover all Economics: associated costs are high –also pay per crawl & rank Technical: still limited capabilities Spam: eliminating bad also looses good Restrictions: some site do not let in Deep structure: some sites complex Size: Web is huge, cannot cover all Economics: associated costs are high –also pay per crawl & rank Technical: still limited capabilities Spam: eliminating bad also looses good Restrictions: some site do not let in Deep structure: some sites complex
7
© Tefko Saracevic, Rutgers University6 Needed for Web searching Knowledge & competencies –variety of Web sources –their organization –search engines –Web search strategies –search dynamics, feedback Keeping up & up & up –constant updates, changes, innovations –many domain/subject specific Knowledge & competencies –variety of Web sources –their organization –search engines –Web search strategies –search dynamics, feedback Keeping up & up & up –constant updates, changes, innovations –many domain/subject specific
8
© Tefko Saracevic, Rutgers University7 Web size - who knows? Estimated over 16 million web servers Lawrence & Giles, 1999 –But only a fraction of direct search relevance Domains of sites 83% commercial, 6% scientific or educational; 3% health 2.5% personal; 2% societies; 1.5% government, about 1% each community, religion 1.5% pornographic Web Characterization Project - OCLC – statistics, trends, report, links … for 2001 reports 8.5 mill web sites –http://wcp.oclc.org/http://wcp.oclc.org/ Estimated over 16 million web servers Lawrence & Giles, 1999 –But only a fraction of direct search relevance Domains of sites 83% commercial, 6% scientific or educational; 3% health 2.5% personal; 2% societies; 1.5% government, about 1% each community, religion 1.5% pornographic Web Characterization Project - OCLC – statistics, trends, report, links … for 2001 reports 8.5 mill web sites –http://wcp.oclc.org/http://wcp.oclc.org/
9
© Tefko Saracevic, Rutgers University8 Organization of sources No standardization across sources Major approaches in search engines –classification: many directory types used –statistical analyses of terms, links Metatags in sources –to enable retrieval by fields –HTML “keywords”, “description” 34% of sites use them –Dublin core -.3% sites use Organization: hindrance to retrieval –also faked contents to force retrieval No standardization across sources Major approaches in search engines –classification: many directory types used –statistical analyses of terms, links Metatags in sources –to enable retrieval by fields –HTML “keywords”, “description” 34% of sites use them –Dublin core -.3% sites use Organization: hindrance to retrieval –also faked contents to force retrieval
10
© Tefko Saracevic, Rutgers University9 Sources & search engines Indexed by search engines (publicly indexed) –by terms, selection, links, registration Not publicly indexed –many domain sources will not be found e.g digital libraries, online journals, reference –many commercial sites will hardly be found Differing approaches to inclusion/selection –mostly automatic; also generic source providers –increasingly added human evaluation & selection Indexed by search engines (publicly indexed) –by terms, selection, links, registration Not publicly indexed –many domain sources will not be found e.g digital libraries, online journals, reference –many commercial sites will hardly be found Differing approaches to inclusion/selection –mostly automatic; also generic source providers –increasingly added human evaluation & selection
11
© Tefko Saracevic, Rutgers University10 Search engine coverage No engine covers more than 16% of WWW In respect to combined coverage of 11 top: –Northern Light 38.3% ; Snap 37.1; AltaVista 37.1 HotBot 27.1 MS 20.3 Infoseek 19.2, Google 18.6, Yahoo 17.6 Excite 13.5, Lycos 5.9, EuroSeek 5.2 –HotBot MS, Snap & Yahoo use Inktomi as search provider, but have different filtering & Inktomi databases Northern Light has ‘special collection’ - documents not part of publicly indexabable web Hard to discern & compare coverage Many national search engines - own coverage No engine covers more than 16% of WWW In respect to combined coverage of 11 top: –Northern Light 38.3% ; Snap 37.1; AltaVista 37.1 HotBot 27.1 MS 20.3 Infoseek 19.2, Google 18.6, Yahoo 17.6 Excite 13.5, Lycos 5.9, EuroSeek 5.2 –HotBot MS, Snap & Yahoo use Inktomi as search provider, but have different filtering & Inktomi databases Northern Light has ‘special collection’ - documents not part of publicly indexabable web Hard to discern & compare coverage Many national search engines - own coverage
12
© Tefko Saracevic, Rutgers University11 Search features among engines Some search features the same across all but details differ - particularly in advanced –Boolean available but sometimes AND sometimes OR default –Differences may be found in: phrases, proximity, truncation, case sensitivity, relevance feedback, field searching, special features term expansion to concepts (latent semantic indexing) Some search features the same across all but details differ - particularly in advanced –Boolean available but sometimes AND sometimes OR default –Differences may be found in: phrases, proximity, truncation, case sensitivity, relevance feedback, field searching, special features term expansion to concepts (latent semantic indexing)
13
© Tefko Saracevic, Rutgers University12 Search strategies & outputs Geared toward very short searches –big majority of searches 2-3 terms (av. 2.5) in IR av. 7-14 - making a big difference Directory browsing a big component - not in IR Geared toward limited top outputs Ranking output by relevance predominates –relevance calculation differ & proprietary (secret) –except Google - they published their method –affects search strategy - you guess how is done Geared toward very short searches –big majority of searches 2-3 terms (av. 2.5) in IR av. 7-14 - making a big difference Directory browsing a big component - not in IR Geared toward limited top outputs Ranking output by relevance predominates –relevance calculation differ & proprietary (secret) –except Google - they published their method –affects search strategy - you guess how is done
14
© Tefko Saracevic, Rutgers University13 Meta search engines Search engines that cover search engines – many around e.g. –All4one http://all4one.com/ http://all4one.com/ four windows - good for comparison –CDNET Search.com http://www.search.com/ http://www.search.com/ meta engine of meta engines - customization Search Engines Worldwide 174 countries, over 1300 engines http://www.twics.com/~takakuwa/search/search.html More on the horizon & differing Search engines that cover search engines – many around e.g. –All4one http://all4one.com/ http://all4one.com/ four windows - good for comparison –CDNET Search.com http://www.search.com/ http://www.search.com/ meta engine of meta engines - customization Search Engines Worldwide 174 countries, over 1300 engines http://www.twics.com/~takakuwa/search/search.html More on the horizon & differing
15
© Tefko Saracevic, Rutgers University14 Specialized meta engines Selective with directories & large number of databases & search engines –Complete Planet http://completeplanet.com http://completeplanet.com –Invisible Web http://invisibleweb.com http://invisibleweb.com U.S. federal information via Government Printing Office Access http://www.gpo.gov/gpoaccess http://www.gpo.gov/gpoaccess –Federal Bulletin Board (file libraries for download from many agencies): http://fedbbs.access.gpo.gov http://fedbbs.access.gpo.gov Selective with directories & large number of databases & search engines –Complete Planet http://completeplanet.com http://completeplanet.com –Invisible Web http://invisibleweb.com http://invisibleweb.com U.S. federal information via Government Printing Office Access http://www.gpo.gov/gpoaccess http://www.gpo.gov/gpoaccess –Federal Bulletin Board (file libraries for download from many agencies): http://fedbbs.access.gpo.gov http://fedbbs.access.gpo.gov
16
© Tefko Saracevic, Rutgers University15 Reference (expert) services Reference services - several models –Q&A, directories, email answers etc. – e.g. –Martindale’s Reference Desk - comprehensive http://www-sci.lib.uci.edu/~martindale/Ref.html –Ask Jeeves! – most popular http://www.ask.com/ http://www.ask.com/ –Ask ERIC – education questions- email answers http://www.askeric.org/Qa/ –Information Please - almanac type questions http://www.infoplease.com/ http://www.infoplease.com/ Academic libraries developing reference models - new service area Reference services - several models –Q&A, directories, email answers etc. – e.g. –Martindale’s Reference Desk - comprehensive http://www-sci.lib.uci.edu/~martindale/Ref.html –Ask Jeeves! – most popular http://www.ask.com/ http://www.ask.com/ –Ask ERIC – education questions- email answers http://www.askeric.org/Qa/ –Information Please - almanac type questions http://www.infoplease.com/ http://www.infoplease.com/ Academic libraries developing reference models - new service area
17
© Tefko Saracevic, Rutgers University16 Libraries as Web sources Academic libraries providing open collections & services; models vary –Rutgers libraries - big long term effort http://www.libraries.rutgers.edu/ http://www.libraries.rutgers.edu/ –various sources & links involved for domain information& sources go to: –Electronic Reference Sources; Subject Research Guides: Social Sciences & Law; Library & Information Science –University of California, Berkeley - a most elaborate effort together with Sun Corporation http://sunsite.berkeley.edu/ http://sunsite.berkeley.edu/ Academic libraries providing open collections & services; models vary –Rutgers libraries - big long term effort http://www.libraries.rutgers.edu/ http://www.libraries.rutgers.edu/ –various sources & links involved for domain information& sources go to: –Electronic Reference Sources; Subject Research Guides: Social Sciences & Law; Library & Information Science –University of California, Berkeley - a most elaborate effort together with Sun Corporation http://sunsite.berkeley.edu/ http://sunsite.berkeley.edu/
18
© Tefko Saracevic, Rutgers University17 Virtual libraries on the Web Libraries emerging only on the Web –More & more libraries & organizations involved Examples of academic & public libraries – Virtual Library - Switzerland, US, UK & other countries – ‘oldest virtual library on the Web’ http://vlib.org –Toronto Public Library http://vrl.tpl.toronto.on.ca/ –Internet Public Library, Michigan http://www.ipl.org/ Libraries emerging only on the Web –More & more libraries & organizations involved Examples of academic & public libraries – Virtual Library - Switzerland, US, UK & other countries – ‘oldest virtual library on the Web’ http://vlib.org –Toronto Public Library http://vrl.tpl.toronto.on.ca/ –Internet Public Library, Michigan http://www.ipl.org/
19
© Tefko Saracevic, Rutgers University18 Domain sites Many domain/issue specific sites –rich & often unique coverage & services – different approaches & requirements Examples in health related domains: –Medscape - registration required http://www.medscape.com/ –Rxlist - The Internet Drug Index http://www.rxlist.com/ –Mayo Clinic HealthOasis http://www.mayohealth.org/ http://www.mayohealth.org/ Many domain/issue specific sites –rich & often unique coverage & services – different approaches & requirements Examples in health related domains: –Medscape - registration required http://www.medscape.com/ –Rxlist - The Internet Drug Index http://www.rxlist.com/ –Mayo Clinic HealthOasis http://www.mayohealth.org/ http://www.mayohealth.org/
20
© Tefko Saracevic, Rutgers University19 Societies, organizations, publishers Great many rich sources for searching –differences in requirements, depth, richness Examples from variety of organizations: –Assoc. for Computing Machinery http://www.acm.org/ http://www.acm.org/ Digital Library; subscription or registration –State department http://www.state.gov/ http://www.state.gov/ about the U.S & other countries –R.R. Bowker http://www.bowker.com/ http://www.bowker.com/ Free Resources from Bowker; Library Resource Guide –Genealogy: http://www.familysearch.org/http://www.familysearch.org/ Great many rich sources for searching –differences in requirements, depth, richness Examples from variety of organizations: –Assoc. for Computing Machinery http://www.acm.org/ http://www.acm.org/ Digital Library; subscription or registration –State department http://www.state.gov/ http://www.state.gov/ about the U.S & other countries –R.R. Bowker http://www.bowker.com/ http://www.bowker.com/ Free Resources from Bowker; Library Resource Guide –Genealogy: http://www.familysearch.org/http://www.familysearch.org/
21
© Tefko Saracevic, Rutgers University20 Language barriers on the Web English still the major language – but declining, now slightly over 50% Multilingual retrieval search engines –Euroseek – searches 40 languages http://www.euroseek.com/ http://www.euroseek.com/ –All the Web – 45 languages http://www.alltheweb.com/ http://www.alltheweb.com/ –in both, search in different languages covers primarily their language sources English still the major language – but declining, now slightly over 50% Multilingual retrieval search engines –Euroseek – searches 40 languages http://www.euroseek.com/ http://www.euroseek.com/ –All the Web – 45 languages http://www.alltheweb.com/ http://www.alltheweb.com/ –in both, search in different languages covers primarily their language sources
22
© Tefko Saracevic, Rutgers University21 Language barriers: translations A number of translation sites –machine aided – i.e. plug in terms, phrases, sentences in one & review in the other language, but effectiveness??? – Free Translations http://www.freetranslations.com http://www.freetranslations.com –Babel Fish http://babelfish.altavista.com/tr http://babelfish.altavista.com/tr –Travlang – great for travelers – phrases http://www.travlang.com http://www.travlang.com A number of translation sites –machine aided – i.e. plug in terms, phrases, sentences in one & review in the other language, but effectiveness??? – Free Translations http://www.freetranslations.com http://www.freetranslations.com –Babel Fish http://babelfish.altavista.com/tr http://babelfish.altavista.com/tr –Travlang – great for travelers – phrases http://www.travlang.com http://www.travlang.com
23
© Tefko Saracevic, Rutgers University22 Key professional competencies Knowledge of SOURCES in area of interest search engines not enough not too helpful in finding these other sources; structure hard to discern Evaluation of sources –a key professional skill! standard criteria: quality, veracity, coverage etc plus Web criteria: authority; accuracy; currency (timeliness); objectivity; coverage, persistence, usability –http://www.otterbein.edu/learning/libpages/subeval.htmhttp://www.otterbein.edu/learning/libpages/subeval.htm Knowledge of SOURCES in area of interest search engines not enough not too helpful in finding these other sources; structure hard to discern Evaluation of sources –a key professional skill! standard criteria: quality, veracity, coverage etc plus Web criteria: authority; accuracy; currency (timeliness); objectivity; coverage, persistence, usability –http://www.otterbein.edu/learning/libpages/subeval.htmhttp://www.otterbein.edu/learning/libpages/subeval.htm
24
© Tefko Saracevic, Rutgers University23 competencies … Knowledge of users & use Knowledge of searching Use of technology Adaptability, flexibility Integration with other resources Teaching others Constant learning & update Knowledge of users & use Knowledge of searching Use of technology Adaptability, flexibility Integration with other resources Teaching others Constant learning & update
25
© Tefko Saracevic, Rutgers University24
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.