Presentation is loading. Please wait.

Presentation is loading. Please wait.

© Tefko Saracevic, Rutgers University1 The Invisible Web Tefko Saracevic, PhD Rutgers University ( contains also a.

Similar presentations


Presentation on theme: "© Tefko Saracevic, Rutgers University1 The Invisible Web Tefko Saracevic, PhD Rutgers University ( contains also a."— Presentation transcript:

1

2 © Tefko Saracevic, Rutgers University1 The Invisible Web Tefko Saracevic, PhD Rutgers University http://www.scils.rutgers.edu/~tefko ( contains also a list of sites relevant to the topic and this presentation) Tefko Saracevic, PhD Rutgers University http://www.scils.rutgers.edu/~tefko ( contains also a list of sites relevant to the topic and this presentation)

3 © Tefko Saracevic, Rutgers University2 What is invisible Web? Materials that general search engines cannot or WILL not include in their collection of Web pages (indexes) You cannot find through general search engines Contains a vast amount of information –much of it authoritative, qualitative Materials that general search engines cannot or WILL not include in their collection of Web pages (indexes) You cannot find through general search engines Contains a vast amount of information –much of it authoritative, qualitative

4 © Tefko Saracevic, Rutgers University3 Why search engines miss? Size: Web is huge, cannot cover all Economics: associated costs are high –also pay per crawl & rank Technical: still limited capabilities Spam: eliminating bad also looses good Restrictions: some site do not let in Deep structure: some sites complex Size: Web is huge, cannot cover all Economics: associated costs are high –also pay per crawl & rank Technical: still limited capabilities Spam: eliminating bad also looses good Restrictions: some site do not let in Deep structure: some sites complex

5 © Tefko Saracevic, Rutgers University4 Web size - who knows? Estimated over 16 million web servers Lawrence & Giles, 1999 –But only a fraction of direct search relevance Domains of sites 83% commercial, 6% scientific or educational; 3% health 2.5% personal; 2% societies; 1.5% government, about 1% each community, religion 1.5% pornographic Web Characterization Project - OCLC – statistics, trends, report, links … for 2001 reports 8.5 mill web sites –http://wcp.oclc.org/http://wcp.oclc.org/ Estimated over 16 million web servers Lawrence & Giles, 1999 –But only a fraction of direct search relevance Domains of sites 83% commercial, 6% scientific or educational; 3% health 2.5% personal; 2% societies; 1.5% government, about 1% each community, religion 1.5% pornographic Web Characterization Project - OCLC – statistics, trends, report, links … for 2001 reports 8.5 mill web sites –http://wcp.oclc.org/http://wcp.oclc.org/

6 © Tefko Saracevic, Rutgers University5 Organization of sources No standardization across sources Major approaches in search engines –classification: many directory types used –statistical analyses of terms, links Metatags in sources –to enable retrieval by fields –HTML “keywords”, “description” 34% of sites use them –Dublin core -.3% sites use Organization: hindrance to retrieval –also faked contents to force retrieval No standardization across sources Major approaches in search engines –classification: many directory types used –statistical analyses of terms, links Metatags in sources –to enable retrieval by fields –HTML “keywords”, “description” 34% of sites use them –Dublin core -.3% sites use Organization: hindrance to retrieval –also faked contents to force retrieval

7 © Tefko Saracevic, Rutgers University6 Sources & search engines Indexed by search engines (publicly indexed) –by terms, selection, links, registration Not publicly indexed –many domain sources will not be found e.g digital libraries, online journals, reference –many commercial sites will hardly be found Differing approaches to inclusion/selection –mostly automatic; also generic source providers –increasingly added human evaluation & selection Indexed by search engines (publicly indexed) –by terms, selection, links, registration Not publicly indexed –many domain sources will not be found e.g digital libraries, online journals, reference –many commercial sites will hardly be found Differing approaches to inclusion/selection –mostly automatic; also generic source providers –increasingly added human evaluation & selection

8 © Tefko Saracevic, Rutgers University7 Search engine coverage No engine covers more than 16% of WWW In respect to combined coverage of 11 top: –Northern Light 38.3% ; Snap 37.1; AltaVista 37.1 HotBot 27.1 MS 20.3 Infoseek 19.2, Google 18.6, Yahoo 17.6 Excite 13.5, Lycos 5.9, EuroSeek 5.2 –HotBot MS, Snap & Yahoo use Inktomi as search provider, but have different filtering & Inktomi databases Northern Light has ‘special collection’ - documents not part of publicly indexabable web Hard to discern & compare coverage Many national search engines - own coverage No engine covers more than 16% of WWW In respect to combined coverage of 11 top: –Northern Light 38.3% ; Snap 37.1; AltaVista 37.1 HotBot 27.1 MS 20.3 Infoseek 19.2, Google 18.6, Yahoo 17.6 Excite 13.5, Lycos 5.9, EuroSeek 5.2 –HotBot MS, Snap & Yahoo use Inktomi as search provider, but have different filtering & Inktomi databases Northern Light has ‘special collection’ - documents not part of publicly indexabable web Hard to discern & compare coverage Many national search engines - own coverage

9 © Tefko Saracevic, Rutgers University8 Meta search engines Search engines that cover search engines – many around e.g. –All4one http://all4one.com/http://all4one.com/ four windows - good for comparison –CDNET Search.com ttp://www.search.com/ ttp://www.search.com/ meta engine of meta engines - customization Search Engines Worldwide http://www.twics.com/~takakuwa/search/search.html http://www.twics.com/~takakuwa/search/search.html 174 countries, over 1300 engines More on the horizon & differing Search engines that cover search engines – many around e.g. –All4one http://all4one.com/http://all4one.com/ four windows - good for comparison –CDNET Search.com ttp://www.search.com/ ttp://www.search.com/ meta engine of meta engines - customization Search Engines Worldwide http://www.twics.com/~takakuwa/search/search.html http://www.twics.com/~takakuwa/search/search.html 174 countries, over 1300 engines More on the horizon & differing

10 © Tefko Saracevic, Rutgers University9 Major source for invisible Web Book Chris Sherman & Gary Price (2001). Invisible Web: Uncovering information sources search engines can’t see. Information Today Site www.invisible-web.net Book Chris Sherman & Gary Price (2001). Invisible Web: Uncovering information sources search engines can’t see. Information Today Site www.invisible-web.net

11 © Tefko Saracevic, Rutgers University10 Specialized meta engines Selective with directories & large number of databases & search engines –Complete Planet http://completeplanet.com http://completeplanet.com –Invisible Web http://invisibleweb.com http://invisibleweb.com In the U.S. federal information via Government Printing Office Access http://www.gpo.gov/gpoaccess http://www.gpo.gov/gpoaccess Federal Bulletin Board (file libraries for download from many agencies): http://fedbbs.access.gpo.gov http://fedbbs.access.gpo.gov Selective with directories & large number of databases & search engines –Complete Planet http://completeplanet.com http://completeplanet.com –Invisible Web http://invisibleweb.com http://invisibleweb.com In the U.S. federal information via Government Printing Office Access http://www.gpo.gov/gpoaccess http://www.gpo.gov/gpoaccess Federal Bulletin Board (file libraries for download from many agencies): http://fedbbs.access.gpo.gov http://fedbbs.access.gpo.gov

12 © Tefko Saracevic, Rutgers University11 Reference (expert) services Reference services - several models –Q&A, directories, email answers etc. – e.g. –Martindale’s Reference Desk - comprehensive http://www-sci.lib.uci.edu/~martindale/Ref.html –Ask Jeeves! – most popular http://www.ask.com/ http://www.ask.com/ –Ask ERIC – education questions- email answers http://www.askeric.org/Qa/ –Information Please - almanac type questions http://www.infoplease.com/ http://www.infoplease.com/ Academic libraries developing reference models - new service area Reference services - several models –Q&A, directories, email answers etc. – e.g. –Martindale’s Reference Desk - comprehensive http://www-sci.lib.uci.edu/~martindale/Ref.html –Ask Jeeves! – most popular http://www.ask.com/ http://www.ask.com/ –Ask ERIC – education questions- email answers http://www.askeric.org/Qa/ –Information Please - almanac type questions http://www.infoplease.com/ http://www.infoplease.com/ Academic libraries developing reference models - new service area

13 © Tefko Saracevic, Rutgers University12 Libraries as Web sources Academic libraries providing open collections & services; models vary –Rutgers libraries - big long term effort http://www.libraries.rutgers.edu/ http://www.libraries.rutgers.edu/ –various sources & links involved for domain information& sources go to: –Electronic Reference Sources; Subject Research Guides: Social Sciences & Law; Library & Information Science –University of California, Berkeley - a most elaborate effort together with Sun Corporation http://sunsite.berkeley.edu/ http://sunsite.berkeley.edu/ Academic libraries providing open collections & services; models vary –Rutgers libraries - big long term effort http://www.libraries.rutgers.edu/ http://www.libraries.rutgers.edu/ –various sources & links involved for domain information& sources go to: –Electronic Reference Sources; Subject Research Guides: Social Sciences & Law; Library & Information Science –University of California, Berkeley - a most elaborate effort together with Sun Corporation http://sunsite.berkeley.edu/ http://sunsite.berkeley.edu/

14 © Tefko Saracevic, Rutgers University13 Virtual libraries on the Web Libraries emerging only on the Web –More & more libraries & organizations involved Examples of academic & public libraries – Virtual Library - Switzerland, US, UK & other countries – ‘oldest virtual library on the Web’ http://vlib.org –Toronto Public Library –Internet Public Library, Michigan http://www.ipl.org/ Libraries emerging only on the Web –More & more libraries & organizations involved Examples of academic & public libraries – Virtual Library - Switzerland, US, UK & other countries – ‘oldest virtual library on the Web’ http://vlib.org –Toronto Public Library –Internet Public Library, Michigan http://www.ipl.org/

15 © Tefko Saracevic, Rutgers University14 Domain sites Many domain/issue specific sites –rich & often unique coverage & services – different approaches & requirements Examples in health related domains: –Medscape - registration required http://www.medscape.com/ –Rxlist - The Internet Drug Index http://www.rxlist.com/ –Mayo Clinic HealthOasis http://www.mayohealth.org/ http://www.mayohealth.org/ Many domain/issue specific sites –rich & often unique coverage & services – different approaches & requirements Examples in health related domains: –Medscape - registration required http://www.medscape.com/ –Rxlist - The Internet Drug Index http://www.rxlist.com/ –Mayo Clinic HealthOasis http://www.mayohealth.org/ http://www.mayohealth.org/

16 © Tefko Saracevic, Rutgers University15 Societies, organizations, publishers Great many rich sources for searching –differences in requirements, depth, richness Examples from variety of organizations: –Assoc. for Computing Machinery http://www.acm.org/ http://www.acm.org/ Digital Library; subscription or registration –State department http://www.state.gov/ http://www.state.gov/ about the U.S & other countries –R.R. Bowker http://www.bowker.com/ http://www.bowker.com/ Free Resources from Bowker; Library Resource Guide –Genealogy: http://www.familysearch.org/http://www.familysearch.org/ Great many rich sources for searching –differences in requirements, depth, richness Examples from variety of organizations: –Assoc. for Computing Machinery http://www.acm.org/ http://www.acm.org/ Digital Library; subscription or registration –State department http://www.state.gov/ http://www.state.gov/ about the U.S & other countries –R.R. Bowker http://www.bowker.com/ http://www.bowker.com/ Free Resources from Bowker; Library Resource Guide –Genealogy: http://www.familysearch.org/http://www.familysearch.org/

17 © Tefko Saracevic, Rutgers University16 Language barriers on the Web English still the major language – but declining, now slightly over 50% Multilingual retrieval search engines –Euroseek – searches 40 languages http://www.euroseek.com/ http://www.euroseek.com/ –All the Web – 45 languages http://www.alltheweb.com/ http://www.alltheweb.com/ –in both, search in different languages covers primarily their language sources English still the major language – but declining, now slightly over 50% Multilingual retrieval search engines –Euroseek – searches 40 languages http://www.euroseek.com/ http://www.euroseek.com/ –All the Web – 45 languages http://www.alltheweb.com/ http://www.alltheweb.com/ –in both, search in different languages covers primarily their language sources

18 © Tefko Saracevic, Rutgers University17 Language barriers: translations A number of translation sites –machine aided – i.e. plug in terms, phrases, sentences in one & review in the other language, but effectiveness??? – Free Translations http://www.freetranslations.com http://www.freetranslations.com –Babel Fish http://babelfish.altavista.com/tr http://babelfish.altavista.com/tr –Travlang – great for travelers – phrases http://www.travlang.com http://www.travlang.com A number of translation sites –machine aided – i.e. plug in terms, phrases, sentences in one & review in the other language, but effectiveness??? – Free Translations http://www.freetranslations.com http://www.freetranslations.com –Babel Fish http://babelfish.altavista.com/tr http://babelfish.altavista.com/tr –Travlang – great for travelers – phrases http://www.travlang.com http://www.travlang.com

19 © Tefko Saracevic, Rutgers University18 News sources about the Web visible & invisible –The Virtual Acquisition Shelf & News Desk http://resourceshelf.blogspot.com/ http://resourceshelf.blogspot.com/ –Free Pint http://www.freepint.com/http://www.freepint.com/ –ResearchBuzz. http://www.researchbuzz.com/index.shtml http://www.researchbuzz.com/index.shtml –Internet Resources Newsletter. http://www.hw.ac.uk/libwww/irn/ http://www.hw.ac.uk/libwww/irn/ –Search Engine Watch. http://www.searchenginewatch.com/ http://www.searchenginewatch.com/ –The Virtual Acquisition Shelf & News Desk http://resourceshelf.blogspot.com/ http://resourceshelf.blogspot.com/ –Free Pint http://www.freepint.com/http://www.freepint.com/ –ResearchBuzz. http://www.researchbuzz.com/index.shtml http://www.researchbuzz.com/index.shtml –Internet Resources Newsletter. http://www.hw.ac.uk/libwww/irn/ http://www.hw.ac.uk/libwww/irn/ –Search Engine Watch. http://www.searchenginewatch.com/ http://www.searchenginewatch.com/

20 © Tefko Saracevic, Rutgers University19 Sample of great sources for invisible Web –Direct Search. http://gwis2.circ.gwu.edu/~gprice/direct.htm http://gwis2.circ.gwu.edu/~gprice/direct.htm –eLibrary. http://ask.elibrary.com/http://ask.elibrary.com/ –The Scout Report. http://scout.cs.wisc.edu/http://scout.cs.wisc.edu/ –Museum of online museums. http://www.coudal.com/archives/museum.html http://www.coudal.com/archives/museum.html –Librarians index to the Internet. http://www.lii.org/http://www.lii.org/ –Profusion. http://www.profusion.com/http://www.profusion.com/ –Research Index. http://www.researchindex.com/http://www.researchindex.com/ –Cybercafe Search Engine. http://www.cybercaptive.com http://www.cybercaptive.com –Direct Search. http://gwis2.circ.gwu.edu/~gprice/direct.htm http://gwis2.circ.gwu.edu/~gprice/direct.htm –eLibrary. http://ask.elibrary.com/http://ask.elibrary.com/ –The Scout Report. http://scout.cs.wisc.edu/http://scout.cs.wisc.edu/ –Museum of online museums. http://www.coudal.com/archives/museum.html http://www.coudal.com/archives/museum.html –Librarians index to the Internet. http://www.lii.org/http://www.lii.org/ –Profusion. http://www.profusion.com/http://www.profusion.com/ –Research Index. http://www.researchindex.com/http://www.researchindex.com/ –Cybercafe Search Engine. http://www.cybercaptive.com http://www.cybercaptive.com

21 © Tefko Saracevic, Rutgers University20 Needed for Web searching in general Knowledge & competencies –variety of Web sources –their organization –search engines –Web search strategies –search dynamics, feedback Keeping up & up & up –constant updates, changes, innovations –many domain/subject specific Knowledge & competencies –variety of Web sources –their organization –search engines –Web search strategies –search dynamics, feedback Keeping up & up & up –constant updates, changes, innovations –many domain/subject specific

22 © Tefko Saracevic, Rutgers University21 Needed for Web searching by professionals Knowledge of SOURCES in area of interest search engines not enough not too helpful in finding these other sources; structure hard to discern Evaluation of sources –a key professional skill! standard criteria: quality, veracity, coverage etc plus Web criteria: authority; accuracy; currency (timeliness); objectivity; coverage, persistence, usability Knowledge of SOURCES in area of interest search engines not enough not too helpful in finding these other sources; structure hard to discern Evaluation of sources –a key professional skill! standard criteria: quality, veracity, coverage etc plus Web criteria: authority; accuracy; currency (timeliness); objectivity; coverage, persistence, usability

23 © Tefko Saracevic, Rutgers University22 competencies … Knowledge of users & use Knowledge of searching Use of technology Adaptability, flexibility Integration with other resources Teaching others Constant learning & update Knowledge of users & use Knowledge of searching Use of technology Adaptability, flexibility Integration with other resources Teaching others Constant learning & update

24 © Tefko Saracevic, Rutgers University23


Download ppt "© Tefko Saracevic, Rutgers University1 The Invisible Web Tefko Saracevic, PhD Rutgers University ( contains also a."

Similar presentations


Ads by Google