Presentation is loading. Please wait.

Presentation is loading. Please wait.

© Tefko Saracevic, Rutgers University1 The Invisible Web - finding things that are hard to find - Tefko Saracevic, PhD Rutgers University

Similar presentations


Presentation on theme: "© Tefko Saracevic, Rutgers University1 The Invisible Web - finding things that are hard to find - Tefko Saracevic, PhD Rutgers University"— Presentation transcript:

1

2 © Tefko Saracevic, Rutgers University1 The Invisible Web - finding things that are hard to find - Tefko Saracevic, PhD Rutgers University http://www.scils.rutgers.edu/~tefko ( contains also a list of sites relevant to the topic and this presentation) Tefko Saracevic, PhD Rutgers University http://www.scils.rutgers.edu/~tefko ( contains also a list of sites relevant to the topic and this presentation)

3 © Tefko Saracevic, Rutgers University2 What is “Invisible Web?” Materials that general search engines cannot or WILL not include in their collection of web pages (indexes) You cannot find through general search engines Contains a vast amount of information –much of it authoritative, qualitative –much of it specialized Materials that general search engines cannot or WILL not include in their collection of web pages (indexes) You cannot find through general search engines Contains a vast amount of information –much of it authoritative, qualitative –much of it specialized

4 © Tefko Saracevic, Rutgers University3 Why search engines miss? Size: Web is huge, cannot cover all Economics: associated costs are high –also pay per crawl & rank Technical: still limited capabilities Spam: eliminating bad also looses good Restrictions: some site do not let in Deep structure: some sites complex Size: Web is huge, cannot cover all Economics: associated costs are high –also pay per crawl & rank Technical: still limited capabilities Spam: eliminating bad also looses good Restrictions: some site do not let in Deep structure: some sites complex

5 © Tefko Saracevic, Rutgers University4 Web size - who knows? Web Characterization Project - OCLCWeb Characterization Project –provides statistics about the web – 1998: 2.8, 2002: 9.04 mill web sites (IP address) In 2002: 35% public, 29% private, 36% provisional sites –Public sites (2002): 55% US, 7% German, 6% Japanese, 3% each French, Spanish, 2% each Italian, Dutch, Chinese,1% each Korean, Russian, Polish, Portuguese –Adult sites (2002): 3.3% –IP address volatility - all sites (disappearance pattern): 13% of sites in 2002 were also in 1998; 51% in 2001 Web Characterization Project - OCLCWeb Characterization Project –provides statistics about the web – 1998: 2.8, 2002: 9.04 mill web sites (IP address) In 2002: 35% public, 29% private, 36% provisional sites –Public sites (2002): 55% US, 7% German, 6% Japanese, 3% each French, Spanish, 2% each Italian, Dutch, Chinese,1% each Korean, Russian, Polish, Portuguese –Adult sites (2002): 3.3% –IP address volatility - all sites (disappearance pattern): 13% of sites in 2002 were also in 1998; 51% in 2001

6 © Tefko Saracevic, Rutgers University5 How search engines work? Crawlers, spiders: go out to find –new & changed sites; periodic, not for each query Databases, caches: –gather content; could be submitted, bought Indexing: creating appropriate entries –various, mostly proprietary algorithms Retrieval engine: searching on basis of query Interface: gathers query, displays results – could be ordered by pay Crawlers, spiders: go out to find –new & changed sites; periodic, not for each query Databases, caches: –gather content; could be submitted, bought Indexing: creating appropriate entries –various, mostly proprietary algorithms Retrieval engine: searching on basis of query Interface: gathers query, displays results – could be ordered by pay

7 © Tefko Saracevic, Rutgers University6 Search engines differ Substantial differences among search engines on each aspect Information about search engines:  Search Engine Watch Search Engine Watch  ratings, news, statistics, charts  Search Engine Showdown Search Engine Showdown  run by a librarian, news links, ratings  Extreme Searcher Extreme Searcher  update of a popular book Substantial differences among search engines on each aspect Information about search engines:  Search Engine Watch Search Engine Watch  ratings, news, statistics, charts  Search Engine Showdown Search Engine Showdown  run by a librarian, news links, ratings  Extreme Searcher Extreme Searcher  update of a popular book

8 © Tefko Saracevic, Rutgers University7 Search engine coverage No engine covers more than 16% of WWW Hard to discern & compare coverage Many national search engines - own coverage Many topical search engines – own coverage Many comprehensive sources independent of search engines No engine covers more than 16% of WWW Hard to discern & compare coverage Many national search engines - own coverage Many topical search engines – own coverage Many comprehensive sources independent of search engines

9 © Tefko Saracevic, Rutgers University8 Specialized sources Meta search engines Specialized engines & catalogs Domain (subject) engines & catalogs Reference sources Libraries as web sources Virtual libraries Subject databases Societies, organizations Meta search engines Specialized engines & catalogs Domain (subject) engines & catalogs Reference sources Libraries as web sources Virtual libraries Subject databases Societies, organizations

10 © Tefko Saracevic, Rutgers University9 Meta search engines Search engines that cover search engines  Search Engine Colossus Search Engine Colossus  international meta engine  Dogpile Dogpile  results from a number of search engines  Surfwax -gives statistics and text sources Surfwax  Search Engine Guide Search Engine Guide  categorized by topic; other engine information Search engines that cover search engines  Search Engine Colossus Search Engine Colossus  international meta engine  Dogpile Dogpile  results from a number of search engines  Surfwax -gives statistics and text sources Surfwax  Search Engine Guide Search Engine Guide  categorized by topic; other engine information

11 © Tefko Saracevic, Rutgers University10 meta engines … (cont.)  Vivisimo Vivisimo  clusters results; innovative  Complete Planet Complete Planet  over 100,000 databases & s engines  Webbrain Webbrain  results in tree structure – fun to use  Vivisimo Vivisimo  clusters results; innovative  Complete Planet Complete Planet  over 100,000 databases & s engines  Webbrain Webbrain  results in tree structure – fun to use

12 © Tefko Saracevic, Rutgers University11 Domain engines & catalogs Cover general & specific areas  Open Directory Project – large edited catalog of the web – global, run by volunteers Open Directory Project  BUBL LINK -selected Internet resources covering all academic subject areas – UK BUBL LINK  Profusion – search in categories Profusion Cover general & specific areas  Open Directory Project – large edited catalog of the web – global, run by volunteers Open Directory Project  BUBL LINK -selected Internet resources covering all academic subject areas – UK BUBL LINK  Profusion – search in categories Profusion

13 © Tefko Saracevic, Rutgers University12 domain engines … Exist in many domains & subjects – rich!  Psychcrawler Amer Psychological Association Psychcrawler  web index for psychology  Entrez PubMed – Nat Library of Medicine Entrez PubMed  CiteSeer - NEC Research Center CiteSeer  scientific literature, citations index - free  Think Quest – an international organization Think Quest  education resources, programs Exist in many domains & subjects – rich!  Psychcrawler Amer Psychological Association Psychcrawler  web index for psychology  Entrez PubMed – Nat Library of Medicine Entrez PubMed  CiteSeer - NEC Research Center CiteSeer  scientific literature, citations index - free  Think Quest – an international organization Think Quest  education resources, programs

14 © Tefko Saracevic, Rutgers University13 domain engines …  KIRKE - Katalog der Internetressourcen für die Klassische Philologie aus Erlangen KIRKE  a variety of resources  Perseus Digital Library Tufts University Perseus Digital Library  covers antiquity to renaissance  Sch of Slavonic & East European Studies, University College London Sch of Slavonic & East European Studies  includes country resources, e.g. Croatia  U Mich Document Center U Mich Document Center  official documents from all over the world  KIRKE - Katalog der Internetressourcen für die Klassische Philologie aus Erlangen KIRKE  a variety of resources  Perseus Digital Library Tufts University Perseus Digital Library  covers antiquity to renaissance  Sch of Slavonic & East European Studies, University College London Sch of Slavonic & East European Studies  includes country resources, e.g. Croatia  U Mich Document Center U Mich Document Center  official documents from all over the world

15 © Tefko Saracevic, Rutgers University14 Reference services Reference services - several models –Q&A, directories, email answers etc.  Ask Jeeves! Ask Jeeves!  most popular, commercial  Information Please Information Please  almanac type questions Reference services - several models –Q&A, directories, email answers etc.  Ask Jeeves! Ask Jeeves!  most popular, commercial  Information Please Information Please  almanac type questions

16 © Tefko Saracevic, Rutgers University15 reference … Digital reference - new service area for libraries  QuestionPoint L of Congress & OCLC QuestionPoint  project for a global reference network  Virtual Reference Desk – L of Congress Virtual Reference Desk  compilation of web reference sites  LiveRef - maintained at Iowa State U LiveRef  a registry of real time digital reference services Digital reference - new service area for libraries  QuestionPoint L of Congress & OCLC QuestionPoint  project for a global reference network  Virtual Reference Desk – L of Congress Virtual Reference Desk  compilation of web reference sites  LiveRef - maintained at Iowa State U LiveRef  a registry of real time digital reference services

17 © Tefko Saracevic, Rutgers University16 Libraries as web sources Academic libraries providing open collections & services; models vary  Rutgers libraries - big long term effort Rutgers libraries  University of California, Berkeley University of California, Berkeley  a most elaborate effort together with Sun Corporation  Bibliothèque Nationale de France Bibliothèque Nationale de France  includes virtual exhibitions, among others Academic libraries providing open collections & services; models vary  Rutgers libraries - big long term effort Rutgers libraries  University of California, Berkeley University of California, Berkeley  a most elaborate effort together with Sun Corporation  Bibliothèque Nationale de France Bibliothèque Nationale de France  includes virtual exhibitions, among others

18 © Tefko Saracevic, Rutgers University17 Virtual libraries on the Web Libraries emerging only on the Web  Virtual Library –Virtual Library  Switzerland, US, UK & other countries – ‘oldest virtual library on the Web’  Internet Public Library Michigan Internet Public Library  also a long term effort  Librarians Index of the Internet Librarians Index of the Internet  very popular and comprehensive Libraries emerging only on the Web  Virtual Library –Virtual Library  Switzerland, US, UK & other countries – ‘oldest virtual library on the Web’  Internet Public Library Michigan Internet Public Library  also a long term effort  Librarians Index of the Internet Librarians Index of the Internet  very popular and comprehensive

19 © Tefko Saracevic, Rutgers University18 virtual libraries …  Academic Info Digital Library Academic Info Digital Library  many links to digital collections & resources in various subjects  Gabriel Gabriel  Gateway to European National Libraries  Museum of online museums Museum of online museums  a delight  Academic Info Digital Library Academic Info Digital Library  many links to digital collections & resources in various subjects  Gabriel Gabriel  Gateway to European National Libraries  Museum of online museums Museum of online museums  a delight

20 © Tefko Saracevic, Rutgers University19 Subjects databases Many subject specific sites –rich & often unique coverage & services – different approaches & requirements Examples in health related domains:  WebMDHealth – news, medical information WebMDHealth  Rxlist - The Internet Drug Index Rxlist  Mayo Clinic HealthOasis – health advice Mayo Clinic HealthOasis Many subject specific sites –rich & often unique coverage & services – different approaches & requirements Examples in health related domains:  WebMDHealth – news, medical information WebMDHealth  Rxlist - The Internet Drug Index Rxlist  Mayo Clinic HealthOasis – health advice Mayo Clinic HealthOasis

21 © Tefko Saracevic, Rutgers University20 Societies, organizations Great many rich sources for searching –differences in requirements, depth, richness Examples from variety of organizations:  Assoc. for Computing Machinery Assoc. for Computing Machinery  Digital Library; subscription or registration  US State Department US State Department  about the U.S & other countries  Genealogy – Church of Later Day Saints Genealogy  most comprehensive historical list of records Great many rich sources for searching –differences in requirements, depth, richness Examples from variety of organizations:  Assoc. for Computing Machinery Assoc. for Computing Machinery  Digital Library; subscription or registration  US State Department US State Department  about the U.S & other countries  Genealogy – Church of Later Day Saints Genealogy  most comprehensive historical list of records

22 © Tefko Saracevic, Rutgers University21 Language barriers on the Web English still the major language – but declining, now slightly over 50% Multilingual retrieval search engines  Euroseek Euroseek  searches in a number of languages  All the Web All the Web  results in 45 languages English still the major language – but declining, now slightly over 50% Multilingual retrieval search engines  Euroseek Euroseek  searches in a number of languages  All the Web All the Web  results in 45 languages

23 © Tefko Saracevic, Rutgers University22 Language barriers: translations A number of translation sites –machine aided – i.e. plug in terms, phrases, sentences in one & review in the other language, but effectiveness???  Free Translations Free Translations  from to English, & 8 other languages  Babel Fish Babel Fish  from to English and 9 languages, translates URLs  Travlang Travlang  great for travelers, but annoying commercials A number of translation sites –machine aided – i.e. plug in terms, phrases, sentences in one & review in the other language, but effectiveness???  Free Translations Free Translations  from to English, & 8 other languages  Babel Fish Babel Fish  from to English and 9 languages, translates URLs  Travlang Travlang  great for travelers, but annoying commercials

24 © Tefko Saracevic, Rutgers University23 Web news; keeping up What is going on on the Web? Some major sources of news and evaluations:  Free Pint – newsletter, articles, links Free Pint  Internet Resources Newsletter – UK based Internet Resources Newsletter  ResearchBuzz – daily updates; many aspects ResearchBuzz  About.com Web Search – tools, Web Search Forum About.com Web Search  Resource Shelf – newsletter with archive Resource Shelf What is going on on the Web? Some major sources of news and evaluations:  Free Pint – newsletter, articles, links Free Pint  Internet Resources Newsletter – UK based Internet Resources Newsletter  ResearchBuzz – daily updates; many aspects ResearchBuzz  About.com Web Search – tools, Web Search Forum About.com Web Search  Resource Shelf – newsletter with archive Resource Shelf

25 © Tefko Saracevic, Rutgers University24 keeping up … Information Today –trade & professional monthly newspaper & web site –industry news –searcher columns –general analyses of trends Information Today –trade & professional monthly newspaper & web site –industry news –searcher columns –general analyses of trends

26 © Tefko Saracevic, Rutgers University25 Evaluations, ratings Many sources evaluate web sites:  The Scout Report – The Scout Report  librarians’ BIBLE! Annotations. Comprehensive.  Medical Library Assoc. – ten most useful sites; Medical Library Assoc.  MLA user guide for health inf., recommendations MLA user guide  Web 100 – commercial, user ratings, news Web 100  Evaluating web pages UC Berkeley Evaluating web pages –tutorial and guide Many sources evaluate web sites:  The Scout Report – The Scout Report  librarians’ BIBLE! Annotations. Comprehensive.  Medical Library Assoc. – ten most useful sites; Medical Library Assoc.  MLA user guide for health inf., recommendations MLA user guide  Web 100 – commercial, user ratings, news Web 100  Evaluating web pages UC Berkeley Evaluating web pages –tutorial and guide

27 © Tefko Saracevic, Rutgers University26 Archiving the web Internet Archive – a large undertakingInternet Archive –includes web archive & lots more publicly available & free –10 billion web pages archived from 1996 to a few months ago –Wayback Machine – search to look at old versions of web pages But there is more. e.g.: –Million Book ProjectMillion Book Project –International Children’s Digital LibraryInternational Children’s Digital Library Internet Archive – a large undertakingInternet Archive –includes web archive & lots more publicly available & free –10 billion web pages archived from 1996 to a few months ago –Wayback Machine – search to look at old versions of web pages But there is more. e.g.: –Million Book ProjectMillion Book Project –International Children’s Digital LibraryInternational Children’s Digital Library

28 © Tefko Saracevic, Rutgers University27 Needed for Web searching Knowledge & competencies on –variety of web sources & their organization –search engines –web search strategies –search dynamics, feedback Keeping up & up & up –constant updates, changes, innovations –many domain/subject specific Knowledge & competencies on –variety of web sources & their organization –search engines –web search strategies –search dynamics, feedback Keeping up & up & up –constant updates, changes, innovations –many domain/subject specific

29 © Tefko Saracevic, Rutgers University28 Needed for Web searching by professionals Knowledge of SOURCES in area of interest search engines not enough not too helpful in finding these other sources; structure hard to discern Evaluation of sources –a key professional skill! standard criteria & Web criteria: authority; accuracy; currency (timeliness); objectivity; coverage, persistence, usability Knowledge of SOURCES in area of interest search engines not enough not too helpful in finding these other sources; structure hard to discern Evaluation of sources –a key professional skill! standard criteria & Web criteria: authority; accuracy; currency (timeliness); objectivity; coverage, persistence, usability

30 © Tefko Saracevic, Rutgers University29 Needed competencies … Knowledge of users & use Knowledge of searching Use of technology Adaptability, flexibility Integration with other resources Teaching others Constant learning & update –keeping up, keeping up, keeping up Knowledge of users & use Knowledge of searching Use of technology Adaptability, flexibility Integration with other resources Teaching others Constant learning & update –keeping up, keeping up, keeping up

31 © Tefko Saracevic, Rutgers University30 information WWW But now really: How to do it?

32 © Tefko Saracevic, Rutgers University31

33 © Tefko Saracevic, Rutgers University32

34 © Tefko Saracevic, Rutgers University33 P.S. a few weird sites… SelectSmart.com – all kinds of quizzes for you James Dean official web site Deaducated –Dead Librarians’ Society Livejournal –blogs & authoring tools SelectSmart.com – all kinds of quizzes for you James Dean official web site Deaducated –Dead Librarians’ Society Livejournal –blogs & authoring tools

35 © Tefko Saracevic, Rutgers University34 Sources About.com Web Search http://websearch.about.com Academic Info Digital Library http://www.academicinfo.net/digital.html All the Web http://www.alltheweb.com/ Ask Jeeves! http://www.ask.com/ Assoc. for Computing Machinery http://www.acm.org/ Babelfish http://babelfish.altavista.com/tr Bibliothèque Nationale de France http://www.bnf.fr/ BUBL LINK http://bubl.ac.uk/link/ CDNET Search.com http://www.search.com/ CiteSeer http://citeseer.nj.nec.com/ CompletePlanet http://completeplanet.com Deaducated http://www.geocities.com/deadlibrarians/ Dogpile http://www.dogpile.com/ Entrez PubMed http://www.ncbi.nlm.nih.gov/PubMed/ Extreme Searcher http://www.extremesearcher.com/ Free Pint http://www.freepint.com/ About.com Web Search http://websearch.about.com Academic Info Digital Library http://www.academicinfo.net/digital.html All the Web http://www.alltheweb.com/ Ask Jeeves! http://www.ask.com/ Assoc. for Computing Machinery http://www.acm.org/ Babelfish http://babelfish.altavista.com/tr Bibliothèque Nationale de France http://www.bnf.fr/ BUBL LINK http://bubl.ac.uk/link/ CDNET Search.com http://www.search.com/ CiteSeer http://citeseer.nj.nec.com/ CompletePlanet http://completeplanet.com Deaducated http://www.geocities.com/deadlibrarians/ Dogpile http://www.dogpile.com/ Entrez PubMed http://www.ncbi.nlm.nih.gov/PubMed/ Extreme Searcher http://www.extremesearcher.com/ Free Pint http://www.freepint.com/

36 © Tefko Saracevic, Rutgers University35 sources … Free Translations http://www.freetranslations.com Gabriel http://www.kb.nl/gabriel/ Genealogy http://www.familysearch.org/ Information Please http://www.infoplease.com// International Children’s Digital Library http://www.icdlbooks.org/ Internet Archive http://www.archive.org/ Internet Public Library, Michigan http://www.ipl.org/ Internet Resources Newsletter. http://www.hw.ac.uk/libwww/irn/ James Dean http://www.jamesdean.com/ KIRKE http://www.phil.uni-erlangen.de/~p2latein/ressourc/ressourc.html Librarians Index to the Internet http://lii.org/ Live Journal http://www.livejournal.com/ LiveRef http://www.public.iastate.edu/~CYBERSTACKS/LiveRef.htm Mayo Clinic http://www.mayohealth.org/ Free Translations http://www.freetranslations.com Gabriel http://www.kb.nl/gabriel/ Genealogy http://www.familysearch.org/ Information Please http://www.infoplease.com// International Children’s Digital Library http://www.icdlbooks.org/ Internet Archive http://www.archive.org/ Internet Public Library, Michigan http://www.ipl.org/ Internet Resources Newsletter. http://www.hw.ac.uk/libwww/irn/ James Dean http://www.jamesdean.com/ KIRKE http://www.phil.uni-erlangen.de/~p2latein/ressourc/ressourc.html Librarians Index to the Internet http://lii.org/ Live Journal http://www.livejournal.com/ LiveRef http://www.public.iastate.edu/~CYBERSTACKS/LiveRef.htm Mayo Clinic http://www.mayohealth.org/

37 © Tefko Saracevic, Rutgers University36 sources … Medical Library Assoc. ten top sites http://www.mlanet.org/resources/medspeak/topten.html Medical Library Assoc. user guide for health inf. http://www.mlanet.org/resources/userguide.html Medscape http://www.medscape.com/ Million Book Project http://www.archive.org/texts/collection.php?collection=millionbooks Museum of online museums. http://www.coudal.com/moom.php OCLC Web Characterization Project http://wcp.oclc.org/ Open Directory Project http://dmoz.org Perseus Digital Library http://www.perseus.tufts.edu/ Profusion http://www.profusion.com/ Psychcrawler http://www.psychcrawler.com/ QuestionPoint http://www.questionpoint.org/ ResearchBuzz. http://www.researchbuzz.com/index.shtml Resource Shelf http://resourceshelf.blogspot.com/ Rutgers Libraries http://www.libraries.rutgers.edu/ RxList http://www.rxlist.com/ Medical Library Assoc. ten top sites http://www.mlanet.org/resources/medspeak/topten.html Medical Library Assoc. user guide for health inf. http://www.mlanet.org/resources/userguide.html Medscape http://www.medscape.com/ Million Book Project http://www.archive.org/texts/collection.php?collection=millionbooks Museum of online museums. http://www.coudal.com/moom.php OCLC Web Characterization Project http://wcp.oclc.org/ Open Directory Project http://dmoz.org Perseus Digital Library http://www.perseus.tufts.edu/ Profusion http://www.profusion.com/ Psychcrawler http://www.psychcrawler.com/ QuestionPoint http://www.questionpoint.org/ ResearchBuzz. http://www.researchbuzz.com/index.shtml Resource Shelf http://resourceshelf.blogspot.com/ Rutgers Libraries http://www.libraries.rutgers.edu/ RxList http://www.rxlist.com/

38 © Tefko Saracevic, Rutgers University37 sources … Sch of East Eur & Slavonic Studies http://www.ssees.ac.uk/dirctory.htm Search Engine Colossus http://www.searchenginecolossus.com/ Search Engine Guide http://www.searchengineguide.com/ Search Engine Showdown http://searchengineshowdown.com/ Search Engine Watch http://searchenginewatch.com/ Select Smart.com http://www.selectsmart.com/home.html Surfwax http://www.surfwax.com/ The Scout Report. http://scout.cs.wisc.edu/ Think Quest http://www.thinkquest.org/ Travlang http://www.travlang.com U California Berkeley http://sunsite.berkeley.edu/ U Mich Documents Center http://www.lib.umich.edu/govdocs/ US State department http://www.state.gov/ Virtual Library http://vlib.org Virtual Reference Desk http://www.loc.gov/rr/askalib/virtualref.html Vivisimo http://vivisimo.com Web 100 http://www.web100.com Webbrain http://www.webbrain.com/html/default_win.html WebMD http://my.webmd.com/webmd_today/home/default Sch of East Eur & Slavonic Studies http://www.ssees.ac.uk/dirctory.htm Search Engine Colossus http://www.searchenginecolossus.com/ Search Engine Guide http://www.searchengineguide.com/ Search Engine Showdown http://searchengineshowdown.com/ Search Engine Watch http://searchenginewatch.com/ Select Smart.com http://www.selectsmart.com/home.html Surfwax http://www.surfwax.com/ The Scout Report. http://scout.cs.wisc.edu/ Think Quest http://www.thinkquest.org/ Travlang http://www.travlang.com U California Berkeley http://sunsite.berkeley.edu/ U Mich Documents Center http://www.lib.umich.edu/govdocs/ US State department http://www.state.gov/ Virtual Library http://vlib.org Virtual Reference Desk http://www.loc.gov/rr/askalib/virtualref.html Vivisimo http://vivisimo.com Web 100 http://www.web100.com Webbrain http://www.webbrain.com/html/default_win.html WebMD http://my.webmd.com/webmd_today/home/default


Download ppt "© Tefko Saracevic, Rutgers University1 The Invisible Web - finding things that are hard to find - Tefko Saracevic, PhD Rutgers University"

Similar presentations


Ads by Google