Download presentation
Presentation is loading. Please wait.
2
Principles of Searching Tefko Saracevic1 Web searching & the invisible Web Finding things that are hard to find tefkos@rutgers.edutefkos@rutgers.edu; http://comminfo.rutgers.edu/~tefko/http://comminfo.rutgers.edu/~tefko/
3
Principles of Searching Central ideas Web has great many, even indispensable, information resource for searching –many of these are hard to find, forming the “invisible Web” A variety of Web resources have to be accessed directly & not through common search engines – as explored here Web has great many, even indispensable, information resource for searching –many of these are hard to find, forming the “invisible Web” A variety of Web resources have to be accessed directly & not through common search engines – as explored here Tefko Saracevic2
4
Principles of Searching ToC 1.Invisible Web 2.Invisible Web searching 3.General sources 4.Domain sources 5.Reference sources and services 6.Digital libraries 1.Invisible Web 2.Invisible Web searching 3.General sources 4.Domain sources 5.Reference sources and services 6.Digital libraries Tefko Saracevic3
5
Principles of Searching Definition, characteristics, reasons 1. Invisible Web Tefko Saracevic4
6
Principles of Searching A few definitions World wide web : Internet-connected files the very large set of linked documents and other files located on computers connected through the Internet and used to access, manipulate, and download data and programs Invisible Encarta Encarta hidden from view; not readily noticed or detected Invisible web (also deep Web, hidden Web as opposed to visible Web or surface Web) "Invisible Web" is the term used to describe all the information available on the World Wide Web that is not found by using general- purpose search engines. Invisible Web Tefko Saracevic5
7
Principles of Searching Tefko Saracevic6 What is “Invisible web?” Materials that general search engines cannot or will not include in their collection of web pages You cannot find through general search engines Contains a vast amount of information resources much of it authoritative & higher quality than visible web quality becomes a main issue much of it specialized a lot of it also fluid or streaming or real time “You can’t step in the same river twice” much of it free Many times larger than the visible Web
8
Principles of Searching Size & characteristic of invisible web ( a lot from CUNY library)CUNY library Even the best search engines can access only about 16% of the available information on WWW therefore 84% of the information is excluded = Invisible Web put another way, the size of the Invisible Web is 500 times larger than the Surface Web 95% of the Invisible Web is publicly accessible information More than half of the Invisible Web resides in topic specific databases Tefko Saracevic7 Thus, a lot of professional Web searching concentrates on the invisible Web. So will we in this lecture
9
Principles of Searching in other words… “infobesity” refers to the belief that searching Google for information provides a junk information diet not concerned about the quality coined by James Morris, the Dean of the School of Computer Science at Carnegie Mellon University Brophy & Biden, (2005) Tefko Saracevic8 There is much more to the Web than or
10
Principles of Searching Tefko Saracevic9 Why search engines do not cover all? Size: web is huge, cannot cover all Economics: associated costs are high engines support themselves mostly by ads some engines have rank per pay & crawl update per pay - providing paid listings first & mostly Technical: still a challenge & limited capabilities also some file formats hard to cover Spam: eliminating bad also looses good Restrictions: some site do not let in e.g. login required Deep structure: some sites complex
11
Principles of Searching Not found … Tefko Saracevic10 From: Characteristics of Invisible Web contentCharacteristics of Invisible Web content
12
Principles of Searching Tefko Saracevic11 Search engine coverage Hard (impossible) to discern & compare coverage Many national search engines have own coverage, orientation, governance Many topical or domain search engines have own coverage geared to subject of interest But: there are many comprehensive sources independent of search engines some are search engines in specialized domains others are compilations of evaluated web sources
13
Principles of Searching Tefko Saracevic12 Search engines differ Substantial differences among search engines on coverage hard to impossible to discern Substantial difference how they work relatively easy to find out e.g. from Search Engine Showdown Search Engine Showdown “The users’ guide to web searching” - run by a librarian, news links, ratings
14
Principles of Searching Be aware: search engines are not only about search Yes, search is (still) their core, but they are in many other businesses built upon search & these affect what & how of searching for us they are corporations, commercial entities have to make money, mostly by ads & placements but provide many other services selling, licensing software email, messenger add-on utilities – desktop search functions, toolbars … most of the additional stuff is provided free but there is no such thing as free lunch - it is about how search engines can get us to continue to use their service Tefko Saracevic13
15
Principles of Searching A few pointers 2. Invisible web searching Tefko Saracevic14
16
Principles of Searching Invisible web searching: Basic approach User The first step in determining the best approach for searching the invisible web is to have a clear idea of what you’re seeking extensive user modeling Resources Limit your search to appropriate resources & tools for the particular type of information you’re looking for know your sources know how to find appropriate sources shades of “Knowledge is of two kinds…”Knowledge is of two kinds Tefko Saracevic15
17
Principles of Searching Advanced searching on the Web – applies to searching of the invisible Web as well Needs to be adapted to differences coverage not specified; vastly different from one source, engine to another no controlled vocabulary output ranked by unknown methods & criteria for “relevance” building blocks may be indicated by “similar pages” or “more from this site” or some such some provide clusters to narrow searches features, capabilities, specifics differ Tefko Saracevic16
18
Principles of Searching Tefko Saracevic17 Specialized sources - particularly for the invisible web Large scholarly search engines & directories Domain sources, databases Reference sources Libraries as web sources Virtual libraries
19
Principles of Searching A few tips for Web searching, including the invisible kind Advanced web searc Univ. of California, Berkeley Advanced web searc Four NETS for better searching Bernie Dodge Four NETS for better searching Web search tutorial Searchenginez Web search tutorial Finding information: search engines Phil Bradley Finding information: search engines Google Guide Nancy Blachman Google Guide Tefko Saracevic18
20
Principles of Searching A selection of a few (of great many) sources for invisible web 3. General sources Tefko Saracevic19
21
Principles of Searching Characteristics Many oriented toward scholarly, research & professional, technical & related information include sources mostly not covered by general search engines majority of these are trustworthy quality much higher, some carefully selected, some edited origins vary widely from commercial to voluntary to government sponsored Popular in many disciplines Tefko Saracevic20
22
Principles of Searching Large scholarly search engines & directories - sample Infomine - a comprehensive virtual library and reference tool for academic and scholarly Internet resources, including Web sites, databases Infomine covers a wide range of scholarly resources by fields Scirus – “it allows researchers to search for not only journal content but also scientists' homepages, courseware, pre-print server material, patents and institutional repository and website information. “ Scirus by Elsevier, run in conjunction with Scopus and Science Direct, but this one free Google Scholar “Stand on the shoulders of giants ” (but Newton and John of Salisbury said it better) Google Scholar Newton and John of Salisbury searches for scholarly articles & resources, but sources not disclosed (no idea on what it covers ) Tefko Saracevic21
23
Principles of Searching Large edited sites Open Directory Project large edited catalog of the web – global, run by volunteers BUBL LINK selected Internet resources covering all academic subject areas; organized by Dewey Decimal System – from UK Tefko Saracevic22
24
Principles of Searching Science, scholarship engines, not free – a sample In addition to freely accessible engines many provide search free but access to full text paid by subscription or per item RUL provides access to these & many more General ScienceDirect Elsevier: “world's largest electronic collection of science, technology and medicine full text and bibliographic information” [available at RUL] In a specific domain ACM Portal Asoc. for Computing Machinery: access to ACM Digital Library & Guide to Computing [available at RUL] Tefko Saracevic23
25
Principles of Searching Hardly a field without it 4. Domain sources Tefko Saracevic24
26
Principles of Searching Domain engines Cover specific subjects & topics from sciences, arts, humanities, to various media & interests – you name it Important tool for subject searches particularly for subject specialist valued by professional searchers Selection mostly hand-picked rather than by crawlers, following inclusion criteria often not readily discernable but content more trustworthy Usually well organized Tefko Saracevic25
27
Principles of Searching in health & related fields … PubMed – Nat Library of Medicine PubMed biomedical literature from MEDLINE & health journals Psychcrawler - Amer. Psychological Association Psychcrawler web index for psychology WebMDHealth WebMDHealth news, medical information Rxlist Rxlist The Internet Drug Index Mayo Clinic HealthOasis Mayo Clinic HealthOasis health advice Kidshealth sites for parents, kids, teen Tefko Saracevic26
28
Principles of Searching in science … Ocean Planet Ocean Planet NASA presentation of earth & its vast oceans ArXiv ArXiv Cornell U, National Science Foundation e-print service in the fields of physics, mathematics, computer science, and quantitative biology large, non-reviewed contribution by authors, comments later Athena Earth Sciences Resources not a search engine but a large well organized directory Tefko Saracevic27
29
Principles of Searching in education … Intute “Intute is a free online service providing you with a database of hand selected Web resources for education and research.” Think Quest Think Quest – Oracle Education Foundation education resources, programs; web sites created by students Resource Discovery Network Resource Discovery Network – UK “ UK's free national gateway to Internet resources for the learning, teaching and research community” Tefko Saracevic28
30
Principles of Searching in images, movies, video … Internet Movie Database treasure trove of movies Picsearch picture searching Blinkx claims to be word largest search engine for videos; it has indexed over 32 million hours worth of video footage, made searchable by automatically transcribing the speech content. Moving Images Collections “ MIC documents moving image collections around the world.” Part particularly oriented toward science educators. Now at Library of Congress, but developed at Rutgers. Tefko Saracevic29
31
Principles of Searching in humanities … Shakespeare & Internet Search Tools & Resources great fun to navigate KIRKE - Katalog der Internetressourcen für die Klassische Philologie aus Erlangen KIRKE German; a variety of resources for classics Perseus Digital Library Tufts University Perseus Digital Library covers antiquity to renaissance; one of the best subject sites on the web; affected the whole field Sch of Slavonic & East European Studies, University College London Sch of Slavonic & East European Studies includes country resources, e.g. Croatia Diotima Materials for study of women and gender in the Ancient World Tefko Saracevic30
32
Principles of Searching in music … Musipedia Not everything is text. This is “a searchable, editable, and expandable collection of tunes, melodies, and musical themes.” Great fun! All Music Guide resource about musicians, albums, and songs Tefko Saracevic31
33
Principles of Searching governments … U Mich Document Center U Mich Document Center official documents from all over the world US government official web portal “Whatever you want or need from the U.S. government” US government official web portal US State Department US State Department about the U.S & other countries FirstGov the US government official web portal Tefko Saracevic32
34
Principles of Searching Tefko Saracevic33 Evaluations, ratings Evaluating web sites: a prime responsibility of searchers & all information professionals Many sources evaluate web sites: The Scout Report – The Scout Report librarians’ BIBLE! Annotations. Comprehensive. Medical Library Association Medical Library Association ten most useful sites for consumer health MLA user guide MLA user guide for finding & evaluating health information on the web Web 100 Web 100 commercial, user ranking & evaluation of web sites Evaluating web pages UC Berkeley Evaluating web pages tutorial and guide
35
Principles of Searching Tefko Saracevic34 also a domain resource And, of course … Snoopy The Official Peanuts Website
36
Principles of Searching a few sources & services 5. References Tefko Saracevic35
37
Principles of Searching Reference trends Transactions Live reference transactions in libraries falling off dramatically But new reference modes emerging chat, ask a librarian cooperative reference among group of libraries Commercial reference growing strong Tools Most, if not even all reference tools migrated to digital general & in many domains Many free online access Others licensed to libraries End users oriented but still important source for searchers Tefko Saracevic36
38
Principles of Searching Reference tools – open access Wikipedia web encyclopedia in many languages; user generated; very popular; but uneven & entries at times manipulated Stanford Encyclopedia of Philosophy a comprehensive encyclopedia; authoritative - maintained and kept up to date by experts in the field Bartleby.comBartleby.com “Great books online” dozen of reference books; Harvard classics; English usage; and more. Amazing, invaluable collection Tefko Saracevic37
39
Principles of Searching At RUL 100s of digital reference sources in all domains Tefko Saracevic38
40
Principles of Searching Tefko Saracevic39 Reference services Reference services - several models Commercial – relatively new & successful Ask (originally known as Ask Jeeves) Ask most popular, commercial Information Please Information Please almanac type questions RefDesk access to a number of reference tools ChaCha - new service ChaCha direct answers to questions routed to any device. “ ChaCha’s advanced technology instantly routes it to the most knowledgeable person on that topic in our Guide community” – to any device
41
Principles of Searching Tefko Saracevic40 Cooperative reference & real time reference Digital reference - new service area for libraries QuestionPoint L of Congress & OCLC QuestionPoint project for a global 24/7 reference network Virtual Reference Desk – L of Congress Virtual Reference Desk large compilation of web reference sites LiveRef - maintained at Iowa State U LiveRef a registry of real time digital reference services
42
Principles of Searching Digital libraries, virtual libraries, museums, good old books 5. Libraries & other institutions as web sources Tefko Saracevic41
43
Principles of Searching Tefko Saracevic42 Libraries as web sources Academic, national libraries providing open collections & services; models vary Rutgers libraries - big long term effort Rutgers libraries University of California, Berkeley University of California, Berkeley a most elaborate effort together with Sun Corporation LibWebLibWeb by Webjunction, formerly at U California, Berkeley “ lists currently over 7900 Web pages from libraries in 146 countries” Bibliothèque Nationale de France Bibliothèque Nationale de France includes virtual exhibitions, among others
44
Principles of Searching Tefko Saracevic43 Virtual libraries Libraries “living” only on the Web Virtual Library –Virtual Library Switzerland, US, UK & other countries – ‘oldest virtual library on the Web’ Internet Public Library Drexel formerly at U of Michigan Internet Public Library “the first public library of and for the Internet community” Librarians Index of the Internet Drexel Librarians Index of the Internet very popular and comprehensive - directory Digital librarian Digital librarian maintained by Margaret Vail Anderson, a librarian in Cortland, New York “a librarian's choice of the best of the Web “ – large directory
45
Principles of Searching Tefko Saracevic44 Museums, societies… Growing number of resources in museums & variety of societies – rich resource for searching Museum of online museums Museum of online museums a delight MuseumStuff.com “We have 1000's of museums, zoos, historical societies and related organizations in our database” The State Hermitage Museum One of the greatest museums in the world, and one of the best museum site – developed with IBM help National Museum of Science and Technology Leonardo da Vinci Guess where those pictures came from? A delight!
46
Principles of Searching Tefko Saracevic45 Archiving, books on the web Internet Archive – a large undertaking Internet Archive includes web archive & lots more publicly available & free 10 billion web pages archived from 1996 to a few months ago Wayback Machine – search to look at old versions of web pages Wayback Machine
47
Principles of Searching Digital books on the Web a sample of large projects Books on the web – searchable Million Book Project digitizing books and providing free access International Children’s Digital Library online children books Digital books Index “ "Meta-index" for most major eBook sites, along with thousands of smaller specialized sites. ” Google Book Search large digitization effort; many large libraries cooperate; agreement reached with publishers; connected with Worldcat Tefko Saracevic46
48
Principles of Searching Tefko Saracevic47 Needed for Web searching Knowledge & competencies on variety of web sources & their organization search engines web search strategies search dynamics, feedback Keeping up & up & up Why? many reasons, such as: constant updates, changes, innovations many domain/subject specific fluidity very high
49
Principles of Searching Tefko Saracevic48 Needed for web searching by professionals Knowledge of SOURCES in area of interest search engines not enough not too helpful in finding these other sources; structure hard to discern find & use specialized sources Evaluation of sources a key professional skill! application of standard criteria & web criteria : authority; accuracy; currency (timeliness); objectivity; coverage, persistence, usability
50
Principles of Searching Tefko Saracevic49 Needed competencies … Knowledge of users & use Knowledge of searching Use of technology Adaptability, flexibility Integration with other resources Teaching others Constant learning & update again: keeping up, keeping up, keeping up and again: keeping up, keeping up, keeping up
51
Principles of Searching Tefko Saracevic50 P.S. a few weird sites… SelectSmart.com all kinds of quizzes for you James Dean official web site Deaducated Dead Librarians’ Society Livejournal blogs & authoring tools; and many pathetic entries
52
Principles of Searching Tefko Saracevic51 information WWW But now really: How to do it?
53
Principles of Searching Tefko Saracevic52
54
Principles of Searching Tefko Saracevic53 Images from the invisible web
55
Principles of Searching Tefko Saracevic54 and of course…
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.