Tefko Saracevic 1 search engines digital libraries

Slides:



Advertisements
Similar presentations
Information Technology People © Tefko Saracevic1 Keeping up & up & up & up & up & up Key to you professional success & even longevity.
Advertisements

Writing Across the Profession Part II Frederic Murray Assistant Professor MLIS, University of British Columbia BA, Political Science, University of Iowa.
Introduction to Online Resources Aeronautics & Astronautics, Mechanical Engineering and Ship Science Michael Whitton November 2011 & February 2012 University.
Information Retrieval in Practice
© Tefko Saracevic, Rutgers University1 digital libraries and human information behavior Tefko Saracevic, Ph.D. School of Communication, Information and.
Best Web Directories and Search Engines Order Out of Chaos on the World Wide Web.
© Tefko Saracevic, Rutgers University1 Services in digital libraries Following functions? Following new capabilities?
© Tefko Saracevic 1 part 1: search engines part 2: digital libraries.
Mastering the Internet, XHTML, and JavaScript Chapter 7 Searching the Internet.
© Tefko Saracevic 1 part 1: search engines part 2: digital libraries.
© Tefko Saracevic, Rutgers University1 digital libraries and human information behavior Tefko Saracevic, Ph.D. School of Communication, Information and.
MUCT 602: ONLINE SOURCES November 5, Part 1: Subscription Sources The BGSU Libraries have a number of subscriptions. These resources are carefully.
© Tefko Saracevic, Rutgers University1 Web sources and library & information services Finding, evaluating and using a variety of Web sources for searching.
What is the Internet? The Internet is a computer network connecting millions of computers all over the world It has no central control - works through.
© Tefko Saracevic1 Types & structures of information resources What is out there for searching and what’s under the hood?
digital libraries internationally projects, applications, research in many countries © Tefko Saracevic Rutgers University
Searching and Researching the World Wide: Emphasis on Christian Websites Developed from the book: Searching and Researching on the Internet and World Wide.
© Tefko Saracevic, Rutgers University1 The Invisible Web Tefko Saracevic, PhD Rutgers University ( contains also a.
What’s new in search? Internet Librarian Oct 29 th 2007.
Overview of Search Engines
What’s The Difference??  Subject Directory  Search Engine  Deep Web Search.
THOMSON SCIENTIFIC Web of Science 7.0 via the Web of Knowledge 3.0 Platform Access to the World’s Most Important Published Research.
IL Step 1: Sources of Information Information Literacy 1.
Databases and Library Catalogs Global Index Medicus/Global Health Library PubMed Source Bibliographic Database: International Health and Disability.
The Confident Researcher: Google Away (Module 2) The Confident Researcher: Google Away 2.
1 Information Literacy Program Module 6 Emalus Campus.
THOMSON SCIENTIFIC Web of Science Using the specialized search and analyze features Jackie Stapleton, librarian Fall 2006.
 Search Tools:  There are many type of search tools that you can use to locate information on the World Wide Web.  Various search tools are developed.
LIS 506 (Fall 2006) LIS 506 Information Technology Week 11: Digital Libraries & Institutional Repositories.
Bio-Medical Information Retrieval from Net By Sukhdev Singh.
Week 9 Search Engines and the Invisible Web. Resource Pages Collections of Links Compiled by “experts” Sometimes annotated Targeted Information for a.
Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009.
NCBI/WHO PubMed/Hinari Course Introduction Session #1, Sept 13, 2005 Session #2, Sept 14, 2005 Internet Concepts and Scientific Literature Resources Ho.
Search Engine By Bhupendra Ratha, Lecturer School of Library and Information Science Devi Ahilya University, Indore
Beyond Search Engines: Advanced Web Searching Subject Directories  Librarians’ Index to the Internet  Infomine Finding Databases on a Subject  The Invisible.
Fourth Edition Discovering the Internet Discovering the Internet Complete Concepts and Techniques, Second Edition Chapter 3 Searching the Web.
What to Know: 9 Essential Things to Know About Web Searching Janet Eke Graduate School of Library and Information Science University of Illinois at Champaign-Urbana.
Click on the tab to find journals by Subjects. From the drop down menu, we will select Parasitology and Parasitic Diseases.
XP New Perspectives on The Internet, Sixth Edition— Comprehensive Tutorial 3 1 Searching the Web Using Search Engines and Directories Effectively Tutorial.
Strategies for Conducting Research on the Internet Angela Carritt User Coordinator, Oxford University Library Services Angela Carritt User Education Coordinator,
Using The Right Tools Information Searching by using the right tools. by Dolores Jordan August 1,2006.
WISER Social Sciences: Finding Quality Information on the Internet Angela Carritt and Penny Schenk Bodleian Law Library.
The World Wide Web: Information Resource. Hock, Randolph. The Extreme Searcher’s Internet Handbook. 2 nd ed. CyberAge Books: Medford. (2007). Internet.
World Wide Web Library 150 Week 8. The Web The World Wide Web is one part of the Internet. No one controls the web Diverse kinds of services accessed.
Search Engines By: Faruq Hasan.
Uncovering the Invisible Web. Back in the day… Students used to research using resources hand-picked by librarians and teachers. These materials were.
© 2010 Pearson Education, Inc. | Publishing as Prentice Hall. Computer Literacy for IC 3 Unit 3: Living Online Chapter 2: Searching for Information.
Internet Network of networks Mother of all networks
Unit 1—Computer Basics Lesson 3 The Internet and Research.
LIR 10: Week 10 Advanced WWW Topics. Class Announcements New features on Section 2904 Schedule Missing Homework Online Quiz due 11/16 Another WWW directory.
+ The Use of Databases in the Instructional Program Increasing Rigor and Inquiry Throughout the Curriculum Donna Dick, Jacob Gerding, and Michelle Phillips.
The World Wide Web: Information Resource. How a Search Engine works… How Search Works - YouTube
The World Wide Web. What is the worldwide web? The content of the worldwide web is held on individual pages which are gathered together to form websites.
To find journals by language of publication, click on the Languages bar in the horizontal frame. The Languages drop down menu appear and we will choose.
Information Literacy Learn to find and critically evaluate information sources. Increase your information literacy skills, to more effectively search,
Effective Internet Search Strategies: Search Engines & Directories Wendy E. Moore, M.S. in L.S. Acquisitions/Serials Librarian University of Georgia School.
Searching Effectively The Free Internet Magazines: EBSCOhost.
The Internet and the WWW IT-IDT-5.1. History of the Internet How did the Internet originate? Goal: To function if part of network were disabled Became.
Third Edition Discovering the Internet Discovering the Internet Complete Concepts and Techniques, Second Edition Chapter 3 Searching the Web.
Lecture 4 Access Tools/Searching Tools. Learning Objectives To define access tools To identify various access tools To be able to formulate a search strategy.
Searching the Web for academic information Ruth Stubbings.
The Web Web Design. 3.2 The Web Focus on Reading Main Ideas A URL is an address that identifies a specific Web page. Web browsers have varying capabilities.
Information Retrieval in Practice
Using computers to search electronic databases
Research4Life Programmes: Similarities and Differences!
Searching for and Accessing Information
Chapter 2.8: Developing Supporting Material
Unit# 5: Internet and Worldwide Web
Introduction to Information Retrieval
Research4Life Programmes: Similarities and Differences! (Part A)
Presentation transcript:

Tefko Saracevic 1 search engines digital libraries

Central ideas Search engines While the structure & basic operation of search engines is similar a great number & variety exists beyond Google  with their own features  many of them in specialized domains Digital libraries They have rich & varied resources of use in  accessing & searching of variety of databases & reference tools in many domains  accessing of journals for delivery of full texts in all fields Tefko Saracevic 2 As a searcher you are also using Knowing searching = also knowing these resources

ToC 1.Search engines 2.Digital libraries Tefko Saracevic 3

Definitions. How they work. Diversity 1. Search engines Tefko Saracevic 4

5 dictionary definitions search COMPUTING (transitive verb) to examine a computer file, disk, database, or network for particular information engine something that supplies the driving force or energy to a movement, system, or trend search engine a computer program that searches for particular keywords and returns a list of documents in which they were found, especially a commercial service that scans documents on the Internet Tefko Saracevic

6 about definition of search engines oh well … search engines do not search only for keywords, some search for other stuff as well and they are really not “engines” in the classical sense  but then mouse is not a “mouse” Tefko Saracevic

7 use of search engines … among others Tefko Saracevic

8 Your Browser How Search Engines Work (Sherman 2003) The Web URL1 URL2 URL3 URL4 Crawler Indexer Search Engine Database Eggs? Eggs. Eggs - 90% Eggo - 81% Ego- 40% Huh? - 10% All About Eggs by S. I. Am Tefko Saracevic

9 how do search engines work? elaboration crawlers, spiders: go out to find content  in various ways go through the web looking for new & changed sites  periodic, not for each query  no search engine works in real time  some search engines do it for themselves, others not  buy content from other companies  for a number of reasons crawlers do not cover all of the web – just a fraction  what is not covered is “invisible web” Tefko Saracevic

10 elaboration … organizing content: labeling, arranging  indexing for searching – automatic  keywords and other fields  arranging by URL popularity - PageRank as Google  classifying as directory  mostly human handpicked & classified as a result of different organization we have basically several kinds of search engines:  search – input is a query that is then searched & displayed  directory – classified content – a class is displayed  fused: directories have now also search capabilities & vice versa Tefko Saracevic

11 elaboration (cont.) databases, caches: storing content  humongous files usually distributed over many computers query processor: searching, retrieval, display  takes your query as input  engines have differing rules how handled  displays ranked output  some engines also cluster output and provide visualization at the other end is your browser  in addition to Explorer a number of the exists  Mozilla Firefox for instance – became quite popular Mozilla Firefox Tefko Saracevic

12 elaboration… similarities, differences all search engines have these basic parts in common BUT the actual processes – methods how they do it – are based on various algorithms & they differ  most are proprietary with details kept secret but based on well known principles from information retrieval or classification  to some extent Google is an exception – they published their original method, but not further Tefko Saracevic

13 case of developed by Sergey Brin and Lawrence Page while students at Stanford  in the beginning run on Stanford computers basic approach has been described in their famous paper “The Anatomy of a Large-Scale Hypertextual Web Search Engine”“The Anatomy of a Large-Scale Hypertextual Web Search Engine”  well written, simple language, has their pictures  in acknowledgement they cite the support by NSF’s Digital Library Initiative i.e. initially, Google came out of government sponsored research  describe their method PageRank - based on ranking hyperlinks as in citation indexing  “We chose our system name, Google, because it is a common spelling of googol, or ten on hundredth power” Tefko Saracevic

14 coverage differences no engine covers more than a fraction of WWW  estimates: none more than 16%  hard (even impossible) to discern & compare coverage, but they differ substantially in what they cover in addition:  many national search engines  own coverage, orientation, governance  many specialized or domain search engines  own coverage geared to subject of interest  many comprehensive sources independent of search engines  some have compilations of evaluated web sources Tefko Saracevic

searching differences substantial differences among search engines on searching, retrieval display  need to know how they work & differ in respect to  defaults in searching a query  searching of phrases, case sensitivity, categories  searching of different fields, formats, types of resources  advance search capabilities and features  possibilities for refinement, using relevance feedback  display options  personalization options Greg Notess’ chart & features describe differenceschartfeatures Tefko Saracevic 15

16 business model differences several business models public good - have independent budget  e.g. PubMed, Librarians’ Index to InternetPubMedLibrarians’ Index to Internet earn revenue from provision of information  all commercial search engines using search engines to promote their other activities  e.g. telephone directories Tefko Saracevic

17 sponsorship differences need to understand treatment of sponsorship – they influence what they search & how they display results  some list separately results from sponsored sites so you are reasonably clear what is there - what is sponsored & not  some have display-per-pay - showing first sites that paid most & do not even tell you that  some have pay per update of sites imperative to find sources that explain these models for different engines to know what is covered & what are you are getting Tefko Saracevic

18 limitations every search engine has limitation as to  coverage  meta engines just follow coverage limitations & have more of their own – have to be careful in their use  search capabilities  finding quality information some have compromised search with economics  becoming little more than advertisers but search engines are also many times victims of spamindexing  affecting what is included and how ranked Tefko Saracevic

19 spamming a search engine use of techniques that push rankings higher than they belong is also called spamdexing  methods typically include textual as well as link- based techniques  like spam, search engine spam is a form of adversarial information retrieval  the conflicting goals of accurate results of search providers & high positioning by content page rank search engines are constantly battling this with their own special (& secret) tools Tefko Saracevic

search engine features, reviews, tutorials - Search Engine Showdown lists, reviews, follows search engines, blog – look at Chart by Greg Notess (librarian) – book Teaching Web Search Skills has live linksTeaching Web Search Skills Recommended search engines by UC BerkeleyRecommended search engines library workshop; lists features, evaluates Search Basics: Web Search Essentials among others, has a large section on search engines Search features chart with explanations Tefko Saracevic 20

21 how to find a search engine? resources that list or categorize engines Search Engine Guide engines categorized by topic; other engine information Search Engine Colossus  international directory of search engines by country, topic from 351 countries and territories; engines in many languages Phil Bradley’s country based search engines “currently a total of 4,017 search engines and 222 countries, territories, islands and regions” Tefko Saracevic

all questions are not created equal what engine, what resource to use for what kind of question or information need?  An exhaustive classification in: Finding information: search engines Finding information: search engines by Phil Bradley  Sources for different topics: Choose the Best Search for Your Information Need Choose the Best Search for Your Information Need by NoodleTools  List of capabilities for major search engines: Best Search Tools Chart Best Search Tools Chart by Infopeople Tefko Saracevic 22

meta search engines Tefko Saracevic 23 meta engines search multiple engines  getting combined results from a variety of engines do not have their own databases  but have their own business models affecting results a number of techniques used  interesting ones: clustering, statistical analyses

24 sample of meta engines - with organized results Dogpile results from a number of leading search engines; gives source, so overlap can be compared; has SearchSpy - listing searches that were performed Surfwax gives text sources & linking to sources; for some terms gives related terms to focus Turbo10 provides results in clusters; engines searched can be edited Clusty results grouped by topics or clusters for further sources Tefko Saracevic

25 meta search engines (cont.) large directory  Complete Planet directory of over 70,000 databases & specialty engines; classified Complete Planet results with graphical displays Kartoo Kartoo results in display by topics of query new kid on the block (not a meta engine, but a search engine) Cuil Cuil Claim: “Cuil searches more pages on the Web than anyone else—three times as many as Google and ten times as many as Microsoft”. Well … I do not know if it holds. Tefko Saracevic

multilingual English still the major language  but declining, now slightly over 50% multilingual retrieval search engines  Euroseek Euroseek  searches in a number of languages  All the Web All the Web  results in 45 languages Tefko Saracevic 26

where to find out? Tefko Saracevic 27 information about search engines in sources that have updates, news, tips for searching and more – a MUST for searchers :  Search Engine Watch Search Engine Watch  ratings, news, statistics, charts, explanations, tutorials  Search Engine Showdown Search Engine Showdown  “The users’ guide to web searching” - run by a librarian, news links, ratings Virtual Chase a site about “Teaching Legal Professionals How To Do Research” - this section has very good tips and links for consideration of quality on the web

28 where? …. SiteLines a blog, written by Rita Vine, a professional librarian, & web search trainer; many evaluations in archive ResourceShelf “Resources and News for Information Professionals,” edited by Gary Price, a librarian & author of Invisible Web – has extensive archive WebsearchAbout not evaluative, but provides news, capabilities, sources, articles about web searching Tefko Saracevic

29 art of searching search engines Tefko Saracevic

30 part 2: digital libraries Tefko Saracevic

31 definition digital libraries are viewed from several perspectives  technical: “Digital library is a managed collection of information, with associated services, where information is stored in digital format and accessible over a network.” (Arms, 2000)Arms, 2000  institutional: “Digital libraries are organizations that provide the resources, including the specialized staff, to select, structure, offer intellectual access to, interpret, distribute, preserve the integrity of, and ensure the persistence over time of collections of digital works so that they are readily and economically available for use by a defined community or set of communities.” (Waters, 1998)Waters, 1998 Tefko Saracevic

32 a bit of context digital libraries have a short but volatile history  research & development took of by start/mid 1990’s  in the next decade phenomenal growth worldwide  large investment in research, development, keeping up number of communities involved  computer science, primarily in research  library & information science: operations, studies of users, use, usability  many subjects: digital libraries in their domain diversity is large  many institutions e..g. museums developed ownmuseums Tefko Saracevic

libraries & digital resources Tefko Saracevic 33 libraries (particularly research, academic & special) invested massive & ongoing funding toward  electronic journals  databases  reference sources  digitization of parts of collection thus becoming in effect digital libraries – or more accurately hybrid libraries  with graphic and digital versions or types of resources RUL has substantial holdings & expenditures in all of these

34 emphasis here on large academic or research digital libraries that also are related to searching including provision of  search capabilities & access to databases  electronic journals that provide full text of articles after a search  digital reference sources such libraries have become also search portals of sort, essential for their users  in education, research & related activities Tefko Saracevic

35 sample New York Public Library Digital Collections A gateway to rare and unique collections in digitized form & to databases. Access to most searchable databases requires library card number U California Berkeley Digital Library SUNsite digital collections and services The British Library “The world’s knowledge.” Includes “Services for library and information Professionals.”Services for library and information Professionals Los Angeles Public Library Kids’ Path resources for children; search through directory Tefko Saracevic

36 sample … New Zealand Digital Library searching of a number of digital collections, incl. humanitarian and UN collections; provision of free software for digital libraries Public Library of Science “PLoS is a nonprofit organization of scientists and physicians committed to making the world's scientific and medical literature a public resource.” Publishes open access journals Closer to home: New Brunswick Free Public Libr aryNew Brunswick Free Public Libr ary has online resources, databases (some require library PIN), historical archives and more example of great many public libraries that have databases for searching Tefko Saracevic

37 Rutgers libraries Rutgers libraries – digital components strategic planning in developing digital accessstrategic planning rich & complex content of digital resources  several hundred indexes & databases for searching  some 20,000 electronic journals  thousand & more digital reference sources  subject research guides  Searchpath & other tutorials  electronic reserve affected teaching, learning, research by the whole community Tefko Saracevic

38 some critical issues for searching no way yet to do effective federated searching in digital libraries (to search several indexes at the same time)  RUL has Searchlight – searches only 8 major databases  each source has to be searched separately  most have very different search features, capabilities finding items in indexes does not mean that always able to get full text thus, searching time-consuming, chaotic Tefko Saracevic

39 where to find out? information about digital libraries for searching LibWebLibWeb Webjunction formerly U California, Berkeley “lists currently over 7900 pages from libraries in over 146 countries” Digital Library Federation “a consortium of libraries and related agencies that are pioneering the use of electronic-information technologies to extend their collections and services” D-Lib Magazine “a solely electronic publication with a primary focus on digital library research and development, including but not limited to new technologies, applications, and contextual social and economic issues” Tefko Saracevic

40 where? … Ariadne Ariadne (UK) “to report on information service developments and information networking issues worldwide, keeping the busy practitioner abreast of current digital library initiatives” Journal of Digital Information “Publishing papers on the management, presentation and uses of information in digital environments” Tool Kit for the Expert Web Searcher one of the wikis by Library Information and Technology Association, a division of the American Library AssociationLibrary Information and Technology Association Expert Web Search Tips one of many informative articles from the Living InternetLiving Internet Tefko Saracevic

in conclusion Tefko Saracevic 41 search engines are great but you have to KNOW what is under the hood  as to coverage, business model, search features, outputs …  they are NOT for every kind of information need digital libraries are great for searching but you have to KNOW requirements for searching different resources that are included  as yet federated searching is limited

42 art of searching digital libraries Tefko Saracevic more

43 and rewards … Tefko Saracevic