Exploring the Academic Invisible Web Das wissenschaftliche Invisible Web erkunden Dr. Dirk Lewandowski Heinrich-Heine-Universität Düsseldorf, Information.

Slides:



Advertisements
Similar presentations
Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Advertisements

1 Use of Electronic Resources in Research Prof. Dr. Khalid Mahmood Department of Library & Information Science University of the Punjab.
Your dissertation and the Library James Webley 19 February 2013.
Classification & Your Intranet: From Chaos to Control Susan Stearns Inmagic, Inc. E-Libraries E204 May, 2003.
The Visible Web (aka The Surface Web or Indexable Web)
“The Computer as an Educational Tool: Productivity and Problem Solving” ©Richard C. Forcier and Don E. Descy.
Tara Guthrie, 2012 Types of Resources: Electronic.
Computer Information Technology – Section 3-2. The Internet Objectives: The Student will: 1. Understand Search Engines and how they work 2. Understand.
How Do Search Engines Work? Dr. Steve Broskoske Misericordia University.
Engineering Research Methodology
Best Web Directories and Search Engines Order Out of Chaos on the World Wide Web.
Web Evaluation Websites and the Invisible Web HIST 221/INFO 221 February 25, 2004 Presented By: Teresa Ferguson
SCI-MATH WORLD Robert J. Lackie, Associate Professor-Librarian Franklin F. Moore Library, Rider University Lawrenceville, New Jersey
University Library Internet searching: getting the best from Outline The Web – the good, the bad and the ugly Search engines and Google Getting the best.
Eric Sieverts University Library Utrecht IT Department Institute for Media & Information Management (Hogeschool van Amsterdam)
The Invisible or Deep Web What is it? The "visible web" is what you can find using general web search engines. It's also what you see in almost all subject.
What’s new in search? Internet Librarian Oct 29 th 2007.
Internet Research Search Engines & Subject Directories.
Beyond Google Sean Barr Bernie Mathias Jan 30 th, 2013.
An introduction to databases In this module, you will learn: What exactly a database is How a database differs from an internet search engine How to find.
Literature Search Techniques 2 Strategic searching In this lecture you will learn: 1. The function of a literature search 2. The structure of academic.
Beyond Google Search Using Google Search tools to their potential.
Web Search Created by Ejaj Ahamed. What is web?  The World Wide Web began in 1989 at the CERN Particle Physics Lab in Switzerland. The Web did not gain.
CS621 : Seminar-2008 DEEP WEB Shubhangi Agrawal ( )‏ Jayalekshmy S. Nair ( )‏
IL Step 1: Sources of Information Information Literacy 1.
Fielding Graduate University Library Locating Tests and Measures.
The Invisible Web Cynthia Rooley Computer Research.
Company profile John Wiley & Sons Founded 1807 Wiley-VCH Acquisition 1995 International publisher of scientific and professional.
Thomas HeckeleiPublishing and Writing in Agricultural Economics1 Publishing and Writing in Agricultural Economics Promotionskolleg Agrarökonomik 1Introduction.
Bio-Medical Information Retrieval from Net By Sukhdev Singh.
Bibliographic databases, online journals and literature searching.
Sian Aynsley Information Skills Trainer South London Healthcare NHS Trust Getting the Most Out of Google.
1 © Netskills Quality Internet Training, University of Newcastle Search Engines and Other Animals © Netskills, Quality Internet Training, University of.
University of Antwerp Library TEW & HI UA library offers... books, journals, internet catalogue -UA catalogue, e-info catalogue databases -e.g.
NCBI/WHO PubMed/Hinari Course Introduction Session #1, Sept 13, 2005 Session #2, Sept 14, 2005 Internet Concepts and Scientific Literature Resources Ho.
Week 9 Lecture Quiz ProQuest Review Week 9 Homework Review Invisible vs. Visible Web Break GVRL + Google + The Wayback Machine The Information Cycle Featured.
Research resources for 3 rd /4 th year projects James Webley Subject Librarian Mathematics 13 October 2014.
Search Engine Comparisons By: Thomie Ventura. Search Engines Today, much, but not all, of the work we do revolves around the web Today, much, but not.
How can we index a journal in DOAJ? Elham Torbati, Kowsar March 2012.
Presented by Dr. S. C. Jindal Librarian Central Science Library University of Delhi Delhi Information Competency.
Research Skills GS 140. Your research proposal ( Assignment 7 – due week 14) What is your proposed research problem? What has been written by others on.
RESEARCH PROPOSAL: HOW TO REVIEW THE LITERATURE MNGT Özge Can.
Finding literature for 3 rd /4 th year projects James Webley Subject Librarian Mathematics 19 October 2015.
Uncovering the Invisible Web. Back in the day… Students used to research using resources hand-picked by librarians and teachers. These materials were.
Advanced Searching IS530 Fall 2009 Dr. Dania Bilal.
Advancing Science: OSTI’s Current and Future Search Strategies Jeff Given IT Operations Manager Computer Protection Program Manager Office of Scientific.
Web Search Architecture & The Deep Web
Irakli Garibashvili Director, National Scientific Library in Georgia.
Access to Scholarly literature (Open access) Presentation by Dr.S.K.Patil Professor and Librarian Symbiosis International University.
Year 12: Workshop 2: Finding and evaluating information LSE Library / CLT / Widening Participation This work is licensed under a Creative Commons Attribution-NonCommercial.
Information Literacy Learn to find and critically evaluate information sources. Increase your information literacy skills, to more effectively search,
The Value of E Books: Beyond a Good Read Mark Schregardus, VP - International Sales Ovid Technologies Informatio Medicato 2004 MOKSZ Budapest, 2004.
Learning how to search on the web “If all you ever do is all you’ve ever done, then all you’ll ever get is all you’ve ever got.” (author unknown)
SEMINAR ON INTERNET SEARCHING PRESENTED BY:- AVIPSA PUROHIT REGD NO GUIDED BY:- Lect. ANANYA MISHRA.
Searching the Web for academic information Ruth Stubbings.
Contents Module 6: E-journal, E-books and Internet Resources
MAT4444: Transferable Skills for Engineers and Materials Scientists
Using computers to search electronic databases
Federated & Meta Search
Search Engines & Subject Directories
Eric Sieverts University Library Utrecht Institute for Media &
using the internet for research
Elsevier Engineering Information
أدوات البحث عبر الانترنت
ثانيا :أدوات البحث عبر الانترنت
Data Mining Chapter 6 Search Engines
Search Engines & Subject Directories
Search Engines & Subject Directories
Networked Information Resources
Presentation transcript:

Exploring the Academic Invisible Web Das wissenschaftliche Invisible Web erkunden Dr. Dirk Lewandowski Heinrich-Heine-Universität Düsseldorf, Information Science Research done in collaboration with Philipp Mayr, Bonn

Agenda 1.Introduction 2.The (Academic) Invisible Web defined 3.The size of the (Academic) Invisible Web 4.AIW relevant to... 5.Opening the AIW – different models

1 Introduction Users expect their search services to be comprehensive and integrated. Up-to-dateness and completeness are important factors in research.

2 The Invisible Web defined Definitions for Invisible/Deep Web “Text pages, files, or other often high-quality authoritative information available via the World Wide Web that general- purpose search engines cannot, due to technical limitations, or will not, due to deliberate choice, add to their indices of Web pages" (Sherman u. Price 2001). “The deep Web - those pages do not exist until they are created dynamically as the result of a specific search“ (Bergman 2001).

Type of Invisible Web ContentWhy It's Invisible Disconnected pageNo links for crawlers to find the page Pages consisting primarily of images, audio, or video Insufficient text for the search engine to "understand" what the page is about Pages consisting primarily of PDF or Postscript, Flash, Shockwave, Executables (programs) or Compressed files (.zip,.tar, etc.) Technically indexable, but usually ignored, primarily for business or policy reasons Content in relational databasesCrawlers can't fill out required fields in interactive forms Real-time contentEphemeral data; huge quantities; rapidly changing information Dynamically generated content Customized content is irrelevant for most searchers; fear of "spider traps" Sherman u. Price 2001

From the Invisible Web to the Academic Invisible Web Nowadays, the IW problem is mainly the problem with the contents of databases. For the academic sector, sources from the surface Web are relevant as well as sources from the Invisible Web. The Academic Invisible Web (AIW) consists of the databases relevant to academia. Or narrower: The AIW consists of the databases that libraries should index (using search engine technology).

3 The size of the Invisible Web

Bergman‘s calculation Average size of IW databases: –5,43 million documents (mean) –4.950 documents (median) Total size: databases * 5,43 Mio. documents = total of 543 billion documents. Size of the surface Web: 1 billion documents (2001). The Invisible/Deep Web is 550 times larger than the surface Web.

Bergman’s calculation But: Use of the mean, although distribution of sizes is highly skewed. –5,43 million documents (mean) –4.950 documents (median) Top60 contain 85 billion documents, GB. Top2 contain GB (>75% of Top60).

Contents of Bergman’s Top 60 Basis: Database sizes in GB

Summary Bergman criticism Database selection –Database types –Database content Calculation

Size comparison: Gale Directory of Databases Contains approx databases (2003); covers all major academic databases. Total size estimate for all databases: 18,55 billion documents (includes CD-ROM databases). Estimate is based on less than 10 percent of all databases. 5 percent of all databases contain >1 million documents, some more than 100 million. Some of the databases included in Bergman’s top 60 are missing in Gale.

Will AIW show also an exponential distribution?

Conclusion: Size of the Invisible Web Bergman’s size of 550 billion documents is highly overestimated. An exact calculation from the distribution of Bergman’s top 60 is not possible. The size estimate from Gale directory includes databases beyond the web, but does not include all web databases. The estimate from Gale is probably too low.

4 AIW relevant for scholars, searchers, librarians, information professionals

Everything relevant for the scientific process –Literature (articles, dissertations, reports, books, …) –Data –Pure Online content (e.g. OA) Providers of AIW content –Database vendors (meta data) + human indexing –Library content (OPACs, collections) + human indexing –Publishers content (full text) + mixed indexing –Other repositories A lot of these materials are not necessarily AIW, but in fact uncovered by the main search engines and tools.

5 Opening the AIW – different models Commercial search engines –Google Scholar –Scirus Libraries & database vendors –BASE (Bielefeld Academic Search Engine) –Vascoda (Integration of library and database collections) Open Access repositories –Citebase –OpenROAR

Conclusion

Summary Existing search tools and approaches show potential to make AIW visible All protagonists should work together –Commercial search engine providers with their machine and financing power –Librarians with their experience in collection building and subject access (e.g. thesauri, classification, taxonomies) –Publishers and database vendors via opening their collections

Future research Building an AIW sample for further tests. Better size estimates from this sample. Classification of AIW content. Distinction between Academic Surface Web and AIW.

Vielen Dank.

References Bergman, M.K. (2001). The Deep Web: Surfacing Hidden Value. Journal of Electronic Pub-lishing, 7(1). Sherman, C., & Price, G. (2001). The Invisible Web: Uncovering Information Sources Search Engines Can't See. Medford, NJ: Information Today.