The web is referred to as a “massive collection of web pages stored on millions of computers across the world that are linked by the Internet” (Chowdhury, 2010, p. 381) It was created in 1989 by Tim Berners-Lee and his team of scientists at the European Laboratory for Particle Physics in Geneva. The Hyper Text Transfer Protocol (HTTP) was created to standardize communication between clients and servers used by the web Mosaic was the first web browser created for the web in 1993 at the US National Center for Supercomputing Applications. This was followed by the Netscape Navigator and the Internet Explorer. Today there are several browsers, Firefox, Chrome, Safari etc. The web has grown exponentially from over 9 million websites in 2002 to over 1 billion in Today the no. Of indexed pages is 4.71 billion.
Distributed nature of web- Web resources are distributed on millions of computers throughout the world with different architecture, software, and standards. Text retrieval systems deals with a set of documents, and specified set of standards such as hardware, software, and processing standards, (eg. MARC formats and OPAC). Size and growth of the web – The rapid growth of the web makes indexing and retrieval complex and difficult. Traditional text retrieval systems are amenable to research and testing for eventual handling of large volumes of data
Deep versus the surface web – surface web is accessible by all, deep web is larger, inaccessible, and password protected, requires authorization or use of a specified program Type and format of documents - The web has a variety of data and documents, eg. Text and multimedia resources. Text retrieval deals with text only
Quality of information- Quality of web information is uncertain since anyone can publish on the web. Text retrieval system comprise published information resources with definite quality control. Frequency of changes- Web information changes frequently. Contents of text retrieval systems are static and thus easy to track and retrieved by a retrieval system. Ownership – ownership of web resources varies, some are free, others require permission or access rights, posing a challenge to retrieval
Distributed users – Unlike users of the web, text retrieval systems know the nature, characteristics, information needs and seeking behaviours of their users posing a challenge to the designer of a web information retrieval system Multiple languages- language of both information resources and users are diverse posing a challenge. An ideal IRS must be able to retrieve required information irrespective of language of the query or the source of information. Resource requirements- The astronomical size of the web makes it difficult for it to run effectively and efficiently, and also be funded by a single body although the world desires a good IRS to access the web information resources.
Make notes on issues and challenges of web information retrieval (Chowdhury, 2010, pp