Ontological Classification of Web Pages Zafer Erenel Many users use search engines to locate and buy goods and services (such as choosing a vacation).

Slides:



Advertisements
Similar presentations
Semantic Access to Data from the Web Raquel Trillo *, Laura Po +, Sergio Ilarri *, Sonia Bergamaschi + and E. Mena * 1st International Workshop on Interoperability.
Advertisements

Search Engine – Metasearch Engine Comparison By Ali Can Akdemir.
 How many pages does it search?  How does it access all those pages?  How does it give us an answer so quickly?  How does it give us such accurate.
Mastering the Internet, XHTML, and JavaScript Chapter 7 Searching the Internet.
Towards Semantic Web Mining Bettina Berndt Andreas Hotho Gerd Stumme.
Chapter 5 Searching for Truth: Locating Information on the WWW.
Search engines. The number of Internet hosts exceeded in in in in in
1 MARG-DARSHAK: A Scrapbook on Web Search engines allow the users to enter keywords relating to a topic and retrieve information about internet sites (URLs)
University of Kansas Data Discovery on the Information Highway Susan Gauch University of Kansas.
Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.
Overview of Search Engines
What’s The Difference??  Subject Directory  Search Engine  Deep Web Search.
Application Layer. Applications A program or group of programs designed for end users. A program or group of programs designed for end users. Software.
IDK0040 Võrgurakendused I Building a site: Publicising Deniss Kumlander.
SEARCH ENGINE By Ms. Preeti Patel Lecturer School of Library and Information Science DAVV, Indore E mail:
Search Engine Optimization
Result presentation. Search Interface Input and output functionality – helping the user to formulate complex queries – presenting the results in an intelligent.
Evaluation David Kauchak cs458 Fall 2012 adapted from:
Evaluation David Kauchak cs160 Fall 2009 adapted from:
Aurora: A Conceptual Model for Web-content Adaptation to Support the Universal Accessibility of Web-based Services Anita W. Huang, Neel Sundaresan Presented.
Chapter 5 Searching for Truth: Locating Information on the WWW.
HOW SEARCH ENGINE WORKS. Aasim Bashir.. What is a Search Engine? Search engine: It is a website dedicated to search other websites and there contents.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
AnswerBus Question Answering System Zhiping Zheng School of Information, University of Michigan HLT 2002.
1 Applying Collaborative Filtering Techniques to Movie Search for Better Ranking and Browsing Seung-Taek Park and David M. Pennock (ACM SIGKDD 2007)
A Survey of Patent Search Engine Software Jennifer Lewis April 24, 2007 CSE 8337.
Search Engine Interfaces search engine modus operandi.
ITIS 1210 Introduction to Web-Based Information Systems Chapter 27 How Internet Searching Works.
Search Engine By Bhupendra Ratha, Lecturer School of Library and Information Science Devi Ahilya University, Indore
CSCI-235 Micro-Computer in Science Internet Search.
It! Some tips and tricks for using Google Ashley Knapp Just.
XP New Perspectives on The Internet, Sixth Edition— Comprehensive Tutorial 3 1 Searching the Web Using Search Engines and Directories Effectively Tutorial.
Search engines are used to for looking for documents. They compile their databases by employing "spiders" or "robots" to crawl through web space from.
Detecting Dominant Locations from Search Queries Lee Wang, Chuang Wang, Xing Xie, Josh Forman, Yansheng Lu, Wei-Ying Ma, Ying Li SIGIR 2005.
استاد : مهندس حسین پور ارائه دهنده : احسان جوانمرد Google Architecture.
Curtis Spencer Ezra Burgoyne An Internet Forum Index.
The Business Model of Google MBAA 609 R. Nakatsu.
Improving Web Search Results Using Affinity Graph Benyu Zhang, Hua Li, Yi Liu, Lei Ji, Wensi Xi, Weiguo Fan, Zheng Chen, Wei-Ying Ma Microsoft Research.
Next Generation Search Engines Ehsun Daroodi 1 Feb, 2003.
 A search agent scours the entire web.  Constantly Evolving and Expanding.
ICP and the Squid Web Cache Duanc Wessels k Claffy August 13, 1997 元智大學系統實驗室 宮春富 2000/01/26.
Search Tools and Search Engines Searching for Information and common found internet file types.
CPT 499 Internet Skills for Educators Session Three Class Notes.
Understanding User Goals in Web Search University of Seoul Computer Science Database Lab. Min Mi-young.
Web Information Retrieval Prof. Alessandro Agostini 1 Context in Web Search Steve Lawrence Speaker: Antonella Delmestri IEEE Data Engineering Bulletin.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
The World Wide Web. What is the worldwide web? The content of the worldwide web is held on individual pages which are gathered together to form websites.
Web Design Terminology Unit 2 STEM. 1. Accessibility – a web page or site that address the users limitations or disabilities 2. Active server page (ASP)
1 UNIT 13 The World Wide Web. Introduction 2 Agenda The World Wide Web Search Engines Video Streaming 3.
1 UNIT 13 The World Wide Web. Introduction 2 The World Wide Web: ▫ Commonly referred to as WWW or the Web. ▫ Is a service on the Internet. It consists.
Search Engine and Optimization 1. Introduction to Web Search Engines 2.
Vertical Search for Courses of UIUC Homepage Classification The aim of the Course Search project is to construct a database of UIUC courses across all.
CSC 102 Lecture 12 Nicholas R. Howe
Search Engines and Search techniques
Advanced Topics in Concurrency and Reactive Programming: Case Study – Google Cluster Majeed Kassis.
Prepared by Rao Umar Anwar For Detail information Visit my blog:
Search Engines & Subject Directories
Submitted By: Usha MIT-876-2K11 M.Tech(3rd Sem) Information Technology
Information Retrieval
What is a Search Engine EIT, Author Gay Robertson, 2017.
Data Mining Chapter 6 Search Engines
Searching for Truth: Locating Information on the WWW
Anatomy of a Search Search The Index:
Agenda What is SEO ? How Do Search Engines Work? Measuring SEO success ? On Page SEO – Basic Practices? Technical SEO - Source Code. Off Page SEO – Social.
Identify Different Chinese People with Identical Names on the Web
Search Engines & Subject Directories
Search Engines & Subject Directories
Panagiotis G. Ipeirotis Luis Gravano
Searching for Truth: Locating Information on the WWW
Searching for Truth: Locating Information on the WWW
Presentation transcript:

Ontological Classification of Web Pages Zafer Erenel Many users use search engines to locate and buy goods and services (such as choosing a vacation). Web pages presented on the internet do not conform to any data organization standard and search engines provide primitive query capabilities for users to retrieve relevant data [1]. In addition to that, they do not list sites equally and are inclined toward listing more popular pages. These tendencies brush many web pages aside and leave a limited number of alternatives to the users. I have created a lightweight domain ontology that consists of a taxonomic hierarchy and made use of it by an automated agent to classify web pages on the internet. The automated agent discovers and classifies relevant pages with the help of Yahoo and Google search engines.

Related Research Desai and Spink presented a clustering scheme that groups documents into partially and substantially relevant pages by using similarity measures and ranking heuristics [2]. They worked with the end-user queries (limited number of terms) to obtain the relevance score. Instead, my automated agent will act along with an established ontology to discover and classify documents. Chiang, Chua, and Storey parsed snippets of returned links to find the ratio of the number of matching terms to rank the web pages for relevance [1]. I believe snippets consist of a very few number of words and we cannot judge the web page on the basis of snippets. My agent scours the entire web page which is more time-consuming but more effective

Yahoo and Google search results contain scores of links. My lightweight domain ontology consists of 7 branches. Each branch is comprised of predetermined terms. Score of the web page increases in a certain branch as the agent comes across these predetermined terms on the html code. I’ve chosen country ontology because internet users’ interest in a certain country can be quite high.

The ranks of web pages in each cluster will clarify their content to the user. In addition to that, we can compare result sets of different search engines (Yahoo and Google) for the same queries and find complement and intersection of their result sets to have a clear understanding of search engines’ behaviors.

I’ve used web stream classes in C# Programming language to create my agent. A WebRequest is an object that requests a Uniform Resource Identifier (URI) such as the URL for a web page [3]. You can use a WebRequest object to create a WebResponse object that will encapsulate the object pointed to by the URI. Once you get the actual object (e.g., a web page) pointed to by the URI, what you get back is a stream of the web page.

I used this capability for reading a page from a site to extract the information I need. I have created two web requests using search syntax given below

Google search engine has returned 200 URLs and I have created 200 web requests to extract relevant information from each web page. Yahoo search engine has been used in the same manner to extract relevant information.

If we analyze price rankings, we come across pages that have information about student flights, travel insurances, vacation package discounts, cheap flights and etc.. If we analyze nature rankings, we come across web pages that offer adventure and etc. If I want to do scuba diving on my vacation, I know that hawai and fiji are among my options by looking at activities rankings

Interestingly enough, in the top 100 search lists, the number of web pages that both appear on Google and Yahoo is 19.

In the top 200 search lists, the number of web pages that both appear on Google and Yahoo is 24.

As a result, ontologically organized clusters of web sites that are offering information about a given country regarding vacation and travel alternatives serve our objective to a greater extent in finding what we are in search of. In my work, I have used 2 search engines and a single ontology. Search Engines’ shortcomings can be prevented by combining multiple engines with multiple ontologies to ease the search for most needed information on the internet. Venn Diagrams prove that a specific search engine is not very effective by itself.

References [1] R.H.L. Chiang, C.E.H. Chua, V.C. Storey, A smart web query method for semantic retrieval of web data, Data & Knowledge Engineering 38 (2001) [2] M. Desai, A. Spink, An algorithm to cluster documents based on relevance, Information Processing and Management 41 (2005) [3] Liberty, J., Programming C#,3rd ed. O’REILLY, 2003.