Download presentation
Presentation is loading. Please wait.
Published byEarl Roland Ryan Modified over 9 years ago
1
Research paper: Web Mining Research: A survey SIGKDD Explorations, June 2000. Volume 2, Issue 1 Author: R. Kosala and H. Blockeel
2
Introduction Web Mining Web Content Mining Web Structure Mining Web Usage Mining Conclusion
3
The World Wide Web is a popular and interactive medium to disseminate information Information users may encounter four problems 1. Finding relevant information a. low precision b. low recall 2. Creating new knowledge out of the information available on the web --- data-triggered process 3. Personalizing of the information People differ in the content and presentations of information 4. Learning about consumers or individual users Mass customizing or even personalizing
4
Definition: web mining refers to the overall process of discovering potentially useful and previously unknown information or knowledge from the web data Four subtasks Resource finding: retrieving intended web documents Information selection and pre-processing: selecting and pre- processing specific information Generalization: discovering general patterns Analysis: validation and/or interpretation of mined patterns
5
Web Mining and Information Retrieval Definition: IR is the automatic retrieval of all relevant documents while at the same time retrieving as few of the non-relevant documents as possible. goal: indexing and searching for useful documents Web Mining and Information Extraction IE has the goal of transforming a collection of documents into information that is more readily digested and analyzed. Compare IR and IE a. aims b. fields
6
Web Mining and the Agent Paradigm Web mining is often viewed from or implemented within an agent paradigm 1. User interface agents 2. Distributed agents 3. Mobile agents Two approaches used to develop intelligent agents 1. Content-based approach 2. Collaborative approach
7
Definition: discovering useful info from web page contents/data/documents Several types of data: text, image, audio, video, hyperlinks Types of Data Structure: 1.Unstructured: free text 2.Semi- structured: HTML 3.More structured: data in tables or database generated HTML pages
8
IR view: Unstructured Documents a. Bag of words to represent unstructured documents b. Feature: Boolean, Frequency based c. Variations of the feature selection d. Features could be reduced using different feature selection techniques Semi-Structured Documents a. Uses richer representations for features b. Uses common data mining methods
9
DB view: DB view tries to infer the structure of a web site or transform a web site to become a database Methods: a. Finding the scheme of web documents b. Building a web warehouse c. Building a web knowledge base d. Building a virtual database
10
Interested in the structure of the hyperlinks within the web Inspired by the study of social networks and citation analysis Discover specific types of pages based on the incoming and outgoing links Application: a. discovering micro-communities in the web b. measuring the completeness of a web site
11
Tries to predict user behavior from interaction with the web Wide range of data Two commonly used approaches a. Maps the usage data of Web server into relational tables before an adapted data mining technique is performed b. Uses the log data directly by utilizing special pre-processing techniques problems: a. Distinguishing among unique users, server sessions, episodes in the presence of caching and proxy servers b. Often usage mining uses some background or domain knowledge applications
12
Survey of research in the area of web mining Three web mining categories: content structure usage mining Connection between web mining categories and related agent paradigm
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.