Download presentation
Presentation is loading. Please wait.
Published byClyde Chapman Modified over 9 years ago
1
WEB MINING
2
In recent years the growth of the World Wide Web exceeded all expectations. Today there are several billions of HTML documents, pictures and other multimedia files available via internet and the number is still rising. But considering the impressive variety of the web, retrieving interesting content has become a very difficult task.
3
Web is the single largest data source in the world Due to heterogeneity and lack of structure of web data, mining is a challenging task Multidisciplinary field: data mining, machine learning, natural language processing, statistics, databases, information retrieval, multimedia, etc.
4
Enormous wealth of information on Web Lots of data on user access patterns Possible to mine interesting nuggets of information
5
Structured Data Unstructured Data OLE DB offers some solutions!
7
A PPLICATIONS OF WEB MINING E-commerce (Infrastructure) Generate user profiles Targetted advertizing Fraud Similar image retrieval Information retrieval (Search) on the Web Automated generation of topic hierarchies Web knowledge bases Extraction of schema for XML documents Network Management Performance management Fault management
8
Service Provider Network Router Server Objective: To deliver content to users quickly and reliably Traffic management Fault management
9
Examine the contents of web pages as well as result of web searching Can be thought of as extending the work performed by basic search engines Search engines have crawlers to search the web and gather information, indexing techniques to store the information, and query processing support to provide information to the users Web Content Mining is: the process of extracting knowledge from web contents
10
Many methods designed to analyze structured data If we can represent documents by a set of attributes we will be able to use existing data mining methods How to represent a document? Vector based representation (referred to as “bag of words” as it is invariant to permutations) Use statistics to add a numerical dimension to unstructured text
11
WEB USAGE MINING PROCESS
12
Web Usage Mining Process
13
WEB USAGE MINING PROCESS
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.