Download presentation
Presentation is loading. Please wait.
Published byMarian Dennis Modified over 8 years ago
1
WEB USAGE MINING Web Usage Mining 1
2
Contents Web Usage Mining 2 Web Mining Web Mining Taxonomy Web Usage Mining Web analysis tools Pattern Discovery Tools & it’s different stages Pattern Analysis Tools & techniques employed Web usage Mining Process Web usage Mining Architecture Research Directions Conclusion References
3
Web Mining Web Usage Mining 3 Web mining - data mining techniques to automatically discover and extract information from Web documents/services. Web mining research – it integrate information from several research communities such as: Database (DB) Information retrieval (IR) The sub-areas of machine learning (ML) Natural language processing (NLP)
4
Mining the World-Wide Web Web Usage Mining 4 WWW is a huge, widely distributed, global information source for : Information services: news, advertisements, consumer information, financial management, education, government, e-commerce, etc. Hyper-link information Access and usage information Web Site contents and Organization
5
Challenges on WWW Interactions Web Usage Mining 5 Finding Relevant Information Creating knowledge from Information available Personalization of the information Learning about customers / individual users Web Mining can play an important Role!
6
Web Mining Taxonomy Web Usage Mining 6 Web Mining Web Content Mining Web Usage Mining Web Structure Mining
7
Web Usage Mining 7 Web usage mining also known as Web log mining mining techniques to discover interesting usage patterns from the secondary data derived from the interactions of the users while surfing the web
8
Web Usage Mining 8 Organizations often generate and collect large volumes of data in their daily operations while interacting with a web site. Most of this information is usually generated automatically by Web servers and collected in server access logs. Other sources of user information include referrer logs which contains information about the referring pages for each page reference, and user registration or survey data gathered via tools such as CGI scripts. Analysis of server access logs and user registration data provide valuable information on how to better structure a Web site in order to create a more effective presence for the organization. Most of the existing Web analysis tools provide mechanisms for reporting user activity in the servers and various forms of data filtering.
9
Web analysis tools: Web Usage Mining 9 Using these tool it is possible to determine the number of accesses to the server and the individual files within the organization's Web space, the times or time intervals of visits, and domain names and the URLs of users of the Web server. Pattern Discovery Tools Pattern Analysis Tools
10
Pattern Discovery Tools Web Usage Mining 10 The emerging tools for user pattern discovery that use sophisticated techniques from AI, data mining, psychology, and information theory, to mine for knowledge from collected data. The WEBMINER system introduces a general architecture for Web usage mining. WEBMINER automatically discovers association rules and sequential patterns from server access logs. Pirolli et. al. use information foraging theory to combine path traversal patterns, Web page typing, and site topology information to categorize pages for easier access by users.
11
Pattern Analysis Tools Web Usage Mining 11 Once access patterns have been discovered, analysts need the appropriate tools and techniques to understand, visualize, and interpret these patterns. Examples of such tools include, WebViz system OLAP techniques such as data cubes for the purpose of simplifying the analysis of usage statistics from server access logs. The WEBMINER system proposes an SQL-like query mechanism for querying the discovered knowledge
12
Pattern Discovery from Web Transactions Web Usage Mining 12 Preprocessing Tasks Data Cleaning Transaction Identification Discovery Techniques on Web Transactions Path Analysis Association Rules Sequential Patterns Clustering and Classification
13
Preprocessing Tasks Web Usage Mining 13 Data Cleaning : Techniques to clean a server log to eliminate irrelevant items. Elimination of irrelevant items can be reasonably accomplished by checking the suffix of the URL name. like, all log entries with filename suffixes such as, gif, jpeg, GIF, JPEG, jpg, JPG, and map can be removed.
14
Transaction Identification Web Usage Mining 14 Here, sequences of page references are grouped into logical units representing Web transactions or user sessions. Two types of transactions are defined. navigation-content where each transaction consists of a single content reference and all of the navigation references in the traversal path leading to the content reference. These transactions can be used to mine for path traversal patterns. content-only which consists of all of the content references for a given user session. These transactions can be used to discover associations between the content pages of a site.
15
Discovery Techniques on Web Transactions Web Usage Mining 15 Path Analysis Here a graph represents the physical layout of a Web site, with Web pages as nodes and hypertext links between pages as directed edges. Other graphs could be formed based on the types of Web pages with edges representing similarity between pages, or creating edges that give the number of users that go from one page to another. Path analysis could be used to determine most frequently visited paths in a Web site.
16
Web Usage Mining 16 Other examples of information that can be discovered through path analysis are: 70% of clients who accessed /company/products/file2.html did so by starting at /company and proceeding through /company/whatsnew, /company/products, and /company/products/file1.html; 80% of clients who accessed the site started from /company/products; or 65% of clients left the site after four or less page references.
17
Association Rules Web Usage Mining 17 This technique is generally applied to databases of transactions where each transaction consists of a set of items. the problem is to discover all associations and correlations among data items Each transaction is comprised of a set of URLs accessed by a client in one visit to the server. For example, using association rule discovery techniques we can find correlations such as the following: 40% of clients who accessed the Web page with URL /company/products/product1.html, also accessed /company/products/product2.html; or 30% of clients who accessed /company/announcements/special-offer.html, placed an online order in /company/products/product1.
18
Contd.. Web Usage Mining 18 Usually such transaction databases contain extremely large amounts of data, current association rule discovery techniques try to prune the search space according to support for items under consideration. Support is a measure based on the number of occurrences of user transactions within transaction logs. Discovery of such rules for organizations engaged in electronic commerce can help in the development of effective marketing strategies.
19
Sequential Patterns Web Usage Mining 19 The problem of discovering sequential patterns is to find inter-transaction patterns such that the presence of a set of items is followed by another item in the time-stamp ordered transaction set. By analyzing this information, the Web mining system can determine temporal relationships among data items such as the following: 30% of clients who visited /company/products/, had done a search in Yahoo, within the past week on keyword w; or 60% of clients who placed an online order in /company/products/product1.html, also placed an online order in /company1/products/product4 within 15 days.
20
Clustering and Classification Web Usage Mining 20 Discovering classification rules allows one to develop a profile of items belonging to a particular group according to their common attributes. This profile can then be used to classify new data items that are added to the database such as the following: clients from state or government agencies who visit the site tend to be interested in the page /company/products/product1.html; or 50% of clients who placed an online order in /company/products/product2, were in the 20-25 age group and lived on the West Coast. Clustering analysis allows one to group together clients or data items that have similar characteristics. Clustering of client information or data items on Web transaction logs, can facilitate the development and execution of future marketing strategies, both online and off-line.
21
Analysis of Discovered Patterns Web Usage Mining 21 Web site administrators are extremely interested in questions like "How are people using the site?", "Which pages are being accessed most frequently?", etc. These questions require the analysis of structure of hyperlinks as well as the contents of the pages. The end products of such analysis might include 1) the frequency of visits per document, 2) most recent visit per document, 3) who is visiting which documents, 4) frequency of use of each hyperlink, and 5) most recent use of each hyperlink. Visualization Techniques OLAP Techniques Data & Knowledge Querying
22
Visualization Techniques Web Usage Mining 22 Visualization has been used very successfully in helping people understand various kinds of phenomena, both real and abstract. The WebViz system is used for visualizing WWW access patterns. WebViz allows the analyst to selectively analyze the portion of the Web that is of interest by filtering out the irrelevant portions. The Web is visualized as a directed graph with cycles, where nodes are pages and edges are (inter- page) hyperlinks. The visualization is composed of two windows, the WebViz control window and the display window. The first provides the analyst with controls to adjust the bindings, select a specific time to view, control the animation, and rearrange the layout. The second window's arrangement allows a document's access frequency to be represented by the width of the node representing it, while the node's color represents it recency of access.
23
OLAP Techniques Web Usage Mining 23 On-Line Analytical Processing (OLAP) is emerging as a very powerful paradigm for strategic analysis of databases in business settings. The key characteristics of strategic analysis include,very large data volume, explicit support for the temporal dimension, support for various kinds of information aggregation, and long-range analysis. This has led to the development of the data cube information model, and techniques for its efficient implementation. Web usage data have much in common with those of a data warehouse, and hence OLAP techniques are quite applicable and the issue needs further investigation.
24
Data & Knowledge Querying Web Usage Mining 24 One of the reasons attributed to the great success of relational database technology has been the existence of a high-level, declarative, query language, which allows an application to express what conditions must be satisfied by the data it needs, rather than having to specify how to get the required data. The main focus may be provided in at least two ways. First, constraints may be placed on the database (perhaps in a declarative language) Second, querying may be performed on the knowledge that has been extracted by the mining process. An SQL-like querying mechanism has been proposed for the WEBMINER system.
25
Web usage Mining Process Web Usage Mining 25
26
Web Usage Mining Architecture Web Usage Mining 26
27
Research Directions Web Usage Mining 27 Web Usage Mining, which is just starting as an area of research, has a number of open issues. Following are some directions for future research: Data Pre-Processing for Mining The Mining Process Analysis of Mined Knowledge Web SIFT Example
28
WebSIFT Example Web Usage Mining 28 Web Site Information Filter System (WebSIFT) is a Web usage mining framework, that uses the content and structure information from a Web site, and identifies the interesting results from mining usage data. Input of the mining process: server logs (access, referrer, and agent), HTML files, optional data. Prototypical Web usage mining system.
29
Conclusion Web Usage Mining 29 Web usage and data mining used for finding patterns is a growing area with the growth of Web- based applications Application of web usage data can be used to better understand web usage, and apply this specific knowledge to better serve users Web usage patterns and data mining can be the basis for a great deal of future research
30
References: Web Usage Mining 30 Web Usage: Mining: Discovery and Applications of Usage Patterns from Web Data - Jaideep Srivastava, Robert Cooley, Mukund Deshpande, Pang-N in Tan Dept of CSE – University of Minnesota. Web Mining: Pattern Discovery from World Wide Web Transaction Web Mining Research: A Survey – Raymond Kosala, Hendrik Blockeel Dept of CS Katholieke Universiteit LeuvenJ. Srivastava, R. Cooley, M. Deshpande, Pang-Ning-tan. Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data, SIGKDD Explorations, Vol. 1, Issue 2, 2000. B. Mobasher, R. Cooley and J. Srivastava, Web Mining: Information and Pattern Discovery on the World Wide Web, Proceedings of the 9th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'97), November 1997.Web Mining: Information and Pattern Discovery on the World Wide Web www.wikipedia.org www.wikipedia.org
31
THANK YOU… Web Usage Mining 31
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.