Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web logs Data Engineering Lab 성 유 진.

Slides:



Advertisements
Similar presentations
Web Mining.
Advertisements

Supporting End-User Access
Chapter 5: Introduction to Information Retrieval
C6 Databases.
By: Mr Hashem Alaidaros MIS 211 Lecture 4 Title: Data Base Management System.
OLAP Services Business Intelligence Solutions. Agenda Definition of OLAP Types of OLAP Definition of Cube Definition of DMR Differences between Cube and.
Chapter 9 Business Intelligence Systems
Chapter 12: Web Usage Mining - An introduction
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Introduction to Data Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
Data Sources Data Warehouse Analysis Results Data visualisation Analytical tools OLAP Data Mining Overview of Business Intelligence Data visualisation.
The Web is perhaps the single largest data source in the world. Due to the heterogeneity and lack of structure, mining and integration are challenging.
Advanced Querying OLAP Part 2. Context OLAP systems for supporting decision making. Components: –Dimensions with hierarchies, –Measures, –Aggregation.
WebSphere -DB2 Integration Web Browser Web Server (Apache) WebSphere –JSP/Servlet/EJB DB2 JDBC, SQL HTTP.
COMP 578 Data Warehousing And OLAP Technology Keith C.C. Chan Department of Computing The Hong Kong Polytechnic University.
Web Usage Mining - W hat, W hy, ho W Presented by:Roopa Datla Jinguang Liu.
MIS DATABASE SYSTEMS, DATA WAREHOUSES, AND DATA MARTS CHAPTER 3
The University of Akron Dept of Business Technology Computer Information Systems Database Management Approaches 2440: 180 Database Concepts Instructor:
Chapter 13 The Data Warehouse
Data Mining – Intro.
DASHBOARDS Dashboard provides the managers with exactly the information they need in the correct format at the correct time. BI systems are the foundation.
Data Warehousing: Defined and Its Applications Pete Johnson April 2002.
Online Analytical Processing (OLAP) Hweichao Lu CS157B-02 Spring 2007.
Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.
WEB ANALYTICS Prof Sunil Wattal. Business questions How are people finding your website? What pages are the customers most interested in? Is your website.
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Data Mining : Introduction Chapter 1. 2 Index 1. What is Data Mining? 2. Data Mining Functionalities 1. Characterization and Discrimination 2. MIning.
Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization.
Data Mining Techniques
FALL 2012 DSCI5240 Graduate Presentation By Xxxxxxx.
Week 6 Lecture The Data Warehouse Samuel Conn, Asst. Professor
SharePoint 2010 Business Intelligence Module 6: Analysis Services.
L/O/G/O Metadata Business Intelligence Erwin Moeyaert.
Understanding Data Analytics and Data Mining Introduction.
Web mining Web mining deals with mining of patterns from web and e-commerce data. Web data –Web pages –Web structures –Web logs –E-commerce sites – .
5.1 © 2007 by Prentice Hall 5 Chapter Foundations of Business Intelligence: Databases and Information Management.
Research paper: Web Mining Research: A survey SIGKDD Explorations, June Volume 2, Issue 1 Author: R. Kosala and H. Blockeel.
MIS DATABASE SYSTEMS, DATA WAREHOUSES, AND DATA MARTS CHAPTER 3
OLAP & DSS SUPPORT IN DATA WAREHOUSE By - Pooja Sinha Kaushalya Bakde.
Data Mining By Dave Maung.
Log files presented to : Sir Adnan presented by: SHAH RUKH.
Chapter 12: Web Usage Mining - An introduction Chapter written by Bamshad Mobasher Many slides are from a tutorial given by B. Berendt, B. Mobasher, M.
BUSINESS ANALYTICS AND DATA VISUALIZATION
Some OLAP Issues CMPT 455/826 - Week 9, Day 2 Jan-Apr 2009 – w9d21.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
By N.Gopinath AP/CSE. There are 5 categories of Decision support tools, They are; 1. Reporting 2. Managed Query 3. Executive Information Systems 4. OLAP.
Web-Mining …searching for the knowledge on the Internet… Marko Grobelnik Institut Jožef Stefan.
Web Mining Issues Size Size –>350 million pages –Grows at about 1 million pages a day Diverse types of data Diverse types of data.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
1 Introduction to Data Mining C hapter 1. 2 Chapter 1 Outline Chapter 1 Outline – Background –Information is Power –Knowledge is Power –Data Mining.
Advanced Database Concepts
© Prentice Hall1 DATA MINING Web Mining Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University Companion slides.
Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 11: BIG DATA AND.
Data Mining – Introduction (contd…) Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
Data mining in web applications
DATA MINING © Prentice Hall.
Data warehouse and OLAP
Chapter 13 The Data Warehouse
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Data Warehouse.
MANAGING DATA RESOURCES
Data Warehouse and OLAP
Data Warehousing and Data Mining
Supporting End-User Access
Web Mining Department of Computer Science and Engg.
Introduction of Week 9 Return assignment 5-2
Data Warehouse.
Discovery of Significant Usage Patterns from Clickstream Data
Data Warehouse and OLAP
Presentation transcript:

Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web logs Data Engineering Lab 성 유 진

Abstract Web server log files analysis problem and difficulty server performance improvement system performance improvement customer targeting in electronic commerce problem and difficulty large raw log data processing is not easy data reduce size and time

WebLogMiner current weglogminer only frequency count  not enough slow, inflexible, difficult to maintain only frequency count  not enough WebLogMiner Virtual University/data mining WeblogMiner OLAP and data mining technique multi-dimensional data cube scalability, interactivity, variety, flexibility

Design of a Web log Miner Web log server log file information domain name of the request / user name / date and time of the request / the method of the request(GET, POST) / the name of the file requested / the result of the request(success, failure, error, etc) / size of the data sent back / the URL of the referring page / identification of the client agent Example 210.114.3.64 - - [01/Jul/1998:17:34:05 0900] "GET/~yjsung/sign.htmlHTTP/1.1" 200 740 210.114.3.64 -- [01/Jul/1998:17:38:44-0900] "POST/cgi-bin/yjsung/signHTTP/1.1" 200 352  POST : 브라우저가 채워진 양식을 서버에 전달 할 때 GET : 서버로부터의 데이터 요청 시

Sequence of requests can predict next request  improve traffic Cache information frequent backtracking and reload : deficient design client site log Access count not always the measure of interestingness 특정 document를 access하기 위해 반드시 거쳐야하는 사이트 Time and Date evaluate user interest by time spent Domain name Sequence of requests can predict next request  improve traffic

WebLogMiner 4 Stages .Filtering the data, creating relational DB 2. Data cube construction 3. OLAP is used 4. Data mining technique are used

1.DATABASE CONSTRUCTION FROM SERVER LOG FILES Data Cleansing and Transformation filter out page graphics(sound and video) but 보존 two types without knowledge about site (time day, month, year등으로의 transformation은 서버 정보 없이 가능) with knowledge about site : associating server request to intended action needs site structure relation database cleaned data and new implicit data is added

2.MULTI-DIMENSIONAL WEB LOG DATA CUBE CONSTRUCTION AND MANIPULATION group by operator in SQL is used to compute aggregates on a set of attributes sum of sales by P, C: for each product, give a breakdown on how much of it was sold to each customer CUBE is the n-dimensional generalization of group-by gives remarkable flexibility to manipulate and view the data allow OLAP operation such as drill-down, roll-up, slice and dice

Attributes - URL - domain name - size of resource, - time .

3.DATA MINING ON WEB LOG DATA CUBE AND WEB LOG DATABASE Data Characterization find rule that summarize user defined data set ☞ the traffic on a web server for a given type of media in a particular time of day Class comparison discover discriminant rules ☞ compare requests from two different web browsers Association discover the patterns that access to different resources consistently occurring together Prediction ☞ access to a new resource on a given day can be prediected based on accesses to similar old resources on similar days

Time-series analysis - Classification can be used to develop a better understanding of each class in the web log database, and perhaps restructure a web sit or customize answers to requests based on classes of requests Time-series analysis - to analyze data along time sequences to discover time-related interesting patterns … ☞ disclose the patterns and trends of the improvement of services of the web server Focus will be on time-series analysis because web log records are highly time-related

Experiments with the web log miner Virtual-U:six different major component: Goal - understand the usage and user behavior patterns Data Cleaning and transformations all entries were mapped one on one into relational database field site, user action are added. Problem extraneous information => define those entries and eliminate them multiple server requests by same user action same server request by multiple user actions local activities are not recorded

Multi-dimensional data cube construction manipulation summarization(group-bys on different dimensions) request/domain /event/session/bandwidth/ error/referring organization /browser summary Examples Figure2) OLAP analysis of Web log

Fig3) Typical event sequence and user behavior pattern analysis Fig4) Web traffic analysis of Web log

Fig6) Event trees of month one to four

Discussion and Conclusion WebLogMiner OLAP and data mining technique multi-dimensional data cube major strength scalability, interactivity, variety, flexibility Current log file의 문제점 web server should collect more information new structure is needed ==> would simplify pre-processing