Presentation is loading. Please wait.

Presentation is loading. Please wait.

Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web logs Data Engineering Lab 성 유 진.

Similar presentations


Presentation on theme: "Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web logs Data Engineering Lab 성 유 진."— Presentation transcript:

1 Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web logs
Data Engineering Lab 성 유 진

2 Abstract Web server log files analysis problem and difficulty
server performance improvement system performance improvement customer targeting in electronic commerce problem and difficulty large raw log data processing is not easy data reduce size and time

3 WebLogMiner current weglogminer only frequency count  not enough
slow, inflexible, difficult to maintain only frequency count  not enough WebLogMiner Virtual University/data mining WeblogMiner OLAP and data mining technique multi-dimensional data cube scalability, interactivity, variety, flexibility

4 Design of a Web log Miner
Web log server log file information domain name of the request / user name / date and time of the request / the method of the request(GET, POST) / the name of the file requested / the result of the request(success, failure, error, etc) / size of the data sent back / the URL of the referring page / identification of the client agent Example [01/Jul/1998:17:34: ] "GET/~yjsung/sign.htmlHTTP/1.1" [01/Jul/1998:17:38: ] "POST/cgi-bin/yjsung/signHTTP/1.1"  POST : 브라우저가 채워진 양식을 서버에 전달 할 때 GET : 서버로부터의 데이터 요청 시

5 Sequence of requests can predict next request  improve traffic
Cache information frequent backtracking and reload : deficient design client site log Access count not always the measure of interestingness 특정 document를 access하기 위해 반드시 거쳐야하는 사이트 Time and Date evaluate user interest by time spent Domain name Sequence of requests can predict next request  improve traffic

6 WebLogMiner 4 Stages .Filtering the data, creating relational DB
2. Data cube construction 3. OLAP is used 4. Data mining technique are used

7 1.DATABASE CONSTRUCTION FROM SERVER LOG FILES
Data Cleansing and Transformation filter out page graphics(sound and video) but 보존 two types without knowledge about site (time day, month, year등으로의 transformation은 서버 정보 없이 가능) with knowledge about site : associating server request to intended action needs site structure relation database cleaned data and new implicit data is added

8 2.MULTI-DIMENSIONAL WEB LOG DATA CUBE CONSTRUCTION AND MANIPULATION
group by operator in SQL is used to compute aggregates on a set of attributes sum of sales by P, C: for each product, give a breakdown on how much of it was sold to each customer CUBE is the n-dimensional generalization of group-by gives remarkable flexibility to manipulate and view the data allow OLAP operation such as drill-down, roll-up, slice and dice

9 Attributes - URL - domain name - size of resource, - time .

10 3.DATA MINING ON WEB LOG DATA CUBE AND WEB LOG DATABASE
Data Characterization find rule that summarize user defined data set ☞ the traffic on a web server for a given type of media in a particular time of day Class comparison discover discriminant rules ☞ compare requests from two different web browsers Association discover the patterns that access to different resources consistently occurring together Prediction ☞ access to a new resource on a given day can be prediected based on accesses to similar old resources on similar days

11 Time-series analysis -
Classification can be used to develop a better understanding of each class in the web log database, and perhaps restructure a web sit or customize answers to requests based on classes of requests Time-series analysis - to analyze data along time sequences to discover time-related interesting patterns … ☞ disclose the patterns and trends of the improvement of services of the web server Focus will be on time-series analysis because web log records are highly time-related

12 Experiments with the web log miner
Virtual-U:six different major component: Goal - understand the usage and user behavior patterns Data Cleaning and transformations all entries were mapped one on one into relational database field site, user action are added. Problem extraneous information => define those entries and eliminate them multiple server requests by same user action same server request by multiple user actions local activities are not recorded

13

14 Multi-dimensional data cube construction manipulation
summarization(group-bys on different dimensions) request/domain /event/session/bandwidth/ error/referring organization /browser summary Examples Figure2) OLAP analysis of Web log

15 Fig3) Typical event sequence and user behavior pattern analysis
Fig4) Web traffic analysis of Web log

16

17 Fig6) Event trees of month one to four

18 Discussion and Conclusion
WebLogMiner OLAP and data mining technique multi-dimensional data cube major strength scalability, interactivity, variety, flexibility Current log file의 문제점 web server should collect more information new structure is needed ==> would simplify pre-processing


Download ppt "Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web logs Data Engineering Lab 성 유 진."

Similar presentations


Ads by Google