Web-Mining …searching for the knowledge on the Internet… Marko Grobelnik Institut Jožef Stefan.

Slides:



Advertisements
Similar presentations
Web Mining.
Advertisements

The Internet and the Web
Web Usage Mining Web Usage Mining (Clickstream Analysis) Mark Levene (Follow the links to learn more!)
Data Mining and Text Analytics Advertising Laura Quinn.
Unit 11 Using the Internet & Browsing the Web.  Define the Internet and the Web  Set up & troubleshoot an Internet connection  Categorize webs sites.
Back to Table of Contents
Chapter 12: Web Usage Mining - An introduction
Chapter 9: Electronic Commerce Software. Electronic Commerce, Seventh Annual Edition2 Web Development Spectrum HTML Editors – FrontPage, Expression Web,
The Web is perhaps the single largest data source in the world. Due to the heterogeneity and lack of structure, mining and integration are challenging.
Dunja Mladenić Marko Grobelnik Jožef Stefan Institute, Slovenia.
Business Intelligence Andrew Davis Andria Zippler Jana Krinsky Tiffany Ferris.
Web Usage Mining - W hat, W hy, ho W Presented by:Roopa Datla Jinguang Liu.
DASHBOARDS Dashboard provides the managers with exactly the information they need in the correct format at the correct time. BI systems are the foundation.
Measuring and Monitoring Social Media Presence Measuring and Monitoring Social Media Presence Rim Dakelbab.
TURKISH STATISTICAL INSTITUTE INFORMATION TECHNOLOGIES DEPARTMENT (Muscat, Oman) DATA MINING.
3-1 Chapter Three. 3-2 Secondary Data vs. Primary Data Secondary Data: Data that have been gathered previously. Primary Data: New data gathered to help.
Operational Data Tools Chapter Eight. Copyright © Houghton Mifflin Company. All rights reserved.8–28–2 Chapter Eight Learning Objectives To learn database.
WEB ANALYTICS Prof Sunil Wattal. Business questions How are people finding your website? What pages are the customers most interested in? Is your website.
Prof. Vishnuprasad Nagadevara Indian Institute of Management Bangalore
1.Understand the decision-making process of consumer purchasing online. 2.Describe how companies are building one-to-one relationships with customers.
Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web logs Data Engineering Lab 성 유 진.
FALL 2012 DSCI5240 Graduate Presentation By Xxxxxxx.
Exploring Marketing Research William G. Zikmund Chapter 2: Information Systems and Knowledge Management.
Data Mining Dr. Chang Liu. What is Data Mining Data mining has been known by many different terms Data mining has been known by many different terms Knowledge.
«Tag-based Social Interest Discovery» Proceedings of the 17th International World Wide Web Conference (WWW2008) Xin Li, Lei Guo, Yihong Zhao Yahoo! Inc.,
ACS1803 Lecture Outline 2 DATA MANAGEMENT CONCEPTS Text, Ch. 3 How do we store data (numeric and character records) in a computer so that we can optimize.
INTELLIGENT SYSTEMS BUSINESS MOTIVATION BUSINESS INTELLIGENCE M. Gams.
Chapter 16 The World Wide Web. 2 The Web An infrastructure of information combined and the network software used to access it Web page A document that.
Chapter 7 DATA, TEXT, AND WEB MINING Pages , 311, Sections 7.3, 7.5, 7.6.
GOOGLE ANALYTICS Destinee Cushing DIG 4104C Spring 2014.
CSE Data Mining, 2002Lecture 11.1 Data Mining - CSE5230 Web Mining CSE5230/DMS/2002/11.
Generating Intelligent Links to Web Pages by Mining Access Patterns of Individuals and the Community Benjamin Lambert Omid Fatemieh CS598CXZ Spring 2005.
Personalization features to accelerate research Presented by: Armond DiRado Account Development Manager
INTRODUCTION TO DATA MINING MIS2502 Data Analytics.
The Internet Industry Week Four. RISE OF THE INTERNET THE INTERNET – a global system of interconnected private, public, academic, business, and government.
Web Analytics Unit 4-1(2005 Fall) Managing the Digital Enterprise By Professor Michael Rappa.
COMP3121 E-Commerce Technologies Richard Henson University of Worcester November 2011.
WebMining Web Mining By- Pawan Singh Piyush Arora Pooja Mansharamani Pramod Singh Praveen Kumar 1.
Database Design Part of the design process is deciding how data will be stored in the system –Conventional files (sequential, indexed,..) –Databases (database.
Digital Citizenship Lesson 3. Does it Matter who has your Data What kinds of information about yourself do you share online? What else do you do online.
1 Business System Analysis & Decision Making – Data Mining and Web Mining Zhangxi Lin ISQS 5340 Summer II 2006.
A Model for Fast Web Mining Prototyping Nivio Ziviani UFMG – Brazil Álvaro Pereir a Ricardo Baeza-Yates Jesus Bisbal UPF – Spain.
Log files presented to : Sir Adnan presented by: SHAH RUKH.
Chapter 12: Web Usage Mining - An introduction Chapter written by Bamshad Mobasher Many slides are from a tutorial given by B. Berendt, B. Mobasher, M.
Operated by Public Health England Making the most of Weblogs Web analytics in brief How to use ‘Google Analytics’ What can we obtain from raw data.
Srivastava J., Cooley R., Deshpande M, Tan P.N.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
© 2009 All Rights Reserved Jody Underwood Chief Scientist
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
WEB MINING. In recent years the growth of the World Wide Web exceeded all expectations. Today there are several billions of HTML documents, pictures and.
Organisations and Data Management 1 Data Collection: Why organisations & individuals acquire data & supply data via websites 2Techniques used by organisations.
WEB USAGE MINING Web Usage Mining 1. Contents Web Usage Mining 2  Web Mining  Web Mining Taxonomy  Web Usage Mining  Web analysis tools  Pattern.
Artificial Intelligence, simulation and modelling.
Introduction Web analysis includes the study of users’ behavior on the web Traffic analysis – Usage analysis Behavior at particular website or across.
A WEB USAGE MINING FRAMEWORK FOR MINING EVOLVING USER PROFILES IN DYNAMIC WEB SITES.
Chapter 8: Web Analytics, Web Mining, and Social Analytics
Lecture-6 Bscshelp.com. Todays Lecture  Which Kinds of Applications Are Targeted?  Business intelligence  Search engines.
Zaap Visualization of web traffic from http server logs.
Data mining in web applications
What is Google Analytics?
CSC 427: Data Structures and Algorithm Analysis
Tonga Institute of Higher Education IT 141: Information Systems
Automated ad placement
Improving searches through community clustering of information
The Internet Industry Week Two.
Web Mining Ref:
Global Enterprise Search
Tonga Institute of Higher Education IT 141: Information Systems
Tonga Institute of Higher Education IT 141: Information Systems
Welcome! Knowledge Discovery and Data Mining
Presentation transcript:

Web-Mining …searching for the knowledge on the Internet… Marko Grobelnik Institut Jožef Stefan

Outline What is Web-Mining? Typical problems, methods and solutions: – Customer profiling (most important task!) – Real time analysis of the data (stream mining) – Web visualization Who are the most important players in the area? Where to get more info on Web-Mining? …instead of conclusion…

What is Web-Mining? From Information on the Web to get Knowledge of the Web! Web-Mining is one of the currently most prosperous subareas of the Data-Mining …the goal is to understand complex dynamical data to be e.g. more profitable or more efficient in our business Web-Mining is defined by the set of typical problems, methods and solutions – …the most typical problem is analysis and profiling of web customers based on web-server log files

Web-Customer profiling Customer profiling is the most important application of Web-Mining: – …the goal is to better understand our web customer behavior in order to optimize our e-services – The problem is how to get quality data about our users – …even bigger problem is how to analyze the data...

Main source of the data: Log files Main source of the data about the activity of our web server are Log files Typical line of a Log file: – :13: W3SVC1 ASPIRE GET /KddGarden/Grouper/Grouper.zip HTTP/1.1 aspire.ijs.si Mozilla/4.0+(compatible;+MSIE+5.5;+Wind ows+NT+5.0) - E.g. Log files on WinNT/2000 reside at the \winnt\system32\logfiles\ system directory

Customer identification The most common way for identifying of the customers are: – Cookies – the information saved by a foreign web- server at the users local disk usually when first time using the web service – Username and password (explicit identification) – information input by the user at each e-service usage …web customer identification could not be solved optimally (for all situations)

Additional customer information What else do we know about the web customer/user? – The URL of the web page from which our user came to our web server written in the Referrer field in the Log file – The sequence of URLs or web services visited by our user (click-stream data) based on the Referrer field or Session-Id – How much time the user spent at the web page – The contents of the web page read by the user (text) – …from additional sources we know the history of the users in the form of the past actions (purchases, visits, habits) – …sometimes we have some demographical data etc. All the available information is hard to use in analysis

Data analysis methods Log files include sequences of events (click-streams): – …methods for analyzing event sequences are usually modified classical methods from the area of Data-Mining for analysis of very large databases – Basic methods are modified methods for induction of association rules, clustering, decision trees Other analytic methods are from the areas of Text- Mining, Statistics and Machine-Learning …not enough time for details...

What kind of problems do we solve? Personalization of web services: – Preparing offers (discounts, products, contents) customized for each particular user Understanding of what is going on at the web server: – Customer groups identification, behavioral patterns – …the goal is to better organize web services Better “Banner Adds” selection to increase the probability to be clicked by the user – …it is not hard to increase the probability for several 100% Building the psychological profiles based on the texts read by the user – …to get more info about the user than he has about himself

Association rules in Web-logs Searching for rules that connect two or more events: – 60% of the users that visited URL/company/product, also visited company/product/product1.html – 30% of the users that visited URL/company/special-offer/ also visited company/product2.html

Profiling using time dimension Searching for rules that connect two or more events taking into account time dimension: – 30% of the users that visited URL/company/product/product1.html also searched in the last week words W1 and W2 on Yahoo – 60% of the users that ordered product1 in the next 15 days also ordered product2

Classification rules Identification of behavior for groups of users - additional information can be obtained from cookies, registration,etc.: – Users that frequently visit page /company/products/product3.html are from educational institutions – 50% of the users that visited /company/products/product4.html are in age group of and live at the see coast

Real-Time Data-Analysis At some web servers there are too many hits to be saved and analyzed off-line: – …we have a data stream – no time or space for off-line data analysis (e.g. search engines, shops, banks, news, …) – …we would like to understand what is going on to detect e.g. anomalies or changes in trends The solution is in using special type of methods for online event analysis: – Methods are able to analyze non-stationary data – At each moment results (models) are in human readable form (e.g. decision trees, rules, …) – …no need to save Log files

Web visualization Usually we try solve two problems: – Network visualization – Web-Server contents visualization Network visualization is in general impossible, good partial solution is hyperbolic visualization ( Contents of large documents set could be visualized by creating knowledge map

Network visualization

Document contents visualization

Who are the most important players in the Web-Mining area? Several smaller companies solving partial focused problems: – … Bigger companies started offering the products only recently – usually more expensive solutions: – Microsoft (Analysis Server – OLE DB for DM) – SAS (Enterprise Miner) – IBM (Intelligent Miner, DB2+extender)

Where to get more info on Web-Mining? Good overview of the companies from the area: – WebKDD workshops with on-line accessible papers: – – Books: – Data Mining Your Website - Jesus Mena Data Mining Your Website – Web-Mining for Profit: E-Business Optimization - Jesus Mena Web-Mining for Profit: E-Business Optimization

…instead of conclusion… Web-Mining should be used by everybody offering services on the web and not being satisfied by simple access statistics! The idea is to make something more out of the data already collected by your computer. It is expected that Web-Mining will become soon a standard part of a typical web- solution.

Tnx!