An OLAM Framework for Web Usage Mining and Business Intelligence Reporting Xiaohua (Tony) Hu Drexel University Philadelphia, PA, 19104.

Slides:



Advertisements
Similar presentations
RP Designs Semi-Custom e-Commerce Package. Overview RP Designs semi- custom e-commerce package is a complete website solution. Visitors can browse a catalog.
Advertisements

Web Mining.
Web Usage Mining Web Usage Mining (Clickstream Analysis) Mark Levene (Follow the links to learn more!)
Presentation Prepared For:. Secure user Login provides access to specific ship-to addresses, customer catalog, order processing rules, and other account-based.
Fox Scientific, Inc. ONLINE ORDERING 101. Welcome to our website On our main page you can find current promotions, the vendors we offer, technical references.
Digital Experience Analytics v10. Agenda Digital Experience REAN Model.
A Product of Online E-Commerce (B2C) Store front Solutions Sell Direct to clients and maximize your Profits Copyright © ANGLER.
OLAP Tuning. Outline OLAP 101 – Data warehouse architecture – ROLAP, MOLAP and HOLAP Data Cube – Star Schema and operations – The CUBE operator – Tuning.
Back to Table of Contents
MICROSOFT OFFICE ACCESS 2007.
Chapter 18: Data Analysis and Mining Kat Powell. Chapter 18: Data Analysis and Mining ➔ Decision Support Systems ➔ Data Analysis and OLAP ➔ Data Warehousing.
Chapter 12: Web Usage Mining - An introduction
ELC 200 DAY Chapter 11 Marketing On Internet.
Metrics for Performance Measurement in E-Commerce MARK 3030 – Week 10.
COMP 578 Data Warehousing And OLAP Technology Keith C.C. Chan Department of Computing The Hong Kong Polytechnic University.
Web Usage Mining - W hat, W hy, ho W Presented by:Roopa Datla Jinguang Liu.
© Copyright , Blue Martini Software. San Mateo California, USA 1 1 Integrating E-Commerce and Data Mining: Architecture and Challenges Llew Mason.
Insight on Google Analytics Features - Suresh. K.
Web Mining: An Overview Of Web Analytics with Examples Donghui Wu, Ph.D. Oracle Corporation April 16 th 2003.
WEB ANALYTICS Prof Sunil Wattal. Business questions How are people finding your website? What pages are the customers most interested in? Is your website.
Prof. Vishnuprasad Nagadevara Indian Institute of Management Bangalore
Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web logs Data Engineering Lab 성 유 진.
Business Intelligence. Topics Chart Online Analytical Process, OLAP – Excel’s Pivot table – Data visualization with dashboard Data warehousing Data Mining.
FALL 2012 DSCI5240 Graduate Presentation By Xxxxxxx.
Microsoft ® Site Server Commerce Edition Jay Sauls Microsoft Consulting Services.
Overview and key features.  Each page will be embedded SEO friendly tag  Tags are editable for users.
E-Commerce Solutions. What is e-Commerce  Simply put, e-commerce is the online transaction of business, featuring linked computer systems of the vendor,
Fall 2006 Davison/LinCSE 197/BIS 197: Search Engine Strategies 6-1 Module II Overview PLANNING: Things to Know BEFORE You Start… Why SEM? Goal Analysis.
Your on-line connection to Ferraz Shawmut; Getting Started Login / out Contact Us Home Page, Account Inquiry My E-Account Account Status Ordering Options.
Copyright © 2009 Pearson Education, Inc. Slide 6-1 Chapter 6 E-commerce Marketing Concepts.
Strategies for improving Web site performance Google Webmaster Tools + Google Analytics Marshall Breeding Director for Innovative Technologies and Research.
Web Site Performance An analytical approach for benchmarking and tuning.
INTRODUCTION TO DATA MINING MIS2502 Data Analytics.
Data Warehouse and Business Intelligence Dr. Minder Chen Fall 2009.
Chapter 1 Adamson & Venerable Spring Dimensional Modeling Dimensional Model Basics Fact & Dimension Tables Star Schema Granularity Facts and Measures.
Data Warehouse. Design DataWarehouse Key Design Considerations it is important to consider the intended purpose of the data warehouse or business intelligence.
Log files presented to : Sir Adnan presented by: SHAH RUKH.
Chapter 12: Web Usage Mining - An introduction Chapter written by Bamshad Mobasher Many slides are from a tutorial given by B. Berendt, B. Mobasher, M.
1 Tools for Website Effectiveness. What is your site producing? Sales PR Expanding client base Brand awareness Feedback.
Web Analytics MGMT 230 WEEK 10. After today’s class you will be able to: Explain the types of information routinely gathered by web servers Understand.
Srivastava J., Cooley R., Deshpande M, Tan P.N.
EVALUATE YOUR SITE’S PERFORMANCE. Web site statistics Affiliate Sales Figures.
Order the featured book of the day Estimated effort: 2.
Table of Contents TopicSlide Administrator Login 2 Administrator Navigations 3 Managing AlternativeDr.com Blogs 4 Managing Dr. Lloyd May Blogs 5 Managing.
CMPE 226 Database Systems October 21 Class Meeting Department of Computer Engineering San Jose State University Fall 2015 Instructor: Ron Mak
Business Intelligence - 2 BUS 782. Topics Data warehousing Data Mining.
Business Intelligence. Topics Chart Online Analytical Process, OLAP – Excel’s Pivot table – Data visualization with dashboard Scenario Management Data.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
OLAP On Line Analytic Processing. OLTP On Line Transaction Processing –support for ‘real-time’ processing of orders, bookings, sales –typically access.
E-Commerce Solution for all businesses. E-commerce solution for all businesses.
 Shopping Basket  Stages to maintain shopping basket in framework  Viewing Shopping Basket.
MIS2502: Data Analytics Advanced Analytics - Introduction.
Complete Ordering System for Promotional Literature and Samples Quick Reference and Training Guide.
Introduction Web analysis includes the study of users’ behavior on the web Traffic analysis – Usage analysis Behavior at particular website or across.
Building the Corporate Data Warehouse Pindaro Demertzoglou Data Resource Management.
E commerce Online Shopping Website at Rs. 7920/-.
The Concepts of Business Intelligence Microsoft® Business Intelligence Solutions.
Chapter 3 Building Business Intelligence Chapter 3 DATABASES AND DATA WAREHOUSES Building Business Intelligence 6/22/2016 1Management Information Systems.
Data Resource Management – MGMT An overview of where we are right now SQL Developer OLAP CUBE 1 Sales Cube Data Warehouse Denormalized Historical.
Data Mining – Introduction (contd…) Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
Fox Scientific, Inc. ONLINE ORDERING 101. Welcome to our website On our main page you can find current promotions, the vendors we offer, technical references.
Data mining in web applications
Guide to the Clickstream Data
MIS2502: Data Analytics Advanced Analytics - Introduction
NetApp Online Ordering User Tutorial
E-Commerce Solution for all businesses
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Star Schema.
Supporting End-User Access
Presentation transcript:

An OLAM Framework for Web Usage Mining and Business Intelligence Reporting Xiaohua (Tony) Hu Drexel University Philadelphia, PA, 19104

Outline Introduction Data Capture Data Webhouse Construction Mining, OLAP and Business Reporting Pattern Evaluation and Development Q &A

Benefits of Web Usage Mining Targeting customers based on usage behavior or profile (personalization) Adjusting web content and structure dynamically based on page access pattern of users (adaptive web site) Enhancing the service quality and delivery to the end user (cross-selling, up-selling) Improving web server system performance based on the web traffic analysis Identifying hot areas/killer areas of the web site

Web Usage Mining Steps 1.Data capture (clickstream, sales, customers, products, promotion, shipping etc) 2.Data processing -- ETL from OLTP to DW 3.Pattern discovery and OLAP cubes and reports 4.Pattern evaluation and deployment

Data Capture The web server logs recording the visitors’ click stream behaviors (pages template, cookie, transfer log, time stamp, IP address, agent, referrer etc.) Product information (product hierarchy, manufacturer, price, color, size etc.) Content information of the web site (image, gif, video clip etc.) The customer purchase data (quantity of the products, payment amount and method, shipping address etc.) Customer demographics information (age, gender, income, education level, lifestyle etc.)

Issues in Clickstream Capture Distinguish sessions Use Cookies to track customers Tag templates Log business events Records query string Crawlers detection

What Kind of Clickstream Information Need to Be Recorded? –Request (Click) Data: Template, Product, Assortment Time stamps for each click, Compile & execution times Query string information, Referring page information The request sequence number within a session –Cookie Data: The cookie of the visitor (This ID is temporary if the user has cookies turned off) –Session Data: Session length Browser (useragent) and IP address information for the client User’s Cookie ID User ID of the user if he/she logged in Whether or not the session timed out The total number of requests in the session Whether the session belongs to a user who “opts-out” The total number of sessions that have come from users with this Cookie ID

Web Log Data Designed for debugging purpose, not for analysis

Crawler Session Crawlers are programs that visit your site search engine, shopping bots It is very important to filter the crawler session (some of our clients’ site, the crawler sessions account up to 30%)

Techniques to Identify Crawlers Sessions Build a model to identify crawler sessions: common turn off images, have empty referrers, friendly bots will visit robots.txt file, page hits rate is too fast, pattern is a depth-first or breadth-first search of the site, bots never purchase Created invisible links in the web page

OLTP vs DSS OLTPDSS Daily operationAnalysis Many small transactions Few large transaction Need quick response Very time consuming Read & write (insert, delete, update) Mostly read only Session and product centric Customer centric

What is OLAM? OLAP: (On-Line Analytical Processing) pre-calculate summary information to enable drilling, pivoting, slicing/dicing, filtering, to analyze business from multiple angles or views (dimensions) OLAM (On Line Analytical Mining): An integration of data mining and data warehousing and OLAP technologies

Data Webhouse Construction Requirement Analysis of the Data Webhouse Data Webhouse Schema Design Dimensions, Fact Tables, Aggregation/Summary tables

Requirement Analysis of the Data Webhouse 1. Web site activity (hourly, daily, weekly, monthly, quarterly etc) 2. Product sale (by region, by brand, by domain, by browser type, by time etc) 3. Customers (by type, by age, by gender, by region, buyer vs. visitor, heavy buyer vs. light buyer etc) 4. Vendors (by type, by region, by price range etc) 5. Referrers (by domain, by sale amount, by visit numbers etc) 6. Navigational behavior pattern (top entry page, top exit page, killer age, hot page etc) 7. Click conversation-ratio 8. Shipments (by regular, by express mail etc) 9. Payments (by cash, by credit card, e-money etc)

Data Webhouse Schema Design Define the Source Data Choose the Grain of the Fact Tables Choose the Dimensions Appropriate for the Grain Choose the Facts Appropriate for That Grain

Appropriate Dimensions Session Dimension Page Dimension Time Dimension User Dimension Product Dimension

Session Attributes Session Length Referrer Agent Host Name IP Address Cookie_id First Request Time Last Request Time Average Time Per Page Purchase Flag Time Out Flag Many more …

Customer Attributes Address: City, State/Province, Country Gender, Age, profession, Education, Marital Status Contact Info: , Phone Repeat Visit Flag Frequent Buyer Flag Heavy Spender Flag Reader/Browser Flag Many more …

Page Attributes Page Template Page Location Page Type Page Category Page Description Registration Page Flag Shipping Page Flag Checkout Page lag Many more …

Promotion Attributes Promotion Name Price Reduction Percentage Adv Type Coupon Type Begin Date End Data Promotion Region Many more …

Date Attributes Day, Week, Month, Quarter, Year Day number in Month, Day Number in Quarter, Day Number in Year Week number in Month, Week Number in Quarter, Week Number in Year Weekday Flag Weekend Flag Season Many more …

Time Attributes Second, Minute, Minute, Hour, Early Morning Flag Late Afternoon Flag Lunch Time Flag Dinner Time Flag Late Evening Flag Many more …

OLAP View data from Multiple views and angles Immediate response to business query Ability to drill down and roll up the multiple dimensional data in the cube Analyze Business measures such as profit, revenue, quantity from different angles, perspectives and various factors

Some Fact Tables MINE_ORDERS_CLICKS_GIFTS This table contains a row for each order line, clickstream request, and gift registry entry. It is the union of the MINE_ORDER_LINES, MINE_CLICK_LINES, and MINE_GIFT_LINES tables and is used as the fact table when mining on a combination of order and clickstream data. Since different columns apply to different types of line items they are marked with the applicable type(s) (order, click, gift, or all). MINE_ORDERS_ACXIOM MINE_ORDER_HEADERS joins with MINE_CUSTOMERS, MINE_ACXIOM, MINE_PROMOTION MINE_LINE_ITEMS MINE_ORDER_LINES joins with MINE_CUSTOMER, MINE_ORDER_HEADERS, MINE_PRODUCTS, MINE_ASSORTMENT, MINE_PROMOTIONS

Some Dimension and Summary Tables in Webhouse MINE_CLICK_LINESa row for each Web page viewed MINE_ACXIOMa row for each customer for which the system was able to find Acxiom data MINE_SESSIONSa row for each Web session MINE_ASSORTMENTSa row for each assortment folder, assortment, and sub assortment defined in the system. MINE_CUSTOMERS a row for each customer MINE_GIFT_HEADERS a gift row for each customer MINE_GIFT_LINES a row for each gift registry item of each customer MINE_ORDER_LINEcontains a row for each order line of each order MINE_ORDER_HEADERSa row for each order of each customer MINE_PROMOTIONSa row for each promotion folder and promotion defined in the system

Search Argument Findings

Top 20 Paths Lead to Non-Purchased Sessions path counts main main->main 3731 main->main->main 790 main->main->login 329 main->main->main->main 303 login 274 main->main->pna->pna 216 pna 212 main->main->pna->pna->pna 192 main->main->eDealer 185 mc 180 main->main->pna 175 main->main->pna->pna->pna->pna->pna 169 main->main->pna->pna->pna->pna->pna->pna 166 main->main->pna->pna->pna->pna->pna->pna->pna 160 main->main->pna->pna->pna->pna 147 main->main->mc->mc->mc->mc 131 main->main->pna->pna->pna->pna->pna->pna->pna->pna 118 main->main->mc->mc->mc 111 main->main->pna->pna->pna->pna->pna->pna->pna->pna->pna 106

Top 20 paths start at OF_Main.jsp and exit at OF_Main.jsp Paths Counts OF_Main.jsp->splash.jsp->OF_Main.jsp 154 OF_Main.jsp->OF_Main.jsp 122 OF_Main.jsp->splash.jsp->OF_Main.jsp->OF_Main.jsp 52 OF_Main.jsp->OF_Main.jsp->OF_Main.jsp 28 OF_Main.jsp->splash.jsp->OF_Main.jsp->OF_Main.jsp->OF_Main.jsp 25 OF_Main.jsp->OF_Main.jsp->splash.jsp->OF_Main.jsp 23 OF_Main.jsp->splash.jsp->pna/pa_main.jsp->OF_Main.jsp 16 OF_Main.jsp->splash.jsp->login/ln_login.jsp->OF_Main.jsp 15 OF_Main.jsp->OF_Main.jsp->OF_Main.jsp->OF_Main.jsp 13 OF_Main.jsp->splash.jsp->mc/MC_main.jsp->OF_Main.jsp 13 OF_Main.jsp->splash.jsp->dealer_positioning.jsp->OF_Main.jsp 11 OF_Main.jsp->splash.jsp->pna/pa_main.jsp->pna/pa_family.jsp->OF_Main.jsp 11 OF_Main.jsp->splash.jsp->login/ln_login.jsp->login/ln_loginopp.jsp->login/ln_message.jsp->OF_Main.jsp 10 OF_Main.jsp->splash.jsp->OF_Main.jsp->OF_Main.jsp->OF_Main.jsp->OF_Main.jsp 9 OF_Main.jsp->splash.jsp->cart/sc_listing.jsp->OF_Main.jsp 7 OF_Main.jsp->splash.jsp->login/ln_login.jsp->login/ln_login_step.jsp->OF_Main.jsp 7 OF_Main.jsp->browser_message.jsp->OF_Main.jsp 6 OF_Main.jsp->dealer_positioning.jsp->OF_Main.jsp 5 OF_Main.jsp->OF_Main.jsp->OF_Main.jsp->OF_Main.jsp->OF_Main.jsp 5 OF_Main.jsp->OF_Main.jsp->OF_Main.jsp->OF_Main.jsp->OF_Main.jsp->OF_Main.jsp 5

Single/Multiple visitors/buyers

Web Usage Mining Methods Construct cubes from data webhouse roll-up, drill-down the OLAP cubes to find the top domain, top products, top hot spot, web activity, most frequently accessed time periods etc. Perform data mining on data webhouse find association patterns for cross-sell and up-sell, build link between pages, sequential patterns, and trend of web accessing, improve system design by web caching, web page prefetching, and web page swapping

Mining the web data Association Rules Classification/Prediction Clustering

Data Mining -Association Path Link analysis : Explore, understand, predict browsing pattern Shopping cart Analysis: cross-sell, up-sell to increase wallet-share

Gloss Example Relations Lift Support(%) Confidence(%) Rule Bloom ==> Dirty_Girl Dirty_Girl ==> Bloom Philosophy ==> Bloom Bloom ==> Philosophy Dirty_Girl ==> Blue_Q Blue_Q ==> Dirty_Girl Tony_And_Tina ==> Girl Philosophy ==> Tony_And_Tina Tony_And_Tina ==> Philosophy Demeter_Fragrances ==> Smell_This Girl ==> Tony_And_Tina Smell_This ==> Demeter_Fragrances

Data Mining - Classification Understand customer via rules, tree etc Prediction model for target-oriented marketing/campaign

Data Mining - Clustering Discover group/segments of similar behaviors/profile

Questions ?