LinkSelector: Select Hyperlinks for Web Portals Prof. Olivia Sheng Xiao Fang School of Accounting and Information Systems University of Utah.

Slides:



Advertisements
Similar presentations
E-Business and e-Commerce. e-commerce and e-business e-commerce refers to aspects of online business involving exchanges among customers, business partners.
Advertisements

Web Mining.
Digital Marketing Analytics v10. Introduction  Name / job role  What company are you with  How much experience do you have using Webtrends  Create.
Web Mining Research: A Survey Authors: Raymond Kosala & Hendrik Blockeel Presenter: Ryan Patterson April 23rd 2014 CS332 Data Mining pg 01.
Introduction EducatorsHandbook.com is an an entirely new way to manage office discipline referrals. It replaces paper discipline referrals with a streamlined.
Managing and Publishing Local Content YP Listing Management Powered by YEXT.
Information Retrieval Lecture 8 Introduction to Information Retrieval (Manning et al. 2007) Chapter 19 For the MSc Computer Science Programme Dell Zhang.
Collaborative filtering with ordinal scale-based implicit ratings for mobile music recommendations S.-K. Lee et al., KAIST,
Searchable Web sites Recommendation Date : 2012/2/20 Source : WSDM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh Jia-ling 1.
Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1.
Chapter 12: Web Usage Mining - An introduction
Looking at both the Present and the Past to Efficiently Update Replicas of Web Content Luciano Barbosa * Ana Carolina Salgado ! Francisco Tenorio ! Jacques.
CS246: Page Selection. Junghoo "John" Cho (UCLA Computer Science) 2 Page Selection Infinite # of pages on the Web – E.g., infinite pages from a calendar.
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Introduction to Data Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
Where should I submit my publication? Application Training Module Series III by Customer Education Team Stop Searching, Start.
LinkSelector: A Web Mining Approach to Hyperlink Selection for Web Portals Xiao Fang University of Arizona 10/18/2002.
CS246 Search Engine Bias. Junghoo "John" Cho (UCLA Computer Science)2 Motivation “If you are not indexed by Google, you do not exist on the Web” --- news.com.
The Web is perhaps the single largest data source in the world. Due to the heterogeneity and lack of structure, mining and integration are challenging.
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Association Rule Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
WebMiningResearchASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007 Revised.
Discovery of Aggregate Usage Profiles for Web Personalization
1 Discovering Unexpected Information from Your Competitor’s Web Sites Bing Liu, Yiming Ma, Philip S. Yu Héctor A. Villa Martínez.
CS 345 Data Mining Lecture 1 Introduction to Web Mining.
Consumers on the Web: Identification of usage patterns Consumers on the Web: Identification of usage patterns by Nina Koiso-Kanttila
Overview of Web Data Mining and Applications Part I
Scout Portal Toolkit For Web/Database Legal Material 2004 CONFERENCE FOR LAW SCHOOL COMPUTING.
Web Usage Mining Sara Vahid. Agenda Introduction Web Usage Mining Procedure Preprocessing Stage Pattern Discovery Stage Data Mining Approaches Sample.
FALL 2012 DSCI5240 Graduate Presentation By Xxxxxxx.
CS 401 Paper Presentation Praveen Inuganti
Lecturer: Ghadah Aldehim
Research paper: Web Mining Research: A survey SIGKDD Explorations, June Volume 2, Issue 1 Author: R. Kosala and H. Blockeel.
Web Usage Mining with Semantic Analysis Date: 2013/12/18 Author: Laura Hollink, Peter Mika, Roi Blanco Source: WWW’13 Advisor: Jia-Ling Koh Speaker: Pei-Hao.
Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.
Put it to the Test: Usability Testing of Library Web Sites Nicole Campbell, Washington State University.
Gradual Adaption Model for Estimation of User Information Access Behavior J. Chen, R.Y. Shtykh and Q. Jin Graduate School of Human Sciences, Waseda University,
Generating Intelligent Links to Web Pages by Mining Access Patterns of Individuals and the Community Benjamin Lambert Omid Fatemieh CS598CXZ Spring 2005.
User Browsing Graph: Structure, Evolution and Application Yiqun Liu, Yijiang Jin, Min Zhang, Shaoping Ma, Liyun Ru State Key Lab of Intelligent Technology.
WebMining Web Mining By- Pawan Singh Piyush Arora Pooja Mansharamani Pramod Singh Praveen Kumar 1.
Center for E-Business Technology Seoul National University Seoul, Korea BrowseRank: letting the web users vote for page importance Yuting Liu, Bin Gao,
Data Mining By Dave Maung.
Chapter 12: Web Usage Mining - An introduction Chapter written by Bamshad Mobasher Many slides are from a tutorial given by B. Berendt, B. Mobasher, M.
Ryen W. White, Dan Morris Microsoft Research, Redmond, USA {ryenw,
Curtis Spencer Ezra Burgoyne An Internet Forum Index.
Analysis of Topic Dynamics in Web Search Xuehua Shen (University of Illinois) Susan Dumais (Microsoft Research) Eric Horvitz (Microsoft Research) WWW 2005.
EVALUATE YOUR SITE’S PERFORMANCE. Web site statistics Affiliate Sales Figures.
Web-Mining …searching for the knowledge on the Internet… Marko Grobelnik Institut Jožef Stefan.
Chapter Twelve Digital Interactive Media Arens|Schaefer|Weigold Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution.
Web Mining Issues Size Size –>350 million pages –Grows at about 1 million pages a day Diverse types of data Diverse types of data.
Web Usability Made Easier Adaptation personalization vs. customization Aleksandra Stoeva.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
Self-Organized Web Usage Regularities. Problems of foraging information on WWW Slow accession Difficulty in finding useful information is related to balkanization.
Chaoyang University of Technology Clustering web transactions using rough approximation Source : Fuzzy Sets and Systems 148 (2004) 131–138 Author : Supriya.
Performance Testing Test Complete. Performance testing and its sub categories Performance testing is performed, to determine how fast some aspect of a.
Web 2.0 – A New Beginning Web 2.0, a phrase coined by O'Reilly Media in 2004 refers to a supposed second generation of Internet-based services— such.
Chapter 8: Web Analytics, Web Mining, and Social Analytics
1 UNIT 13 The World Wide Web. Introduction 2 Agenda The World Wide Web Search Engines Video Streaming 3.
1 UNIT 13 The World Wide Web. Introduction 2 The World Wide Web: ▫ Commonly referred to as WWW or the Web. ▫ Is a service on the Internet. It consists.
Web Analytics Fundamentals Presented by Tejaswi, Chandrika, Sunil.
Introduction to Digital Analytics Keith MacDonald Guest Presentation.
WEB STRUCTURE MINING SUBMITTED BY: BLESSY JOHN R7A ROLL NO:18.
Personalized Ontology for Web Search Personalization S. Sendhilkumar, T.V. Geetha Anna University, Chennai India 1st ACM Bangalore annual Compute conference,
Data mining in web applications
Chapter 10: Web Basics.
Guide to the Clickstream Data
Internet.
UNIT 15 Webpage Creator.
Author: Kazunari Sugiyama, etc. (WWW2004)
Computer Networks and Internet
Data Mining Chapter 6 Search Engines
Web Mining Department of Computer Science and Engg.
Presentation transcript:

LinkSelector: Select Hyperlinks for Web Portals Prof. Olivia Sheng Xiao Fang School of Accounting and Information Systems University of Utah

2 Agenda Introduction Problem definition -- Hyperlink Selection Solution -- LinkSelector Evaluation Collaboration

3 Introduction Size of WWW More than 3 billion web pages (Google.com, 2001) 1 million pages added daily (Lawrence and Giles,1999) How to find information on the Web Using search engines (best coverage 38.3%) (Lawrence and Giles,1999) Clicking through hyperlinks

4 Introduction Product Category List A B C D E F Product Category A Product List A1 A2 A3 A4 A5 Product A2 Price: 1000 Detailed description Click on A Click on A2 Web Page 1 Web Page 2 Web Page 3 B2

5 Introduction Portal page: is a specific web page which serves as the entrance to a website. Portal page Important Mainly consisting of hyperlinks

6 Introduction Web portal is a personalized entrance to a website. (e.g., My Yahoo!) Default Web Portal/Portal Page Most My Yahoo! users never customize their default web portals (Manber et al., 2000).

7 Introduction Homepage of a Website/Portal Page

8 Introduction Not all hyperlinks in a website can be placed in the portal page of the website Hyperlinks in a portal page are selected from a hyperlink pool which is a set of hyperlinks pointing to top-level web pages, e.g., hyperlinks in a site index page.

9 Portal page

10 Hyperlink pool

11 Portal page

12 Hyperlink pool

13 Introduction Number of hyperlinks in a portal page one to several dozens (e.g., 14 in My Yahoo!). (Neilson, 1999) Number of hyperlinks in a hyperlink pool: one to several hundreds (e.g., 102 in My Yahoo!).

14 Introduction It is too computational expensive to do an exhaustive search (e.g., ). Current practice of hyperlink selection – expert selection Based on domain experts’ experiences Subjective and slower to adapt

15 Introduction Our approach is based on Web access patterns extracted from a web log – objective (web surfers’ actual visiting behaviors) Web structural patterns extracted from an existing website – objective and dynamically adaptive

16 Hyperlink Selection Metrics to measure the quality of a portal page Effectiveness Efficiency Usage The quality of a portal page is measured using a web log. A web log can be divided into sessions.

17 Hyperlink Selection Effectiveness: is the percentage of the user- sought top-level web pages that can be easily accessed from a portal page.  Efficiency measures the usefulness of hyperlinks placed in a portal page.  Usage : how often a portal page is visited.

18 Hyperlink Selection  Given  the hyperlink pool of a website, HP,  the number of hyperlinks to be placed in the portal page of the website, N, where N < |HP|;  Construct the portal page by selecting N hyperlinks from the hyperlink pool HP  Objective: optimize the effectiveness, efficiency and usage of the resulting portal page

19 LinkSelector LinkSelector is based on relationships between hyperlinks in a hyperlink pool. Structure Relationship Access Relationship

20 LinkSelector Structure Relationship L2 L4 L6 L8 L1 L3 Web page 1 Web page 2 L5 L7 Web page 3 Other Structure relationships: L1  L4 L1  L6 L1  L8 L3  L5 L3  L7 Structure relationship: L1  L2 L1: initial hyperlink L2: terminal hyperlink

21 LinkSelector A k-HS is denoted as a hyperlink set with k hyperlinks. e.g., {L1,L2} is a 2-HS The support of a k-HS is the percentage of sessions in which hyperlinks in the k-HS are accessed together. Example: If L1 and L2 are accessed together in 20 sessions out of total 100 sessions, then the support of the 2-HS {L1,L2} is 20%. Access Relationship

22 LinkSelector Access Relationship Definition : For a k-HS, where, there exists an access relationship among hyperlinks in the k-HS if and only if its support is greater than a pre-defined threshold. Example: If threshold = 0.15 and the support of the 2-HS {L1, L2} is 0.2 then, there exists an access relationship between hyperlinks L1 and L2 and the support of the relationship is 0.2

23 LinkSelector Discover structure relationships Parse the existing website Discover access relationships Data Preprocessing Web log cleaning Session identification Association rule mining (Agrawal and Srikant,1994 )

24 LinkSelector

25 Evaluation Summary of Data Hyperlink pool: site-index page of the UA web Site 110 links

26 Evaluation Summary of Data Web log: collected from the UA web server in Sep M records (raw)  4.2 M records (clean) total 344 K sessions 262 K sessions  Training data (23 days) 82 K sessions  Testing data (7 days)

27 Evaluation Average improvement: 12.7% Improvement decrease from 22.1% to 8.4% Average number of sessions per day: 11.5k

28 Evaluation Group II relationship: 0.2% of the training sessions Group I relationship /shared/sports-entertain.shtml  /shared/athletics.shtml

29 Evaluation Average improvement: 17.0% Improvement decreases from 30.2% to 9.4% 605/day more user-sought top-level web pages can be easily accessed from the portal page constructed using LinkSelector than from those constructed using the other two approaches

30 Evaluation Average improvement: 16.9% Improvement decrease from 30.2% to 9.3%