WebMiningResearch ASurvey Web Mining Research: A Survey By Raymond Kosala & Hendrik Blockeel, Katholieke Universitat Leuven, July 2000 Presented 4/18/2002.

Slides:



Advertisements
Similar presentations
Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Advertisements

Web Mining.
Linked and Relevant? Dr Brian S Collins Director, Europium Consulting Visiting Professor, IAM, Southampton Univ. Vice President, IEE Associate Fellow,
WEB USAGE MINING FRAMEWORK FOR MINING EVOLVING USER PROFILES IN DYNAMIC WEBSITE DONE BY: AYESHA NUSRATH 07L51A0517 FIRDOUSE AFREEN 07L51A0522.
1 Distributed Agents for User-Friendly Access of Digital Libraries DAFFODIL Effective Support for Using Digital Libraries Norbert Fuhr University of Duisburg-Essen,
Chapter 5: Introduction to Information Retrieval
Web Mining Research: A Survey Authors: Raymond Kosala & Hendrik Blockeel Presenter: Ryan Patterson April 23rd 2014 CS332 Data Mining pg 01.
Web- and Multimedia-based Information Systems. Assessment Presentation Programming Assignment.
WebMiningResearch ASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007.
Web Mining Research: A Survey
LinkSelector: A Web Mining Approach to Hyperlink Selection for Web Portals Xiao Fang University of Arizona 10/18/2002.
Web Usage Mining: Processes and Applications
FACT: A Learning Based Web Query Processing System Hongjun Lu, Yanlei Diao Hong Kong U. of Science & Technology Songting Chen, Zengping Tian Fudan University.
The Web is perhaps the single largest data source in the world. Due to the heterogeneity and lack of structure, mining and integration are challenging.
Web Mining Research: A Survey
Web Mining Research: A Survey
WebMiningResearchASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007 Revised.
Information Retrieval
University of Kansas Data Discovery on the Information Highway Susan Gauch University of Kansas.
Overview of Web Data Mining and Applications Part I
Authors:Jochen Dijrre, Peter Gerstl, Roland Seiffert Adapted from slides by: Trevor Crum Presenter: Nicholas Romano Text Mining: Finding Nuggets in Mountains.
Overview of Search Engines
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Web Mining Research: A survey
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Ihr Logo Data Explorer - A data profiling tool. Your Logo Agenda  Introduction  Existing System  Limitations of Existing System  Proposed Solution.
Research paper: Web Mining Research: A survey SIGKDD Explorations, June Volume 2, Issue 1 Author: R. Kosala and H. Blockeel.
Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.
Chapter 7 DATA, TEXT, AND WEB MINING Pages , 311, Sections 7.3, 7.5, 7.6.
The Confident Researcher: Google Away (Module 2) The Confident Researcher: Google Away 2.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
Generating Intelligent Links to Web Pages by Mining Access Patterns of Individuals and the Community Benjamin Lambert Omid Fatemieh CS598CXZ Spring 2005.
Funded by: European Commission – 6th Framework Project Reference: IST WP 2: Learning Web-service Domain Ontologies Miha Grčar Jožef Stefan.
Chapter 1 Introduction to Data Mining
Web Usage Patterns Ryan McFadden IST 497E December 5, 2002.
Master Thesis Defense Jan Fiedler 04/17/98
Web Data Management Dr. Daniel Deutch. Web Data The web has revolutionized our world Data is everywhere Constitutes a great potential But also a lot of.
Automatically Extracting Data Records from Web Pages Presenter: Dheerendranath Mundluru
WebMining Web Mining By- Pawan Singh Piyush Arora Pooja Mansharamani Pramod Singh Praveen Kumar 1.
Web Usage Mining for Semantic Web Personalization جینی شیره شعاعی زهرا.
Web Mining By:- Vineeta 8pgc18 M.Tech (II Semester)
Data Mining By Dave Maung.
8/12/10 By Uday Kumar WEB MINING. 8/12/10 Agenda World Wide Web – a brief history Introduction to Data Mining Data Mining Process & Techniques Web Mining.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
GUIDED BY DR. A. J. AGRAWAL Search Engine By Chetan R. Rathod.
A Context Model based on Ontological Languages: a Proposal for Information Visualization School of Informatics Castilla-La Mancha University Ramón Hervás.
Data Mining for Web Intelligence Presentation by Julia Erdman.
1 Of Crawlers, Portals, Mice and Men: Is there more to Mining the Web? Jiawei Han Simon Fraser University, Canada ACM-SIGMOD’99 Web Mining Panel Presentation.
WIRED Week 3 Syllabus Update (next week) Readings Overview - Quick Review of Last Week’s IR Models (if time) - Evaluating IR Systems - Understanding Queries.
AN INTELLIGENT AGENT is a software entity that senses its environment and then carries out some operations on behalf of a user, with a certain degree of.
Personalized Interaction With Semantic Information Portals Eric Schwarzkopf DFKI
WEB MINING. In recent years the growth of the World Wide Web exceeded all expectations. Today there are several billions of HTML documents, pictures and.
Mining real world data Web data. World Wide Web Hypertext documents –Text –Links Web –billions of documents –authored by millions of diverse people –edited.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.
WEB USAGE MINING Web Usage Mining 1. Contents Web Usage Mining 2  Web Mining  Web Mining Taxonomy  Web Usage Mining  Web analysis tools  Pattern.
Navigation Aided Retrieval Shashank Pandit & Christopher Olston Carnegie Mellon & Yahoo.
Web mining is the use of data mining techniques to automatically discover and extract information from Web documents/services
WebMiningResearchASurvey Web Mining Research: A Survey Authors: Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Computer Science Department University.
WEB STRUCTURE MINING SUBMITTED BY: BLESSY JOHN R7A ROLL NO:18.
Data mining in web applications
A Comparative Study of Link Analysis Algorithms
Data Warehousing and Data Mining
Data Mining Chapter 6 Search Engines
CSE 635 Multimedia Information Retrieval
Web Mining Department of Computer Science and Engg.
Web Mining Research: A Survey
Presentation transcript:

WebMiningResearch ASurvey Web Mining Research: A Survey By Raymond Kosala & Hendrik Blockeel, Katholieke Universitat Leuven, July 2000 Presented 4/18/2002 by Caitlin C Coughlin

4/18/2002Caitlin C Coughlin, University of Vermont2 Overview n Introduction n Web Mining n Web Content Mining n Web Structure Mining n Web Usage Mining n Conclusions

4/18/2002Caitlin C Coughlin, University of Vermont3 Introduction n The Web is huge, dynamic & diverse, and thus raises the scalability, multimedia data and temporal issues respectively. n Thus we are drowning in information and facing information overload. Information users can encounter problems when interacting with the Web

4/18/2002Caitlin C Coughlin, University of Vermont4 More Introduction PROBLEMS: l Finding Relevant information l Creating new knowledge out of the information available on the web l Personalization of the information l Learning about consumers or individual users

4/18/2002Caitlin C Coughlin, University of Vermont5 More Introduction Web mining techniques could be directly or indirectly used to solve the information overload problems described before. directly - application of web mining techniques directly addresses the problem indirectly- web mining approach techniques are used as part of a bigger application that addresses the aforementioned problems. Web mining NOT only useful tool: other useful techniques include nDB database nIR Information Retrieval nNLP Natural Language Processing nWeb document community

4/18/2002Caitlin C Coughlin, University of Vermont6 Web Mining: Outline n Overview of Web Mining n Describe some confusion in use of the term “Web Mining” n Provide a Classification n Relate Classification to the agent paradigm

4/18/2002Caitlin C Coughlin, University of Vermont7 Web Mining: Overview Web mining is the use of data mining techniques to automatically discover and extract information from web documents and services. We suggest decomposing Web mining into these subtasks: Resource finding l 1 Resource finding: the task of retrieving intended web documents Information selection and pre-processing l 2 Information selection and pre-processing: automatically selecting and pre-processing specific information from retrieved Web resources Generalization l 3 Generalization: automatically selecting and preprocessing specific information from retrieved Web resources Analysis l 4 Analysis: validation and/or interpretation of the mined patterns. We’ll call this pattern , as we’ll later see, sometimes is also used.

4/18/2002Caitlin C Coughlin, University of Vermont8 Web Mining: Confusion Information Retrieval Information Extraction l Web mining is often associated with Information Retrieval or Information Extraction, but it is different from both. l IR l IR is the automatic retrieval of all relevant documents while at the same time retrieving as few non-relevant ones as possible. [views documents as bag-of-words] l IE l IE has the goal of transforming a collection of documents, usually with the help of an IR system, into information that is more readily digested and analyzed. [interested in the structure or representation of a document] l We argue that Web mining intersects with the application of machine learning on the web.

4/18/2002Caitlin C Coughlin, University of Vermont9 Web Mining: Classification u Web content mining u Web content mining: describes the discovery of useful information from Web contents/data/documents. [IR and DB views] u Web structure mining u Web structure mining: tries to discover the model underlying the link structures of the Web. u Web usage mining: u Web usage mining: tries to make sense of the data generated by the Web surfer’s sessions or behaviors

4/18/2002Caitlin C Coughlin, University of Vermont10

4/18/2002Caitlin C Coughlin, University of Vermont11 Web Mining & the Agent Paradigm Web mining is often viewed from or implemented within an agent paradigm. Thus, web mining has a close relationship with software agents or intelligent agents. Two relevant types of software agents: ¨ User interface agents : information retrieval agents, information filtering agents, & personal assistant agents ¨ Distributed agents : distributed agents for knowledge discovery or data mining [content-based or collaborative]

4/18/2002Caitlin C Coughlin, University of Vermont12 Web Mining & the Agent Paradigm

4/18/2002Caitlin C Coughlin, University of Vermont13 Web Content Mining: IR view Information retrieval view for unstructured documents: çmost of the research uses “bag of words” to represent unstructured documents. çTakes single words as features. Features could be boolean or frequency based. çSee the table that follows

4/18/2002Caitlin C Coughlin, University of Vermont

4/18/2002Caitlin C Coughlin, University of Vermont15 Web Content Mining: IR view

4/18/2002Caitlin C Coughlin, University of Vermont16 Web Content Mining: DB View © The database techniques on the web are related to the problem of managing and querying the information on the web. © Three classes of tasks: modeling and querying the web, information extraction and integration, and web site construction and restructuring. © Tries to model the data on the web and to integrate them so that more more sophisticated queries other than the keywords based search can be performed. © Research in this area mainly deals with semi-structured data

4/18/2002Caitlin C Coughlin, University of Vermont17 Web Content Mining: DB view

4/18/2002Caitlin C Coughlin, University of Vermont18 Web Structure Mining « In Web structure mining we are interested in the structure of the hyperlinks within the Web itself. (inter-document structure) « This line of research inspired by the study of social networks and citation analysis. « A few different algorithms have been proposed to do this such as HITS, PageRank, improved HITS using content info & outlier filtering [example coming up]

4/18/2002Caitlin C Coughlin, University of Vermont19 Successful example of Web Structure Mining t The heart of Google software is PageRank™, a system for ranking web pages developed by our founders Larry Page and Sergey Brin at Stanford University t PageRank uses the web’s link structure as an indicator of an individual page's value. Google interprets a link from page A to page B as a vote, by page A, for page B. Google also analyzes the page that casts the vote. Votes cast by pages that are themselves "important" weigh more heavily and help to make other pages "important.” t Google combines PageRank with sophisticated text-matching techniques to find pages that are both important and relevant to your search. [they don’t specify] t Google does not sell placement within the results themselves (i.e., no one can buy a higher PageRank).

4/18/2002Caitlin C Coughlin, University of Vermont20 Web Usage Mining * W eb usage mining focuses on techniques that could predict user behavior while the user interacts with the web. * T wo commonly used approaches: 1) mapping the usage data of the web server into relational tables before an adapted data mining technique is performed, 2) uses the log data directly by using special preprocessing techniques. * A pplications of web usage mining fall into two main categories: learning a user profile/user modeling in adaptive interfaces [personalized] and learning user navigation patterns [impersonalized]

4/18/2002Caitlin C Coughlin, University of Vermont21 Conclusions W We surveyed research in Web Mining, W clarified some confusion in the use of the term Web mining, W explored the connection between Web mining categories and the agent paradigm, W & suggested three Web mining categories and situated some current research with respect to these categories. W The Web presents new challenges to the traditional data mining algorithms that work on flat data. We have seen that some of the traditional data mining algorithms have been extended or new algorithms have been used to work on the Web data.

4/18/2002Caitlin C Coughlin, University of Vermont22 Questions?