Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.

Slides:



Advertisements
Similar presentations
Presentation at Society of The Query conference, Amsterdam November 13-14, 2009 (original title: Learning from Google: software design as a methodology.
Advertisements

ELibrary Science Product Demonstration Get ready to experience science in a whole new way –eLibrary Science offers targeted science text and tools.
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Taxonomy & Ontology Impact on Search Infrastructure John R. McGrath Sr. Director, Fast Search & Transfer.
Web Mining.
Wincite Knowledge Warehousing and Networking Sophisticated Simplicity.
Taxonomies, Lexicons and Organizing Knowledge Wendi Pohs, IBM Software Group.
WWW Challenges : Supporting Users in Search and Navigation Natasa Milic-Frayling Microsoft Research, Cambridge UK SOFSEM 2004 January 28, 2004.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Automatic Discovery and Classification of search interface to the Hidden Web Dean Lee and Richard Sia Dec 2 nd 2003.
The Web is perhaps the single largest data source in the world. Due to the heterogeneity and lack of structure, mining and integration are challenging.
WebMiningResearch ASurvey Web Mining Research: A Survey By Raymond Kosala & Hendrik Blockeel, Katholieke Universitat Leuven, July 2000 Presented 4/18/2002.
IST NeOn-project.org The Semantic Web is growing… #SW Pages Lee, J., Goodwin, R. (2004) The Semantic.
Enterprise Search With SharePoint Portal Server V2 Steve Tullis, Program Manager, Business Portal Group 3/5/2003.
1 MARG-DARSHAK: A Scrapbook on Web Search engines allow the users to enter keywords relating to a topic and retrieve information about internet sites (URLs)
Introduction to WEKA Aaron 2/13/2009. Contents Introduction to weka Download and install weka Basic use of weka Weka API Survey.
Hypertext Categorization using Hyperlink Patterns and Meta Data Rayid Ghani Séan Slattery Yiming Yang Carnegie Mellon University.
Overview of Web Data Mining and Applications Part I
Overview of Search Engines
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation SEASR Overview Loretta Auvil and Bernie Acs National.
Data Management Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition.
Research paper: Web Mining Research: A survey SIGKDD Explorations, June Volume 2, Issue 1 Author: R. Kosala and H. Blockeel.
Enterprise & Intranet Search How Enterprise is different from Web search What to think about when evaluating Enterprise Search How Intranet use is different.
Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.
Chapter 7 DATA, TEXT, AND WEB MINING Pages , 311, Sections 7.3, 7.5, 7.6.
1 The BT Digital Library A case study in intelligent content management Paul Warren
Automatic Subject Classification and Topic Specific Search Engines -- Research at KnowLib Anders Ardö and Koraljka Golub DELOS Workshop, Lund, 23 June.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
GLOSSARY COMPILATION Alex Kotov (akotov2) Hanna Zhong (hzhong) Hoa Nguyen (hnguyen4) Zhenyu Yang (zyang2)
Defining Text Mining Preprocessing Transforming unstructured data stored in document collections into a more explicitly structured intermediate format.
Introduction to Web Mining Spring What is data mining? Data mining is extraction of useful patterns from data sources, e.g., databases, texts, web,
WebMining Web Mining By- Pawan Singh Piyush Arora Pooja Mansharamani Pramod Singh Praveen Kumar 1.
Markup and Validation Agents in Vijjana – A Pragmatic model for Self- Organizing, Collaborative, Domain- Centric Knowledge Networks S. Devalapalli, R.
XP New Perspectives on The Internet, Sixth Edition— Comprehensive Tutorial 3 1 Searching the Web Using Search Engines and Directories Effectively Tutorial.
Data Mining By Dave Maung.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
CBSOR,Indian Statistical Institute 30th March 07, ISI,Kokata 1 Digital Repository support for Consortium Dr. Devika P. Madalli Documentation Research &
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
1 Of Crawlers, Portals, Mice and Men: Is there more to Mining the Web? Jiawei Han Simon Fraser University, Canada ACM-SIGMOD’99 Web Mining Panel Presentation.
Oracle Database 11g Semantics Overview Xavier Lopez, Ph.D., Dir. Of Product Mgt., Spatial & Semantic Technologies Souripriya Das, Ph.D., Consultant Member.
Next Generation Search Engines Ehsun Daroodi 1 Feb, 2003.
WEB MINING. In recent years the growth of the World Wide Web exceeded all expectations. Today there are several billions of HTML documents, pictures and.
Algorithmic Detection of Semantic Similarity WWW 2005.
Mining real world data Web data. World Wide Web Hypertext documents –Text –Links Web –billions of documents –authored by millions of diverse people –edited.
Mercury – A Service Oriented Web-based system for finding and retrieving Biogeochemical, Ecological and other land- based data National Aeronautics and.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Web Information Retrieval Prof. Alessandro Agostini 1 Context in Web Search Steve Lawrence Speaker: Antonella Delmestri IEEE Data Engineering Bulletin.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
Characteristics of Information on the Web Dania Bilal IS 530 Spring 2006.
Hypertext Categorization using Hyperlink Patterns and Meta Data Rayid Ghani Séan Slattery Yiming Yang Carnegie Mellon University.
Social Information Processing March 26-28, 2008 AAAI Spring Symposium Stanford University
Semantic (web) activity at Elsevier Marc Krellenstein VP, Search and Discovery Elsevier October 27, 2004
Bringing Order to the Web : Automatically Categorizing Search Results Advisor : Dr. Hsu Graduate : Keng-Wei Chang Author : Hao Chen Susan Dumais.
Web Design Terminology Unit 2 STEM. 1. Accessibility – a web page or site that address the users limitations or disabilities 2. Active server page (ASP)
Web mining is the use of data mining techniques to automatically discover and extract information from Web documents/services
WEB STRUCTURE MINING SUBMITTED BY: BLESSY JOHN R7A ROLL NO:18.
Data mining in web applications
Search User Behavior: Expanding The Web Search Frontier
Introduction C.Eng 714 Spring 2010.
Knowledge Management Systems
Federated & Meta Search
Taxonomies, Lexicons and Organizing Knowledge
Data Mining: Concepts and Techniques Course Outline
Web & Databases Dania Bilal IS 530 Fall 2006.
Data Warehousing and Data Mining
Web Mining Department of Computer Science and Engg.
International Marketing and Output Database Conference 2005
Information Retrieval and Web Design
Presentation transcript:

Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997

The Web as a Text Database BIG and doubling every year –70 million observations (urls) –50 million variables (words) –very sparse BAD and UGLY –uncontrolled quality, widely distributed, rapidly changing, heterogeneous/complex data types, no consistent semantics or structure within or across objects, etc.

Data Mining the Web Today: –Search and meta-search engines –Hand-crafted hierarchies –Special-purpose information discovery and extraction algorithms (e.g., home pages, authority pages, interesting pages, fun cities)

Data Mining the Web To Come: –Inter-document associations uncovered by: Automatic classification Generating fixed or ad hoc structures (e.g., clustering) Exploring similarity neighborhoods (e.g., visualization) –Highly interactive interfaces Analysis of interrelations among objects Interest specification/Query formulation problem

What we Need to Get There Better Text Mining Tools (for the Web) –Robust, scalable methods for feature selection - word statistics, learned indexing features, tags –Integration w/ databases –Web mining services (rich API to Web indices) –Model/Pattern specification and summarization Content/topical interests Patterns of interest - new, different, central

What we Need to Get There Going Beyond Text –Metadata Date, size, author, site, time etc. –Structure - reflects prior human knowledge Link structure (in-links, out-links) –People - individually and collectively Ratings/preferences User models, usage patterns –Integration of the above