GROUPER: A DYNAMIC CLUSTERING INTERFACE TO WEB SEARCH RESULTS Erdem Sarıgil - 21000089 O ğ uz Yılmaz - 21000082 1.

Slides:



Advertisements
Similar presentations
Web Mining.
Advertisements

The World Wide Web From Greenlaw/Hepp, In-line/On-line: Fundamentals of the Internet and the World Wide Web 1 Introduction The Web Defined Miscellaneous.
 Grouper: A Dynamic Clustering Interface to Web Search Results Fatih Çalı ş ır Tolga Çekiç Elif Dal Acar Erdinç /9.
Learning to Cluster Web Search Results SIGIR 04. ABSTRACT Organizing Web search results into clusters facilitates users quick browsing through search.
Prof. B. I. Khodanpur HOD – Dept. of CSE R. V. College of Engineering
1 CANTINA : A Content-Based Approach to Detecting Phishing Web Sites WWW Yue Zhang, Jason Hong, and Lorrie Cranor.
Databases for the 'Pi of the Sky' experiment Marek Biskup Warsaw University.
A New Suffix Tree Similarity Measure for Document Clustering Hung Chim, Xiaotie Deng City University of Hong Kong WWW 2007 Session: Similarity Search April.
Web Document Clustering: A Feasibility Demonstration Hui Han CSE dept. PSU 10/15/01.
Online Clustering of Web Search results
Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1.
Web search results clustering Web search results clustering is a version of document clustering, but… Billions of pages Constantly changing Data mainly.
 How many pages does it search?  How does it access all those pages?  How does it give us an answer so quickly?  How does it give us such accurate.
Video Table-of-Contents: Construction and Matching Master of Philosophy 3 rd Term Presentation - Presented by Ng Chung Wing.
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) Classic Information Retrieval (IR)
Chapter 14 An Overview of Query Optimization. Copyright © 2005 Pearson Addison-Wesley. All rights reserved Figure 14.1 Typical architecture for.
Mastering the Internet, XHTML, and JavaScript Chapter 7 Searching the Internet.
Query Languages: Patterns & Structures. Pattern Matching Pattern –a set of syntactic features that must occur in a text segment Types of patterns –Words:
A machine learning approach to improve precision for navigational queries in a Web information retrieval system Reiner Kraft
FACT: A Learning Based Web Query Processing System Hongjun Lu, Yanlei Diao Hong Kong U. of Science & Technology Songting Chen, Zengping Tian Fudan University.
1 MARG-DARSHAK: A Scrapbook on Web Search engines allow the users to enter keywords relating to a topic and retrieve information about internet sites (URLs)
Web Search – Summer Term 2006 II. Information Retrieval (Basics) (c) Wolfgang Hürst, Albert-Ludwigs-University.
1 ES 314 Advanced Programming Lec 2 Sept 3 Goals: Complete the discussion of problem Review of C++ Object-oriented design Arrays and pointers.
Hybrid Bounding Volumes for Distance Queries Distance Query returns the minimum distance between two geometric models Major application is path planning.
1 CS/INFO 430 Information Retrieval Lecture 23 Usability 1.
Search Engines. Allows a user to find information residing on remote computers; Searching differs from browsing in that the user is not required to provide.
“A Comparison of Document Clustering Techniques” Michael Steinbach, George Karypis and Vipin Kumar (Technical Report, CSE, UMN, 2000) Mahashweta Das
NUITS: A Novel User Interface for Efficient Keyword Search over Databases The integration of DB and IR provides users with a wide range of high quality.
 Clustering of Web Documents Jinfeng Chen. Zhong Su, Qiang Yang, HongHiang Zhang, Xiaowei Xu and Yuhen Hu, Correlation- based Document Clustering using.
RuleML-2007, Orlando, Florida1 Towards Knowledge Extraction from Weblogs and Rule-based Semantic Querying Xi Bai, Jigui Sun, Haiyan Che, Jin.
A New Suffix Tree Similarity Measure for Document Clustering Hung Chim, Xiaotie Deng WWW 07.
Web Document Clustering By Sang-Cheol Seok. 1.Introduction: Web document clustering? Why ? Two results for the same query ‘amazon’ Google : currently.
PERSONALIZED SEARCH Ram Nithin Baalay. Personalized Search? Search Engine: A Vital Need Next level of Intelligent Information Retrieval. Retrieval of.
Filter Algorithms for Approximate String Matching Stefan Burkhardt.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.
Personalized Web Search by Mapping User Queries to Categories Fang Liu Presented by Jing Zhang CS491CXZ February 26, 2004.
Web Document Clustering: A Feasibility Demonstration Oren Zamir and Oren Etzioni, SIGIR, 1998.
SEARCHING. Vocabulary List A collection of heterogeneous data (values can be different types) Dynamic in size Array A collection of homogenous data (values.
Clustering Personalized Web Search Results Xuehua Shen and Hong Cheng.
Improving Suffix Tree Clustering Base cluster ranking s(B) = |B| * f(|P|) |B| is the number of documents in base cluster B |P| is the number of words in.
SCATTER/GATHER : A CLUSTER BASED APPROACH FOR BROWSING LARGE DOCUMENT COLLECTIONS GROUPER : A DYNAMIC CLUSTERING INTERFACE TO WEB SEARCH RESULTS MINAL.
Semantic, Hierarchical, Online Clustering of Web Search Results Yisheng Dong.
Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.
THE ABSTRACT OBJECT RELATIONSHIP BROWSER (absORB) COS 333 Project Demo Thursday, May 7th, 2009 Laura Bai ’10 Natasha Indik ’10 Ryan Bayer ’09 Tsheko Mutungu.
Iccha Sethi Serdar Aslan Team 1 Virginia Tech Information Storage and Retrieval CS 5604 Instructor: Dr. Edward Fox 10/11/2010.
Meet the web: First impressions How big is the web and how do you measure it? How many people use the web? How many use search engines? What is the shape.
Unconstrained Endpoint Profiling Googling the Internet Ionut Trestian, Supranamaya Ranjan, Alekandar Kuzmanovic, Antonio Nucci Reviewed by Lee Young Soo.
CPT 499 Internet Skills for Educators Session Three Class Notes.
DOCUMENT CLUSTERING USING HIERARCHICAL ALGORITHM Submitted in partial fulfillment of requirement for the V Sem MCA Mini Project Under Visvesvaraya Technological.
Information seeking behaviour
15 th ACM GIS: Seattle, WA: Nov 7—9, 2007 TS2-Tree: An Efficient Similarity Based Organization for Trajectory Data Petko Bakalov Eamonn Keogh Vassilis.
1 What Makes a Query Difficult? David Carmel, Elad YomTov, Adam Darlow, Dan Pelleg IBM Haifa Research Labs SIGIR 2006.
Grouping Robin Burke ECT 360. Outline Grouping: Sibling difference method Uniquifying in XPath Grouping: Muenchian method Generated ids Keys Moded Templates.
GROUPER: A DYNAMIC CLUSTERING INTERFACE TO WEB SEARCH RESULTS by Oren Zamir and Oren Etzioni Presented by: Duygu Sarıkaya,Ahsen Yergök,Dilek Demirbaş.
Document Clustering for Natural Language Dialogue-based IR (Google for the Blind) Antoine Raux IR Seminar and Lab Fall 2003 Initial Presentation.
Clustering (Search Engine Results) CSE 454. © Etzioni & Weld To Do Lecture is short Add k-means Details of ST construction.
SZRZ6014 Research Methodology Prepared by: Aminat Adebola Adeyemo Study of high-dimensional data for data integration.
Michael T. Cox Computer Science & Engineering Department Wright State University Dayton, OH DAGSI/AFRL #HE-WSU AFOSR #F
1 Query Directed Web Page Clustering Daniel Crabtree Peter Andreae, Xiaoying Gao Victoria University of Wellington.
Characteristics of Information on the Web Dania Bilal IS 530 Spring 2005.
BLAST Anders Gorm Pedersen & Rasmus Wernersson.
Issues in Machine Learning
Type-directed Topic Segmentation of Entity Descriptions
Unit-4: Dynamic Programming
Lectures 4: Skip Pointers, Phrase Queries, Positional Indexing
Identify Different Chinese People with Identical Names on the Web
CS/INFO 430 Information Retrieval
Algorithm Course Algorithms Lecture 3 Sorting Algorithm-1
Presentation transcript:

GROUPER: A DYNAMIC CLUSTERING INTERFACE TO WEB SEARCH RESULTS Erdem Sarıgil O ğ uz Yılmaz

Grouper  Interface to the results of the HuskySearch  Dynamically groups the search results into clusters using Suffix Tree Clustering Algorithm (STC)  The goal make search engine results easy to browse by clustering them  Grouper receives hit from different engines, and only looks at the top hits from each search engine 2

Post-retrieval Clustering 3  Based on the returned document set  Superior results than pre-retrieval clustering  Some key requirements:  Coherent Clusters  Efficiently Browsable  Speed Algorithmic Speed Snippet-Tolerance

Suffix Tree Clustering (STC) 4  Linear time clustering algorithm  STC has three logical steps:  Document cleaning  Identifying base clusters using a suffix tree  Merging these base clusters into clusters  STC has several novel characteristics:  Overlapping clusters  Bag-of-words  Well suited for Web document clustering  Robust in such “noisy” situations

User Interface 5

User Interface (cont’d) 6

Making the Clusters Easy to Browse 7 Three heuristic to identify redundant phases: 1. Word Overlap 2. Sub- and Super- Strings 3. Most General Phase with Low Coverage

Speeeeed 8  Quality Search  TimeQuality OR TimeQuality  the vice president of vice president

Coherent Clusters 9

Comparison 10  Number of documents followed  Time Spent  Click Distance

Comparison (cont’d) 11

12