Beyond Basic Faceted Search Ori Ben-Yitzhak, …(10 people) IBM Research Lab & Yahoo! Research WSDM 2008 (ACM International Conference on W eb S earch and.

Slides:



Advertisements
Similar presentations
Classification & Your Intranet: From Chaos to Control Susan Stearns Inmagic, Inc. E-Libraries E204 May, 2003.
Advertisements

INFO624 - Week 2 Models of Information Retrieval Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University.
Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
By: Mr Hashem Alaidaros MIS 211 Lecture 4 Title: Data Base Management System.
“ Leveraging SharePoint 2010 Search Technologies ” With: Ivan Neganov.
OLAP Services Business Intelligence Solutions. Agenda Definition of OLAP Types of OLAP Definition of Cube Definition of DMR Differences between Cube and.
Information Retrieval in Practice
Search Engines and Information Retrieval
6/16/20151 Recent Results in Automatic Web Resource Discovery Soumen Chakrabartiv Presentation by Cui Tao.
Information Retrieval in Practice
Statistical Relational Learning for Link Prediction Alexandrin Popescul and Lyle H. Unger Presented by Ron Bjarnason 11 November 2003.
Data Sources Data Warehouse Analysis Results Data visualisation Analytical tools OLAP Data Mining Overview of Business Intelligence Data visualisation.
Beyond Basic Faceted Search Ben-Yitzhak, et al. Fahimeh Fakour CS 572 Summer 2010.
1 Discussion Class 12 User Interfaces and Visualization.
Chapter 5: Information Retrieval and Web Search
Overview of Search Engines
ECPRD seminar on the net IX”, Brussels, 2011 Faceted Search Some examples of applied faceted search on websites developed by the EP Jerry.
Xpantrac connection with IDEAL Sloane Neidig, Samantha Johnson, David Cabrera, Erika Hoffman CS /6/2014.
Search Engines and Information Retrieval Chapter 1.
Social scope: Enabling Information Discovery On Social Content Sites
Name : Emad Zargoun Id number : EASTERN MEDITERRANEAN UNIVERSITY DEPARTMENT OF Computing and technology “ITEC547- text mining“ Prof.Dr. Nazife Dimiriler.
ECHO DEPository Project: Highlight on tools & emerging issues The ECHO DEPository Project is a 3-year digital preservation research and development project.
Web Search. Structure of the Web n The Web is a complex network (graph) of nodes & links that has the appearance of a self-organizing structure  The.
Chapter 6 SAS ® OLAP Cube Studio. Section 6.1 SAS OLAP Cube Studio Architecture.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Revolutionizing enterprise web development Searching with Solr.
Overview of IU Digital Collections Search Hui Zhang Jon Dunn Indiana University Digital Library Program IU Digital Library Brown Bag October 19, 2011.
Presented By :Ayesha Khan. Content Introduction Everyday Examples of Collaborative Filtering Traditional Collaborative Filtering Socially Collaborative.
Chapter 6: Information Retrieval and Web Search
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Topical Categorization of Large Collections of Electronic Theses and Dissertations Venkat Srinivasan & Edward A. Fox Virginia Tech, Blacksburg, VA, USA.
Search Engine Architecture
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
Iccha Sethi Serdar Aslan Team 1 Virginia Tech Information Storage and Retrieval CS 5604 Instructor: Dr. Edward Fox 10/11/2010.
1 © Xchanging 2010 no part of this document may be circulated, quoted or reproduced without prior written approval of Xchanging. MOSS Training – UI customization.
JISC/NSF PI Meeting, June Archon - A Digital Library that Federates Physics Collections with Varying Degrees of Metadata Richness Department of Computer.
Social Search and Discovery Using a Unified Approach Einat Amitay et al. IBM Research Lab in Haifa, Israel HT’09 18 March 2011 IDB Lab Seminar.
Personalized Recommendation of Related Content Based on Automatic Metadata Extraction Andreas Nauerz 1, Fedor Bakalov 2, Birgitta.
Copyright OpenHelix. No use or reproduction without express written consent1 1.
Collection & Item Level Metadata: Making Sense of Federated Search Websites By Paul R Butler.
Single Document Key phrase Extraction Using Neighborhood Knowledge.
Refined Online Citation Matching and Adaptive Canonical Metadata Construction CSE 598B Course Project Report Huajing Li.
Toward Entity Retrieval over Structured and Text Data Mayssam Sayyadian, Azadeh Shakery, AnHai Doan, ChengXiang Zhai Department of Computer Science University.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Apache Solr Dima Ionut Daniel. Contents What is Apache Solr? Architecture Features Core Solr Concepts Configuration Conclusions Bibliography.
Lecture-6 Bscshelp.com. Todays Lecture  Which Kinds of Applications Are Targeted?  Business intelligence  Search engines.
Beyond Basic Faceted Search Presented by Chien-Ling Huang Jun. 30, 2011 Ori Ben-Yitzhak, Nadav Golbandi, Nadav Har’El, Ronny Lempel,Andreas Neumann, Shila.
Information Retrieval in Practice
Information Retrieval in Practice
Information Organization: Overview
Search Engine Architecture
Information Retrieval (in Practice)
Searching and Indexing
Search Engine Architecture
Custom search forms with Apache Solr David Hernández
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Federated & Meta Search
Taxonomies, Lexicons and Organizing Knowledge
Information Retrieval
CS & CS Capstone Project & Software Development Project
Data Mining Chapter 6 Search Engines
International Marketing and Output Database Conference 2005
Magnet & /facet Zheng Liang
Introduction to Information Retrieval
BUILDING A DIGITAL REPOSITORY FOR LEARNING RESOURCES
Search Engine Architecture
CS/INFO 430 Information Retrieval
Information Organization: Overview
Presentation transcript:

Beyond Basic Faceted Search Ori Ben-Yitzhak, …(10 people) IBM Research Lab & Yahoo! Research WSDM 2008 (ACM International Conference on W eb S earch and D ata M ining) Fabruary 9, 2010 Presented by Hyo-jin Song

Contents  Introduction  Basic Faceted Search –Lucene –Data Model and Document Ingestion  Extending Multifaceted Search –Business Intelligence –Dynamic Facets  Correlated Facets  Conclusion  SNU CSE Homepage Practice 2

Contents  Introduction  Basic Faceted Search –Lucene –Data Model and Document Ingestion  Extending Multifaceted Search  Dynamic Facets  Correlated Facets  Conclusion  SNU CSE Homepage Practice 3

4  Faceted Search Overview –Used in search applications. –To improve the precision of the search results –Multidimensional or Vertical browsing ※ Nobel Prize Winners Search Introduction (1/3) The technique for accessing a collection of information represented using a faceted classification (Wikipedia)

 The two paradigms of the web search –Navigational Search  To use a hierarchy structure(taxonomy)  Users can browse the information space by iteratively narrowing the scope of their quest (Eg. Yahoo! Directory, DMOZ, etc) –Direct Search  Users write their queries as a bag of words in a text box.  To be Popular by Web search engines, such as Google, Yahoo! Search. –Recently a new approach has emerged, combining both paradigms, namely the faceted search approach. –multi-dimensional information space by combining text search with a progressive narrowing of choices in each dimension. ※ Source : SIGIR’2006 Workshop on Faceted Search Website 5 Introduction (2/3)

Introduction (3/3)  Facet comprises some attribute –Clearly defined –mutually exclusive –collectively exhaustive aspects, properties of a class –Eg. In a collection of books – author, subject, date facets.  Beyond Basic Faceted Search –To extends traditional faceted search to support richer information discovery tasks over more complex data models. –Users enable to gain insight into their data. –A Faceted search engine to support correlated facets  To associate more complex information model with a document across multiple facets are not independent 6

Contents  Introduction  Basic Faceted Search –Lucene –Data Model and Document Ingestion  Extending Multifaceted Search  Dynamic Facets  Correlated Facets  Conclusion  SNU CSE Homepage Practice 7

 Apache Lucene Overview –The popular open-source search library. –High-performance, full-featured text search, faceted search –written entirely in Java –Websites powered by Lucene  Apple, Disney, Eclipse, IBM, MIT DSpace, etc.  Apache Solr Overview –The popular, blazing fast open source enterprise search platform from the Apache Lucene project –To feature powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, rich document(Word, PDF) handling. 8 Basic Faceted Search Lucean(1/3)

 Apache Lucene structure (1) –To maintain a index that holds for each term(word) –a postings list : a list of document identifiers and word offsets within those documents in which this term occurs. –During search, Lucene uses these posting lists to quickly iterate over all documents. – ※ Some example of the Postings list (source : lecture slides in IR Class) 9 Basic Faceted Search Lucean(2/3)

 Apache Lucene structure (2) –Faceted search enablement requires some additional processing for each matching document –Adding its contribution to its associated facets. –Lucene makes it easy to plug such functionality into its iteration over the hits, since it can call a hit collector. -> The Lucene Stack 10 Basic Faceted Search Lucean(3/3)

 Taxonomy VS Folksonomy in Faceted Search  Directed acyclic graph –Nodes represent facets –Directed edges denote the refinement relations between nodes.  The two approaches to ingesting documents in faceted collections –1. To be given the full taxonomy before indexing  Documents must specify the taxonomy nodes. –2. Not to be given that and have to learn it,while indexing, from the ingested documents  Documents specify the taxonomy paths to which they correspond. 11 Basic Faceted Search Data Model and Document Ingestion (1/3)

 The Process of the second approach –The application must add taxonomy nodes or facet paths to each document prior to adding it to the index. –To infer the facet hierarchy from the plurality of paths encountered when indexing the individual documents. –To collect all encountered facet-paths to a forest like graph –This approach allows that new facet paths may be seamlessly introduced without the need for any administrative action, as new documents are ingested. –Our inferred taxonomy will automatically expand to accommodate the new data. 12 Basic Faceted Search Data Model and Document Ingestion (2/3)

 The Process of the second approach –The output of the indexing process after ingesting two documents doc and doc2 is in Figure 1 –The resulting taxonomy is a forest of trees Rather than a general DAG. –Each associated with the facet paths shown in Table 2 –The facet forest maintained by the taxonomy index 13 Basic Faceted Search Data Model and Document Ingestion (3/3)

Contents  Introduction  Basic Faceted Search –Lucene –Data Model and Document Ingestion  Extending Multifaceted Search  Dynamic Facets  Correlated Facets  Conclusion  SNU CSE Homepage Practice 14

 Business Intelligence –A broad category of applications and technologies for gathering, providing access to, and analyzing data for the purpose of helping enterprise users make better business decisions. –Figure 2 shows an example of aggregations when searching for “world wide web” over a subset of Amazon’s book catalog.  Multifaceted search model –By allowing a faceted query to specify any number of aggregation expressions that are to be calculated per target facet –Returning the values of these aggregations for each path of the corresponding facet set in the result set. 15 Extending Multifaceted Search

16 Extending Multifaceted Search

Contents  Introduction  Basic Faceted Search –Lucene –Data Model and Document Ingestion  Extending Multifaceted Search  Dynamic Facets  Correlated Facets  Conclusion  SNU CSE Homepage Practice 17

 The Dynamic Facets –Typical faceted search : over a set of predetermined indexed facets  i.e. the facets and attributes associated with each document must be known at indexing time.  One such attribute might be the date of a document –Dynamic facets search: To support dynamic time-based facets  We can do this : “Sum { qtime-doc.time < 60*60*24*7 } “  In a similar fashion, one can support the categorization of search results into spatial dynamic facets.  E.g. count the number of results in certain radii around a location that is specified by the query. 18 Dynamic Facets

Contents  Introduction  Basic Faceted Search –Lucene –Data Model and Document Ingestion  Extending Multifaceted Search  Dynamic Facets  Correlated Facets  Conclusion  SNU CSE Homepage Practice 19

 The Correlated Search –The standard faceted search data model  Each document has a certain set of facet values  E.g. a product (represented by a document) will have a certain color, size, price  The product is essentially available in all combinations of these colors and sizes  In the cross-product, {color/red, color/blue} x {size/small, size/medium} 20 Correlated Search

Contents  Introduction  Basic Faceted Search –Lucene –Data Model and Document Ingestion  Extending Multifaceted Search  Dynamic Facets  Correlated Facets  Conclusion  SNU CSE Homepage Practice 21

 Faceted Search –The more ambient web, The more needs for Faceted search –Many complicated domains –The combinations of many technology  For future work (some of which already started) –Aggregate cross products of dimensions (as in OLAP cubes) –Update facet values and numeric attributes of documents without requiring the re-indexing of the document –Faceted search across a distributed index 22 Conclusion

Contents  Introduction  Basic Faceted Search –Lucene –Data Model and Document Ingestion  Extending Multifaceted Search  Dynamic Facets  Correlated Facets  Conclusion  SNU CSE Homepage Practice 23

SNU CSE Homepage Practice  Basic Web Documents Search –By Crawling All documents in CSE Homepage & All Laborotary Homepage –The construction of the Inverted List –The Core source of Faceted Search or etc  Extension Search –Graph Visualization –Faceted Search –The domains of Department, Professor, Laboratary, Course –Web2.0 Technology – AJAX, Reverse AJAX, COMET, etc 24

Thank You! Any Questions or Comments?