Download presentation
Presentation is loading. Please wait.
Published byBernard Caldwell Modified over 9 years ago
1
Beyond Basic Faceted Search Ori Ben-Yitzhak, …(10 people) IBM Research Lab & Yahoo! Research WSDM 2008 (ACM International Conference on W eb S earch and D ata M ining) Fabruary 9, 2010 Presented by Hyo-jin Song
2
Contents Introduction Basic Faceted Search –Lucene –Data Model and Document Ingestion Extending Multifaceted Search –Business Intelligence –Dynamic Facets Correlated Facets Conclusion SNU CSE Homepage Practice 2
3
Contents Introduction Basic Faceted Search –Lucene –Data Model and Document Ingestion Extending Multifaceted Search Dynamic Facets Correlated Facets Conclusion SNU CSE Homepage Practice 3
4
4 Faceted Search Overview –Used in search applications. –To improve the precision of the search results –Multidimensional or Vertical browsing ※ Nobel Prize Winners Search http://orange.sims.berkeley.edu/cgi-bin/flamenco.cgi/nobel/Flamenco Introduction (1/3) The technique for accessing a collection of information represented using a faceted classification (Wikipedia)
5
The two paradigms of the web search –Navigational Search To use a hierarchy structure(taxonomy) Users can browse the information space by iteratively narrowing the scope of their quest (Eg. Yahoo! Directory, DMOZ, etc) –Direct Search Users write their queries as a bag of words in a text box. To be Popular by Web search engines, such as Google, Yahoo! Search. –Recently a new approach has emerged, combining both paradigms, namely the faceted search approach. –multi-dimensional information space by combining text search with a progressive narrowing of choices in each dimension. ※ Source : SIGIR’2006 Workshop on Faceted Search Website 5 Introduction (2/3)
6
Introduction (3/3) Facet comprises some attribute –Clearly defined –mutually exclusive –collectively exhaustive aspects, properties of a class –Eg. In a collection of books – author, subject, date facets. Beyond Basic Faceted Search –To extends traditional faceted search to support richer information discovery tasks over more complex data models. –Users enable to gain insight into their data. –A Faceted search engine to support correlated facets To associate more complex information model with a document across multiple facets are not independent 6
7
Contents Introduction Basic Faceted Search –Lucene –Data Model and Document Ingestion Extending Multifaceted Search Dynamic Facets Correlated Facets Conclusion SNU CSE Homepage Practice 7
8
Apache Lucene Overview –The popular open-source search library. –High-performance, full-featured text search, faceted search –written entirely in Java –Websites powered by Lucene Apple, Disney, Eclipse, IBM, MIT DSpace, etc. Apache Solr Overview –The popular, blazing fast open source enterprise search platform from the Apache Lucene project –To feature powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, rich document(Word, PDF) handling. 8 Basic Faceted Search Lucean(1/3)
9
Apache Lucene structure (1) –To maintain a index that holds for each term(word) –a postings list : a list of document identifiers and word offsets within those documents in which this term occurs. –During search, Lucene uses these posting lists to quickly iterate over all documents. – ※ Some example of the Postings list (source : lecture slides in IR Class) 9 Basic Faceted Search Lucean(2/3)
10
Apache Lucene structure (2) –Faceted search enablement requires some additional processing for each matching document –Adding its contribution to its associated facets. –Lucene makes it easy to plug such functionality into its iteration over the hits, since it can call a hit collector. -> The Lucene Stack 10 Basic Faceted Search Lucean(3/3)
11
Taxonomy VS Folksonomy in Faceted Search Directed acyclic graph –Nodes represent facets –Directed edges denote the refinement relations between nodes. The two approaches to ingesting documents in faceted collections –1. To be given the full taxonomy before indexing Documents must specify the taxonomy nodes. –2. Not to be given that and have to learn it,while indexing, from the ingested documents Documents specify the taxonomy paths to which they correspond. 11 Basic Faceted Search Data Model and Document Ingestion (1/3)
12
The Process of the second approach –The application must add taxonomy nodes or facet paths to each document prior to adding it to the index. –To infer the facet hierarchy from the plurality of paths encountered when indexing the individual documents. –To collect all encountered facet-paths to a forest like graph –This approach allows that new facet paths may be seamlessly introduced without the need for any administrative action, as new documents are ingested. –Our inferred taxonomy will automatically expand to accommodate the new data. 12 Basic Faceted Search Data Model and Document Ingestion (2/3)
13
The Process of the second approach –The output of the indexing process after ingesting two documents doc and doc2 is in Figure 1 –The resulting taxonomy is a forest of trees Rather than a general DAG. –Each associated with the facet paths shown in Table 2 –The facet forest maintained by the taxonomy index 13 Basic Faceted Search Data Model and Document Ingestion (3/3)
14
Contents Introduction Basic Faceted Search –Lucene –Data Model and Document Ingestion Extending Multifaceted Search Dynamic Facets Correlated Facets Conclusion SNU CSE Homepage Practice 14
15
Business Intelligence –A broad category of applications and technologies for gathering, providing access to, and analyzing data for the purpose of helping enterprise users make better business decisions. –Figure 2 shows an example of aggregations when searching for “world wide web” over a subset of Amazon’s book catalog. Multifaceted search model –By allowing a faceted query to specify any number of aggregation expressions that are to be calculated per target facet –Returning the values of these aggregations for each path of the corresponding facet set in the result set. 15 Extending Multifaceted Search
16
16 Extending Multifaceted Search
17
Contents Introduction Basic Faceted Search –Lucene –Data Model and Document Ingestion Extending Multifaceted Search Dynamic Facets Correlated Facets Conclusion SNU CSE Homepage Practice 17
18
The Dynamic Facets –Typical faceted search : over a set of predetermined indexed facets i.e. the facets and attributes associated with each document must be known at indexing time. One such attribute might be the date of a document –Dynamic facets search: To support dynamic time-based facets We can do this : “Sum { qtime-doc.time < 60*60*24*7 } “ In a similar fashion, one can support the categorization of search results into spatial dynamic facets. E.g. count the number of results in certain radii around a location that is specified by the query. 18 Dynamic Facets
19
Contents Introduction Basic Faceted Search –Lucene –Data Model and Document Ingestion Extending Multifaceted Search Dynamic Facets Correlated Facets Conclusion SNU CSE Homepage Practice 19
20
The Correlated Search –The standard faceted search data model Each document has a certain set of facet values E.g. a product (represented by a document) will have a certain color, size, price The product is essentially available in all combinations of these colors and sizes In the cross-product, {color/red, color/blue} x {size/small, size/medium} 20 Correlated Search
21
Contents Introduction Basic Faceted Search –Lucene –Data Model and Document Ingestion Extending Multifaceted Search Dynamic Facets Correlated Facets Conclusion SNU CSE Homepage Practice 21
22
Faceted Search –The more ambient web, The more needs for Faceted search –Many complicated domains –The combinations of many technology For future work (some of which already started) –Aggregate cross products of dimensions (as in OLAP cubes) –Update facet values and numeric attributes of documents without requiring the re-indexing of the document –Faceted search across a distributed index 22 Conclusion
23
Contents Introduction Basic Faceted Search –Lucene –Data Model and Document Ingestion Extending Multifaceted Search Dynamic Facets Correlated Facets Conclusion SNU CSE Homepage Practice 23
24
SNU CSE Homepage Practice Basic Web Documents Search –By Crawling All documents in CSE Homepage & All Laborotary Homepage –The construction of the Inverted List –The Core source of Faceted Search or etc Extension Search –Graph Visualization –Faceted Search –The domains of Department, Professor, Laboratary, Course –Web2.0 Technology – AJAX, Reverse AJAX, COMET, etc 24
25
Thank You! Any Questions or Comments?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.