Presentation is loading. Please wait.

Presentation is loading. Please wait.

Beyond Basic Faceted Search Ori Ben-Yitzhak, …(10 people) IBM Research Lab & Yahoo! Research WSDM 2008 (ACM International Conference on W eb S earch and.

Similar presentations


Presentation on theme: "Beyond Basic Faceted Search Ori Ben-Yitzhak, …(10 people) IBM Research Lab & Yahoo! Research WSDM 2008 (ACM International Conference on W eb S earch and."— Presentation transcript:

1 Beyond Basic Faceted Search Ori Ben-Yitzhak, …(10 people) IBM Research Lab & Yahoo! Research WSDM 2008 (ACM International Conference on W eb S earch and D ata M ining) Fabruary 9, 2010 Presented by Hyo-jin Song

2 Contents  Introduction  Basic Faceted Search –Lucene –Data Model and Document Ingestion  Extending Multifaceted Search –Business Intelligence –Dynamic Facets  Correlated Facets  Conclusion  SNU CSE Homepage Practice 2

3 Contents  Introduction  Basic Faceted Search –Lucene –Data Model and Document Ingestion  Extending Multifaceted Search  Dynamic Facets  Correlated Facets  Conclusion  SNU CSE Homepage Practice 3

4 4  Faceted Search Overview –Used in search applications. –To improve the precision of the search results –Multidimensional or Vertical browsing ※ Nobel Prize Winners Search http://orange.sims.berkeley.edu/cgi-bin/flamenco.cgi/nobel/Flamenco Introduction (1/3) The technique for accessing a collection of information represented using a faceted classification (Wikipedia)

5  The two paradigms of the web search –Navigational Search  To use a hierarchy structure(taxonomy)  Users can browse the information space by iteratively narrowing the scope of their quest (Eg. Yahoo! Directory, DMOZ, etc) –Direct Search  Users write their queries as a bag of words in a text box.  To be Popular by Web search engines, such as Google, Yahoo! Search. –Recently a new approach has emerged, combining both paradigms, namely the faceted search approach. –multi-dimensional information space by combining text search with a progressive narrowing of choices in each dimension. ※ Source : SIGIR’2006 Workshop on Faceted Search Website 5 Introduction (2/3)

6 Introduction (3/3)  Facet comprises some attribute –Clearly defined –mutually exclusive –collectively exhaustive aspects, properties of a class –Eg. In a collection of books – author, subject, date facets.  Beyond Basic Faceted Search –To extends traditional faceted search to support richer information discovery tasks over more complex data models. –Users enable to gain insight into their data. –A Faceted search engine to support correlated facets  To associate more complex information model with a document across multiple facets are not independent 6

7 Contents  Introduction  Basic Faceted Search –Lucene –Data Model and Document Ingestion  Extending Multifaceted Search  Dynamic Facets  Correlated Facets  Conclusion  SNU CSE Homepage Practice 7

8  Apache Lucene Overview –The popular open-source search library. –High-performance, full-featured text search, faceted search –written entirely in Java –Websites powered by Lucene  Apple, Disney, Eclipse, IBM, MIT DSpace, etc.  Apache Solr Overview –The popular, blazing fast open source enterprise search platform from the Apache Lucene project –To feature powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, rich document(Word, PDF) handling. 8 Basic Faceted Search Lucean(1/3)

9  Apache Lucene structure (1) –To maintain a index that holds for each term(word) –a postings list : a list of document identifiers and word offsets within those documents in which this term occurs. –During search, Lucene uses these posting lists to quickly iterate over all documents. – ※ Some example of the Postings list (source : lecture slides in IR Class) 9 Basic Faceted Search Lucean(2/3)

10  Apache Lucene structure (2) –Faceted search enablement requires some additional processing for each matching document –Adding its contribution to its associated facets. –Lucene makes it easy to plug such functionality into its iteration over the hits, since it can call a hit collector. -> The Lucene Stack 10 Basic Faceted Search Lucean(3/3)

11  Taxonomy VS Folksonomy in Faceted Search  Directed acyclic graph –Nodes represent facets –Directed edges denote the refinement relations between nodes.  The two approaches to ingesting documents in faceted collections –1. To be given the full taxonomy before indexing  Documents must specify the taxonomy nodes. –2. Not to be given that and have to learn it,while indexing, from the ingested documents  Documents specify the taxonomy paths to which they correspond. 11 Basic Faceted Search Data Model and Document Ingestion (1/3)

12  The Process of the second approach –The application must add taxonomy nodes or facet paths to each document prior to adding it to the index. –To infer the facet hierarchy from the plurality of paths encountered when indexing the individual documents. –To collect all encountered facet-paths to a forest like graph –This approach allows that new facet paths may be seamlessly introduced without the need for any administrative action, as new documents are ingested. –Our inferred taxonomy will automatically expand to accommodate the new data. 12 Basic Faceted Search Data Model and Document Ingestion (2/3)

13  The Process of the second approach –The output of the indexing process after ingesting two documents doc and doc2 is in Figure 1 –The resulting taxonomy is a forest of trees Rather than a general DAG. –Each associated with the facet paths shown in Table 2 –The facet forest maintained by the taxonomy index 13 Basic Faceted Search Data Model and Document Ingestion (3/3)

14 Contents  Introduction  Basic Faceted Search –Lucene –Data Model and Document Ingestion  Extending Multifaceted Search  Dynamic Facets  Correlated Facets  Conclusion  SNU CSE Homepage Practice 14

15  Business Intelligence –A broad category of applications and technologies for gathering, providing access to, and analyzing data for the purpose of helping enterprise users make better business decisions. –Figure 2 shows an example of aggregations when searching for “world wide web” over a subset of Amazon’s book catalog.  Multifaceted search model –By allowing a faceted query to specify any number of aggregation expressions that are to be calculated per target facet –Returning the values of these aggregations for each path of the corresponding facet set in the result set. 15 Extending Multifaceted Search

16 16 Extending Multifaceted Search

17 Contents  Introduction  Basic Faceted Search –Lucene –Data Model and Document Ingestion  Extending Multifaceted Search  Dynamic Facets  Correlated Facets  Conclusion  SNU CSE Homepage Practice 17

18  The Dynamic Facets –Typical faceted search : over a set of predetermined indexed facets  i.e. the facets and attributes associated with each document must be known at indexing time.  One such attribute might be the date of a document –Dynamic facets search: To support dynamic time-based facets  We can do this : “Sum { qtime-doc.time < 60*60*24*7 } “  In a similar fashion, one can support the categorization of search results into spatial dynamic facets.  E.g. count the number of results in certain radii around a location that is specified by the query. 18 Dynamic Facets

19 Contents  Introduction  Basic Faceted Search –Lucene –Data Model and Document Ingestion  Extending Multifaceted Search  Dynamic Facets  Correlated Facets  Conclusion  SNU CSE Homepage Practice 19

20  The Correlated Search –The standard faceted search data model  Each document has a certain set of facet values  E.g. a product (represented by a document) will have a certain color, size, price  The product is essentially available in all combinations of these colors and sizes  In the cross-product, {color/red, color/blue} x {size/small, size/medium} 20 Correlated Search

21 Contents  Introduction  Basic Faceted Search –Lucene –Data Model and Document Ingestion  Extending Multifaceted Search  Dynamic Facets  Correlated Facets  Conclusion  SNU CSE Homepage Practice 21

22  Faceted Search –The more ambient web, The more needs for Faceted search –Many complicated domains –The combinations of many technology  For future work (some of which already started) –Aggregate cross products of dimensions (as in OLAP cubes) –Update facet values and numeric attributes of documents without requiring the re-indexing of the document –Faceted search across a distributed index 22 Conclusion

23 Contents  Introduction  Basic Faceted Search –Lucene –Data Model and Document Ingestion  Extending Multifaceted Search  Dynamic Facets  Correlated Facets  Conclusion  SNU CSE Homepage Practice 23

24 SNU CSE Homepage Practice  Basic Web Documents Search –By Crawling All documents in CSE Homepage & All Laborotary Homepage –The construction of the Inverted List –The Core source of Faceted Search or etc  Extension Search –Graph Visualization –Faceted Search –The domains of Department, Professor, Laboratary, Course –Web2.0 Technology – AJAX, Reverse AJAX, COMET, etc 24

25 Thank You! Any Questions or Comments?


Download ppt "Beyond Basic Faceted Search Ori Ben-Yitzhak, …(10 people) IBM Research Lab & Yahoo! Research WSDM 2008 (ACM International Conference on W eb S earch and."

Similar presentations


Ads by Google